Thread: index on ILIKE/LIKE - PostgreSQL 9.2

index on ILIKE/LIKE - PostgreSQL 9.2

From
Lucas Possamai
Date:
Hi there!

I've got a simple but slow query:

 SELECT DISTINCT title  
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239  AND time_job > 1457826264
order BY title
limit 10

Explain analyze: 

Limit  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)
  ->  Unique  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)
        ->  Sort  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)
              Sort Key: "title"
              Sort Method: quicksort  Memory: 25kB
              ->  Bitmap Heap Scan on "ja_jobs"  (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)
                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
                    Filter: (("title")::"text" ~~* '%
RYAN WER%'::"text")
                    Rows Removed by Filter: 791
                    ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)
                          Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Total runtime: 2746.879 ms

Then, I created a trgm index:

CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);

Explain analyze after the index: (Yes, I ran the analyze)

Limit  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)
  ->  Unique  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)
        ->  Sort  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)
              Sort Key: "title"
              Sort Method: quicksort  Memory: 25kB
              ->  Bitmap Heap Scan on "ja_jobs"  (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~ '%
RYAN WER%'::"text"))
                    Rows Removed by Index Recheck: 4
                    ->  BitmapAnd  (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1)
                          ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1)
                                Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
                          ->  Bitmap Index Scan on "ix_ja_jobs_trgm_gin"  (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1)
                                Index Cond: (("title")::"text" ~~ '%
RYAN WER%'::"text")
Total runtime: 3720.653 ms 


so.. the query is still slow.. 
Do you guys  know what can be done ? related to the ILIKE?

cheers
Lucas

Re: index on ILIKE/LIKE - PostgreSQL 9.2

From
Melvin Davidson
Date:


On Wed, May 11, 2016 at 8:30 PM, Lucas Possamai <drum.lucas@gmail.com> wrote:
Hi there!

I've got a simple but slow query:

 SELECT DISTINCT title  
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239  AND time_job > 1457826264
order BY title
limit 10

Explain analyze: 

Limit  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)
  ->  Unique  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)
        ->  Sort  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)
              Sort Key: "title"
              Sort Method: quicksort  Memory: 25kB
              ->  Bitmap Heap Scan on "ja_jobs"  (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)
                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
                    Filter: (("title")::"text" ~~* '%
RYAN WER%'::"text")
                    Rows Removed by Filter: 791
                    ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)
                          Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Total runtime: 2746.879 ms

Then, I created a trgm index:

CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);

Explain analyze after the index: (Yes, I ran the analyze)

Limit  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)
  ->  Unique  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)
        ->  Sort  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)
              Sort Key: "title"
              Sort Method: quicksort  Memory: 25kB
              ->  Bitmap Heap Scan on "ja_jobs"  (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~ '%
RYAN WER%'::"text"))
                    Rows Removed by Index Recheck: 4
                    ->  BitmapAnd  (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1)
                          ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1)
                                Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
                          ->  Bitmap Index Scan on "ix_ja_jobs_trgm_gin"  (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1)
                                Index Cond: (("title")::"text" ~~ '%
RYAN WER%'::"text")
Total runtime: 3720.653 ms 


so.. the query is still slow.. 
Do you guys  know what can be done ? related to the ILIKE?

cheers
Lucas

The main problem is WHERE title ILIKE '%RYAN WER%'
When you put a % on the left of the text, there is no way to optimize that, so yes, it will be slow.

If you can eliminate the leading percent and just have trailing, it will be much faster.


--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: index on ILIKE/LIKE - PostgreSQL 9.2

From
Lucas Possamai
Date:


The main problem is WHERE title ILIKE '%RYAN WER%'
When you put a % on the left of the text, there is no way to optimize that, so yes, it will be slow.

If you can eliminate the leading percent and just have trailing, it will be much faster.



Hmm.. yep.. I suppose I can do that.

But, taking the left % off, the query is still slow:

Limit  (cost=418.57..418.58 rows=1 width=20) (actual time=4439.367..4439.381 rows=1 loops=1)
  Buffers: shared hit=6847
  ->  Unique  (cost=418.57..418.58 rows=1 width=20) (actual time=4439.363..4439.374 rows=1 loops=1)
        Buffers: shared hit=6847
        ->  Sort  (cost=418.57..418.58 rows=1 width=20) (actual time=4439.360..4439.365 rows=4 loops=1)
              Sort Key: "title"
              Sort Method: quicksort  Memory: 25kB
              Buffers: shared hit=6847
              ->  Bitmap Heap Scan on "ja_jobs"  (cost=414.55..418.56 rows=1 width=20) (actual time=4439.312..4439.329 rows=4 loops=1)
                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~* 'RYAN SHOWER%'::"text"))
                    Buffers: shared hit=6847
                    ->  BitmapAnd  (cost=414.55..414.55 rows=1 width=0) (actual time=4439.280..4439.280 rows=0 loops=1)
                          Buffers: shared hit=6843
                          ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..50.67 rows=1711 width=0) (actual time=0.142..0.142 rows=795 loops=1)
                                Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
                                Buffers: shared hit=8
                          ->  Bitmap Index Scan on "ix_ja_jobs_trgm_gin"  (cost=0.00..363.62 rows=483 width=0) (actual time=4439.014..4439.014 rows=32 loops=1)
                                Index Cond: (("title")::"text" ~~* '
RYAN SHOWER%'::"text")
                                Buffers: shared hit=6835
Total runtime: 4439.427 ms

Here [1] it appears to be working even with two %.. But it's not for me.... 



Any ideia? lol 

Re: index on ILIKE/LIKE - PostgreSQL 9.2

From
Melvin Davidson
Date:

On Wed, May 11, 2016 at 9:36 PM, Lucas Possamai <drum.lucas@gmail.com> wrote:


The main problem is WHERE title ILIKE '%RYAN WER%'
When you put a % on the left of the text, there is no way to optimize that, so yes, it will be slow.

If you can eliminate the leading percent and just have trailing, it will be much faster.



Hmm.. yep.. I suppose I can do that.

But, taking the left % off, the query is still slow:

Limit  (cost=418.57..418.58 rows=1 width=20) (actual time=4439.367..4439.381 rows=1 loops=1)
  Buffers: shared hit=6847
  ->  Unique  (cost=418.57..418.58 rows=1 width=20) (actual time=4439.363..4439.374 rows=1 loops=1)
        Buffers: shared hit=6847
        ->  Sort  (cost=418.57..418.58 rows=1 width=20) (actual time=4439.360..4439.365 rows=4 loops=1)
              Sort Key: "title"
              Sort Method: quicksort  Memory: 25kB
              Buffers: shared hit=6847
              ->  Bitmap Heap Scan on "ja_jobs"  (cost=414.55..418.56 rows=1 width=20) (actual time=4439.312..4439.329 rows=4 loops=1)
                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~* 'RYAN SHOWER%'::"text"))
                    Buffers: shared hit=6847
                    ->  BitmapAnd  (cost=414.55..414.55 rows=1 width=0) (actual time=4439.280..4439.280 rows=0 loops=1)
                          Buffers: shared hit=6843
                          ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..50.67 rows=1711 width=0) (actual time=0.142..0.142 rows=795 loops=1)
                                Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
                                Buffers: shared hit=8
                          ->  Bitmap Index Scan on "ix_ja_jobs_trgm_gin"  (cost=0.00..363.62 rows=483 width=0) (actual time=4439.014..4439.014 rows=32 loops=1)
                                Index Cond: (("title")::"text" ~~* '
RYAN SHOWER%'::"text")
                                Buffers: shared hit=6835
Total runtime: 4439.427 ms

Here [1] it appears to be working even with two %.. But it's not for me.... 



Any ideia? lol 

Trying redoing the query with CTE as below:

WITH ja_jobs as
  (SELECT DISTINCT title
     FROM ja_jobs
    WHERE clientid = 31239  AND time_job > 1457826264
  )
SELECT title 
  FROM ja_jobs
 WHERE title ILIKE 'RYAN WER%'
 ORDER BY title
 LIMIT 10;


--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: index on ILIKE/LIKE - PostgreSQL 9.2

From
Lucas Possamai
Date:


Trying redoing the query with CTE as below:

WITH ja_jobs as
  (SELECT DISTINCT title
     FROM ja_jobs
    WHERE clientid = 31239  AND time_job > 1457826264
  )
SELECT title 
  FROM ja_jobs
 WHERE title ILIKE 'RYAN WER%'
 ORDER BY title
 LIMIT 10;

hmm.. still slow =(


and it's not hitting the index: (i had to change the clientid because the previous one was in cache)

Limit  (cost=93790.08..93790.09 rows=1 width=218) (actual time=284.293..284.308 rows=5 loops=1)
  Buffers: shared hit=42284
  CTE ja_jobs
    ->  HashAggregate  (cost=93774.31..93779.16 rows=485 width=20) (actual time=207.235..228.141 rows=16320 loops=1)
          Buffers: shared hit=42284
          ->  Bitmap Heap Scan on "ja_jobs"  (cost=882.98..93697.86 rows=30578 width=20) (actual time=21.942..133.380 rows=48472 loops=1)
                Recheck Cond: (("clientid" = 14635) AND ("time_job" > 1436731799))
                Buffers: shared hit=42284
                ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..875.34 rows=30578 width=0) (actual time=12.389..12.389 rows=48472 loops=1)
                      Index Cond: (("clientid" = 14635) AND ("time_job" > 1436731799))
                      Buffers: shared hit=243
  ->  Sort  (cost=10.92..10.93 rows=1 width=218) (actual time=284.289..284.293 rows=5 loops=1)
        Sort Key: "ja_jobs"."title"
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=42284
        ->  CTE Scan on "ja_jobs"  (cost=0.00..10.91 rows=1 width=218) (actual time=236.248..284.263 rows=5 loops=1)
              Filter: (("title")::"text" ~~* '%To Electrical%'::"text")
              Rows Removed by Filter: 16315
              Buffers: shared hit=42284
Total runtime: 287.633 ms 

Re: index on ILIKE/LIKE - PostgreSQL 9.2

From
Jan de Visser
Date:
On Wed, May 11, 2016 at 10:03 PM, Lucas Possamai <drum.lucas@gmail.com> wrote:


Trying redoing the query with CTE as below:

WITH ja_jobs as
  (SELECT DISTINCT title
     FROM ja_jobs
    WHERE clientid = 31239  AND time_job > 1457826264
  )
SELECT title 
  FROM ja_jobs
 WHERE title ILIKE 'RYAN WER%'
 ORDER BY title
 LIMIT 10;

hmm.. still slow =(


and it's not hitting the index: (i had to change the clientid because the previous one was in cache)

Limit  (cost=93790.08..93790.09 rows=1 width=218) (actual time=284.293..284.308 rows=5 loops=1)
  Buffers: shared hit=42284
  CTE ja_jobs
    ->  HashAggregate  (cost=93774.31..93779.16 rows=485 width=20) (actual time=207.235..228.141 rows=16320 loops=1)
          Buffers: shared hit=42284
          ->  Bitmap Heap Scan on "ja_jobs"  (cost=882.98..93697.86 rows=30578 width=20) (actual time=21.942..133.380 rows=48472 loops=1)
                Recheck Cond: (("clientid" = 14635) AND ("time_job" > 1436731799))
                Buffers: shared hit=42284
                ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..875.34 rows=30578 width=0) (actual time=12.389..12.389 rows=48472 loops=1)
                      Index Cond: (("clientid" = 14635) AND ("time_job" > 1436731799))
                      Buffers: shared hit=243
  ->  Sort  (cost=10.92..10.93 rows=1 width=218) (actual time=284.289..284.293 rows=5 loops=1)
        Sort Key: "ja_jobs"."title"
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=42284
        ->  CTE Scan on "ja_jobs"  (cost=0.00..10.91 rows=1 width=218) (actual time=236.248..284.263 rows=5 loops=1)
              Filter: (("title")::"text" ~~* '%To Electrical%'::"text")
              Rows Removed by Filter: 16315
              Buffers: shared hit=42284
Total runtime: 287.633 ms 



I think a GIN index can't be used for sorting.

Re: index on ILIKE/LIKE - PostgreSQL 9.2

From
Jeff Janes
Date:
On Wed, May 11, 2016 at 5:30 PM, Lucas Possamai <drum.lucas@gmail.com> wrote:
> Hi there!
>
> I've got a simple but slow query:
>
>>  SELECT DISTINCT title
>> FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
>> and clientid = 31239  AND time_job > 1457826264
>> order BY title
>> limit 10


>
> CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title
> gin_trgm_ops);
>
> Explain analyze after the index: (Yes, I ran the analyze)
>
>> Limit  (cost=389.91..389.91 rows=1 width=20) (actual
>> time=3720.511..3720.511 rows=0 loops=1)
>>   ->  Unique  (cost=389.91..389.91 rows=1 width=20) (actual
>> time=3720.507..3720.507 rows=0 loops=1)
>>         ->  Sort  (cost=389.91..389.91 rows=1 width=20) (actual
>> time=3720.505..3720.505 rows=0 loops=1)
>>               Sort Key: "title"
>>               Sort Method: quicksort  Memory: 25kB
>>               ->  Bitmap Heap Scan on "ja_jobs"  (cost=385.88..389.90
>> rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
>>                     Recheck Cond: (("clientid" = 31239) AND ("time_job" >
>> 1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))
>>                     Rows Removed by Index Recheck: 4
>>                     ->  BitmapAnd  (cost=385.88..385.88 rows=1 width=0)
>> (actual time=3720.469..3720.469 rows=0 loops=1)
>>                           ->  Bitmap Index Scan on "ix_jobs_client_times"
>> (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795
>> loops=1)
>>                                 Index Cond: (("clientid" = 31239) AND
>> ("time_job" > 1457826264))
>>                           ->  Bitmap Index Scan on "ix_ja_jobs_trgm_gin"
>> (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32
>> loops=1)
>>                                 Index Cond: (("title")::"text" ~~ '%RYAN
>> WER%'::"text")
>> Total runtime: 3720.653 ms


How big is the table?  The gin index?  shared_buffers?  RAM?  What
kind of IO system do you have, and how many other things were going on
with it?

It would be interesting to see the output of explain (analyze,
buffers) with track_io_timing turned on.

There have been improvements in this area since 9.2, you should
consider upgrading to at least 9.4.

Cheers,

Jeff


Re: index on ILIKE/LIKE - PostgreSQL 9.2

From
Lucas Possamai
Date:
 
How big is the table?  The gin index?  shared_buffers?  RAM?  What
kind of IO system do you have, and how many other things were going on
with it?

- Just a reminder that I'm not running these tests on my prod server.. I'm running on my test server. So the confs will be different

The table is 9GB big
The gin index is 400MB big
shared_buffers = 1536MB
RAM = 8 GB

just wanted to understand why the GIN index is not working, but it works here: https://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/
 

It would be interesting to see the output of explain (analyze,
buffers) with track_io_timing turned on.

explain analyze buffer with track_io_timing turned on:


Limit  (cost=93466.83..93466.83 rows=1 width=218) (actual time=24025.463..24025.478 rows=5 loops=1)
  Buffers: shared hit=8 read=42285
  I/O Timings: read=23599.672
  CTE ja_jobs
    ->  HashAggregate  (cost=93451.05..93455.90 rows=485 width=20) (actual time=23946.801..23967.660 rows=16320 loops=1)
          Buffers: shared hit=3 read=42285
          I/O Timings: read=23599.672
          ->  Bitmap Heap Scan on "ja_jobs"  (cost=877.70..93374.92 rows=30453 width=20) (actual time=161.372..23835.632 rows=48472 loops=1)
                Recheck Cond: (("clientid" = 14635) AND ("time_job" > 1436731799))
                Buffers: shared hit=3 read=42285
                I/O Timings: read=23599.672
                ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..870.09 rows=30453 width=0) (actual time=133.920..133.920 rows=48472 loops=1)
                      Index Cond: (("clientid" = 14635) AND ("time_job" > 1436731799))
                      Buffers: shared hit=3 read=244
                      I/O Timings: read=120.137
  ->  Sort  (cost=10.92..10.93 rows=1 width=218) (actual time=24025.457..24025.462 rows=5 loops=1)
        Sort Key: "ja_jobs"."title"
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=8 read=42285
        I/O Timings: read=23599.672
        ->  CTE Scan on "ja_jobs"  (cost=0.00..10.91 rows=1 width=218) (actual time=23977.095..24025.325 rows=5 loops=1)
              Filter: (("title")::"text" ~~* '%To Electrical%'::"text")
              Rows Removed by Filter: 16315
              Buffers: shared hit=3 read=42285
              I/O Timings: read=23599.672
Total runtime: 24028.551 ms

 

There have been improvements in this area since 9.2, you should
consider upgrading to at least 9.4.

 
Yep I know. The upgrade will happen, but I don't know when.

Re: index on ILIKE/LIKE - PostgreSQL 9.2

From
Andreas Joseph Krogh
Date:
På torsdag 12. mai 2016 kl. 02:30:33, skrev Lucas Possamai <drum.lucas@gmail.com>:
Hi there!
 
I've got a simple but slow query:
 
 SELECT DISTINCT title  
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239  AND time_job > 1457826264
order BY title
limit 10
 
Explain analyze: 
 
Limit  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)
  ->  Unique  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)
        ->  Sort  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)
              Sort Key: "title"
              Sort Method: quicksort  Memory: 25kB
              ->  Bitmap Heap Scan on "ja_jobs"  (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)
                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
                    Filter: (("title")::"text" ~~* '%
RYAN WER%'::"text")
                    Rows Removed by Filter: 791
                    ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)
                          Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Total runtime: 2746.879 ms
 
Then, I created a trgm index:
 
CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);
 
Explain analyze after the index: (Yes, I ran the analyze)
 
Limit  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)
  ->  Unique  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)
        ->  Sort  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)
              Sort Key: "title"
              Sort Method: quicksort  Memory: 25kB
              ->  Bitmap Heap Scan on "ja_jobs"  (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~ '%
RYAN WER%'::"text"))
                    Rows Removed by Index Recheck: 4
                    ->  BitmapAnd  (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1)
                          ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1)
                                Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
                          ->  Bitmap Index Scan on "ix_ja_jobs_trgm_gin"  (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1)
                                Index Cond: (("title")::"text" ~~ '%
RYAN WER%'::"text")
Total runtime: 3720.653 ms 
 
 
so.. the query is still slow.. 
Do you guys  know what can be done ? related to the ILIKE?
 
cheers
Lucas
 
It uses available indexes, but that isn't good enough.
 
Try including clientid in the index, using the btree_gin extension:
 
CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title gin_trgm_ops, clientid);
 
Note that if clientid is a bigint you have to cast the value to bigint for btree_gin to use it (note that this isn't necessary if you use a prepared statement):
 
SELECT DISTINCT title  
        FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
        and clientid = 31239::bigint  AND time_job > 1457826264
        order BY title
        limit 10
 
Also note that the index cannot ant won't be used for sorting. A bitmap-AND is also inevitable because GIN-indexes cannot be used for the '>' operator, so PG uses the ix_jobs_client_times btree-index and bigmap-ANDs the result.
 
Can you post you complete schema?
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: index on ILIKE/LIKE - PostgreSQL 9.2

From
Andreas Joseph Krogh
Date:
På torsdag 12. mai 2016 kl. 09:57:58, skrev Andreas Joseph Krogh <andreas@visena.com>:
På torsdag 12. mai 2016 kl. 02:30:33, skrev Lucas Possamai <drum.lucas@gmail.com>:
Hi there!
 
I've got a simple but slow query:
 
 SELECT DISTINCT title  
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239  AND time_job > 1457826264
order BY title
limit 10
 
Explain analyze: 
 
Limit  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)
  ->  Unique  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)
        ->  Sort  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)
              Sort Key: "title"
              Sort Method: quicksort  Memory: 25kB
              ->  Bitmap Heap Scan on "ja_jobs"  (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)
                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
                    Filter: (("title")::"text" ~~* '%
RYAN WER%'::"text")
                    Rows Removed by Filter: 791
                    ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)
                          Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Total runtime: 2746.879 ms
 
Then, I created a trgm index:
 
CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);
 
Explain analyze after the index: (Yes, I ran the analyze)
 
Limit  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)
  ->  Unique  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)
        ->  Sort  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)
              Sort Key: "title"
              Sort Method: quicksort  Memory: 25kB
              ->  Bitmap Heap Scan on "ja_jobs"  (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
                    Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~ '%
RYAN WER%'::"text"))
                    Rows Removed by Index Recheck: 4
                    ->  BitmapAnd  (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1)
                          ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1)
                                Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
                          ->  Bitmap Index Scan on "ix_ja_jobs_trgm_gin"  (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1)
                                Index Cond: (("title")::"text" ~~ '%
RYAN WER%'::"text")
Total runtime: 3720.653 ms 
 
 
so.. the query is still slow.. 
Do you guys  know what can be done ? related to the ILIKE?
 
cheers
Lucas
 
It uses available indexes, but that isn't good enough.
 
Try including clientid in the index, using the btree_gin extension:
 
CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title gin_trgm_ops, clientid);
 
Note that if clientid is a bigint you have to cast the value to bigint for btree_gin to use it (note that this isn't necessary if you use a prepared statement):
 
SELECT DISTINCT title  
        FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
        and clientid = 31239::bigint  AND time_job > 1457826264
        order BY title
        limit 10
 
Also note that the index cannot ant won't be used for sorting. A bitmap-AND is also inevitable because GIN-indexes cannot be used for the '>' operator, so PG uses the ix_jobs_client_times btree-index and bigmap-ANDs the result.
 
Can you post you complete schema?
 
I created this test:
 
create table ja_jobs(id bigserial primary key, title varchar not null, clientid bigint not null, time_job bigint not null);
CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title gin_trgm_ops, clientid);

--- insert some test-data
 
As you see, this uses the index (when casting clientid to bigint):
 
andreak=# explain analyze SELECT DISTINCT title  
FROM ja_jobs WHERE title ILIKE '%ras du%'
and clientid = 12::bigint AND time_job > 257826264
order BY title
limit 10;
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.43..8.44 rows=1 width=32) (actual time=0.033..0.034 rows=1 loops=1)
   ->  Unique  (cost=8.43..8.44 rows=1 width=32) (actual time=0.032..0.032 rows=1 loops=1)
         ->  Sort  (cost=8.43..8.43 rows=1 width=32) (actual time=0.032..0.032 rows=1 loops=1)
               Sort Key: title
               Sort Method: quicksort  Memory: 25kB
               ->  Bitmap Heap Scan on ja_jobs  (cost=7.20..8.42 rows=1 width=32) (actual time=0.025..0.025 rows=1 loops=1)
                     Recheck Cond: (((title)::text ~~* '%ras du%'::text) AND (clientid = '12'::bigint))
                     Filter: (time_job > 257826264)
                     Heap Blocks: exact=1
                     ->  Bitmap Index Scan on ix_ja_jobs_trgm_clientid_gin  (cost=0.00..7.20 rows=1 width=0) (actual time=0.016..0.016 rows=1 loops=1)
                           Index Cond: (((title)::text ~~* '%ras du%'::text) AND (clientid = '12'::bigint))
 Planning time: 0.169 ms
 Execution time: 0.061 ms
(13 rows)
 
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: index on ILIKE/LIKE - PostgreSQL 9.2

From
Andreas Joseph Krogh
Date:
På torsdag 12. mai 2016 kl. 10:05:01, skrev Andreas Joseph Krogh <andreas@visena.com>:
[snp]
I created this test:
 
create table ja_jobs(id bigserial primary key, title varchar not null, clientid bigint not null, time_job bigint not null);
CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title gin_trgm_ops, clientid);

--- insert some test-data
 
As you see, this uses the index (when casting clientid to bigint):
 
andreak=# explain analyze SELECT DISTINCT title  
FROM ja_jobs WHERE title ILIKE '%ras du%'
and clientid = 12::bigint AND time_job > 257826264
order BY title
limit 10;
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.43..8.44 rows=1 width=32) (actual time=0.033..0.034 rows=1 loops=1)
   ->  Unique  (cost=8.43..8.44 rows=1 width=32) (actual time=0.032..0.032 rows=1 loops=1)
         ->  Sort  (cost=8.43..8.43 rows=1 width=32) (actual time=0.032..0.032 rows=1 loops=1)
               Sort Key: title
               Sort Method: quicksort  Memory: 25kB
               ->  Bitmap Heap Scan on ja_jobs  (cost=7.20..8.42 rows=1 width=32) (actual time=0.025..0.025 rows=1 loops=1)
                     Recheck Cond: (((title)::text ~~* '%ras du%'::text) AND (clientid = '12'::bigint))
                     Filter: (time_job > 257826264)
                     Heap Blocks: exact=1
                     ->  Bitmap Index Scan on ix_ja_jobs_trgm_clientid_gin  (cost=0.00..7.20 rows=1 width=0) (actual time=0.016..0.016 rows=1 loops=1)
                           Index Cond: (((title)::text ~~* '%ras du%'::text) AND (clientid = '12'::bigint))
 Planning time: 0.169 ms
 Execution time: 0.061 ms
(13 rows)
 
Forgot to say, this is in PG-9.6 (master), but should work on previous versions.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: index on ILIKE/LIKE - PostgreSQL 9.2

From
Jeff Janes
Date:
On Wed, May 11, 2016 at 11:59 PM, Lucas Possamai <drum.lucas@gmail.com> wrote:
>
>>
>> How big is the table?  The gin index?  shared_buffers?  RAM?  What
>> kind of IO system do you have, and how many other things were going on
>> with it?
>
>
> - Just a reminder that I'm not running these tests on my prod server.. I'm
> running on my test server. So the confs will be different
>
>> The table is 9GB big
>> The gin index is 400MB big
>> shared_buffers = 1536MB
>> RAM = 8 GB

With those sizes, the gin index will probably be naturally kept mostly
in the file-system cache, if it is used regularly.  So the original
slowness of your first query is likely just a cold-cache problem.  Can
you generate a stream of realistic queries and see what it stabilizes
at?


> I just wanted to understand why the GIN index is not working, but it works
> here: https://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/

In your first email, the gin index did "work", according to the
execution plan.  It just wasn't as fast as you wanted.  In general,
the longer the query string is between the %%, the worse it will
perform (until version 9.6, and to a smaller degree even with 9.6).
But it still seems oddly slow to me, unless you have a cold-cache and
really bad (or overloaded) IO.

>
>>
>>
>> It would be interesting to see the output of explain (analyze,
>> buffers) with track_io_timing turned on.
>
>
> explain analyze buffer with track_io_timing turned on:
...

That is the wrong query.  The CTE (i.e. the WITH part) is an
optimization fence, so it can't use the gin index, simply because of
the way you query is written.  (I think Melvin suggested it because he
noticed that using the gin index actually slowed down the query, so he
wanted to force it to not be used.)

Cheers,

Jeff


Re: index on ILIKE/LIKE - PostgreSQL 9.2

From
Lucas Possamai
Date:
With those sizes, the gin index will probably be naturally kept mostly
in the file-system cache, if it is used regularly.  So the original
slowness of your first query is likely just a cold-cache problem.  Can
you generate a stream of realistic queries and see what it stabilizes
at?


> I just wanted to understand why the GIN index is not working, but it works
> here: https://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/

In your first email, the gin index did "work", according to the
execution plan.  It just wasn't as fast as you wanted.  In general,
the longer the query string is between the %%, the worse it will
perform (until version 9.6, and to a smaller degree even with 9.6).
But it still seems oddly slow to me, unless you have a cold-cache and
really bad (or overloaded) IO.

>
>>
>>
>> It would be interesting to see the output of explain (analyze,
>> buffers) with track_io_timing turned on.
>
>
> explain analyze buffer with track_io_timing turned on:
...

That is the wrong query.  The CTE (i.e. the WITH part) is an
optimization fence, so it can't use the gin index, simply because of
the way you query is written.  (I think Melvin suggested it because he
noticed that using the gin index actually slowed down the query, so he
wanted to force it to not be used.)


Oh ok. 

- Here is the explain analyze buffer with the original query I posted here with the gin index:

Query:
 explain (analyze, buffers)
 SELECT title  
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239  AND time_job > 1457826264
order BY title
limit 10

Explain analyze:
Limit  (cost=390.07..390.08 rows=1 width=20) (actual time=3945.263..3945.280 rows=4 loops=1)
  Buffers: shared hit=5956 read=10
  I/O Timings: read=60.323
  ->  Sort  (cost=390.07..390.08 rows=1 width=20) (actual time=3945.256..3945.260 rows=4 loops=1)
        Sort Key: "title"
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=5956 read=10
        I/O Timings: read=60.323
        ->  Bitmap Heap Scan on "ja_jobs"  (cost=386.05..390.06 rows=1 width=20) (actual time=3944.857..3945.127 rows=4 loops=1)
              Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~* '% WER%'::"text"))
              Buffers: shared hit=5951 read=10
              I/O Timings: read=60.323
              ->  BitmapAnd  (cost=386.05..386.05 rows=1 width=0) (actual time=3929.540..3929.540 rows=0 loops=1)
                    Buffers: shared hit=5950 read=7
                    I/O Timings: read=45.021
                    ->  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..50.16 rows=1660 width=0) (actual time=45.536..45.536 rows=795 loops=1)
                          Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
                          Buffers: shared hit=5 read=7
                          I/O Timings: read=45.021
                    ->  Bitmap Index Scan on "ix_jobs_trgm_gin"  (cost=0.00..335.64 rows=485 width=0) (actual time=3883.886..3883.886 rows=32 loops=1)
                          Index Cond: (("title")::"text" ~~* '%RYAN WER%'::"text")
                          Buffers: shared hit=5945
Total runtime: 3945.554 ms
 

Re: index on ILIKE/LIKE - PostgreSQL 9.2

From
Jeff Janes
Date:
On Thu, May 12, 2016 at 2:32 PM, Lucas Possamai <drum.lucas@gmail.com> wrote:

>>                     ->  Bitmap Index Scan on "ix_jobs_trgm_gin"
>> (cost=0.00..335.64 rows=485 width=0) (actual time=3883.886..3883.886 rows=32
>> loops=1)
>>                           Index Cond: (("title")::"text" ~~* '%RYAN
>> WER%'::"text")
>>                           Buffers: shared hit=5945
>> Total runtime: 3945.554 ms

So it is not cold-cache or IO problems, but a CPU problem.  Your query
only has 6 trigrams in it, and that is causing nearly 6000 buffer
hits.  I'm guessing the "  w" trigram is extremely common in your data
set.  Anyway, you have some huge posting lists there, and they were
not dealt with very well in 9.2 or 9.3.

Cheers,

Jeff


Re: index on ILIKE/LIKE - PostgreSQL 9.2

From
Lucas Possamai
Date:
Hi there,

The problem was solved by using lowercase

New index:
CREATE INDEX CONCURRENTLY  ON public.ja_jobs (clientid, lower(title) varchar_pattern_ops, time_job);

New query:

SELECT DISTINCT title
                FROM public.ja_jobs WHERE lower(title) LIKE lower('
RYAN WER')
                AND clientid = 31239 
                AND time_job > 1457826264
                order BY title
limit 10;


- Improvement of 1400%


Thanks

Lucas