Thread: index on ILIKE/LIKE - PostgreSQL 9.2
Hi there!
I've got a simple but slow query:
SELECT DISTINCT title
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239 AND time_job > 1457826264
order BY title
limit 10
Explain analyze:
Limit (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)
-> Unique (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)
-> Sort (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")
Rows Removed by Filter: 791
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Total runtime: 2746.879 ms
Then, I created a trgm index:
CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);
Explain analyze after the index: (Yes, I ran the analyze)
Limit (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)
-> Unique (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)
-> Sort (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))
Rows Removed by Index Recheck: 4
-> BitmapAnd (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1)
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
-> Bitmap Index Scan on "ix_ja_jobs_trgm_gin" (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1)
Index Cond: (("title")::"text" ~~ '%RYAN WER%'::"text")
Total runtime: 3720.653 ms
so.. the query is still slow..
Do you guys know what can be done ? related to the ILIKE?
cheers
Lucas
On Wed, May 11, 2016 at 8:30 PM, Lucas Possamai <drum.lucas@gmail.com> wrote:
Hi there!I've got a simple but slow query:SELECT DISTINCT title
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239 AND time_job > 1457826264
order BY title
limit 10Explain analyze:Limit (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)
-> Unique (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)
-> Sort (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")
Rows Removed by Filter: 791
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Total runtime: 2746.879 msThen, I created a trgm index:CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);Explain analyze after the index: (Yes, I ran the analyze)Limit (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)
-> Unique (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)
-> Sort (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))
Rows Removed by Index Recheck: 4
-> BitmapAnd (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1)
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
-> Bitmap Index Scan on "ix_ja_jobs_trgm_gin" (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1)
Index Cond: (("title")::"text" ~~ '%RYAN WER%'::"text")
Total runtime: 3720.653 msso.. the query is still slow..Do you guys know what can be done ? related to the ILIKE?cheersLucas
The main problem is WHERE title ILIKE '%RYAN WER%'
When you put a % on the left of the text, there is no way to optimize that, so yes, it will be slow.
If you can eliminate the leading percent and just have trailing, it will be much faster.
--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
The main problem is WHERE title ILIKE '%RYAN WER%'When you put a % on the left of the text, there is no way to optimize that, so yes, it will be slow.If you can eliminate the leading percent and just have trailing, it will be much faster.
Hmm.. yep.. I suppose I can do that.
But, taking the left % off, the query is still slow:
Limit (cost=418.57..418.58 rows=1 width=20) (actual time=4439.367..4439.381 rows=1 loops=1)
Buffers: shared hit=6847
-> Unique (cost=418.57..418.58 rows=1 width=20) (actual time=4439.363..4439.374 rows=1 loops=1)
Buffers: shared hit=6847
-> Sort (cost=418.57..418.58 rows=1 width=20) (actual time=4439.360..4439.365 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=6847
-> Bitmap Heap Scan on "ja_jobs" (cost=414.55..418.56 rows=1 width=20) (actual time=4439.312..4439.329 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~* 'RYAN SHOWER%'::"text"))
Buffers: shared hit=6847
-> BitmapAnd (cost=414.55..414.55 rows=1 width=0) (actual time=4439.280..4439.280 rows=0 loops=1)
Buffers: shared hit=6843
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..50.67 rows=1711 width=0) (actual time=0.142..0.142 rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Buffers: shared hit=8
-> Bitmap Index Scan on "ix_ja_jobs_trgm_gin" (cost=0.00..363.62 rows=483 width=0) (actual time=4439.014..4439.014 rows=32 loops=1)
Index Cond: (("title")::"text" ~~* 'RYAN SHOWER%'::"text")
Buffers: shared hit=6835
Total runtime: 4439.427 ms
Here [1] it appears to be working even with two %.. But it's not for me....
Any ideia? lol
On Wed, May 11, 2016 at 9:36 PM, Lucas Possamai <drum.lucas@gmail.com> wrote:
The main problem is WHERE title ILIKE '%RYAN WER%'When you put a % on the left of the text, there is no way to optimize that, so yes, it will be slow.If you can eliminate the leading percent and just have trailing, it will be much faster.Hmm.. yep.. I suppose I can do that.But, taking the left % off, the query is still slow:Limit (cost=418.57..418.58 rows=1 width=20) (actual time=4439.367..4439.381 rows=1 loops=1)
Buffers: shared hit=6847
-> Unique (cost=418.57..418.58 rows=1 width=20) (actual time=4439.363..4439.374 rows=1 loops=1)
Buffers: shared hit=6847
-> Sort (cost=418.57..418.58 rows=1 width=20) (actual time=4439.360..4439.365 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=6847
-> Bitmap Heap Scan on "ja_jobs" (cost=414.55..418.56 rows=1 width=20) (actual time=4439.312..4439.329 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~* 'RYAN SHOWER%'::"text"))
Buffers: shared hit=6847
-> BitmapAnd (cost=414.55..414.55 rows=1 width=0) (actual time=4439.280..4439.280 rows=0 loops=1)
Buffers: shared hit=6843
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..50.67 rows=1711 width=0) (actual time=0.142..0.142 rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Buffers: shared hit=8
-> Bitmap Index Scan on "ix_ja_jobs_trgm_gin" (cost=0.00..363.62 rows=483 width=0) (actual time=4439.014..4439.014 rows=32 loops=1)
Index Cond: (("title")::"text" ~~* 'RYAN SHOWER%'::"text")
Buffers: shared hit=6835
Total runtime: 4439.427 msHere [1] it appears to be working even with two %.. But it's not for me....Any ideia? lol
Trying redoing the query with CTE as below:
WITH ja_jobs as
(SELECT DISTINCT title
FROM ja_jobs
WHERE clientid = 31239 AND time_job > 1457826264
)
SELECT title
FROM ja_jobs
WHERE title ILIKE 'RYAN WER%'
ORDER BY title
LIMIT 10;
--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
Trying redoing the query with CTE as below:
WITH ja_jobs as
(SELECT DISTINCT title
FROM ja_jobs
WHERE clientid = 31239 AND time_job > 1457826264
)
SELECT title
FROM ja_jobs
WHERE title ILIKE 'RYAN WER%'
ORDER BY title
LIMIT 10;
hmm.. still slow =(
and it's not hitting the index: (i had to change the clientid because the previous one was in cache)
Limit (cost=93790.08..93790.09 rows=1 width=218) (actual time=284.293..284.308 rows=5 loops=1)
Buffers: shared hit=42284
CTE ja_jobs
-> HashAggregate (cost=93774.31..93779.16 rows=485 width=20) (actual time=207.235..228.141 rows=16320 loops=1)
Buffers: shared hit=42284
-> Bitmap Heap Scan on "ja_jobs" (cost=882.98..93697.86 rows=30578 width=20) (actual time=21.942..133.380 rows=48472 loops=1)
Recheck Cond: (("clientid" = 14635) AND ("time_job" > 1436731799))
Buffers: shared hit=42284
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..875.34 rows=30578 width=0) (actual time=12.389..12.389 rows=48472 loops=1)
Index Cond: (("clientid" = 14635) AND ("time_job" > 1436731799))
Buffers: shared hit=243
-> Sort (cost=10.92..10.93 rows=1 width=218) (actual time=284.289..284.293 rows=5 loops=1)
Sort Key: "ja_jobs"."title"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=42284
-> CTE Scan on "ja_jobs" (cost=0.00..10.91 rows=1 width=218) (actual time=236.248..284.263 rows=5 loops=1)
Filter: (("title")::"text" ~~* '%To Electrical%'::"text")
Rows Removed by Filter: 16315
Buffers: shared hit=42284
Total runtime: 287.633 ms
On Wed, May 11, 2016 at 10:03 PM, Lucas Possamai <drum.lucas@gmail.com> wrote:
Trying redoing the query with CTE as below:
WITH ja_jobs as
(SELECT DISTINCT title
FROM ja_jobs
WHERE clientid = 31239 AND time_job > 1457826264
)
SELECT title
FROM ja_jobs
WHERE title ILIKE 'RYAN WER%'
ORDER BY title
LIMIT 10;hmm.. still slow =(and it's not hitting the index: (i had to change the clientid because the previous one was in cache)Limit (cost=93790.08..93790.09 rows=1 width=218) (actual time=284.293..284.308 rows=5 loops=1)
Buffers: shared hit=42284
CTE ja_jobs
-> HashAggregate (cost=93774.31..93779.16 rows=485 width=20) (actual time=207.235..228.141 rows=16320 loops=1)
Buffers: shared hit=42284
-> Bitmap Heap Scan on "ja_jobs" (cost=882.98..93697.86 rows=30578 width=20) (actual time=21.942..133.380 rows=48472 loops=1)
Recheck Cond: (("clientid" = 14635) AND ("time_job" > 1436731799))
Buffers: shared hit=42284
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..875.34 rows=30578 width=0) (actual time=12.389..12.389 rows=48472 loops=1)
Index Cond: (("clientid" = 14635) AND ("time_job" > 1436731799))
Buffers: shared hit=243
-> Sort (cost=10.92..10.93 rows=1 width=218) (actual time=284.289..284.293 rows=5 loops=1)
Sort Key: "ja_jobs"."title"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=42284
-> CTE Scan on "ja_jobs" (cost=0.00..10.91 rows=1 width=218) (actual time=236.248..284.263 rows=5 loops=1)
Filter: (("title")::"text" ~~* '%To Electrical%'::"text")
Rows Removed by Filter: 16315
Buffers: shared hit=42284
Total runtime: 287.633 ms
I think a GIN index can't be used for sorting.
On Wed, May 11, 2016 at 5:30 PM, Lucas Possamai <drum.lucas@gmail.com> wrote: > Hi there! > > I've got a simple but slow query: > >> SELECT DISTINCT title >> FROM ja_jobs WHERE title ILIKE '%RYAN WER%' >> and clientid = 31239 AND time_job > 1457826264 >> order BY title >> limit 10 > > CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title > gin_trgm_ops); > > Explain analyze after the index: (Yes, I ran the analyze) > >> Limit (cost=389.91..389.91 rows=1 width=20) (actual >> time=3720.511..3720.511 rows=0 loops=1) >> -> Unique (cost=389.91..389.91 rows=1 width=20) (actual >> time=3720.507..3720.507 rows=0 loops=1) >> -> Sort (cost=389.91..389.91 rows=1 width=20) (actual >> time=3720.505..3720.505 rows=0 loops=1) >> Sort Key: "title" >> Sort Method: quicksort Memory: 25kB >> -> Bitmap Heap Scan on "ja_jobs" (cost=385.88..389.90 >> rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1) >> Recheck Cond: (("clientid" = 31239) AND ("time_job" > >> 1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text")) >> Rows Removed by Index Recheck: 4 >> -> BitmapAnd (cost=385.88..385.88 rows=1 width=0) >> (actual time=3720.469..3720.469 rows=0 loops=1) >> -> Bitmap Index Scan on "ix_jobs_client_times" >> (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 >> loops=1) >> Index Cond: (("clientid" = 31239) AND >> ("time_job" > 1457826264)) >> -> Bitmap Index Scan on "ix_ja_jobs_trgm_gin" >> (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 >> loops=1) >> Index Cond: (("title")::"text" ~~ '%RYAN >> WER%'::"text") >> Total runtime: 3720.653 ms How big is the table? The gin index? shared_buffers? RAM? What kind of IO system do you have, and how many other things were going on with it? It would be interesting to see the output of explain (analyze, buffers) with track_io_timing turned on. There have been improvements in this area since 9.2, you should consider upgrading to at least 9.4. Cheers, Jeff
kind of IO system do you have, and how many other things were going onHow big is the table? The gin index? shared_buffers? RAM? What
with it?
- Just a reminder that I'm not running these tests on my prod server.. I'm running on my test server. So the confs will be different
The table is 9GB big
The gin index is 400MB big
shared_buffers = 1536MB
RAM = 8 GB
I just wanted to understand why the GIN index is not working, but it works here: https://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/
It would be interesting to see the output of explain (analyze,
buffers) with track_io_timing turned on.
explain analyze buffer with track_io_timing turned on:
Limit (cost=93466.83..93466.83 rows=1 width=218) (actual time=24025.463..24025.478 rows=5 loops=1)
Buffers: shared hit=8 read=42285
I/O Timings: read=23599.672
CTE ja_jobs
-> HashAggregate (cost=93451.05..93455.90 rows=485 width=20) (actual time=23946.801..23967.660 rows=16320 loops=1)
Buffers: shared hit=3 read=42285
I/O Timings: read=23599.672
-> Bitmap Heap Scan on "ja_jobs" (cost=877.70..93374.92 rows=30453 width=20) (actual time=161.372..23835.632 rows=48472 loops=1)
Recheck Cond: (("clientid" = 14635) AND ("time_job" > 1436731799))
Buffers: shared hit=3 read=42285
I/O Timings: read=23599.672
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..870.09 rows=30453 width=0) (actual time=133.920..133.920 rows=48472 loops=1)
Index Cond: (("clientid" = 14635) AND ("time_job" > 1436731799))
Buffers: shared hit=3 read=244
I/O Timings: read=120.137
-> Sort (cost=10.92..10.93 rows=1 width=218) (actual time=24025.457..24025.462 rows=5 loops=1)
Sort Key: "ja_jobs"."title"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=8 read=42285
I/O Timings: read=23599.672
-> CTE Scan on "ja_jobs" (cost=0.00..10.91 rows=1 width=218) (actual time=23977.095..24025.325 rows=5 loops=1)
Filter: (("title")::"text" ~~* '%To Electrical%'::"text")
Rows Removed by Filter: 16315
Buffers: shared hit=3 read=42285
I/O Timings: read=23599.672
Total runtime: 24028.551 ms
There have been improvements in this area since 9.2, you should
consider upgrading to at least 9.4.
På torsdag 12. mai 2016 kl. 02:30:33, skrev Lucas Possamai <drum.lucas@gmail.com>:
Hi there!I've got a simple but slow query:SELECT DISTINCT title
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239 AND time_job > 1457826264
order BY title
limit 10Explain analyze:Limit (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)
-> Unique (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)
-> Sort (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")
Rows Removed by Filter: 791
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Total runtime: 2746.879 msThen, I created a trgm index:CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);Explain analyze after the index: (Yes, I ran the analyze)Limit (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)
-> Unique (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)
-> Sort (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))
Rows Removed by Index Recheck: 4
-> BitmapAnd (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1)
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
-> Bitmap Index Scan on "ix_ja_jobs_trgm_gin" (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1)
Index Cond: (("title")::"text" ~~ '%RYAN WER%'::"text")
Total runtime: 3720.653 msso.. the query is still slow..Do you guys know what can be done ? related to the ILIKE?cheersLucas
It uses available indexes, but that isn't good enough.
Try including clientid in the index, using the btree_gin extension:
CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title gin_trgm_ops, clientid);
Note that if clientid is a bigint you have to cast the value to bigint for btree_gin to use it (note that this isn't necessary if you use a prepared statement):
SELECT DISTINCT title
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239::bigint AND time_job > 1457826264
order BY title
limit 10
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239::bigint AND time_job > 1457826264
order BY title
limit 10
Also note that the index cannot ant won't be used for sorting. A bitmap-AND is also inevitable because GIN-indexes cannot be used for the '>' operator, so PG uses the ix_jobs_client_times btree-index and bigmap-ANDs the result.
Can you post you complete schema?
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
På torsdag 12. mai 2016 kl. 09:57:58, skrev Andreas Joseph Krogh <andreas@visena.com>:
På torsdag 12. mai 2016 kl. 02:30:33, skrev Lucas Possamai <drum.lucas@gmail.com>:Hi there!I've got a simple but slow query:SELECT DISTINCT title
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239 AND time_job > 1457826264
order BY title
limit 10Explain analyze:Limit (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)
-> Unique (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)
-> Sort (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")
Rows Removed by Filter: 791
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Total runtime: 2746.879 msThen, I created a trgm index:CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);Explain analyze after the index: (Yes, I ran the analyze)Limit (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)
-> Unique (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)
-> Sort (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))
Rows Removed by Index Recheck: 4
-> BitmapAnd (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1)
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
-> Bitmap Index Scan on "ix_ja_jobs_trgm_gin" (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1)
Index Cond: (("title")::"text" ~~ '%RYAN WER%'::"text")
Total runtime: 3720.653 msso.. the query is still slow..Do you guys know what can be done ? related to the ILIKE?cheersLucasIt uses available indexes, but that isn't good enough.Try including clientid in the index, using the btree_gin extension:CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title gin_trgm_ops, clientid);Note that if clientid is a bigint you have to cast the value to bigint for btree_gin to use it (note that this isn't necessary if you use a prepared statement):SELECT DISTINCT title
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239::bigint AND time_job > 1457826264
order BY title
limit 10Also note that the index cannot ant won't be used for sorting. A bitmap-AND is also inevitable because GIN-indexes cannot be used for the '>' operator, so PG uses the ix_jobs_client_times btree-index and bigmap-ANDs the result.Can you post you complete schema?
I created this test:
create table ja_jobs(id bigserial primary key, title varchar not null, clientid bigint not null, time_job bigint not null);
CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title gin_trgm_ops, clientid);
--- insert some test-data
As you see, this uses the index (when casting clientid to bigint):
andreak=# explain analyze SELECT DISTINCT title
FROM ja_jobs WHERE title ILIKE '%ras du%'
and clientid = 12::bigint AND time_job > 257826264
order BY title
limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=8.43..8.44 rows=1 width=32) (actual time=0.033..0.034 rows=1 loops=1)
-> Unique (cost=8.43..8.44 rows=1 width=32) (actual time=0.032..0.032 rows=1 loops=1)
-> Sort (cost=8.43..8.43 rows=1 width=32) (actual time=0.032..0.032 rows=1 loops=1)
Sort Key: title
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on ja_jobs (cost=7.20..8.42 rows=1 width=32) (actual time=0.025..0.025 rows=1 loops=1)
Recheck Cond: (((title)::text ~~* '%ras du%'::text) AND (clientid = '12'::bigint))
Filter: (time_job > 257826264)
Heap Blocks: exact=1
-> Bitmap Index Scan on ix_ja_jobs_trgm_clientid_gin (cost=0.00..7.20 rows=1 width=0) (actual time=0.016..0.016 rows=1 loops=1)
Index Cond: (((title)::text ~~* '%ras du%'::text) AND (clientid = '12'::bigint))
Planning time: 0.169 ms
Execution time: 0.061 ms
(13 rows)
FROM ja_jobs WHERE title ILIKE '%ras du%'
and clientid = 12::bigint AND time_job > 257826264
order BY title
limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=8.43..8.44 rows=1 width=32) (actual time=0.033..0.034 rows=1 loops=1)
-> Unique (cost=8.43..8.44 rows=1 width=32) (actual time=0.032..0.032 rows=1 loops=1)
-> Sort (cost=8.43..8.43 rows=1 width=32) (actual time=0.032..0.032 rows=1 loops=1)
Sort Key: title
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on ja_jobs (cost=7.20..8.42 rows=1 width=32) (actual time=0.025..0.025 rows=1 loops=1)
Recheck Cond: (((title)::text ~~* '%ras du%'::text) AND (clientid = '12'::bigint))
Filter: (time_job > 257826264)
Heap Blocks: exact=1
-> Bitmap Index Scan on ix_ja_jobs_trgm_clientid_gin (cost=0.00..7.20 rows=1 width=0) (actual time=0.016..0.016 rows=1 loops=1)
Index Cond: (((title)::text ~~* '%ras du%'::text) AND (clientid = '12'::bigint))
Planning time: 0.169 ms
Execution time: 0.061 ms
(13 rows)
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
På torsdag 12. mai 2016 kl. 10:05:01, skrev Andreas Joseph Krogh <andreas@visena.com>:
[snp]I created this test:create table ja_jobs(id bigserial primary key, title varchar not null, clientid bigint not null, time_job bigint not null);CREATE INDEX ix_ja_jobs_trgm_clientid_gin ON public.ja_jobs USING gin (title gin_trgm_ops, clientid);
--- insert some test-dataAs you see, this uses the index (when casting clientid to bigint):andreak=# explain analyze SELECT DISTINCT title
FROM ja_jobs WHERE title ILIKE '%ras du%'
and clientid = 12::bigint AND time_job > 257826264
order BY title
limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=8.43..8.44 rows=1 width=32) (actual time=0.033..0.034 rows=1 loops=1)
-> Unique (cost=8.43..8.44 rows=1 width=32) (actual time=0.032..0.032 rows=1 loops=1)
-> Sort (cost=8.43..8.43 rows=1 width=32) (actual time=0.032..0.032 rows=1 loops=1)
Sort Key: title
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on ja_jobs (cost=7.20..8.42 rows=1 width=32) (actual time=0.025..0.025 rows=1 loops=1)
Recheck Cond: (((title)::text ~~* '%ras du%'::text) AND (clientid = '12'::bigint))
Filter: (time_job > 257826264)
Heap Blocks: exact=1
-> Bitmap Index Scan on ix_ja_jobs_trgm_clientid_gin (cost=0.00..7.20 rows=1 width=0) (actual time=0.016..0.016 rows=1 loops=1)
Index Cond: (((title)::text ~~* '%ras du%'::text) AND (clientid = '12'::bigint))
Planning time: 0.169 ms
Execution time: 0.061 ms
(13 rows)
Forgot to say, this is in PG-9.6 (master), but should work on previous versions.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
On Wed, May 11, 2016 at 11:59 PM, Lucas Possamai <drum.lucas@gmail.com> wrote: > >> >> How big is the table? The gin index? shared_buffers? RAM? What >> kind of IO system do you have, and how many other things were going on >> with it? > > > - Just a reminder that I'm not running these tests on my prod server.. I'm > running on my test server. So the confs will be different > >> The table is 9GB big >> The gin index is 400MB big >> shared_buffers = 1536MB >> RAM = 8 GB With those sizes, the gin index will probably be naturally kept mostly in the file-system cache, if it is used regularly. So the original slowness of your first query is likely just a cold-cache problem. Can you generate a stream of realistic queries and see what it stabilizes at? > I just wanted to understand why the GIN index is not working, but it works > here: https://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/ In your first email, the gin index did "work", according to the execution plan. It just wasn't as fast as you wanted. In general, the longer the query string is between the %%, the worse it will perform (until version 9.6, and to a smaller degree even with 9.6). But it still seems oddly slow to me, unless you have a cold-cache and really bad (or overloaded) IO. > >> >> >> It would be interesting to see the output of explain (analyze, >> buffers) with track_io_timing turned on. > > > explain analyze buffer with track_io_timing turned on: ... That is the wrong query. The CTE (i.e. the WITH part) is an optimization fence, so it can't use the gin index, simply because of the way you query is written. (I think Melvin suggested it because he noticed that using the gin index actually slowed down the query, so he wanted to force it to not be used.) Cheers, Jeff
With those sizes, the gin index will probably be naturally kept mostly
in the file-system cache, if it is used regularly. So the original
slowness of your first query is likely just a cold-cache problem. Can
you generate a stream of realistic queries and see what it stabilizes
at?
> I just wanted to understand why the GIN index is not working, but it works
> here: https://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/
In your first email, the gin index did "work", according to the
execution plan. It just wasn't as fast as you wanted. In general,
the longer the query string is between the %%, the worse it will
perform (until version 9.6, and to a smaller degree even with 9.6).
But it still seems oddly slow to me, unless you have a cold-cache and
really bad (or overloaded) IO.
>
>>
>>
>> It would be interesting to see the output of explain (analyze,
>> buffers) with track_io_timing turned on.
>
>
> explain analyze buffer with track_io_timing turned on:
...
That is the wrong query. The CTE (i.e. the WITH part) is an
optimization fence, so it can't use the gin index, simply because of
the way you query is written. (I think Melvin suggested it because he
noticed that using the gin index actually slowed down the query, so he
wanted to force it to not be used.)
Oh ok.
- Here is the explain analyze buffer with the original query I posted here with the gin index:
Query:
explain (analyze, buffers)
SELECT title
FROM ja_jobs WHERE title ILIKE '%RYAN WER%'
and clientid = 31239 AND time_job > 1457826264
order BY title
limit 10
Explain analyze:
Limit (cost=390.07..390.08 rows=1 width=20) (actual time=3945.263..3945.280 rows=4 loops=1)
Buffers: shared hit=5956 read=10
I/O Timings: read=60.323
-> Sort (cost=390.07..390.08 rows=1 width=20) (actual time=3945.256..3945.260 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=5956 read=10
I/O Timings: read=60.323
-> Bitmap Heap Scan on "ja_jobs" (cost=386.05..390.06 rows=1 width=20) (actual time=3944.857..3945.127 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~* '% WER%'::"text"))
Buffers: shared hit=5951 read=10
I/O Timings: read=60.323
-> BitmapAnd (cost=386.05..386.05 rows=1 width=0) (actual time=3929.540..3929.540 rows=0 loops=1)
Buffers: shared hit=5950 read=7
I/O Timings: read=45.021
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..50.16 rows=1660 width=0) (actual time=45.536..45.536 rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Buffers: shared hit=5 read=7
I/O Timings: read=45.021
-> Bitmap Index Scan on "ix_jobs_trgm_gin" (cost=0.00..335.64 rows=485 width=0) (actual time=3883.886..3883.886 rows=32 loops=1)
Index Cond: (("title")::"text" ~~* '%RYAN WER%'::"text")
Buffers: shared hit=5945
Total runtime: 3945.554 ms
On Thu, May 12, 2016 at 2:32 PM, Lucas Possamai <drum.lucas@gmail.com> wrote: >> -> Bitmap Index Scan on "ix_jobs_trgm_gin" >> (cost=0.00..335.64 rows=485 width=0) (actual time=3883.886..3883.886 rows=32 >> loops=1) >> Index Cond: (("title")::"text" ~~* '%RYAN >> WER%'::"text") >> Buffers: shared hit=5945 >> Total runtime: 3945.554 ms So it is not cold-cache or IO problems, but a CPU problem. Your query only has 6 trigrams in it, and that is causing nearly 6000 buffer hits. I'm guessing the " w" trigram is extremely common in your data set. Anyway, you have some huge posting lists there, and they were not dealt with very well in 9.2 or 9.3. Cheers, Jeff
Hi there,
The problem was solved by using lowercase
New index:
CREATE INDEX CONCURRENTLY ON public.ja_jobs (clientid, lower(title) varchar_pattern_ops, time_job);
New query:
SELECT DISTINCT title
FROM public.ja_jobs WHERE lower(title) LIKE lower('RYAN WER')
AND clientid = 31239
AND time_job > 1457826264
order BY title
limit 10;
- Improvement of 1400%
Thanks
Lucas