Re: Query performance on ILIKE with AND operator... - Mailing list pgsql-performance

From Federico Simonetti (Liveye)
Subject Re: Query performance on ILIKE with AND operator...
Date
Msg-id 434A862C.9010701@liveye.net
Whole thread Raw
In response to Re: Query performance on ILIKE with AND operator...  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Sorry but this does not seem to improve performance, it takes even more time, have a look at these data:

explain analyze SELECT * FROM ViewHttp
WHERE (vchost || txcontenttype) ilike '%www.%html%'
ORDER BY iDStart DESC, iSensorID DESC, iForensicID DESC, iSubID DESC
OFFSET 0 LIMIT 201


"Limit  (cost=22740.77..22741.28 rows=201 width=1250) (actual time=14234.000..14234.000 rows=201 loops=1)"
"  ->  Sort  (cost=22740.77..22741.89 rows=447 width=1250) (actual time=14234.000..14234.000 rows=201 loops=1)"
"        Sort Key: detail0009.idstart, detail0009.isensorid, detail0009.iforensicid, detail0009.isubid"
"        ->  Hash Join  (cost=13.13..22721.10 rows=447 width=1250) (actual time=469.000..12140.000 rows=54035 loops=1)"
"              Hash Cond: ("outer".isensorid = "inner".isensorid)"
"              ->  Nested Loop  (cost=0.00..22701.27 rows=447 width=1165) (actual time=469.000..11428.000 rows=54035 loops=1)"
"                    ->  Seq Scan on detail0009  (cost=0.00..20763.77 rows=26 width=1005) (actual time=453.000..6345.000 rows=54064 loops=1)"
"                          Filter: (((vchost)::text || txcontenttype) ~~* '%www.%html%'::text)"
"                    ->  Index Scan using connections_pkey on connections  (cost=0.00..74.25 rows=18 width=168) (actual time=0.073..0.077 rows=1 loops=54064)"
"                          Index Cond: (("outer".isensorid = connections.isensorid) AND ("outer".iforensicid = connections.iforensicid))"
"              ->  Hash  (cost=12.50..12.50 rows=250 width=101) (actual time=0.000..0.000 rows=0 loops=1)"
"                    ->  Seq Scan on sensors  (cost=0.00..12.50 rows=250 width=101) (actual time=0.000..0.000 rows=1 loops=1)"
"Total runtime: 14234.000 ms"


Thanks for your help anyway...

Federico






Tom Lane ha scritto:
"Federico Simonetti (Liveye)" <federico@liveye.net> writes: 
I'm encountering a quite strange performance problem.   
The problem stems from the horrid misestimation of the number of rows
fetched from detail0009:
 
"                    ->  Seq Scan on detail0009  (cost=0.00..20500.11
rows=26 width=1005) (actual time=453.000..5983.000 rows=53588 loops=1)"
"                          Filter: ((txcontenttype ~~* '%html%'::text)
AND ((vchost)::text ~~* '%www.%'::text))"   
When the planner is off by a factor of two thousand about the number of
rows involved, it's not very likely to produce a good plan :-(

In the OR case the rowcount estimate is 6334, which is somewhat closer
to reality (only about a factor of 10 off, looks like), and that changes
the plan to something that works acceptably well.

Assuming that this is web-log data, the prevalence of www and html
together is hardly surprising, but PG's statistical mechanisms will
never realize it.  Not sure about a good workaround.  Does it make
sense to combine the two conditions into one?(vchost || txcontenttype) ilike '%www.%html%'
		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

 

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Query performance on ILIKE with AND operator...
Next
From: Jon Brisbin
Date:
Subject: Performance on SUSE w/ reiserfs