Thread: planer picks a bad plan (seq-scan instead of index) when adding an additional join
planer picks a bad plan (seq-scan instead of index) when adding an additional join
From
"Thomas H."
Date:
hi list. as soon as i left-join an additional table, the query takes 24sec instead of 0.2sec, although the added fields have no impact on the resultset: -------------------- SELECT * FROM shop.dvds LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like '%superman re%' OR lower(dvd_edition) LIKE '%superman re%') -------------------- Hash Left Join (cost=8402.16..10733.16 rows=39900 width=1276) (actual time=260.712..260.722 rows=2 loops=1) Hash Cond: (dvds.dvd_mov_id = movies.mov_id) Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text)) -> Seq Scan on dvds (cost=0.00..1292.00 rows=39900 width=1062) (actual time=0.036..23.594 rows=20866 loops=1) -> Hash (cost=8194.93..8194.93 rows=82893 width=214) (actual time=168.121..168.121 rows=37417 loops=1) -> Seq Scan on movies (cost=0.00..8194.93 rows=82893 width=214) (actual time=0.024..131.401 rows=37417 loops=1) Total runtime: 264.193 ms 2 rows fetched -------------------- now, an additional table (containing 600k records) is added through a left join. all the sudden the query takes 24sec. although there are indices on both dvds.dvd_ean and data_soundmedia.sm_info_ean, the planer does not make use of the indices but rather chooses to do 2 seq-scans. -------------------- SELECT * FROM shop.dvds LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like '%superman re%' OR lower(dvd_edition) LIKE '%superman re%') -------------------- Hash Left Join (cost=317592.21..326882.92 rows=159086 width=1936) (actual time=21021.023..22242.253 rows=2 loops=1) Hash Cond: (dvds.dvd_mov_id = movies.mov_id) Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR (lower((dvds.dvd_name)::text) ~~ '%superman re%'::text) OR (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text)) -> Merge Left Join (cost=309190.05..313899.09 rows=159086 width=1722) (actual time=19876.552..21902.007 rows=20866 loops=1) Merge Cond: ("outer"."?column20?" = "inner"."?column29?") -> Sort (cost=23027.68..23127.43 rows=39900 width=1062) (actual time=507.886..520.143 rows=20866 loops=1) Sort Key: (dvds.dvd_ean)::text -> Seq Scan on dvds (cost=0.00..1292.00 rows=39900 width=1062) (actual time=0.047..100.415 rows=20866 loops=1) -> Sort (cost=286162.37..287781.38 rows=647601 width=660) (actual time=19336.011..20328.247 rows=646633 loops=1) Sort Key: (data_soundmedia.sm_info_ean)::text -> Seq Scan on data_soundmedia (cost=0.00..31080.01 rows=647601 width=660) (actual time=0.074..2834.831 rows=647601 loops=1) -> Hash (cost=8194.93..8194.93 rows=82893 width=214) (actual time=177.033..177.033 rows=37417 loops=1) -> Seq Scan on movies (cost=0.00..8194.93 rows=82893 width=214) (actual time=0.118..129.716 rows=37417 loops=1) Total runtime: 24419.939 ms 2 rows fetched -------------------- shouldn't the planer join the additional table *after* filtering? even if it does first joining then filtering, why isn't the existing index not used? pgsql is 8.2beta2 thanks, thomas
Have you run analyze on all the three tables since creating the database? What On Thu, 2006-11-09 at 02:31 +0100, Thomas H. wrote: > hi list. > > as soon as i left-join an additional table, the query takes 24sec instead of > 0.2sec, although the added fields have no impact on the resultset: > > -------------------- > SELECT * FROM shop.dvds > LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id > WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like > '%superman re%' OR lower(dvd_edition) LIKE '%superman re%') > -------------------- > Hash Left Join (cost=8402.16..10733.16 rows=39900 width=1276) (actual > time=260.712..260.722 rows=2 loops=1) > Hash Cond: (dvds.dvd_mov_id = movies.mov_id) > Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR > (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text)) > -> Seq Scan on dvds (cost=0.00..1292.00 rows=39900 width=1062) (actual > time=0.036..23.594 rows=20866 loops=1) > -> Hash (cost=8194.93..8194.93 rows=82893 width=214) (actual > time=168.121..168.121 rows=37417 loops=1) > -> Seq Scan on movies (cost=0.00..8194.93 rows=82893 width=214) > (actual time=0.024..131.401 rows=37417 loops=1) > Total runtime: 264.193 ms > 2 rows fetched > -------------------- > > now, an additional table (containing 600k records) is added through a left > join. all the sudden the query takes 24sec. although there are indices on > both dvds.dvd_ean and data_soundmedia.sm_info_ean, the planer does not make > use of the indices but rather chooses to do 2 seq-scans. > > -------------------- > SELECT * FROM shop.dvds > LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id > LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean > WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like > '%superman re%' OR lower(dvd_edition) LIKE '%superman re%') > -------------------- > Hash Left Join (cost=317592.21..326882.92 rows=159086 width=1936) (actual > time=21021.023..22242.253 rows=2 loops=1) > Hash Cond: (dvds.dvd_mov_id = movies.mov_id) > Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR > (lower((dvds.dvd_name)::text) ~~ '%superman re%'::text) OR > (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text)) > -> Merge Left Join (cost=309190.05..313899.09 rows=159086 width=1722) > (actual time=19876.552..21902.007 rows=20866 loops=1) > Merge Cond: ("outer"."?column20?" = "inner"."?column29?") > -> Sort (cost=23027.68..23127.43 rows=39900 width=1062) (actual > time=507.886..520.143 rows=20866 loops=1) > Sort Key: (dvds.dvd_ean)::text > -> Seq Scan on dvds (cost=0.00..1292.00 rows=39900 > width=1062) (actual time=0.047..100.415 rows=20866 loops=1) > -> Sort (cost=286162.37..287781.38 rows=647601 width=660) (actual > time=19336.011..20328.247 rows=646633 loops=1) > Sort Key: (data_soundmedia.sm_info_ean)::text > -> Seq Scan on data_soundmedia (cost=0.00..31080.01 > rows=647601 width=660) (actual time=0.074..2834.831 rows=647601 loops=1) > -> Hash (cost=8194.93..8194.93 rows=82893 width=214) (actual > time=177.033..177.033 rows=37417 loops=1) > -> Seq Scan on movies (cost=0.00..8194.93 rows=82893 width=214) > (actual time=0.118..129.716 rows=37417 loops=1) > Total runtime: 24419.939 ms > 2 rows fetched > -------------------- > > shouldn't the planer join the additional table *after* filtering? even if it > does first joining then filtering, why isn't the existing index not used? > > pgsql is 8.2beta2 > > thanks, > thomas > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match
> Have you run analyze on all the three tables since creating the > database? yes. even a forced ANALYZE FULL after the table loads: the tables were TRUNCATE'd, refilled and ANALYZE FULL'ed some minutes before the tests. there where no UPDATEs after the INSERTs... - thomas
Thomas H. wrote: > hi list. > > as soon as i left-join an additional table, the query takes 24sec > instead of 0.2sec, although the added fields have no impact on the > resultset: > > -------------------- > SELECT * FROM shop.dvds > LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id > WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like > '%superman re%' OR lower(dvd_edition) LIKE '%superman re%') > -------------------- > Hash Left Join (cost=8402.16..10733.16 rows=39900 width=1276) (actual > time=260.712..260.722 rows=2 loops=1) > Hash Cond: (dvds.dvd_mov_id = movies.mov_id) > Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR > (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text)) > -> Seq Scan on dvds (cost=0.00..1292.00 rows=39900 width=1062) > (actual time=0.036..23.594 rows=20866 loops=1) > -> Hash (cost=8194.93..8194.93 rows=82893 width=214) (actual > time=168.121..168.121 rows=37417 loops=1) > -> Seq Scan on movies (cost=0.00..8194.93 rows=82893 width=214) > (actual time=0.024..131.401 rows=37417 loops=1) > Total runtime: 264.193 ms > 2 rows fetched > -------------------- That's a pretty bad plan already, considering it does two seq-scans. I'm pretty sure you can get that query to return in something close to 1ms. Do you have indexes on any of dvds.dvd_mov_id, movies.mov_id, lower(mov_name), lower(dvd_edition) or lower(dvd_name)? I think that'd help. If you already do have those indices, you may be running out of memory; check for how much memory your postgres is set, the defaults are rather modest. > now, an additional table (containing 600k records) is added through a > left join. all the sudden the query takes 24sec. although there are > indices on both dvds.dvd_ean and data_soundmedia.sm_info_ean, the planer > does not make use of the indices but rather chooses to do 2 seq-scans. > > -------------------- > SELECT * FROM shop.dvds > LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id > LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean Make sure you have indexes on both sm_info_ean and dvd_ean. Regards, -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Thomas H. wrote: > -------------------- > SELECT * FROM shop.dvds > LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id > LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean > WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like > '%superman re%' OR lower(dvd_edition) LIKE '%superman re%') > -------------------- Try putting your conditions as part of the join: SELECT * FROM shop.dvds LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id AND ( lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like '%superman re%' OR lower(dvd_edition) LIKE '%superman re%' ) LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean I'd also be tempted to look at a tsearch2 setup for the word searches. -- Richard Huxton Archonet Ltd
I am admittedly speaking up somewhat late here, and may be completely off base, but it seems to me that the "LIKE" operationis almost always going to be a loser, performance-wise, when there is an initial wildcard, e.g. "%superman re%"will require a sequential scan, while "superman re%" would not (assuming proper indexes matching case and type). I'd suggest tsearch2, possibly, which uses GIST indexes and may perhaps be a better match for this sort of problem. HTH, Greg Williamson DBA GlobeXplorer LLC -----Original Message----- From: pgsql-general-owner@postgresql.org on behalf of Richard Huxton Sent: Thu 11/9/2006 1:22 AM To: Thomas H. Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] planer picks a bad plan (seq-scan instead of index) Thomas H. wrote: > -------------------- > SELECT * FROM shop.dvds > LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id > LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean > WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like > '%superman re%' OR lower(dvd_edition) LIKE '%superman re%') > -------------------- Try putting your conditions as part of the join: SELECT * FROM shop.dvds LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id AND ( lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like '%superman re%' OR lower(dvd_edition) LIKE '%superman re%' ) LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean I'd also be tempted to look at a tsearch2 setup for the word searches. -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ------------------------------------------------------- Click link below if it is SPAM gsw@globexplorer.com "https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=4552efed289104295495211&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1" !DSPAM:4552efed289104295495211! -------------------------------------------------------
Thomas H. wrote: > hi list. > > as soon as i left-join an additional table, the query takes 24sec > instead of 0.2sec, although the added fields have no impact on the > resultset: > > -------------------- > SELECT * FROM shop.dvds > LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id > WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like > '%superman re%' OR lower(dvd_edition) LIKE '%superman re%') > -------------------- > Hash Left Join (cost=8402.16..10733.16 rows=39900 width=1276) (actual > time=260.712..260.722 rows=2 loops=1) > Hash Cond: (dvds.dvd_mov_id = movies.mov_id) > Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR > (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text)) > -> Seq Scan on dvds (cost=0.00..1292.00 rows=39900 width=1062) > (actual time=0.036..23.594 rows=20866 loops=1) > -> Hash (cost=8194.93..8194.93 rows=82893 width=214) (actual > time=168.121..168.121 rows=37417 loops=1) > -> Seq Scan on movies (cost=0.00..8194.93 rows=82893 > width=214) (actual time=0.024..131.401 rows=37417 loops=1) > Total runtime: 264.193 ms > 2 rows fetched > -------------------- > 1. You MUST sequence scan dvds, as there is no way to do an index search on a like with % at the beginning. 2. You are asking for a left join on dvds, which means you want all records, so you must sequence scan dvds. The filters are all OR, so you can't say that a records is excluded until AFTER the join is completed. 3. The join estimates that 39900 rows will come out, but only 2 do, thats out by a large factor 10^4, which means that it's not going to join movies after the filter is applied. > now, an additional table (containing 600k records) is added through a > left join. all the sudden the query takes 24sec. although there are > indices on both dvds.dvd_ean and data_soundmedia.sm_info_ean, the > planer does not make use of the indices but rather chooses to do 2 > seq-scans. The 2 items from the first query are still true. You just have a lot more records to play with now by joining in a 3rd table. Which means there is no way to reduce the possible output set before the join. -> Sort (cost=286162.37..287781.38 rows=647601 width=660) (actual time=19336.011..20328.247 rows=646633 loops=1) This sort here is where nearly all of the 24 seconds goes. I am assuming at this point that the Merge Join is chosen because of the stats problem listed next; > Merge Left Join (cost=309190.05..313899.09 rows=159086 width=1722) (actual time=19876.552..21902.007 rows=20866 loops=1) This estimate is out by a factor of 10. I'd increase the statistics on the join columns to see if it decides to use a different join method. If the estimate for the Merge join is fixed to be closed, then it's likely an index scan would be chosen, 159000 is about 25% of the table. Assuming it's small on disk then it's faster to do a seq_scan than all the random I/O to read the index, then the heap to produce results. How big is data_soundmedia? Mb size, not rows. How much is your sort_mem/work_mem? > > -------------------- > SELECT * FROM shop.dvds > LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id > LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean > WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like > '%superman re%' OR lower(dvd_edition) LIKE '%superman re%') > -------------------- > Hash Left Join (cost=317592.21..326882.92 rows=159086 width=1936) > (actual time=21021.023..22242.253 rows=2 loops=1) > Hash Cond: (dvds.dvd_mov_id = movies.mov_id) > Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR > (lower((dvds.dvd_name)::text) ~~ '%superman re%'::text) OR > (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text)) > -> Merge Left Join (cost=309190.05..313899.09 rows=159086 > width=1722) (actual time=19876.552..21902.007 rows=20866 loops=1) > Merge Cond: ("outer"."?column20?" = "inner"."?column29?") > -> Sort (cost=23027.68..23127.43 rows=39900 width=1062) > (actual time=507.886..520.143 rows=20866 loops=1) > Sort Key: (dvds.dvd_ean)::text > -> Seq Scan on dvds (cost=0.00..1292.00 rows=39900 > width=1062) (actual time=0.047..100.415 rows=20866 loops=1) > -> Sort (cost=286162.37..287781.38 rows=647601 width=660) > (actual time=19336.011..20328.247 rows=646633 loops=1) > Sort Key: (data_soundmedia.sm_info_ean)::text > -> Seq Scan on data_soundmedia (cost=0.00..31080.01 > rows=647601 width=660) (actual time=0.074..2834.831 rows=647601 loops=1) > -> Hash (cost=8194.93..8194.93 rows=82893 width=214) (actual > time=177.033..177.033 rows=37417 loops=1) > -> Seq Scan on movies (cost=0.00..8194.93 rows=82893 > width=214) (actual time=0.118..129.716 rows=37417 loops=1) > Total runtime: 24419.939 ms > 2 rows fetched > -------------------- > > shouldn't the planer join the additional table *after* filtering? even > if it does first joining then filtering, why isn't the existing index > not used? I would fix the above anomalies before asking these questions. Regards Russell Smith
Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join
From
"Thomas H."
Date:
> 1. You MUST sequence scan dvds, as there is no way to do an index search > on a like with % at the beginning. > 2. You are asking for a left join on dvds, which means you want all > records, so you must sequence scan dvds. The filters are all OR, so you > can't say that a records is excluded until AFTER the join is completed. true, but thats fast (200ms). > 3. The join estimates that 39900 rows will come out, but only 2 do, thats > out by a large factor 10^4, which means that it's not going to join movies > after the filter is applied. 20866 is the total number of rows in the dvd table. the planer is now showing the accurate rate after another (auto)vacuum run. of course it can't know the estimate of a '%...' comparsion, so estimating the full result set is ok. >> now, an additional table (containing 600k records) is added through a >> left join. all the sudden the query takes 24sec. although there are >> indices on both dvds.dvd_ean and data_soundmedia.sm_info_ean, the planer >> does not make use of the indices but rather chooses to do 2 seq-scans. > The 2 items from the first query are still true. You just have a lot more > records to play with now by joining in a 3rd table. Which means there is > no way to reduce the possible output set before the join. well, under normal cases, the output set would be determined by index lookups (see bottom) > > -> Sort (cost=286162.37..287781.38 rows=647601 width=660) (actual > time=19336.011..20328.247 rows=646633 loops=1) > This sort here is where nearly all of the 24 seconds goes. I am assuming > at this point that the Merge Join is chosen because of the stats problem > listed next; exactly. but that sort shouldn't happen as there is an index on the join-field, and that is usualy pretty fast (~400ms), but not here... > > > Merge Left Join (cost=309190.05..313899.09 rows=159086 width=1722) > (actual time=19876.552..21902.007 rows=20866 loops=1) > This estimate is out by a factor of 10. I'd increase the statistics on > the join columns to see if it decides to use a different join method. > > If the estimate for the Merge join is fixed to be closed, then it's likely > an index scan would be chosen, 159000 is about 25% of the table. Assuming > it's small on disk then it's faster to do a seq_scan than all the random > I/O to read the index, then the heap to produce results. the discs are not the fastest, but there is plenty of free ram available. seq_page_cost is in its default state (1.0). should i raise this? > How big is data_soundmedia? Mb size, not rows. > How much is your sort_mem/work_mem? data_soundmedia is 195mb + 105mb for indices (6 fields indexed). work_mem = 30MB, sort_mem is undefined in 8.2's postgresql.conf what troubles me is that its only slow with this table (data_soundmedia). we have other tables (400k and 200k entries) that are joined with the same query in under 400ms total. these tables do have the exact same structure and indices defined, but in these joins the planer properly uses an Index Scann... if i force enable_seqscan = off, the planer makes use of the index, resulting in acceptable query speed: Nested Loop Left Join (cost=8402.16..257761.36 rows=83223 width=1067) (actual time=361.931..713.405 rows=2 loops=1) -> Hash Left Join (cost=8402.16..11292.37 rows=20873 width=407) (actual time=322.085..666.519 rows=2 loops=1) Hash Cond: (dvds.dvd_mov_id = movies.mov_id) Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR (lower((dvds.dvd_name)::text) ~~ '%superman re%'::text) OR (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text)) -> Seq Scan on dvds (cost=0.00..804.73 rows=20873 width=193) (actual time=11.781..329.672 rows=20866 loops=1) -> Hash (cost=8194.93..8194.93 rows=82893 width=214) (actual time=200.823..200.823 rows=37418 loops=1) -> Seq Scan on movies (cost=0.00..8194.93 rows=82893 width=214) (actual time=0.070..155.178 rows=37418 loops=1) -> Index Scan using data_soundmedia_info_ean_idx on data_soundmedia (cost=0.00..11.76 rows=4 width=660) (actual time=23.424..23.428 rows=1 loops=2) Index Cond: ((data_soundmedia.sm_info_ean)::text = (dvds.dvd_ean)::text) Total runtime: 716.988 ms 2 rows fetched (821 ms) could it be the index gets somehow corrupted? but on the other hand, if i do a TRUNCATE before loading new data, it should be rebuild anyway, shouldn't it? thanks, thomas
> Try putting your conditions as part of the join: > SELECT * FROM shop.dvds > LEFT JOIN > oldtables.movies > ON > mov_id = dvd_mov_id > AND ( > lower(mov_name) LIKE '%superman re%' > OR lower(dvd_name) like '%superman re%' > OR lower(dvd_edition) LIKE '%superman re%' > ) > LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean > unfortunately its getting optimized into the same plan :-) > I'd also be tempted to look at a tsearch2 setup for the word searches. tsearch2 doesn't work that well for exact matches (including special chars). but the culprit here isn't the '%...'%' seqscan, but rather the additional joined table (where no lookup except for the join-column takes place) that makes the query going from 200ms to 24sec. regards, thomas
Thomas H. wrote: >> Try putting your conditions as part of the join: >> SELECT * FROM shop.dvds >> LEFT JOIN >> oldtables.movies >> ON >> mov_id = dvd_mov_id >> AND ( >> lower(mov_name) LIKE '%superman re%' >> OR lower(dvd_name) like '%superman re%' >> OR lower(dvd_edition) LIKE '%superman re%' >> ) >> LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean > > unfortunately its getting optimized into the same plan :-) OK - in that case try explicit subqueries: SELECT ... FROM (SELECT * FROM shop.dvds LEFT JOIN shop.oldtables.movies WHERE lower(mov_name) LIKE ... ) AS bar LEFT JOIN shop.data_soundmedia >> I'd also be tempted to look at a tsearch2 setup for the word searches. > > > tsearch2 doesn't work that well for exact matches (including special > chars). but the culprit here isn't the '%...'%' seqscan, but rather the > additional joined table (where no lookup except for the join-column > takes place) that makes the query going from 200ms to 24sec. Agreed, but I'd still be inclined to let tsearch do a first filter then limit the results with LIKE. -- Richard Huxton Archonet Ltd
> OK - in that case try explicit subqueries: > > SELECT ... FROM > (SELECT * FROM shop.dvds > LEFT JOIN shop.oldtables.movies > WHERE lower(mov_name) LIKE ... > ) AS bar > LEFT JOIN shop.data_soundmedia same result, have tried this as well (22sec). it's the LEFT JOIN shop.data_soundmedia for which the planer picks a seqscan instead of index scan, no matter what... >>> I'd also be tempted to look at a tsearch2 setup for the word searches. >> >> >> tsearch2 doesn't work that well for exact matches (including special >> chars). but the culprit here isn't the '%...'%' seqscan, but rather the >> additional joined table (where no lookup except for the join-column takes >> place) that makes the query going from 200ms to 24sec. > > Agreed, but I'd still be inclined to let tsearch do a first filter then > limit the results with LIKE. would be a way to probably speed up the seqscan on shop.dvds that takes now 200ms. unfortunately, tsearch2 is broken for me in 8.2 (filling tsearch2 tvector columns crashes backend). but thats a different story :-) - thomas
Thomas H. wrote: >> OK - in that case try explicit subqueries: >> >> SELECT ... FROM >> (SELECT * FROM shop.dvds >> LEFT JOIN shop.oldtables.movies >> WHERE lower(mov_name) LIKE ... >> ) AS bar >> LEFT JOIN shop.data_soundmedia > > > same result, have tried this as well (22sec). it's the LEFT JOIN > shop.data_soundmedia for which the planer picks a seqscan instead of > index scan, no matter what... Two things to try: 1. "SET enable_seqscan = false" and see if that forces it. If not there's something very odd 2. Try adding a LIMIT 99 to the inner query (bar) so PG knows how many (few) rows will emerge. I'm guessing we're up against PG's poor estimate on the '%...%' filter. If you were getting 160,000 rows in the final result then a seq-scan might well be the way to go. The only workaround that I can think of (if we can't persuade the planner to cooperate) is to build a temp-table containing dvd_ean's for the first part of the query then analyse it and join against that. That way PG's row estimate will be accurate regardless of your text filtering. -- Richard Huxton Archonet Ltd
Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join
From
Tom Lane
Date:
"Thomas H." <me@alternize.com> writes: > SELECT * FROM shop.dvds > LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id > LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean > WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like > '%superman re%' OR lower(dvd_edition) LIKE '%superman re%') Um, what's the datatype of sm_info_ean and dvd_ean exactly? regards, tom lane
Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join
From
"Thomas H."
Date:
>> SELECT * FROM shop.dvds >> LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id >> LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean >> WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like >> '%superman re%' OR lower(dvd_edition) LIKE '%superman re%') > > Um, what's the datatype of sm_info_ean and dvd_ean exactly? varchar(15) and varchar(14) i can make them same width if that could help - just saw the same field on the other tables are limited to 14... regards, thomas
Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join
From
Tom Lane
Date:
"Thomas H." <me@alternize.com> writes: >> Um, what's the datatype of sm_info_ean and dvd_ean exactly? > varchar(15) and varchar(14) OK. I was wondering if you'd tried to use the new contrib/isn code and it was messing up the estimates somehow. Seems like a red herring. After looking more closely, I think the issue is that the size of the dvds/movies join is estimated at 39900 rows when it's really only 2, and this estimate discourages the planner from using a nestloop join of that join against data_soundmedia. We were thinking this was entirely because of poor estimation of the ~~ operators, but I suddenly realize that what's happening is that the size of the left join is being clamped to be at least the size of its left input --- that is, the planner is failing to distinguish JOIN/ON clauses (which can't suppress left-side rows) from WHERE clauses (which can). Per comment in set_joinrel_size_estimates: * Basically, we multiply size of Cartesian product by selectivity. * * If we are doing an outer join, take that into account: the output must * be at least as large as the non-nullable input. (Is there any chance * of being even smarter?) (XXX this is not really right, because it * assumes all the restriction clauses are join clauses; we should figure * pushed-down clauses separately.) This didn't matter a whole lot back when the planner couldn't reorder outer joins, but now that it can, it's more important that the estimates be accurate. I'm not sure if this is feasible to fix before 8.2, but I'll take a look. regards, tom lane
Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join
From
Tom Lane
Date:
I wrote: > This didn't matter a whole lot back when the planner couldn't reorder > outer joins, but now that it can, it's more important that the estimates > be accurate. > I'm not sure if this is feasible to fix before 8.2, but I'll take a > look. Actually, the changes were far more localized than I first feared. Please apply the attached patch to your copy and see what you get for your problem query. regards, tom lane
Attachment
Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join
From
"Thomas H."
Date:
>I wrote: >> This didn't matter a whole lot back when the planner couldn't reorder >> outer joins, but now that it can, it's more important that the estimates >> be accurate. > >> I'm not sure if this is feasible to fix before 8.2, but I'll take a >> look. > > Actually, the changes were far more localized than I first feared. > Please apply the attached patch to your copy and see what you get for > your problem query. thanks for the quick patch. unfortunately i'm stuck with the win32 version for now and haven't got the possibility to compile from source (yet)... maybe magnus can provide me a custom win32 executable of b3 that contains this patch [and the one for the xlog lockup]? best wishes, thomas