Thread: planer picks a bad plan (seq-scan instead of index) when adding an additional join

planer picks a bad plan (seq-scan instead of index) when adding an additional join

From

"Thomas H."

Date:

08 November 2006, 21:32:00

hi list.

as soon as i left-join an additional table, the query takes 24sec instead of
0.2sec, although the added fields have no impact on the resultset:

--------------------
SELECT * FROM shop.dvds
LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id
WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like
'%superman re%' OR lower(dvd_edition) LIKE '%superman re%')
--------------------
Hash Left Join  (cost=8402.16..10733.16 rows=39900 width=1276) (actual
time=260.712..260.722 rows=2 loops=1)
  Hash Cond: (dvds.dvd_mov_id = movies.mov_id)
  Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR
(lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text))
  ->  Seq Scan on dvds  (cost=0.00..1292.00 rows=39900 width=1062) (actual
time=0.036..23.594 rows=20866 loops=1)
  ->  Hash  (cost=8194.93..8194.93 rows=82893 width=214) (actual
time=168.121..168.121 rows=37417 loops=1)
        ->  Seq Scan on movies  (cost=0.00..8194.93 rows=82893 width=214)
(actual time=0.024..131.401 rows=37417 loops=1)
Total runtime: 264.193 ms
2 rows fetched
--------------------

now, an additional table (containing 600k records) is added through a left
join. all the sudden the query takes 24sec. although there are indices on
both dvds.dvd_ean and data_soundmedia.sm_info_ean, the planer does not make
use of the indices but rather chooses to do 2 seq-scans.

--------------------
SELECT * FROM shop.dvds
LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id
LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean
WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like
'%superman re%' OR lower(dvd_edition) LIKE '%superman re%')
--------------------
Hash Left Join  (cost=317592.21..326882.92 rows=159086 width=1936) (actual
time=21021.023..22242.253 rows=2 loops=1)
  Hash Cond: (dvds.dvd_mov_id = movies.mov_id)
  Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR
(lower((dvds.dvd_name)::text) ~~ '%superman re%'::text) OR
(lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text))
  ->  Merge Left Join  (cost=309190.05..313899.09 rows=159086 width=1722)
(actual time=19876.552..21902.007 rows=20866 loops=1)
        Merge Cond: ("outer"."?column20?" = "inner"."?column29?")
        ->  Sort  (cost=23027.68..23127.43 rows=39900 width=1062) (actual
time=507.886..520.143 rows=20866 loops=1)
              Sort Key: (dvds.dvd_ean)::text
              ->  Seq Scan on dvds  (cost=0.00..1292.00 rows=39900
width=1062) (actual time=0.047..100.415 rows=20866 loops=1)
        ->  Sort  (cost=286162.37..287781.38 rows=647601 width=660) (actual
time=19336.011..20328.247 rows=646633 loops=1)
              Sort Key: (data_soundmedia.sm_info_ean)::text
              ->  Seq Scan on data_soundmedia  (cost=0.00..31080.01
rows=647601 width=660) (actual time=0.074..2834.831 rows=647601 loops=1)
  ->  Hash  (cost=8194.93..8194.93 rows=82893 width=214) (actual
time=177.033..177.033 rows=37417 loops=1)
        ->  Seq Scan on movies  (cost=0.00..8194.93 rows=82893 width=214)
(actual time=0.118..129.716 rows=37417 loops=1)
Total runtime: 24419.939 ms
2 rows fetched
--------------------

shouldn't the planer join the additional table *after* filtering? even if it
does first joining then filtering, why isn't the existing index not used?

pgsql is 8.2beta2

thanks,
thomas

Re: planer picks a bad plan (seq-scan instead of index)

From

mike

Date:

09 November 2006, 00:02:57

Have you run analyze on all the three tables since creating the
database?

What On Thu, 2006-11-09 at 02:31 +0100, Thomas H. wrote:
> hi list.
>
> as soon as i left-join an additional table, the query takes 24sec instead of
> 0.2sec, although the added fields have no impact on the resultset:
>
> --------------------
> SELECT * FROM shop.dvds
> LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id
> WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like
> '%superman re%' OR lower(dvd_edition) LIKE '%superman re%')
> --------------------
> Hash Left Join  (cost=8402.16..10733.16 rows=39900 width=1276) (actual
> time=260.712..260.722 rows=2 loops=1)
>   Hash Cond: (dvds.dvd_mov_id = movies.mov_id)
>   Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR
> (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text))
>   ->  Seq Scan on dvds  (cost=0.00..1292.00 rows=39900 width=1062) (actual
> time=0.036..23.594 rows=20866 loops=1)
>   ->  Hash  (cost=8194.93..8194.93 rows=82893 width=214) (actual
> time=168.121..168.121 rows=37417 loops=1)
>         ->  Seq Scan on movies  (cost=0.00..8194.93 rows=82893 width=214)
> (actual time=0.024..131.401 rows=37417 loops=1)
> Total runtime: 264.193 ms
> 2 rows fetched
> --------------------
>
> now, an additional table (containing 600k records) is added through a left
> join. all the sudden the query takes 24sec. although there are indices on
> both dvds.dvd_ean and data_soundmedia.sm_info_ean, the planer does not make
> use of the indices but rather chooses to do 2 seq-scans.
>
> --------------------
> SELECT * FROM shop.dvds
> LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id
> LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean
> WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like
> '%superman re%' OR lower(dvd_edition) LIKE '%superman re%')
> --------------------
> Hash Left Join  (cost=317592.21..326882.92 rows=159086 width=1936) (actual
> time=21021.023..22242.253 rows=2 loops=1)
>   Hash Cond: (dvds.dvd_mov_id = movies.mov_id)
>   Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR
> (lower((dvds.dvd_name)::text) ~~ '%superman re%'::text) OR
> (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text))
>   ->  Merge Left Join  (cost=309190.05..313899.09 rows=159086 width=1722)
> (actual time=19876.552..21902.007 rows=20866 loops=1)
>         Merge Cond: ("outer"."?column20?" = "inner"."?column29?")
>         ->  Sort  (cost=23027.68..23127.43 rows=39900 width=1062) (actual
> time=507.886..520.143 rows=20866 loops=1)
>               Sort Key: (dvds.dvd_ean)::text
>               ->  Seq Scan on dvds  (cost=0.00..1292.00 rows=39900
> width=1062) (actual time=0.047..100.415 rows=20866 loops=1)
>         ->  Sort  (cost=286162.37..287781.38 rows=647601 width=660) (actual
> time=19336.011..20328.247 rows=646633 loops=1)
>               Sort Key: (data_soundmedia.sm_info_ean)::text
>               ->  Seq Scan on data_soundmedia  (cost=0.00..31080.01
> rows=647601 width=660) (actual time=0.074..2834.831 rows=647601 loops=1)
>   ->  Hash  (cost=8194.93..8194.93 rows=82893 width=214) (actual
> time=177.033..177.033 rows=37417 loops=1)
>         ->  Seq Scan on movies  (cost=0.00..8194.93 rows=82893 width=214)
> (actual time=0.118..129.716 rows=37417 loops=1)
> Total runtime: 24419.939 ms
> 2 rows fetched
> --------------------
>
> shouldn't the planer join the additional table *after* filtering? even if it
> does first joining then filtering, why isn't the existing index not used?
>
> pgsql is 8.2beta2
>
> thanks,
> thomas
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

Re: planer picks a bad plan (seq-scan instead of index)

From

"Thomas H."

Date:

09 November 2006, 03:54:18

> Have you run analyze on all the three tables since creating the
> database?

yes. even a forced ANALYZE FULL after the table loads: the tables were
TRUNCATE'd, refilled and ANALYZE FULL'ed some minutes before the tests.
there where no UPDATEs after the INSERTs...

- thomas

Re: planer picks a bad plan (seq-scan instead of index)

From

Alban Hertroys

Date:

09 November 2006, 05:06:44

Thomas H. wrote:
> hi list.
>
> as soon as i left-join an additional table, the query takes 24sec
> instead of 0.2sec, although the added fields have no impact on the
> resultset:
>
> --------------------
> SELECT * FROM shop.dvds
> LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id
> WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like
> '%superman re%' OR lower(dvd_edition) LIKE '%superman re%')
> --------------------
> Hash Left Join  (cost=8402.16..10733.16 rows=39900 width=1276) (actual
> time=260.712..260.722 rows=2 loops=1)
>  Hash Cond: (dvds.dvd_mov_id = movies.mov_id)
>  Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR
> (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text))
>  ->  Seq Scan on dvds  (cost=0.00..1292.00 rows=39900 width=1062)
> (actual time=0.036..23.594 rows=20866 loops=1)
>  ->  Hash  (cost=8194.93..8194.93 rows=82893 width=214) (actual
> time=168.121..168.121 rows=37417 loops=1)
>        ->  Seq Scan on movies  (cost=0.00..8194.93 rows=82893 width=214)
> (actual time=0.024..131.401 rows=37417 loops=1)
> Total runtime: 264.193 ms
> 2 rows fetched
> --------------------

That's a pretty bad plan already, considering it does two seq-scans. I'm
pretty sure you can get that query to return in something close to 1ms.

Do you have indexes on any of dvds.dvd_mov_id, movies.mov_id,
lower(mov_name), lower(dvd_edition) or lower(dvd_name)?

I think that'd help.

If you already do have those indices, you may be running out of memory;
check for how much memory your postgres is set, the defaults are rather
modest.

> now, an additional table (containing 600k records) is added through a
> left join. all the sudden the query takes 24sec. although there are
> indices on both dvds.dvd_ean and data_soundmedia.sm_info_ean, the planer
> does not make use of the indices but rather chooses to do 2 seq-scans.
>
> --------------------
> SELECT * FROM shop.dvds
> LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id
> LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean

Make sure you have indexes on both sm_info_ean and dvd_ean.

Regards,
--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //

Re: planer picks a bad plan (seq-scan instead of index)

From

Richard Huxton

Date:

09 November 2006, 05:20:21

Thomas H. wrote:
> --------------------
> SELECT * FROM shop.dvds
> LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id
> LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean
> WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like
> '%superman re%' OR lower(dvd_edition) LIKE '%superman re%')
> --------------------

Try putting your conditions as part of the join:
SELECT * FROM shop.dvds
LEFT JOIN
   oldtables.movies
ON
   mov_id = dvd_mov_id
   AND (
     lower(mov_name) LIKE '%superman re%'
     OR lower(dvd_name) like '%superman re%'
     OR lower(dvd_edition) LIKE '%superman re%'
   )
LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean

I'd also be tempted to look at a tsearch2 setup for the word searches.
--
   Richard Huxton
   Archonet Ltd

Re: planer picks a bad plan (seq-scan instead of index)

From

"Gregory S. Williamson"

Date:

09 November 2006, 06:34:00

I am admittedly speaking up somewhat late here, and may be completely off base, but it seems to me that the "LIKE"
operationis almost always going to be a loser, performance-wise, when there is an initial wildcard, e.g. "%superman
re%"will require a sequential scan, while "superman re%" would not (assuming proper indexes matching case and type). 

I'd suggest tsearch2, possibly, which uses GIST indexes and may perhaps be a better match for this sort of problem.

HTH,

Greg Williamson
DBA
GlobeXplorer LLC

-----Original Message-----
From:    pgsql-general-owner@postgresql.org on behalf of Richard Huxton
Sent:    Thu 11/9/2006 1:22 AM
To:    Thomas H.
Cc:    pgsql-general@postgresql.org
Subject:    Re: [GENERAL] planer picks a bad plan (seq-scan instead of index)

Thomas H. wrote:
> --------------------
> SELECT * FROM shop.dvds
> LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id
> LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean
> WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like
> '%superman re%' OR lower(dvd_edition) LIKE '%superman re%')
> --------------------

Try putting your conditions as part of the join:
SELECT * FROM shop.dvds
LEFT JOIN
   oldtables.movies
ON
   mov_id = dvd_mov_id
   AND (
     lower(mov_name) LIKE '%superman re%'
     OR lower(dvd_name) like '%superman re%'
     OR lower(dvd_edition) LIKE '%superman re%'
   )
LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean

I'd also be tempted to look at a tsearch2 setup for the word searches.
--
   Richard Huxton
   Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match


-------------------------------------------------------
Click link below if it is SPAM gsw@globexplorer.com

"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=4552efed289104295495211&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1"
!DSPAM:4552efed289104295495211!
-------------------------------------------------------

Re: planer picks a bad plan (seq-scan instead of index)

From

Russell Smith

Date:

09 November 2006, 07:31:34

Thomas H. wrote:
> hi list.
>
> as soon as i left-join an additional table, the query takes 24sec
> instead of 0.2sec, although the added fields have no impact on the
> resultset:
>
> --------------------
> SELECT * FROM shop.dvds
> LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id
> WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like
> '%superman re%' OR lower(dvd_edition) LIKE '%superman re%')
> --------------------
> Hash Left Join  (cost=8402.16..10733.16 rows=39900 width=1276) (actual
> time=260.712..260.722 rows=2 loops=1)
>  Hash Cond: (dvds.dvd_mov_id = movies.mov_id)
>  Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR
> (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text))
>  ->  Seq Scan on dvds  (cost=0.00..1292.00 rows=39900 width=1062)
> (actual time=0.036..23.594 rows=20866 loops=1)
>  ->  Hash  (cost=8194.93..8194.93 rows=82893 width=214) (actual
> time=168.121..168.121 rows=37417 loops=1)
>        ->  Seq Scan on movies  (cost=0.00..8194.93 rows=82893
> width=214) (actual time=0.024..131.401 rows=37417 loops=1)
> Total runtime: 264.193 ms
> 2 rows fetched
> --------------------
>
1. You MUST sequence scan dvds, as there is no way to do an index search
on a like with % at the beginning.
2. You are asking for a left join on dvds, which means you want all
records, so you must sequence scan dvds.  The filters are all OR, so you
can't say that a records is excluded until AFTER the join is completed.
3. The join estimates that 39900 rows will come out, but only 2 do,
thats out by a large factor 10^4, which means that it's not going to
join movies after the filter is applied.

> now, an additional table (containing 600k records) is added through a
> left join. all the sudden the query takes 24sec. although there are
> indices on both dvds.dvd_ean and data_soundmedia.sm_info_ean, the
> planer does not make use of the indices but rather chooses to do 2
> seq-scans.
The 2 items from the first query are still true.  You just have a lot
more records to play with now by joining in a 3rd table.  Which means
there is no way to reduce the possible output set before the join.

->  Sort  (cost=286162.37..287781.38 rows=647601 width=660) (actual
time=19336.011..20328.247 rows=646633 loops=1)
This sort here is where nearly all of the 24 seconds goes. I am assuming
at this point that the Merge Join is chosen because of the stats problem
listed next;

 >  Merge Left Join  (cost=309190.05..313899.09 rows=159086 width=1722)
(actual time=19876.552..21902.007 rows=20866 loops=1)
This estimate is out by a factor of 10.  I'd increase the statistics on
the join columns to see if it decides to use a different join method.

If the estimate for the Merge join is fixed to be closed, then it's
likely an index scan would be chosen, 159000 is about 25% of the table.
Assuming it's small on disk then it's faster to do a seq_scan than all
the random I/O to read the index, then the heap to produce results.

How big is data_soundmedia? Mb size, not rows.
How much is your sort_mem/work_mem?
>
> --------------------
> SELECT * FROM shop.dvds
> LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id
> LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean
> WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like
> '%superman re%' OR lower(dvd_edition) LIKE '%superman re%')
> --------------------
> Hash Left Join  (cost=317592.21..326882.92 rows=159086 width=1936)
> (actual time=21021.023..22242.253 rows=2 loops=1)
>  Hash Cond: (dvds.dvd_mov_id = movies.mov_id)
>  Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR
> (lower((dvds.dvd_name)::text) ~~ '%superman re%'::text) OR
> (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text))
>  ->  Merge Left Join  (cost=309190.05..313899.09 rows=159086
> width=1722) (actual time=19876.552..21902.007 rows=20866 loops=1)
>        Merge Cond: ("outer"."?column20?" = "inner"."?column29?")
>        ->  Sort  (cost=23027.68..23127.43 rows=39900 width=1062)
> (actual time=507.886..520.143 rows=20866 loops=1)
>              Sort Key: (dvds.dvd_ean)::text
>              ->  Seq Scan on dvds  (cost=0.00..1292.00 rows=39900
> width=1062) (actual time=0.047..100.415 rows=20866 loops=1)
>        ->  Sort  (cost=286162.37..287781.38 rows=647601 width=660)
> (actual time=19336.011..20328.247 rows=646633 loops=1)
>              Sort Key: (data_soundmedia.sm_info_ean)::text
>              ->  Seq Scan on data_soundmedia  (cost=0.00..31080.01
> rows=647601 width=660) (actual time=0.074..2834.831 rows=647601 loops=1)
>  ->  Hash  (cost=8194.93..8194.93 rows=82893 width=214) (actual
> time=177.033..177.033 rows=37417 loops=1)
>        ->  Seq Scan on movies  (cost=0.00..8194.93 rows=82893
> width=214) (actual time=0.118..129.716 rows=37417 loops=1)
> Total runtime: 24419.939 ms
> 2 rows fetched
> --------------------
>
> shouldn't the planer join the additional table *after* filtering? even
> if it does first joining then filtering, why isn't the existing index
> not used?
I would fix the above anomalies before asking these questions.

Regards

Russell Smith

Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join

From

"Thomas H."

Date:

09 November 2006, 07:59:57

> 1. You MUST sequence scan dvds, as there is no way to do an index search
> on a like with % at the beginning.
> 2. You are asking for a left join on dvds, which means you want all
> records, so you must sequence scan dvds.  The filters are all OR, so you
> can't say that a records is excluded until AFTER the join is completed.

true, but thats fast (200ms).

> 3. The join estimates that 39900 rows will come out, but only 2 do, thats
> out by a large factor 10^4, which means that it's not going to join movies
> after the filter is applied.


20866 is the total number of rows in the dvd table. the planer is now
showing the accurate rate after another (auto)vacuum run. of course it can't
know the estimate of a '%...' comparsion, so estimating the full result set
is ok.

>> now, an additional table (containing 600k records) is added through a
>> left join. all the sudden the query takes 24sec. although there are
>> indices on both dvds.dvd_ean and data_soundmedia.sm_info_ean, the planer
>> does not make use of the indices but rather chooses to do 2 seq-scans.
> The 2 items from the first query are still true.  You just have a lot more
> records to play with now by joining in a 3rd table.  Which means there is
> no way to reduce the possible output set before the join.


well, under normal cases, the output set would be determined by index
lookups (see bottom)

>
> ->  Sort  (cost=286162.37..287781.38 rows=647601 width=660) (actual
> time=19336.011..20328.247 rows=646633 loops=1)
> This sort here is where nearly all of the 24 seconds goes. I am assuming
> at this point that the Merge Join is chosen because of the stats problem
> listed next;


exactly. but that sort shouldn't happen as there is an index on the
join-field, and that is usualy pretty fast (~400ms), but not here...

>
> >  Merge Left Join  (cost=309190.05..313899.09 rows=159086 width=1722)
> (actual time=19876.552..21902.007 rows=20866 loops=1)
> This estimate is out by a factor of 10.  I'd increase the statistics on
> the join columns to see if it decides to use a different join method.
>
> If the estimate for the Merge join is fixed to be closed, then it's likely
> an index scan would be chosen, 159000 is about 25% of the table.  Assuming
> it's small on disk then it's faster to do a seq_scan than all the random
> I/O to read the index, then the heap to produce results.


the discs are not the fastest, but there is plenty of free ram available.
seq_page_cost is in its default state (1.0). should i raise this?

> How big is data_soundmedia? Mb size, not rows.
> How much is your sort_mem/work_mem?

data_soundmedia is 195mb + 105mb for indices (6 fields indexed).
work_mem = 30MB, sort_mem is undefined in 8.2's postgresql.conf

what troubles me is that its only slow with this table (data_soundmedia). we
have other tables (400k and 200k entries) that are joined with the same
query in under 400ms total. these tables do have the exact same structure
and indices defined, but in these joins the planer properly uses an Index
Scann...

if i force enable_seqscan = off, the planer makes use of the index,
resulting in acceptable query speed:

Nested Loop Left Join  (cost=8402.16..257761.36 rows=83223 width=1067)
(actual time=361.931..713.405 rows=2 loops=1)
  ->  Hash Left Join  (cost=8402.16..11292.37 rows=20873 width=407) (actual
time=322.085..666.519 rows=2 loops=1)
        Hash Cond: (dvds.dvd_mov_id = movies.mov_id)
        Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text)
OR (lower((dvds.dvd_name)::text) ~~ '%superman re%'::text) OR
(lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text))
        ->  Seq Scan on dvds  (cost=0.00..804.73 rows=20873 width=193)
(actual time=11.781..329.672 rows=20866 loops=1)
        ->  Hash  (cost=8194.93..8194.93 rows=82893 width=214) (actual
time=200.823..200.823 rows=37418 loops=1)
              ->  Seq Scan on movies  (cost=0.00..8194.93 rows=82893
width=214) (actual time=0.070..155.178 rows=37418 loops=1)
  ->  Index Scan using data_soundmedia_info_ean_idx on data_soundmedia
(cost=0.00..11.76 rows=4 width=660) (actual time=23.424..23.428 rows=1
loops=2)
        Index Cond: ((data_soundmedia.sm_info_ean)::text =
(dvds.dvd_ean)::text)
Total runtime: 716.988 ms

2 rows fetched (821 ms)


could it be the index gets somehow corrupted? but on the other hand, if i do
a TRUNCATE before loading new data, it should be rebuild anyway, shouldn't
it?

thanks,
thomas

Re: planer picks a bad plan (seq-scan instead of index)

From

"Thomas H."

Date:

09 November 2006, 08:02:46

> Try putting your conditions as part of the join:
> SELECT * FROM shop.dvds
> LEFT JOIN
>   oldtables.movies
> ON
>   mov_id = dvd_mov_id
>   AND (
>     lower(mov_name) LIKE '%superman re%'
>     OR lower(dvd_name) like '%superman re%'
>     OR lower(dvd_edition) LIKE '%superman re%'
>   )
> LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean
>


unfortunately its getting optimized into the same plan :-)


> I'd also be tempted to look at a tsearch2 setup for the word searches.


tsearch2 doesn't work that well for exact matches (including special chars).
but the culprit here isn't the '%...'%' seqscan, but rather the additional
joined table (where no lookup except for the join-column takes place) that
makes the query going from 200ms to 24sec.

regards,
thomas

Re: planer picks a bad plan (seq-scan instead of index)

From

Richard Huxton

Date:

09 November 2006, 08:10:38

Thomas H. wrote:
>> Try putting your conditions as part of the join:
>> SELECT * FROM shop.dvds
>> LEFT JOIN
>>   oldtables.movies
>> ON
>>   mov_id = dvd_mov_id
>>   AND (
>>     lower(mov_name) LIKE '%superman re%'
>>     OR lower(dvd_name) like '%superman re%'
>>     OR lower(dvd_edition) LIKE '%superman re%'
>>   )
>> LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean
>
> unfortunately its getting optimized into the same plan :-)

OK - in that case try explicit subqueries:

SELECT ... FROM
(SELECT * FROM shop.dvds
  LEFT JOIN shop.oldtables.movies
  WHERE lower(mov_name) LIKE ...
) AS bar
LEFT JOIN shop.data_soundmedia

>> I'd also be tempted to look at a tsearch2 setup for the word searches.
>
>
> tsearch2 doesn't work that well for exact matches (including special
> chars). but the culprit here isn't the '%...'%' seqscan, but rather the
> additional joined table (where no lookup except for the join-column
> takes place) that makes the query going from 200ms to 24sec.

Agreed, but I'd still be inclined to let tsearch do a first filter then
limit the results with LIKE.

--
   Richard Huxton
   Archonet Ltd

Re: planer picks a bad plan (seq-scan instead of index)

From

"Thomas H."

Date:

09 November 2006, 08:42:21

> OK - in that case try explicit subqueries:
>
> SELECT ... FROM
> (SELECT * FROM shop.dvds
>  LEFT JOIN shop.oldtables.movies
>  WHERE lower(mov_name) LIKE ...
> ) AS bar
> LEFT JOIN shop.data_soundmedia


same result, have tried this as well (22sec). it's the LEFT JOIN
shop.data_soundmedia for which the planer picks a seqscan instead of index
scan, no matter what...

>>> I'd also be tempted to look at a tsearch2 setup for the word searches.
>>
>>
>> tsearch2 doesn't work that well for exact matches (including special
>> chars). but the culprit here isn't the '%...'%' seqscan, but rather the
>> additional joined table (where no lookup except for the join-column takes
>> place) that makes the query going from 200ms to 24sec.
>
> Agreed, but I'd still be inclined to let tsearch do a first filter then
> limit the results with LIKE.


would be a way to probably speed up the seqscan on shop.dvds that takes now
200ms. unfortunately, tsearch2 is broken for me in 8.2 (filling tsearch2
tvector columns crashes backend). but thats a different story :-)

- thomas

Re: planer picks a bad plan (seq-scan instead of index)

From

Richard Huxton

Date:

09 November 2006, 09:16:15

Thomas H. wrote:
>> OK - in that case try explicit subqueries:
>>
>> SELECT ... FROM
>> (SELECT * FROM shop.dvds
>>  LEFT JOIN shop.oldtables.movies
>>  WHERE lower(mov_name) LIKE ...
>> ) AS bar
>> LEFT JOIN shop.data_soundmedia
>
>
> same result, have tried this as well (22sec). it's the LEFT JOIN
> shop.data_soundmedia for which the planer picks a seqscan instead of
> index scan, no matter what...

Two things to try:
1. "SET enable_seqscan = false" and see if that forces it. If not
there's something very odd
2. Try adding a LIMIT 99 to the inner query (bar) so PG knows how many
(few) rows will emerge.

I'm guessing we're up against PG's poor estimate on the '%...%' filter.
If you were getting 160,000 rows in the final result then a seq-scan
might well be the way to go.

The only workaround that I can think of (if we can't persuade the
planner to cooperate) is to build a temp-table containing dvd_ean's for
the first part of the query then analyse it and join against that. That
way PG's row estimate will be accurate regardless of your text filtering.

--
   Richard Huxton
   Archonet Ltd

Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join

From

Tom Lane

Date:

09 November 2006, 19:52:53

"Thomas H." <me@alternize.com> writes:
> SELECT * FROM shop.dvds
> LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id
> LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean
> WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like
> '%superman re%' OR lower(dvd_edition) LIKE '%superman re%')

Um, what's the datatype of sm_info_ean and dvd_ean exactly?

            regards, tom lane

Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join

From

"Thomas H."

Date:

09 November 2006, 20:04:15

>> SELECT * FROM shop.dvds
>> LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id
>> LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean
>> WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like
>> '%superman re%' OR lower(dvd_edition) LIKE '%superman re%')
>
> Um, what's the datatype of sm_info_ean and dvd_ean exactly?

varchar(15) and varchar(14)

i can make them same width if that could help - just saw the same field on
the other tables are limited to 14...

regards,
thomas

Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join

From

Tom Lane

Date:

09 November 2006, 20:40:10

"Thomas H." <me@alternize.com> writes:
>> Um, what's the datatype of sm_info_ean and dvd_ean exactly?

> varchar(15) and varchar(14)

OK.  I was wondering if you'd tried to use the new contrib/isn code and
it was messing up the estimates somehow.  Seems like a red herring.

After looking more closely, I think the issue is that the size of the
dvds/movies join is estimated at 39900 rows when it's really only 2,
and this estimate discourages the planner from using a nestloop join of
that join against data_soundmedia.  We were thinking this was entirely
because of poor estimation of the ~~ operators, but I suddenly realize
that what's happening is that the size of the left join is being clamped
to be at least the size of its left input --- that is, the planner is
failing to distinguish JOIN/ON clauses (which can't suppress left-side
rows) from WHERE clauses (which can).  Per comment in
set_joinrel_size_estimates:

     * Basically, we multiply size of Cartesian product by selectivity.
     *
     * If we are doing an outer join, take that into account: the output must
     * be at least as large as the non-nullable input.  (Is there any chance
     * of being even smarter?)  (XXX this is not really right, because it
     * assumes all the restriction clauses are join clauses; we should figure
     * pushed-down clauses separately.)

This didn't matter a whole lot back when the planner couldn't reorder
outer joins, but now that it can, it's more important that the estimates
be accurate.

I'm not sure if this is feasible to fix before 8.2, but I'll take a
look.

            regards, tom lane

Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join

From

Tom Lane

Date:

09 November 2006, 21:29:42

I wrote:
> This didn't matter a whole lot back when the planner couldn't reorder
> outer joins, but now that it can, it's more important that the estimates
> be accurate.

> I'm not sure if this is feasible to fix before 8.2, but I'll take a
> look.

Actually, the changes were far more localized than I first feared.
Please apply the attached patch to your copy and see what you get for
your problem query.

            regards, tom lane

Attachment

msg-32091-126171.bin

Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join

From

"Thomas H."

Date:

09 November 2006, 22:09:37

>I wrote:
>> This didn't matter a whole lot back when the planner couldn't reorder
>> outer joins, but now that it can, it's more important that the estimates
>> be accurate.
>
>> I'm not sure if this is feasible to fix before 8.2, but I'll take a
>> look.
>
> Actually, the changes were far more localized than I first feared.
> Please apply the attached patch to your copy and see what you get for
> your problem query.

thanks for the quick patch. unfortunately i'm stuck with the win32 version
for now and haven't got the possibility to compile from source (yet)...

maybe magnus can provide me a custom win32 executable of b3 that contains
this patch [and the one for the xlog lockup]?

best wishes,
thomas