Thread: postgres 8.2 seems to prefer Seq Scan

postgres 8.2 seems to prefer Seq Scan

From
"Alex Deucher"
Date:
One more anomaly between 7.4 and 8.2.  DB dumped from 7.4 and loaded
onto 8.2, both have locale set to C.  8.2 seems to prefer Seq Scans
for the first query while the ordering in the second query seems to
perform worse on 8.2.  I ran analyze.  I've tried with the encoding
set to UTF-8 and SQL_ASCII; same numbers and plans.  Any ideas how to
improve this?

Thanks,

Alex

postgres 7.4

EXPLAIN ANALYZE select pnum, event_pid, code_name, code_description,
code_mcam, event_date, effective_date, ref_country,
ref_country_legal_code, corresponding_pnum, withdrawal_date,
payment_date, extension_date, fee_payment_year, requester, free_form
from code inner join event on code_pid = code_pid_fk where pnum
='AB5819188';
                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..60.87 rows=19 width=231) (actual
time=0.065..0.065 rows=0 loops=1)
   ->  Index Scan using pnum_idx on event  (cost=0.00..3.37 rows=19
width=172) (actual time=0.063..0.063 rows=0 loops=1)
         Index Cond: ((pnum)::text = 'AB5819188'::text)
   ->  Index Scan using code_pkey on code  (cost=0.00..3.01 rows=1
width=67) (never executed)
         Index Cond: (code.code_pid = "outer".code_pid_fk)
 Total runtime: 0.242 ms
(6 rows)


postgres 8.2

EXPLAIN ANALYZE select pnum, event_pid, code_name, code_description,
code_mcam, event_date, effective_date, ref_country,
ref_country_legal_code, corresponding_pnum, withdrawal_date,
payment_date, extension_date, fee_payment_year, requester, free_form
from code inner join event on code_pid = code_pid_fk where pnum
='AB5819188';
                                                          QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=106.91..3283.46 rows=1779 width=230) (actual
time=10.383..10.390 rows=1 loops=1)
   Hash Cond: (event.code_pid_fk = code.code_pid)
   ->  Index Scan using pnum_idx on event  (cost=0.00..3147.63
rows=1779 width=171) (actual time=0.030..0.033 rows=1 loops=1)
         Index Cond: ((pnum)::text = 'AB5819188'::text)
   ->  Hash  (cost=70.85..70.85 rows=2885 width=67) (actual
time=10.329..10.329 rows=2885 loops=1)
         ->  Seq Scan on code  (cost=0.00..70.85 rows=2885 width=67)
(actual time=0.013..4.805 rows=2885 loops=1)
 Total runtime: 10.490 ms
(7 rows)


postgres 7.4

EXPLAIN ANALYZE select e.pnum, c.code_description, c.code_mcam,
e.event_pid from event e, code c where c.code_name =
e.ref_country_legal_code and c.code_country = e.ref_country and e.pnum
= 'AB5819188';

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=3.47..106.28 rows=1 width=73) (actual
time=7.795..7.795 rows=0 loops=1)
   Hash Cond: ((("outer".code_name)::text =
("inner".ref_country_legal_code)::text) AND
(("outer".code_country)::text = ("inner".ref_country)::text))
   ->  Seq Scan on code c  (cost=0.00..63.92 rows=2592 width=69)
(actual time=0.010..3.881 rows=2592 loops=1)
   ->  Hash  (cost=3.37..3.37 rows=19 width=30) (actual
time=0.064..0.064 rows=0 loops=1)
         ->  Index Scan using pnum_idx on event e  (cost=0.00..3.37
rows=19 width=30) (actual time=0.062..0.062 rows=0 loops=1)
               Index Cond: ((pnum)::text = 'AB5819188'::text)
 Total runtime: 7.947 ms
(7 rows)


postgres 8.2

EXPLAIN ANALYZE select e.pnum, c.code_description, c.code_mcam,
e.event_pid from event e, code c where c.code_name =
e.ref_country_legal_code and c.code_country = e.ref_country and e.pnum
= 'AB5819188';
                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=114.12..3368.51 rows=1 width=73) (actual
time=10.900..10.900 rows=0 loops=1)
   Hash Cond: (((e.ref_country_legal_code)::text =
(c.code_name)::text) AND ((e.ref_country)::text =
(c.code_country)::text))
   ->  Index Scan using pnum_idx on event e  (cost=0.00..3147.63
rows=1779 width=30) (actual time=0.027..0.031 rows=1 loops=1)
         Index Cond: ((pnum)::text = 'AB5819188'::text)
   ->  Hash  (cost=70.85..70.85 rows=2885 width=69) (actual
time=10.838..10.838 rows=2885 loops=1)
         ->  Seq Scan on code c  (cost=0.00..70.85 rows=2885 width=69)
(actual time=0.011..4.863 rows=2885 loops=1)
 Total runtime: 11.018 ms
(7 rows)

Re: postgres 8.2 seems to prefer Seq Scan

From
Michael Fuhr
Date:
On Fri, Apr 06, 2007 at 04:38:33PM -0400, Alex Deucher wrote:
> One more anomaly between 7.4 and 8.2.  DB dumped from 7.4 and loaded
> onto 8.2, both have locale set to C.  8.2 seems to prefer Seq Scans
> for the first query while the ordering in the second query seems to
> perform worse on 8.2.  I ran analyze.  I've tried with the encoding
> set to UTF-8 and SQL_ASCII; same numbers and plans.  Any ideas how to
> improve this?

Are you sure the data sets are identical?  The 7.4 query returned
0 rows; the 8.2 query returned 1 row.  If you're running the same
query against the same data in both versions then at least one of
them appears to be returning the wrong result.  Exactly which
versions of 7.4 and 8.2 are you running?

Have you analyzed all tables in both versions?  The row count
estimate in 7.4 is much closer to reality than in 8.2:

7.4
>   ->  Index Scan using pnum_idx on event  (cost=0.00..3.37 rows=19
> width=172) (actual time=0.063..0.063 rows=0 loops=1)
>         Index Cond: ((pnum)::text = 'AB5819188'::text)

8.2
>   ->  Index Scan using pnum_idx on event  (cost=0.00..3147.63
> rows=1779 width=171) (actual time=0.030..0.033 rows=1 loops=1)
>         Index Cond: ((pnum)::text = 'AB5819188'::text)

If analyzing the event table doesn't improve the row count estimate
then try increasing the statistics target for event.pnum and analyzing
again.  Example:

ALTER TABLE event ALTER pnum SET STATISTICS 100;
ANALYZE event;

You can set the statistics target as high as 1000 to get more
accurate results at the cost of longer ANALYZE times.

--
Michael Fuhr

Re: postgres 8.2 seems to prefer Seq Scan

From
"Alex Deucher"
Date:
On 4/6/07, Michael Fuhr <mike@fuhr.org> wrote:
> On Fri, Apr 06, 2007 at 04:38:33PM -0400, Alex Deucher wrote:
> > One more anomaly between 7.4 and 8.2.  DB dumped from 7.4 and loaded
> > onto 8.2, both have locale set to C.  8.2 seems to prefer Seq Scans
> > for the first query while the ordering in the second query seems to
> > perform worse on 8.2.  I ran analyze.  I've tried with the encoding
> > set to UTF-8 and SQL_ASCII; same numbers and plans.  Any ideas how to
> > improve this?
>
> Are you sure the data sets are identical?  The 7.4 query returned
> 0 rows; the 8.2 query returned 1 row.  If you're running the same
> query against the same data in both versions then at least one of
> them appears to be returning the wrong result.  Exactly which
> versions of 7.4 and 8.2 are you running?

They should be although it's possible one of my co-workers updated one
of the DB's since I last dumped it, but should be a negligible amount
of data.  Not sure of the exact version of 7.4; psql just says:
psql --version
psql (PostgreSQL) 7.4
contains support for command-line editing

8.2 is 8.2.3

>
> Have you analyzed all tables in both versions?  The row count
> estimate in 7.4 is much closer to reality than in 8.2:
>

Yes.

> 7.4
> >   ->  Index Scan using pnum_idx on event  (cost=0.00..3.37 rows=19
> > width=172) (actual time=0.063..0.063 rows=0 loops=1)
> >         Index Cond: ((pnum)::text = 'AB5819188'::text)
>
> 8.2
> >   ->  Index Scan using pnum_idx on event  (cost=0.00..3147.63
> > rows=1779 width=171) (actual time=0.030..0.033 rows=1 loops=1)
> >         Index Cond: ((pnum)::text = 'AB5819188'::text)
>
> If analyzing the event table doesn't improve the row count estimate
> then try increasing the statistics target for event.pnum and analyzing
> again.  Example:
>
> ALTER TABLE event ALTER pnum SET STATISTICS 100;
> ANALYZE event;
>
> You can set the statistics target as high as 1000 to get more
> accurate results at the cost of longer ANALYZE times.
>

Thanks!  I'll give that a try and report back.

Alex

Re: postgres 8.2 seems to prefer Seq Scan

From
"Alex Deucher"
Date:
On 4/6/07, Michael Fuhr <mike@fuhr.org> wrote:
> On Fri, Apr 06, 2007 at 04:38:33PM -0400, Alex Deucher wrote:
> > One more anomaly between 7.4 and 8.2.  DB dumped from 7.4 and loaded
> > onto 8.2, both have locale set to C.  8.2 seems to prefer Seq Scans
> > for the first query while the ordering in the second query seems to
> > perform worse on 8.2.  I ran analyze.  I've tried with the encoding
> > set to UTF-8 and SQL_ASCII; same numbers and plans.  Any ideas how to
> > improve this?
>
> Are you sure the data sets are identical?  The 7.4 query returned
> 0 rows; the 8.2 query returned 1 row.  If you're running the same
> query against the same data in both versions then at least one of
> them appears to be returning the wrong result.  Exactly which
> versions of 7.4 and 8.2 are you running?
>
> Have you analyzed all tables in both versions?  The row count
> estimate in 7.4 is much closer to reality than in 8.2:
>
> 7.4
> >   ->  Index Scan using pnum_idx on event  (cost=0.00..3.37 rows=19
> > width=172) (actual time=0.063..0.063 rows=0 loops=1)
> >         Index Cond: ((pnum)::text = 'AB5819188'::text)
>
> 8.2
> >   ->  Index Scan using pnum_idx on event  (cost=0.00..3147.63
> > rows=1779 width=171) (actual time=0.030..0.033 rows=1 loops=1)
> >         Index Cond: ((pnum)::text = 'AB5819188'::text)
>
> If analyzing the event table doesn't improve the row count estimate
> then try increasing the statistics target for event.pnum and analyzing
> again.  Example:
>
> ALTER TABLE event ALTER pnum SET STATISTICS 100;
> ANALYZE event;
>
> You can set the statistics target as high as 1000 to get more
> accurate results at the cost of longer ANALYZE times.
>

Setting statistics to 400 seems to be the sweet spot.  Values above
that seem to only marginally improve performance.  However, I have to
disable seqscan in order for the query to be fast.  Why does the query
planner insist on doing a seq scan?  Is there anyway to make it prefer
the index scan?

Thanks,

Alex

postgres 8.2

db=# EXPLAIN ANALYZE select pnum, event_pid, code_name,
code_description, code_mcam, event_date, effective_date, ref_country,
ref_country_legal_code, corresponding_pnum, withdrawal_date,
payment_date, extension_date, fee_payment_year, requester, free_form
from code inner join event on code_pid = code_pid_fk where pnum
='US5819188';
                                                        QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=106.91..210.37 rows=54 width=229) (actual
time=11.245..11.253 rows=1 loops=1)
   Hash Cond: (event.code_pid_fk = code.code_pid)
   ->  Index Scan using pnum_idx on event  (cost=0.00..102.58 rows=54
width=170) (actual time=0.108..0.112 rows=1 loops=1)
         Index Cond: ((pnum)::text = 'US5819188'::text)
   ->  Hash  (cost=70.85..70.85 rows=2885 width=67) (actual
time=11.006..11.006 rows=2885 loops=1)
         ->  Seq Scan on code  (cost=0.00..70.85 rows=2885 width=67)
(actual time=0.025..5.392 rows=2885 loops=1)
 Total runtime: 11.429 ms
(7 rows)

db=# set enable_seqscan=0;
SET
db=# EXPLAIN ANALYZE select pnum, event_pid, code_name,
code_description, code_mcam, event_date, effective_date, ref_country,
ref_country_legal_code, corresponding_pnum, withdrawal_date,
payment_date, extension_date, fee_payment_year, requester, free_form
from code inner join event on code_pid = code_pid_fk where pnum
='US5819188';
                                                        QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..289.72 rows=54 width=229) (actual
time=0.068..0.076 rows=1 loops=1)
   ->  Index Scan using pnum_idx on event  (cost=0.00..102.58 rows=54
width=170) (actual time=0.019..0.020 rows=1 loops=1)
         Index Cond: ((pnum)::text = 'US5819188'::text)
   ->  Index Scan using code_pkey on code  (cost=0.00..3.45 rows=1
width=67) (actual time=0.041..0.043 rows=1 loops=1)
         Index Cond: (code.code_pid = event.code_pid_fk)
 Total runtime: 0.126 ms
(6 rows)

Re: postgres 8.2 seems to prefer Seq Scan

From
"Alex Deucher"
Date:
On 4/9/07, Alex Deucher <alexdeucher@gmail.com> wrote:
> On 4/6/07, Michael Fuhr <mike@fuhr.org> wrote:
> > On Fri, Apr 06, 2007 at 04:38:33PM -0400, Alex Deucher wrote:
> > > One more anomaly between 7.4 and 8.2.  DB dumped from 7.4 and loaded
> > > onto 8.2, both have locale set to C.  8.2 seems to prefer Seq Scans
> > > for the first query while the ordering in the second query seems to
> > > perform worse on 8.2.  I ran analyze.  I've tried with the encoding
> > > set to UTF-8 and SQL_ASCII; same numbers and plans.  Any ideas how to
> > > improve this?
> >
> > Are you sure the data sets are identical?  The 7.4 query returned
> > 0 rows; the 8.2 query returned 1 row.  If you're running the same
> > query against the same data in both versions then at least one of
> > them appears to be returning the wrong result.  Exactly which
> > versions of 7.4 and 8.2 are you running?
> >
> > Have you analyzed all tables in both versions?  The row count
> > estimate in 7.4 is much closer to reality than in 8.2:
> >
> > 7.4
> > >   ->  Index Scan using pnum_idx on event  (cost=0.00..3.37 rows=19
> > > width=172) (actual time=0.063..0.063 rows=0 loops=1)
> > >         Index Cond: ((pnum)::text = 'AB5819188'::text)
> >
> > 8.2
> > >   ->  Index Scan using pnum_idx on event  (cost=0.00..3147.63
> > > rows=1779 width=171) (actual time=0.030..0.033 rows=1 loops=1)
> > >         Index Cond: ((pnum)::text = 'AB5819188'::text)
> >
> > If analyzing the event table doesn't improve the row count estimate
> > then try increasing the statistics target for event.pnum and analyzing
> > again.  Example:
> >
> > ALTER TABLE event ALTER pnum SET STATISTICS 100;
> > ANALYZE event;
> >
> > You can set the statistics target as high as 1000 to get more
> > accurate results at the cost of longer ANALYZE times.
> >
>
> Setting statistics to 400 seems to be the sweet spot.  Values above
> that seem to only marginally improve performance.  However, I have to
> disable seqscan in order for the query to be fast.  Why does the query
> planner insist on doing a seq scan?  Is there anyway to make it prefer
> the index scan?
>

FWIW, disabling seqscan also makes the second query much faster:

EXPLAIN ANALYZE select e.pnum, c.code_description, c.code_mcam,
e.event_pid from event e, code c where c.code_name =
e.ref_country_legal_code and c.code_country = e.ref_country and e.pnum
= 'US5819188';
                                                             QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=104.13..257.65 rows=1 width=73) (actual
time=0.038..0.038 rows=0 loops=1)
   Merge Cond: ((c.code_country)::text = "inner"."?column5?")
   Join Filter: ((c.code_name)::text = (e.ref_country_legal_code)::text)
   ->  Index Scan using code_country_idx on code c  (cost=0.00..134.00
rows=2885 width=69) (actual time=0.012..0.012 rows=1 loops=1)
   ->  Sort  (cost=104.13..104.27 rows=54 width=30) (actual
time=0.019..0.021 rows=1 loops=1)
         Sort Key: (e.ref_country)::text
         ->  Index Scan using pnum_idx on event e  (cost=0.00..102.58
rows=54 width=30) (actual time=0.010..0.012 rows=1 loops=1)
               Index Cond: ((pnum)::text = 'US5819188'::text)
 Total runtime: 0.072 ms
(9 rows)

Re: postgres 8.2 seems to prefer Seq Scan

From
"Alex Deucher"
Date:
On 4/9/07, Alex Deucher <alexdeucher@gmail.com> wrote:
> On 4/9/07, Alex Deucher <alexdeucher@gmail.com> wrote:
> > On 4/6/07, Michael Fuhr <mike@fuhr.org> wrote:
> > > On Fri, Apr 06, 2007 at 04:38:33PM -0400, Alex Deucher wrote:
> > > > One more anomaly between 7.4 and 8.2.  DB dumped from 7.4 and loaded
> > > > onto 8.2, both have locale set to C.  8.2 seems to prefer Seq Scans
> > > > for the first query while the ordering in the second query seems to
> > > > perform worse on 8.2.  I ran analyze.  I've tried with the encoding
> > > > set to UTF-8 and SQL_ASCII; same numbers and plans.  Any ideas how to
> > > > improve this?
> > >
> > > Are you sure the data sets are identical?  The 7.4 query returned
> > > 0 rows; the 8.2 query returned 1 row.  If you're running the same
> > > query against the same data in both versions then at least one of
> > > them appears to be returning the wrong result.  Exactly which
> > > versions of 7.4 and 8.2 are you running?
> > >
> > > Have you analyzed all tables in both versions?  The row count
> > > estimate in 7.4 is much closer to reality than in 8.2:
> > >
> > > 7.4
> > > >   ->  Index Scan using pnum_idx on event  (cost=0.00..3.37 rows=19
> > > > width=172) (actual time=0.063..0.063 rows=0 loops=1)
> > > >         Index Cond: ((pnum)::text = 'AB5819188'::text)
> > >
> > > 8.2
> > > >   ->  Index Scan using pnum_idx on event  (cost=0.00..3147.63
> > > > rows=1779 width=171) (actual time=0.030..0.033 rows=1 loops=1)
> > > >         Index Cond: ((pnum)::text = 'AB5819188'::text)
> > >
> > > If analyzing the event table doesn't improve the row count estimate
> > > then try increasing the statistics target for event.pnum and analyzing
> > > again.  Example:
> > >
> > > ALTER TABLE event ALTER pnum SET STATISTICS 100;
> > > ANALYZE event;
> > >
> > > You can set the statistics target as high as 1000 to get more
> > > accurate results at the cost of longer ANALYZE times.
> > >
> >
> > Setting statistics to 400 seems to be the sweet spot.  Values above
> > that seem to only marginally improve performance.  However, I have to
> > disable seqscan in order for the query to be fast.  Why does the query
> > planner insist on doing a seq scan?  Is there anyway to make it prefer
> > the index scan?
> >

Ok, it looks like bumping up the the stats to 400 did the trick.  It
seems my test sets were not a good representation of the queries.  The
sets I was using were more of an exception to the rule since they were
hitting comparatively fewer rows that most others.  Thanks to everyone
on the list and IRC for their help.

Alex