Re: Bad plan by Planner (Already resolved?) - Mailing list pgsql-performance

From Tom Lane
Subject Re: Bad plan by Planner (Already resolved?)
Date
Msg-id 9507.1319659042@sss.pgh.pa.us
Whole thread Raw
In response to Re: Bad plan by Planner (Already resolved?)  (Robins Tharakan <robins.tharakan@comodo.com>)
Responses Re: Bad plan by Planner (Already resolved?)  (Robins Tharakan <robins.tharakan@comodo.com>)
List pgsql-performance
Robins Tharakan <robins.tharakan@comodo.com> writes:
> ORIGINAL QUERY (on PostgreSQL 8.4.9):
> http://explain.depesz.com/s/bTm

> EXPLAIN ANALYZE SELECT field_a FROM large_table_a JOIN large_table_b
> USING (field_a) WHERE field_b IN (SELECT large_table_b.field_b FROM
> large_table_b WHERE field_a = 2673056) ;

>
------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Nested Loop  (cost=132.97..194243.54 rows=156031 width=4) (actual
> time=6.612..43179.524 rows=2120 loops=1)
>     ->  Nested Loop  (cost=132.97..1107.63 rows=156031 width=4) (actual
> time=6.576..29122.017 rows=6938 loops=1)
>           ->  HashAggregate  (cost=132.97..133.96 rows=99 width=4)
> (actual time=6.543..12.726 rows=2173 loops=1)
>                 ->  Index Scan using "IX_large_table_b_SigId" on
> large_table_b  (cost=0.00..132.56 rows=164 width=4) (actual
> time=0.029..3.425 rows=2173 loops=1)
>                       Index Cond: (field_a = 2673056)
>           ->  Index Scan using "IX_large_table_b_field_b" on
> large_table_b  (cost=0.00..9.81 rows=2 width=8) (actual
> time=6.732..13.384 rows=3 loops=2173)
>                 Index Cond: (public.large_table_b.field_b =
> public.large_table_b.field_b)
>     ->  Index Scan using "PK_large_table_a" on large_table_a
> (cost=0.00..1.23 rows=1 width=4) (actual time=2.021..2.021 rows=0
> loops=6938)
>           Index Cond: (large_table_a.field_a = public.large_table_b.field_a)
>   Total runtime: 43182.975 ms


> OPTIMIZED QUERY (on PostgreSQL 8.4.7):
> http://explain.depesz.com/s/emO

> EXPLAIN ANALYZE SELECT s1.field_a FROM large_table_a JOIN large_table_b
> s1 USING (field_a) JOIN large_table_b s2 ON s1.field_b = s2.field_b
> WHERE s2.field_a = 2673056;

>
------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Nested Loop  (cost=0.00..2356.98 rows=494 width=4) (actual
> time=0.086..96.056 rows=2120 loops=1)
>     ->  Nested Loop  (cost=0.00..1745.51 rows=494 width=4) (actual
> time=0.051..48.900 rows=6938 loops=1)
>           ->  Index Scan using "IX_large_table_b_SigId" on large_table_b
> s2  (cost=0.00..132.56 rows=164 width=4) (actual time=0.028..3.411
> rows=2173 loops=1)
>                 Index Cond: (field_a = 2673056)
>           ->  Index Scan using "IX_large_table_b_field_b" on
> large_table_b s1  (cost=0.00..9.81 rows=2 width=8) (actual
> time=0.007..0.012 rows=3 loops=2173)
>                 Index Cond: (s1.field_b = s2.field_b)
>     ->  Index Scan using "PK_large_table_a" on large_table_a
> (cost=0.00..1.23 rows=1 width=4) (actual time=0.004..0.004 rows=0
> loops=6938)
>           Index Cond: (large_table_a.field_a = s1.field_a)
>   Total runtime: 98.165 ms


I suspect that you're just fooling yourself here, and the "optimized"
query is no such thing.  Those plans are identical except for the
insertion of the HashAggregate step, which in itself adds less than
10msec to the runtime, and we can see it's not eliminating any rows
either.  So why does the second one run so much faster?  I can think
of three theories:

1. The tables are horrendously bloated on the first database, so that
many more pages have to be touched to get the same number of tuples.
This would likely indicate an improper autovacuum configuration.

2. You failed to account for caching effects, ie the first example
is being run "cold" and has to actually read everything from disk,
whereas the second example has everything it needs already in RAM.
In that case the speed differential is quite illusory.

3. The HashAggregate would likely spit out the rows in a completely
different order than it received them.  If scanning large_table_b in
the order of IX_large_table_b_SigId happens to yield field_b values
that are very well ordered, it's possible that locality of access in
the other indexscans would be enough better in the second plan to
account for the speedup.  This seems the least likely theory, though.

BTW, how come is it that "SELECT large_table_b.field_b FROM
large_table_b WHERE field_a = 2673056" produces no duplicate field_b
values?  Is that just luck?  Is there a unique constraint on the table
that implies it will happen?

            regards, tom lane

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Anti join miscalculates row number?
Next
From: Julius Tuskenis
Date:
Subject: Re: Anti join miscalculates row number?