Re: Different plan for very similar queries - Mailing list pgsql-performance

From Tom Lane
Subject Re: Different plan for very similar queries
Date
Msg-id 29819.1433012674@sss.pgh.pa.us
Whole thread Raw
In response to Re: Different plan for very similar queries  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Different plan for very similar queries
List pgsql-performance
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> Why exactly does the second query use a much slower plan I'm not sure. I
> believe I've found an issue in planning semi joins (reported to
> pgsql-hackers a few minutes ago), but may be wrong and the code is OK.

I think you are probably right that there's a bug there: the planner is
vastly overestimating the cost of the nestloop-with-inner-indexscan
plan.  However, the reason why the mergejoin plan gets chosen in some
cases seems to be that an additional estimation error is needed to make
that happen; otherwise the nestloop still comes out looking cheaper.
The undesirable case looks like:

>>  Merge Semi Join  (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1)
>>    Merge Cond: ((t.term)::text = (f.berechnungsart)::text)
>>    ->  Index Scan using term_term_idx on term t  (cost=0.00..319880.73 rows=636 width=81) (actual
time=7703.809..7703.938rows=3 loops=1) 
>>          Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'berechnungsart'::text))
>>    ->  Index Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f  (cost=0.00..2545748.85
rows=43577940width=2) (actual time=0.089..16263.582 rows=21336180 loops=1) 
>>  Total runtime: 30948.648 ms

Notice that it's estimating the cost of the join as significantly less
than the cost of the inner-side indexscan.  This means it believes that
the inner indexscan will not be run to completion.  That's not because of
semijoin semantics; there's no stop-after-first-match benefit for mergejoins.
It must be that it thinks the range of keys on the outer side of the join
is much less than the range of keys on the inner.  Given that it knows
that facttable_stat_fta4.berechnungsart only contains the values "m"
and "n", this implies that it thinks term.term only contains "m" and
not "n".  So this estimation error presumably comes from "n" not having
been seen in ANALYZE's last sample of term.term, and raising the stats
target for term.term would probably be a way to fix that.

However, this would all be moot if the cost estimate for the nestloop
plan were nearer to reality.  Since you started a separate -hackers
thread for that issue, let's go discuss that there.

            regards, tom lane


pgsql-performance by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Postgres is using 100% CPU
Next
From: birimblongas
Date:
Subject: Postmaster eating up all my cpu