Re: Different plan for very similar queries - Mailing list pgsql-performance

From Peter J. Holzer
Subject Re: Different plan for very similar queries
Date
Msg-id 20150531110010.GA25173@hjp.at
Whole thread Raw
In response to Re: Different plan for very similar queries  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Different plan for very similar queries
Re: Different plan for very similar queries
List pgsql-performance
[I've seen in -hackers that you already seem to have a fix]

On 2015-05-30 15:04:34 -0400, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> > Why exactly does the second query use a much slower plan I'm not sure. I
> > believe I've found an issue in planning semi joins (reported to
> > pgsql-hackers a few minutes ago), but may be wrong and the code is OK.
>
> I think you are probably right that there's a bug there: the planner is
> vastly overestimating the cost of the nestloop-with-inner-indexscan
> plan.  However, the reason why the mergejoin plan gets chosen in some
> cases seems to be that an additional estimation error is needed to make
> that happen; otherwise the nestloop still comes out looking cheaper.
> The undesirable case looks like:
>
> >>  Merge Semi Join  (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1)
> >>    Merge Cond: ((t.term)::text = (f.berechnungsart)::text)
> >>    ->  Index Scan using term_term_idx on term t  (cost=0.00..319880.73 rows=636 width=81) (actual
time=7703.809..7703.938rows=3 loops=1) 
> >>          Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'berechnungsart'::text))

Just noticed that this is a bit strange, too:

This scans the whole index term_term_idx and for every row found it
checks the table for the filter condition. So it has to read the whole
index and the whole table, right? But the planner estimates that it will
return only 636 rows (out of 6.1E6), so using
term_facttablename_columnname_idx to extract those 636 and then sorting
them should be quite a bit faster (even just a plain full table scan
and then sorting should be faster).

> >>    ->  Index Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f  (cost=0.00..2545748.85
rows=43577940width=2) (actual time=0.089..16263.582 rows=21336180 loops=1) 
> >>  Total runtime: 30948.648 ms
>
> Notice that it's estimating the cost of the join as significantly less
> than the cost of the inner-side indexscan.  This means it believes that
> the inner indexscan will not be run to completion.  That's not because of
> semijoin semantics; there's no stop-after-first-match benefit for mergejoins.
> It must be that it thinks the range of keys on the outer side of the join
> is much less than the range of keys on the inner.  Given that it knows
> that facttable_stat_fta4.berechnungsart only contains the values "m"
> and "n", this implies that it thinks term.term only contains "m" and
> not "n".  So this estimation error presumably comes from "n" not having
> been seen in ANALYZE's last sample of term.term, and raising the stats
> target for term.term would probably be a way to fix that.

The term column has a relatively flat distribution and about 3.5 million
distinct values (in 6.1 million rows). So it's unlikely for any specific
value to be included in an ANALYZE sample, and an error to assume that a
value doesn't occur at all just because it wasn't seen by ANALYZE. OTOH,
a value which is seen by ANALYZE will have its frequency overestimated
by quite a bit. But that shouldn't influence whether the whole index
needs to be scanned or not.

Another thought: For the merge semi join postgresql doesn't actually
have to scan the whole inner index. It can skip from the first 'm' entry
to the first 'n' entry reading only a few non-leaf blocks, skipping many
leaf blocks in the process. The times (7703.917..30948.271) indicate that
it doesn't actually do this, but maybe the planner assumes it does?

I also suspected that the culprit is the "columnname" column. That one has a very
skewed distribution:

wdsah=> select columnname, count(*) from term group by columnname order by count(*);
     columnname      |  count
---------------------+---------
 warenstrom          |       3
 einheit             |       3
 berechnungsart      |       3
 og                  |      26
 berichtsregion      |     242
 partnerregion       |     246
 sitcr4              |    4719
 kurzbezeichnung     | 1221319
 macrobondtimeseries | 1221320
                     | 3661206
(10 rows)

So random variation in the sample could throw off the estimated
frequencies of the the least frequent columnnames by quite a bit.

But given that both plans estimated the number of rows returned by the
outer index scan as 636, that was probably a red herring.

But there does seem to be a connection to this column: In one case
pg_stats contained n_distinct=7 and only the two most common values.
Then the plan looked like this:

wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
        from term t where facttablename='facttable_stat_fta4' and columnname='warenstrom' and exists (select 1 from
facttable_stat_fta4f where f.warenstrom=t.term ); 
                                                                                 QUERY PLAN
                                                    

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=0.00..386141.13 rows=1 width=81) (actual time=0.202..0.253 rows=2 loops=1)
   ->  Index Scan using term_facttablename_columnname_idx on term t  (cost=0.00..264.03 rows=437 width=81) (actual
time=0.097..0.099rows=3 loops=1) 
         Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'warenstrom'::text))
   ->  Index Scan using facttable_stat_fta4_warenstrom_idx on facttable_stat_fta4 f  (cost=0.00..385869.36
rows=21787688width=2) (actual time=0.033..0.033 rows=1 loops=3) 
         Index Cond: ((warenstrom)::text = (t.term)::text)
 Total runtime: 0.314 ms

But after another analye, pg_stats contained n_distinct=5 and the 5 most
common values. And now the plan looks like this (after disabling
bitmapscan and hashagg):

wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
        from term t where facttablename='facttable_stat_fta4' and columnname='warenstrom' and exists (select 1 from
facttable_stat_fta4f where f.warenstrom=t.term ); 
                                                                                 QUERY PLAN
                                                     

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=0.00..2124104.23 rows=1 width=81) (actual time=0.080..0.129 rows=2 loops=1)
   ->  Index Scan using term_facttablename_columnname_idx on term t  (cost=0.00..3.23 rows=1 width=81) (actual
time=0.030..0.031rows=3 loops=1) 
         Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'warenstrom'::text))
   ->  Index Scan using facttable_stat_fta4_warenstrom_idx on facttable_stat_fta4 f  (cost=0.00..2124100.90
rows=21787688width=2) (actual time=0.029..0.029 rows=1 loops=3) 
         Index Cond: ((warenstrom)::text = (t.term)::text)
 Total runtime: 0.180 ms
(6 rows)

The estimated number of rows in the outer scan is way more accurate in
the second plan (1 instead of 437), but for some reason the cost for the
inner scan is higher (2124100.90 instead of 385869.36) although it
should be lower (we only need to search for 1 value, not 437)

(There was no analyze on facttable_stat_fta4 (automatic or manual) on
facttable_stat_fta4 between those two tests, so the statistics on
facttable_stat_fta4 shouldn't have changed - only those for term.)

    hp

--
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Attachment

pgsql-performance by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Postmaster eating up all my cpu
Next
From: Jean-David Beyer
Date:
Subject: Re: Postgres is using 100% CPU