Re: Different plan for very similar queries - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Different plan for very similar queries
Date
Msg-id 556B3160.2010706@2ndquadrant.com
Whole thread Raw
In response to Re: Different plan for very similar queries  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Responses Re: Different plan for very similar queries
List pgsql-performance
On 05/31/15 13:00, Peter J. Holzer wrote:
> [I've seen in -hackers that you already seem to have a fix]
>
> On 2015-05-30 15:04:34 -0400, Tom Lane wrote:
>> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>> Why exactly does the second query use a much slower plan I'm not sure. I
>>> believe I've found an issue in planning semi joins (reported to
>>> pgsql-hackers a few minutes ago), but may be wrong and the code is OK.
>>
>> I think you are probably right that there's a bug there: the planner is
>> vastly overestimating the cost of the nestloop-with-inner-indexscan
>> plan.  However, the reason why the mergejoin plan gets chosen in some
>> cases seems to be that an additional estimation error is needed to make
>> that happen; otherwise the nestloop still comes out looking cheaper.
>> The undesirable case looks like:
>>
>>>>   Merge Semi Join  (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1)
>>>>     Merge Cond: ((t.term)::text = (f.berechnungsart)::text)
>>>>     ->  Index Scan using term_term_idx on term t  (cost=0.00..319880.73 rows=636 width=81) (actual
time=7703.809..7703.938rows=3 loops=1) 
>>>>           Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'berechnungsart'::text))
>
> Just noticed that this is a bit strange, too:
>
> This scans the whole index term_term_idx and for every row found it
> checks the table for the filter condition. So it has to read the whole
> index and the whole table, right? But the planner estimates that it will
> return only 636 rows (out of 6.1E6), so using
> term_facttablename_columnname_idx to extract those 636 and then sorting
> them should be quite a bit faster (even just a plain full table scan
> and then sorting should be faster).

That seems a bit strange, yes. I don't see why a simple index scan (with
Index Cond), expected to produce 636, should be more expensive than
scanning the whole index (with a Filter). Even if there's an additional
Sort node, sorting those 636 rows.

But I've been unable to reproduce that (both on 9.1 and HEAD) without
significant 'SET enable_*' gymnastics, so I'm not sure why that happens.
Don't you have some 'enable_sort=off' or something like that?

A test case with a data set would help a lot, in this case.


> Another thought: For the merge semi join postgresql doesn't actually
> have to scan the whole inner index. It can skip from the first 'm' entry
> to the first 'n' entry reading only a few non-leaf blocks, skipping many
> leaf blocks in the process. The times (7703.917..30948.271) indicate that
> it doesn't actually do this, but maybe the planner assumes it does?

How would it know how far to skip? I mean, assume you're on the first
'n' entry - how do you know where is the first 'm' entry?

If you only really need to check existence, a nested loop with an inner
index scan is probably the right thing anyway, especially if the number
of outer rows (and thus loops performed) is quite low. This is clearly
demonstrated by the first plan in this thread:

                   QUERY PLAN
------------------------------------------------------------- ...
  Nested Loop Semi Join  (cost=0.00..384860.48 rows=1 width=81 ...
    ->  Index Scan using term_facttablename_columnname_idx on  ...
          Index Cond: (((facttablename)::text = 'facttable_sta ...
    ->  Index Scan using facttable_stat_fta4_einheit_idx on fa ...
          Index Cond: ((einheit)::text = (t.term)::text)
  Total runtime: 0.173 ms
(6 rows)

This is probably the best plan you can get in cases like this ...

>
> I also suspected that the culprit is the "columnname" column. That one has a very
> skewed distribution:
>
> wdsah=> select columnname, count(*) from term group by columnname order by count(*);
>       columnname      |  count
> ---------------------+---------
>   warenstrom          |       3
>   einheit             |       3
>   berechnungsart      |       3
>   og                  |      26
>   berichtsregion      |     242
>   partnerregion       |     246
>   sitcr4              |    4719
>   kurzbezeichnung     | 1221319
>   macrobondtimeseries | 1221320
>                       | 3661206
> (10 rows)
>
> So random variation in the sample could throw off the estimated
> frequencies of the the least frequent columnnames by quite a bit.
>
> But given that both plans estimated the number of rows returned by the
> outer index scan as 636, that was probably a red herring.
>
> But there does seem to be a connection to this column: In one case
> pg_stats contained n_distinct=7 and only the two most common values.
> Then the plan looked like this:
>
> wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
>          from term t where facttablename='facttable_stat_fta4' and columnname='warenstrom' and exists (select 1 from
facttable_stat_fta4f where f.warenstrom=t.term ); 
>                                                                                   QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Nested Loop Semi Join  (cost=0.00..386141.13 rows=1 width=81) (actual time=0.202..0.253 rows=2 loops=1)
>     ->  Index Scan using term_facttablename_columnname_idx on term t  (cost=0.00..264.03 rows=437 width=81) (actual
time=0.097..0.099rows=3 loops=1) 
>           Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'warenstrom'::text))
>     ->  Index Scan using facttable_stat_fta4_warenstrom_idx on facttable_stat_fta4 f  (cost=0.00..385869.36
rows=21787688width=2) (actual time=0.033..0.033 rows=1 loops=3) 
>           Index Cond: ((warenstrom)::text = (t.term)::text)
>   Total runtime: 0.314 ms
>
> But after another analye, pg_stats contained n_distinct=5 and the 5 most
> common values. And now the plan looks like this (after disabling
> bitmapscan and hashagg):
>
> wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
>          from term t where facttablename='facttable_stat_fta4' and columnname='warenstrom' and exists (select 1 from
facttable_stat_fta4f where f.warenstrom=t.term ); 
>                                                                                   QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Nested Loop Semi Join  (cost=0.00..2124104.23 rows=1 width=81) (actual time=0.080..0.129 rows=2 loops=1)
>     ->  Index Scan using term_facttablename_columnname_idx on term t  (cost=0.00..3.23 rows=1 width=81) (actual
time=0.030..0.031rows=3 loops=1) 
>           Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'warenstrom'::text))
>     ->  Index Scan using facttable_stat_fta4_warenstrom_idx on facttable_stat_fta4 f  (cost=0.00..2124100.90
rows=21787688width=2) (actual time=0.029..0.029 rows=1 loops=3) 
>           Index Cond: ((warenstrom)::text = (t.term)::text)
>   Total runtime: 0.180 ms
> (6 rows)
>
> The estimated number of rows in the outer scan is way more accurate in
> the second plan (1 instead of 437), but for some reason the cost for the
> inner scan is higher (2124100.90 instead of 385869.36) although it
> should be lower (we only need to search for 1 value, not 437)

As I explained in the pgsql-hackers thread (sorry for the confusion, it
seemed like a more appropriate place for discussion on planner
internals), I believe this happens because of only comparing total costs
of the inner paths. That is a problem, because in this care we only
really care about the first tuple, not about all the tuples. Because
that's what semijoin needs.

Could you post the plan with bitmapscan enabled? I'd bet the cost will
be somewhere between 385869.36 and 2124100.90, so that small variations
in the statistics (and thus costs) cause such plan changes. When the
indexscan gets below bitmapscan, you get the first (good) plan,
otherwise you get the other one.

Also, this may be easily caused by variations within the same
statistics, e.g. between columns or between values within the same
column (so a MCV item with 51% gets one plan, item with 49% gets a
different plan).

This might be improved by using a larger sample - 9.1 uses default
statistics target 100, so samples with 30k rows. Try increasing that to
1000 (SET default_statistics_target=1000) - that should give more
consistent statistics and hopefully stable plans (but maybe in the wrong
direction).

Also, you might tweak the cost variables a bit, to make the cost
differences more significant. But that's secondary I guess, as the costs
(385869 vs. 2124100) are quite far away.

> (There was no analyze on facttable_stat_fta4 (automatic or manual) on
> facttable_stat_fta4 between those two tests, so the statistics on
> facttable_stat_fta4 shouldn't have changed - only those for term.)

So maybe there was autoanalyze, because otherwise it really should be
the same in both plans ...


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Different plan for very similar queries
Next
From: Tom Lane
Date:
Subject: Re: Different plan for very similar queries