Thread: bad plan using nested loops

bad plan using nested loops

From

Johan Fredriksson

Date:

01 February 2018, 16:42:07

Hello!

I brought this issue up about two years ago but without getting any
real explanation or solution. The problem is that PostgreSQL does
really bad plans using nested loops. With "enable_nestloop = 0" the
same query is run about 20 times faster.

The sugested solution I got back then was to upgrade to the latest
version of PostgreSQL (then 9.5). It did not help. The solution we
finally applied was a horribly ugly patch to the perl-module
SearchBuilder that recognized queries that would perform badly and put
them inside transaction blocks with "SET LOCAL enable_nestloop = 0".

Last week I upgraded PostgreSQL for this application (Request Tracker)
to version 10.1 and just for fun I decied to test to remove the patch
to see if the problem still persisted. For two cases it did not. The
planner handled them just fine. For one case however, the same problem
still remains.

Bad plan: https://explain.depesz.com/s/avtZ
Good plan: https://explain.depesz.com/s/SJSt

Any suggestions on how to make the planner make better decisions for
this query?


        / Eskil

Re: bad plan using nested loops

From

Tom Lane

Date:

01 February 2018, 21:00:41

Johan Fredriksson <eskil@kth.se> writes:
> Bad plan: https://explain.depesz.com/s/avtZ
> Good plan: https://explain.depesz.com/s/SJSt
> Any suggestions on how to make the planner make better decisions for
> this query?

Core of the problem looks to be the misestimation here:

    Index Only Scan using shredder_cgm1 on public.cachedgroupmembers cachedgroupmembers_4 (cost=0.43..2.33 rows=79
width=8)(actual time=0.020..0.903 rows=1492 loops=804) 
      Output: cachedgroupmembers_4.memberid, cachedgroupmembers_4.groupid, cachedgroupmembers_4.disabled
      Index Cond: ((cachedgroupmembers_4.memberid = principals_1.id) AND (cachedgroupmembers_4.disabled = 0))
      Heap Fetches: 5018

Probably, memberid and disabled are correlated but the planner doesn't
know that, so it thinks the index condition is way more selective than it
actually is.  In PG 10, you could very possibly fix that by installing
extended statistics on that pair of columns.  See

https://www.postgresql.org/docs/current/static/planner-stats.html#PLANNER-STATS-EXTENDED

            regards, tom lane

SV: bad plan using nested loops

From

Johan Fredriksson

Date:

02 February 2018, 02:34:24

> Johan Fredriksson <eskil@kth.se> writes:
> > Bad plan: https://explain.depesz.com/s/avtZ
> > Good plan: https://explain.depesz.com/s/SJSt
> > Any suggestions on how to make the planner make better decisions for
> > this query?
>
> Core of the problem looks to be the misestimation here:
>
>        Index Only Scan using shredder_cgm1 on public.cachedgroupmembers cachedgroupmembers_4
> (cost=0.43..2.33 rows=79 width=8) (actual time=0.020..0.903 rows=1492 loops=804)
>          Output: cachedgroupmembers_4.memberid, cachedgroupmembers_4.groupid,
> cachedgroupmembers_4.disabled
>          Index Cond: ((cachedgroupmembers_4.memberid = principals_1.id) AND
> (cachedgroupmembers_4.disabled = 0))
>          Heap Fetches: 5018
>
> Probably, memberid and disabled are correlated but the planner doesn't
> know that, so it thinks the index condition is way more selective than it
> actually is.  In PG 10, you could very possibly fix that by installing
> extended statistics on that pair of columns.  See
>
> https://www.postgresql.org/docs/current/static/planner-stats.html#PLANNER-STATS-EXTENDED

I'm not sure what you mean by correlated, but there are only a handful (164 when I check it) disabled groupmembers out
oftotal 7.5 million. 
I'll give CREATE STATISTICS on those columns a shot and see if it gets any better.

        / Eskil

Re: SV: bad plan using nested loops

From

Johan Fredriksson

Date:

02 February 2018, 15:02:07

tor 2018-02-01 klockan 20:34 +0000 skrev Johan Fredriksson:
> > Johan Fredriksson <eskil@kth.se> writes:
> > > Bad plan: https://explain.depesz.com/s/avtZ
> > > Good plan: https://explain.depesz.com/s/SJSt
> > > Any suggestions on how to make the planner make better decisions
> > > for
> > > this query?
> > 
> > Core of the problem looks to be the misestimation here:
> > 
> >        Index Only Scan using shredder_cgm1 on
> > public.cachedgroupmembers cachedgroupmembers_4
> > (cost=0.43..2.33 rows=79 width=8) (actual time=0.020..0.903
> > rows=1492 loops=804)
> >          Output: cachedgroupmembers_4.memberid,
> > cachedgroupmembers_4.groupid,
> > cachedgroupmembers_4.disabled
> >          Index Cond: ((cachedgroupmembers_4.memberid =
> > principals_1.id) AND
> > (cachedgroupmembers_4.disabled = 0))
> >          Heap Fetches: 5018
> > 
> > Probably, memberid and disabled are correlated but the planner
> > doesn't
> > know that, so it thinks the index condition is way more selective
> > than it
> > actually is.  In PG 10, you could very possibly fix that by
> > installing
> > extended statistics on that pair of columns.  See
> > 
> > https://www.postgresql.org/docs/current/static/planner-stats.html#P
> > LANNER-STATS-EXTENDED
> 
> I'm not sure what you mean by correlated, but there are only a
> handful (164 when I check it) disabled groupmembers out of total 7.5
> million.
> I'll give CREATE STATISTICS on those columns a shot and see if it
> gets any better.

It looks like you are right, Tom. There actually exists full
correlation between memberid, groupid and disabled.

rt4=# SELECT stxname, stxkeys, stxdependencies FROM pg_statistic_ext;
 stxname  | stxkeys |   stxdependencies    
-----------+---------+----------------------
 cgm_stat2 | 2 6     | {"2
=> 6": 1.000000}
 cgm_stat1 | 3 6     | {"3 => 6": 1.000000}
(2 rows)

However, this does not help the planner. It still picks the bad plan.


       / Eskil

Re: SV: bad plan using nested loops

From

Tomas Vondra

Date:

06 February 2018, 01:15:02


On 02/02/2018 10:02 AM, Johan Fredriksson wrote:
> tor 2018-02-01 klockan 20:34 +0000 skrev Johan Fredriksson:
>>> Johan Fredriksson <eskil@kth.se> writes:
>>>> Bad plan: https://explain.depesz.com/s/avtZ
>>>> Good plan: https://explain.depesz.com/s/SJSt
>>>> Any suggestions on how to make the planner make better decisions
>>>> for
>>>> this query?
>>>
>>> Core of the problem looks to be the misestimation here:
>>>
>>>        Index Only Scan using shredder_cgm1 on
>>> public.cachedgroupmembers cachedgroupmembers_4
>>> (cost=0.43..2.33 rows=79 width=8) (actual time=0.020..0.903
>>> rows=1492 loops=804)
>>>          Output: cachedgroupmembers_4.memberid,
>>> cachedgroupmembers_4.groupid,
>>> cachedgroupmembers_4.disabled
>>>          Index Cond: ((cachedgroupmembers_4.memberid =
>>> principals_1.id) AND
>>> (cachedgroupmembers_4.disabled = 0))
>>>          Heap Fetches: 5018
>>>
>>> Probably, memberid and disabled are correlated but the planner
>>> doesn't
>>> know that, so it thinks the index condition is way more selective
>>> than it
>>> actually is.  In PG 10, you could very possibly fix that by
>>> installing
>>> extended statistics on that pair of columns.  See
>>>
>>> https://www.postgresql.org/docs/current/static/planner-stats.html#P
>>> LANNER-STATS-EXTENDED
>>
>> I'm not sure what you mean by correlated, but there are only a
>> handful (164 when I check it) disabled groupmembers out of total 7.5
>> million.
>> I'll give CREATE STATISTICS on those columns a shot and see if it
>> gets any better.
> 
> It looks like you are right, Tom. There actually exists full
> correlation between memberid, groupid and disabled.
> 
> rt4=# SELECT stxname, stxkeys, stxdependencies FROM pg_statistic_ext;
>  stxname  | stxkeys |   stxdependencies    
> -----------+---------+----------------------
>  cgm_stat2 | 2 6     | {"2
> => 6": 1.000000}
>  cgm_stat1 | 3 6     | {"3 => 6": 1.000000}
> (2 rows)
> 
> However, this does not help the planner. It still picks the bad plan.
> 

Yeah :-( Unfortunately, we're not using the extended statistics to
improve join cardinality estimates yet. PostgreSQL 10 can only use them
to improve estimates on individual tables, and judging by the progress
on already submitted improvements, it doesn't seem very likely to change
in PostgreSQL 11.

regards
Tomas

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services