Home > mailing lists

Re: slow query plans caused by under-estimation of CTE cardinality - Mailing list pgsql-performance

From	Vitalii Tymchyshyn
Subject	Re: slow query plans caused by under-estimation of CTE cardinality
Date	February 18, 2013 23:40:45
Msg-id	CABWW-d3um0XrHEqORRSTGezV=vY3W0CBN=UJP245VHMyosrhRg@mail.gmail.com Whole thread Raw
In response to	slow query plans caused by under-estimation of CTE cardinality (John Lumby <johnlumby@hotmail.com>)
Responses	Re: slow query plans caused by under-estimation of CTE cardinality (John Lumby <johnlumby@hotmail.com>)
List	pgsql-performance

Tree view

Since cte is already an optimization fence, you can go further and make it temporary table.
Create table;analyze;select should make optimizer's work much easier.

18 лют. 2013 18:45, "John Lumby" <johnlumby@hotmail.com> напис.

On 2012-10-09 23:09:21
Tom Lane wrote:
>

> re subject Why am I getting great/terrible estimates with these CTE queries?
> You're assuming the case where the estimate is better is better for a
> reason ... but it's only better as a result of blind dumb luck. The
> outer-level query planner doesn't know anything about the CTE's output
> except the estimated number of rows --- in particular, it doesn't drill
> down to find any statistics about the join column.
>

I am also struggling with a problem involving CTEs although in my case
it is caused by huge *under*-estimation of cardinality rather then *over*-estimation.
The statement is quite complex and the problem arises because there is a chain of
RECURSIVE CTEs each defined as a query involving an earlier CTE and more tables.
Eventually there is no hope for making a good cardinality estimate.

One CTE in particular has a cardinality estimate of 1 (I guess the actual
estimate is nearer zero and rounded up) but actual count is over 100000.
The planner puts this CTE as inner of a nested loop accessed by simple linear CTE scan
and the full query then takes over 20 minutes.

   -> Nested Loop (cost=0.00..0.06 rows=1 width=588) (actual time=2340.421..1201593.856 rows=105984 loops=1)
          Join Filter: ((winnum.subnet_id = binoptasc.subnet_id) AND (winnum.option_code = binoptasc.option_code) AND ((winnum.option_discriminator)::text = (binoptasc.option_discriminator)::text) AND (winnum.net_rel_level = binoptasc.net_rel_level))
          Rows Removed by Join Filter: 7001612448
          Buffers: shared hit=2290941
          -> CTE Scan on winning_option_nums winnum (cost=0.00..0.02 rows=1 width=536) (actual time=2338.422..2543.684 rows=62904 loops=1)
                Buffers: shared hit=2290941
          -> CTE Scan on subnet_inhrt_options_asc binoptasc (cost=0.00..0.02 rows=1 width=584) (actual time=0.000..9.728 rows=111308 loops=62904)

Whereas, (by altering various statistics to be very wrong) the entire query runs in 21 seconds.

There have been several debates about how to address situations like this where
no practical non-query-specific statistics-gathering scheme can ever hope to
gather enough statistics to model the later derived tables.     E.g. the frowned-on
SELECTIVITY clause and ideas for query-specific statistics.

Meanwhile,   I have one other suggestion aimed specifically at problematic CTEs:
Would it be reasonable to provide a new Planner Configuration option :

enable_nestloop_cte_inner (boolean)
Enables or disables the query planner's use of nested-loop join plans in which a CTE is the inner.
It is impossible to suppress such nested-loop joins entirely,
but turning this variable off discourages the planner from using one
if there are other methods available, such as sorting the CTE for merge-join
or hashing it for hash-join.
The default is on.

John

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

pgsql-performance by date:

From: John Lumby
Date: 18 February 2013, 19:45:10
Subject: slow query plans caused by under-estimation of CTE cardinality

From: John Lumby
Date: 19 February 2013, 02:26:34
Subject: Re: slow query plans caused by under-estimation of CTE cardinality

Re: slow query plans caused by under-estimation of CTE cardinality - Mailing list pgsql-performance

Previous

Next