Home > mailing lists

Re: FETCH FIRST clause WITH TIES option - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: FETCH FIRST clause WITH TIES option
Date	January 2, 2019 18:19:37
Msg-id	e5c5129c-2518-4a5d-fc28-0a080a08526f@2ndquadrant.com Whole thread Raw
In response to	Re: FETCH FIRST clause WITH TIES option (Surafel Temesgen <surafel3000@gmail.com>)
Responses	Re: FETCH FIRST clause WITH TIES option (Surafel Temesgen <surafel3000@gmail.com>)
List	pgsql-hackers

Tree view

On 1/2/19 11:51 AM, Surafel Temesgen wrote:
> 
> 
> On Tue, Jan 1, 2019 at 8:38 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
>     >
>     >     The attached patch include all the comment given by Tomas and i
>     >     check sql standard about LIMIT and this feature
>     >
> 
>     Unfortunately, it seems the "limit" regression tests fail for some
>     reason - the output mismatches the expected results for some reason. It
>     seems as if the WITH TIES code affects ordering of the results within
>     the group. See the attached file.
> 
> 
> Yes the reason is the order of returned row is not always the same. I
> remove other columns from the result set to get constant result
> 

Thanks, that seems reasonable.

After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this
patch should tweak estimates in some way. Currently, the cardinality
estimate is the same as for plain LIMIT, using the requested number of
rows. But let's say there are very few large groups - that will
naturally increase the number of rows produced.

As an example, let's say the subplan produces 1M rows, and there are
1000 groups (when split according to the ORDER BY clause). Consider a
query with "FETCH FIRST 10 ROWS WITH TIES". AFAICS the current estimate
will be 10, but in fact we know that it's likely to produce ~1000 rows
(because that's the average group size).

So I think the patch should tweak the estimate to be

  limitCount + (avgGroupSize/2);

or perhaps

  Max(avgGroupSize, limitCount + (avgGroupSize/2))

The 1/2 is there because we don't know where the group starts.

Of course, using average group size like this is rather crude, but it's
about the best thing we can do. In principle, increasing the cardinality
estimate is the right thing to do.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Peter Eisentraut
Date: 02 January 2019, 18:07:40
Subject: Re: chained transactions

From: Peter Eisentraut
Date: 02 January 2019, 18:20:06
Subject: Re: explain plans with information about (modified) gucs

Re: FETCH FIRST clause WITH TIES option - Mailing list pgsql-hackers

Previous

Next