Re: FETCH FIRST clause WITH TIES option - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: FETCH FIRST clause WITH TIES option
Date
Msg-id e5c5129c-2518-4a5d-fc28-0a080a08526f@2ndquadrant.com
Whole thread Raw
In response to Re: FETCH FIRST clause WITH TIES option  (Surafel Temesgen <surafel3000@gmail.com>)
Responses Re: FETCH FIRST clause WITH TIES option  (Surafel Temesgen <surafel3000@gmail.com>)
List pgsql-hackers

On 1/2/19 11:51 AM, Surafel Temesgen wrote:
> 
> 
> On Tue, Jan 1, 2019 at 8:38 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
>     >
>     >     The attached patch include all the comment given by Tomas and i
>     >     check sql standard about LIMIT and this feature
>     >
> 
>     Unfortunately, it seems the "limit" regression tests fail for some
>     reason - the output mismatches the expected results for some reason. It
>     seems as if the WITH TIES code affects ordering of the results within
>     the group. See the attached file.
> 
> 
> Yes the reason is the order of returned row is not always the same. I
> remove other columns from the result set to get constant result
> 

Thanks, that seems reasonable.

After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this
patch should tweak estimates in some way. Currently, the cardinality
estimate is the same as for plain LIMIT, using the requested number of
rows. But let's say there are very few large groups - that will
naturally increase the number of rows produced.

As an example, let's say the subplan produces 1M rows, and there are
1000 groups (when split according to the ORDER BY clause). Consider a
query with "FETCH FIRST 10 ROWS WITH TIES". AFAICS the current estimate
will be 10, but in fact we know that it's likely to produce ~1000 rows
(because that's the average group size).

So I think the patch should tweak the estimate to be

  limitCount + (avgGroupSize/2);

or perhaps

  Max(avgGroupSize, limitCount + (avgGroupSize/2))

The 1/2 is there because we don't know where the group starts.

Of course, using average group size like this is rather crude, but it's
about the best thing we can do. In principle, increasing the cardinality
estimate is the right thing to do.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: chained transactions
Next
From: Peter Eisentraut
Date:
Subject: Re: explain plans with information about (modified) gucs