Re: FETCH FIRST clause WITH TIES option - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: FETCH FIRST clause WITH TIES option
Date
Msg-id 73660663-d1c6-12cc-200f-315616560a93@2ndquadrant.com
Whole thread Raw
In response to Re: FETCH FIRST clause WITH TIES option  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
List pgsql-hackers
On 10/29/2018 05:47 PM, Andrew Gierth wrote:
>>>>>> "Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> 
>   >> I still think that this is the wrong approach. Implementing WITH
>   >> TIES and PERCENT together using an implicit window function call
>   >> kills two birds with one very small stone (the only executor change
>   >> needed would be teaching LIMIT to be able to stop on a boolean
>   >> condition), with maximum reuse of existing facilities.
> 
>   Tomas> Hmmm, maybe. How would that work, exactly? Wouldn't that mean
>   Tomas> extra overhead (the window functions are hardly free) and
>   Tomas> limitations?
> 
> They're not free, but then this case probably shouldn't be treated as a
> particularly hot code path.
> 
> The basic idea is this: extend the Limit node to be able to stop on a
> boolean expression, such that the first false value ends the result.
> 
> Then FETCH FIRST N WITH TIES becomes "stop when the expression
>    rank() over (order by ...) <= N
> is no longer true" (where the ... is the existing top level order by)
> 
> and FETCH FIRST N PERCENT is the same but with percent_rank (or
> cume_dist, I'd have to check the exact semantics)
> 
> rank() doesn't need to read ahead in the input. percent_rank of course
> does, but it's clearly impossible to implement PERCENT without doing
> that.
> 

OK. I was under the impression the window function would need to see all 
the input rows, but that seems not to be the case in general.

> Also, one could consider extending LIMIT to support arbitrary
> expressions, with some syntax like LIMIT WHEN (condition) which would be
> the general form.
> 

Hmmm. I can't really recall needing such thing, but of course - that 
does not prove it'd not be a good thing.

>   Tomas> FWIW, I doubt the patch can be much smaller/simpler - a
>   Tomas> significant part of the new stuff is in gram.y and node read/out
>   Tomas> infrastructure, the changes to LIMIT node are fairly minimal.
> 
> It's not a matter of patch size as such, but reuse of mechanisms rather
> than adding new ones. Also doing WITH TIES as a special case in the
> Limit node doesn't make adding PERCENT any easier at all, whereas the
> above gets it basically for free.
> 

Makes sense, I guess. At first I was thinking that this certainly does 
not add more new mechanisms than allowing LIMIT to terminate on boolean 
expression. But you're right about the WITH PERCENT part - using window 
functions would make adding this much simpler.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Speeding up INSERTs and UPDATEs to partitioned tables
Next
From: Michael Paquier
Date:
Subject: Re: partition tree inspection functions