Thread: A rather hackish POC for alternative implementation of WITH TIES
This patch is a rather hacky implementation of the basic idea for implementing FETCH ... WITH TIES, and potentially also PERCENT, by using a window function expression to compute a stopping point. Large chunks of this (the parser/ruleutils changes, docs, tests) are taken from Surafel Temesgen's patch. The difference is that the executor change in my version is minimal: Limit allows a boolean column in the input to signal the point at which to stop. The planner inserts a WindowAgg node to compute the necessary condition using the rank() function. The way this is done in the planner isn't (IMO) the best and should probably be improved; in particular it currently misses some possible optimizations (most notably constant-folding of the offset+limit subexpression). I also haven't tested it properly to see whether I broke anything, though it does pass regression. -- Andrew (irc:RhodiumToad)
Attachment
This patch is a rather hacky implementation of the basic idea for
implementing FETCH ... WITH TIES, and potentially also PERCENT, by using
a window function expression to compute a stopping point.
Large chunks of this (the parser/ruleutils changes, docs, tests) are
taken from Surafel Temesgen's patch. The difference is that the executor
change in my version is minimal: Limit allows a boolean column in the
input to signal the point at which to stop. The planner inserts a
WindowAgg node to compute the necessary condition using the rank()
function.
The way this is done in the planner isn't (IMO) the best and should
probably be improved; in particular it currently misses some possible
optimizations (most notably constant-folding of the offset+limit
subexpression). I also haven't tested it properly to see whether I broke
anything, though it does pass regression.
Unlike most other executor node limit node has implementation for handling backward scan that support cursor operation but your approach didn't do this inherently because it outsource limitNode functionality to window function and window function didn't do this
eg.
postgres=# begin;
BEGIN
postgres=# declare c cursor for select i from generate_series(1,1000000) s(i) order by i fetch first 2 rows with ties;
DECLARE CURSOR
postgres=# fetch all in c;
i
---
1
2
(2 rows)
postgres=# fetch backward all in c;
ERROR: cursor can only scan forward
HINT: Declare it with SCROLL option to enable backward scan.
Even with SCROLL option it is not working as limitNode does. It store the result and return in backward scan that use more space than current limit and limit with ties implementation.
If am not mistaken the patch also reevaluate limit every time returning row beside its not good for performance its will return incorrect result with limit involving volatile function
regards
Surafel
>>>>> "Surafel" == Surafel Temesgen <surafel3000@gmail.com> writes: Surafel> Unlike most other executor node limit node has implementation Surafel> for handling backward scan that support cursor operation but Surafel> your approach didn't do this inherently because it outsource Surafel> limitNode functionality to window function and window function Surafel> didn't do this Correct. But this is a non-issue: if you want to be able to do backward scan you are supposed to declare the cursor as SCROLL; if it happens to work without it, that is pure coincidence. (Cursors declared with neither SCROLL nor NO SCROLL support backwards scan only if the underlying plan supports backward scan with no additional overhead, which is something you can't predict from the query.) The Limit node declares that it supports backwards scan if, and only if, its immediate child node supports it. It happens that WindowAgg does not, so in this implementation, LIMIT ... WITH TIES will not support backward scan without a tuplestore. I don't consider this an especially big deal; backward scans are extremely rare (as shown by the fact that bugs in backward scan have tended to go unnoticed for decades, e.g. bug #15336), and therefore we should not optimize for them. Surafel> If am not mistaken the patch also reevaluate limit every time The (offset+limit) expression is, yes. I noted in the original post that this needs work - probably it should be pushed out to an InitPlan if it doesn't fold to a constant. i.e. using the expression rank() over (...) > (select offset+limit) where it currently has rank() over (...) > (offset+limit) (Generating the limit expression so late in planning is the main thing that needs changing to get this from a hack POC to usable code) The main point here is that the same rather minimal executor changes allow support for not only WITH TIES but also PERCENT and possibly arbitrary stop conditions as well. (I know I've often wanted LIMIT WHEN to stop a query at a data-dependent point without having to resort to recursion - this patch doesn't quite get there, because of the scope issues involved in analyzing the WHEN condition, but it at least sets up the concept.) -- Andrew (irc:RhodiumToad)
Hello As this is a valuable feature, it would be good to have something happen here. I wouldn't like to have pg13 ship with no implementation of WITH TIES at all. My own inclination is that Andrew's implementation, being more general in nature, would be the better one to have in the codebase; but we don't have a complete patch yet. Can we reach some compromise such as if Andrew's patch is not completed then we push Surafel's? Thanks -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes: Alvaro> My own inclination is that Andrew's implementation, being more Alvaro> general in nature, would be the better one to have in the Alvaro> codebase; but we don't have a complete patch yet. Can we reach Alvaro> some compromise such as if Andrew's patch is not completed then Alvaro> we push Surafel's? Mine needs some attention to where exactly in planning the necessary transformation work should be done; right now the planner part is a hack, intended to demonstrate the idea (and to let the executor changes work) rather than actually be the final version. As I mentioned before, some stuff does need to be pushed out to an InitPlan to make it work without multiple-evaluation problems. (A second opinion from another planner expert would be welcome on that part) I was largely holding off on doing further work hoping for some discussion of which way we should go. If you think my approach is worth pursuing (I haven't seriously tested the performance, but I'd expect it to be slower than Surafel's - the price you pay for flexibility) then I can look at it further, but figuring out the planner stuff will take some time. -- Andrew.
> in nature, would be the better one to have in the codebase; but we don't
> have a complete patch yet. Can we reach some compromise such as if
> Andrew's patch is not completed then we push Surafel's?
+1
On Wed, Jan 22, 2020 at 4:35 PM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
> I was largely holding off on doing further work hoping for some
> discussion of which way we should go. If you think my approach is worth
> pursuing (I haven't seriously tested the performance, but I'd expect it
> to be slower than Surafel's - the price you pay for flexibility) then I
> can look at it further, but figuring out the planner stuff will take
> some time.
Flexibility with more generalized code is good, though if performance is significantly slower I would be concerned. I quickly reviewed the patch but haven't tested it yet.
>>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes:
I was largely holding off on doing further work hoping for some
discussion of which way we should go. If you think my approach is worth
pursuing (I haven't seriously tested the performance, but I'd expect it
to be slower than Surafel's - the price you pay for flexibility) then I
can look at it further, but figuring out the planner stuff will take
some time.
On 2020-Mar-26, Surafel Temesgen wrote: > On Wed, Jan 22, 2020 at 3:35 PM Andrew Gierth <andrew@tao11.riddles.org.uk> > wrote: > > > >>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > > > I was largely holding off on doing further work hoping for some > > discussion of which way we should go. If you think my approach is worth > > pursuing (I haven't seriously tested the performance, but I'd expect it > > to be slower than Surafel's - the price you pay for flexibility) then I > > can look at it further, but figuring out the planner stuff will take > > some time. > > Other alternative can be pushing the existing implementation > which will be open to change in case of better-finished > implementation. At this point, I think that's what we should do. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services