Re: Window functions patch v04 for the September commit fest - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Window functions patch v04 for the September commit fest
Date
Msg-id 48BCE25D.8040804@enterprisedb.com
Whole thread Raw
In response to Re: Window functions patch v04 for the September commit fest  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: Window functions patch v04 for the September commit fest  ("Hitoshi Harada" <umi.tanuki@gmail.com>)
List pgsql-hackers
Gregory Stark wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> 
>> sfunc is called once for each input row. Unlike with normal aggregates, sfunc
>> is passed the whole input row, so that e.g RANK can compare it against the
>> previous row, or LEAD can buffer it.
> 
> I'm not sure I follow this bit about being passed the whole input row. How
> would that relate to something like lag(x*y, 2) for example?

Well, sfunc needs to be passed not only the whole input tuple, but also 
the values of the arguments.

Hmm. The spec says that the offset argument of lead and lag needs to be 
a numeric literal. To enforce that at parse time, we'd have to hard-code 
that into the grammar. Or we could check it at run-time, and throw an 
error if its value changes across invocations of the sfunc function.

>> outfunc is a set-returning function, and it is called until it returns no more
>> rows, after each sfunc invocation.
> 
> And in any case I feel like this is abstraction on the cheap. The only reason
> it's so general is because it's leaving all the work to the functions to
> implement. 
> 
> It also means we get no benefit from cases like
> 
> SELECT lag(x,5),lag(y,5),lag(z,5)
> 
> where the executor could keep one tuplestore and for all of them. For that
> matter it could keep one tuplestore even for cases like lag(x,5),lag(y,4).

True. I guess it comes down to how much complexity we're willing to have 
in the executor node to handle that more efficiently.

Note that the memory usage is the same either way, except for the extra 
constant overhead of multiple tuplestores, because each tuplestore 
inside the lag implementation only needs to store its own column.

> What would the executor do for a query like
> 
> SELECT lead(x,1),lead(y,2),lead(y,3)
> 
> It would not only have to keep a tuplestore to buffer the output but it would
> have to deal with receiving data from different SRFs at different times. The
> best approach I can think of would be to keep a tuplestore for each SRF using
> themas queues, reading old values from the head as soon as they all have at
> least one new value in them.

Hitoshi solved that creating a separate Window node for each window 
function. So the plan tree for that would look something like:

Window (lead(x,1))  Window (lead(y,2))    Window (lead(y,3))      Seq Scan ...

That keeps the Window node implementation quite simple because it only 
needs to handle one window function at a time.

> And it doesn't answer how to deal with things like
> 
> SELECT lag(x,1) OVER (ORDER BY a), lag(x,1) OVER (ORDER BY b)
> 
> I, uh, don't actually have any ideas of how to deal with that one :(

The above handles that by putting extra sort nodes in between the Window 
nodes. Not too efficient, but I don't see any way around it.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: "Pavel Stehule"
Date:
Subject: Re: Is this really really as designed or defined in some standard
Next
From: David Fetter
Date:
Subject: Re: Window functions patch v04 for the September commit fest