Thread: Window Functions: buffering strategy

Window Functions: buffering strategy

From
"Hitoshi Harada"
Date:
> I can find how to do it with the new (window execution model) design,
> (and the design is suitable to fix it above,) but at first before
> going into trivial specs, I would like core hackers to review the
> model is better than before or not. Thank you for your cooperation.

So no objections appeared. I am progressing with the new model.

Thanks to feedbacks against the previous patch, I am inclined to think
that we need to have different row buffering strategy for window
functions. As we discussed, depends on functions different bufferings
are needed. Then I read spec again and classified functions.

- row_number() -- row bufferingObviously no buffering, or what you need is only where you are in the partition

- rank() -- row bufferingYou need two rows, previous row and current row to determine if you
rank up or not. But it is solvable if the function cache the previous
row so row buffering is needed.

- dense_rank() -- row bufferingSame as rank().

- percent_rank() -- partition bufferingIn addition to rank() requirement, you have to see the last row in
the partition to count up how many rows are contained in the
partition. The spec says: defined as (RK–1)/(NR–1), where RK is
defined to be the RANK of R and NR is defined to be the number of rows
in the window partition of R.

- cume_dist() -- partition bufferingAlmost same as percent_rank(), partition buffering.

- ntile() -- partition bufferingYou have to know how many rows in the partition to determine how many
rows are contained in each bucket.

- lead() -- partition bufferingThe spec says: returns the value of VE evaluated on a row that is
OFFSET number of rows after R within P (partition).

- lag() -- partition bufferingSame as lead().

- first_value() -- frame bufferingThe spec says: return the value of VE evaluated on the first row of
the window frame.

- last_value() -- frame bufferingSame as first_value().

- nth_value() -- frame bufferingSame as first_value()

- aggregates -- frame bufferingaggregates all rows in the frame.

So I propose three Window node buffering strategies,
row/frame/partition bufferinig so as to avoid unnecessary row
buffering. Each window functions have buffering strategy number and
planner detect which strategy is at least needed for the execution. If
the node contains only row_number() then it needs row buffering but if
row_number() and lead() then partition bufferinig is needed.
Temporarily until window function APIs are public out, the strategy
numbers for each function can be defined in the source code as macro
or something, and when they are public we must have pg_wfunc or
something to register the strategy. If the function violate the
declared strategy and touched different API, for example if the
row_number() is about to call window_paritition_rows(), error will
occur.

After reading spec closer, I found row_number(), rank(), etc. doesn't
care if there is a moving frame. Only aggregates and
first/last/nth_value()s care it.

Now I guess I know what to do but I don't know how to do it :)  But I
am going to send another patch until next commit fest. I appreciate
for you comments.

Regeards,


--
Hitoshi Harada


Re: Window Functions: buffering strategy

From
Simon Riggs
Date:
On Mon, 2008-10-20 at 10:32 +0900, Hitoshi Harada wrote:

> So I propose three Window node buffering strategies,
> row/frame/partition buffering so as to avoid unnecessary row
> buffering.

Sounds good from here. Can I suggest you release the code in phases?

It would be better if we got just one of those done (row?), than to
attempt all 3 and end up with none because of emerging details.

Good luck.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Window Functions: buffering strategy

From
"Hitoshi Harada"
Date:
Hi,

2008/10/20 Simon Riggs <simon@2ndquadrant.com>:
>
> On Mon, 2008-10-20 at 10:32 +0900, Hitoshi Harada wrote:
>
>> So I propose three Window node buffering strategies,
>> row/frame/partition buffering so as to avoid unnecessary row
>> buffering.
>
> Sounds good from here. Can I suggest you release the code in phases?
>
> It would be better if we got just one of those done (row?), than to
> attempt all 3 and end up with none because of emerging details.

Thank you for your feedback.
Ok, actually the first will be partition buffering, because it covers
all of the functions' requirement. The row buffering is kind of
optimization in the special case.

Regards,

-- 
Hitoshi Harada


Re: Window Functions: buffering strategy

From
Heikki Linnakangas
Date:
Hitoshi Harada wrote:
> Hi,
> 
> 2008/10/20 Simon Riggs <simon@2ndquadrant.com>:
>> On Mon, 2008-10-20 at 10:32 +0900, Hitoshi Harada wrote:
>>
>>> So I propose three Window node buffering strategies,
>>> row/frame/partition buffering so as to avoid unnecessary row
>>> buffering.
>> Sounds good from here. Can I suggest you release the code in phases?
>>
>> It would be better if we got just one of those done (row?), than to
>> attempt all 3 and end up with none because of emerging details.
> 
> Thank you for your feedback.
> Ok, actually the first will be partition buffering, because it covers
> all of the functions' requirement. The row buffering is kind of
> optimization in the special case.

The thought I had during the last commit fest was that the function 
would call a callback, something like window_forget(pos), that would 
tell the system that it can release any rows before the given position. 
That way you only need one method, and it's also be optimal for 
functions like lag(), that doesn't fit perfectly into either the row or 
frame buffering category. Or do we need the information at plan-time?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Window Functions: buffering strategy

From
"Hitoshi Harada"
Date:
2008/10/20 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
> Hitoshi Harada wrote:
>>
>> Hi,
>>
>> 2008/10/20 Simon Riggs <simon@2ndquadrant.com>:
>>>
>>> On Mon, 2008-10-20 at 10:32 +0900, Hitoshi Harada wrote:
>>>
>>>> So I propose three Window node buffering strategies,
>>>> row/frame/partition buffering so as to avoid unnecessary row
>>>> buffering.
>>>
>>> Sounds good from here. Can I suggest you release the code in phases?
>>>
>>> It would be better if we got just one of those done (row?), than to
>>> attempt all 3 and end up with none because of emerging details.
>>
>> Thank you for your feedback.
>> Ok, actually the first will be partition buffering, because it covers
>> all of the functions' requirement. The row buffering is kind of
>> optimization in the special case.
>
> The thought I had during the last commit fest was that the function would
> call a callback, something like window_forget(pos), that would tell the
> system that it can release any rows before the given position. That way you
> only need one method, and it's also be optimal for functions like lag(),
> that doesn't fit perfectly into either the row or frame buffering category.
> Or do we need the information at plan-time?
>

Right. In the last commit fest we discussed about run-time cut-off
signal API. But I have finally come to that we must know how to buffer
*before* any execution.

The real problem is not how to cut off preceding rows, but how to read
ahead after the current row. I intend to avoid reading ahead until end
of the partition for only row_number() that doesn't need any following
rows. Sometimes we have to store whole the partition before returning
the first result and sometimes not. It depends on function categories,
or function access range. My current idea is classify Window function
API to three parallel to buffering strategies.

And the lag()/lead(), spec says "OFFSET is exact numeric literal" but
we postgres doesn't have mechanism to limit the function argument data
type to Const integer only. So I am thinking about OFFSET for
lag()/lead() may be dynamic integer variable. If it comes, even those
functions don't know how many rows should be cut off. The lag()/lead()
can access any row of partition, per spec.

Regards,


-- 
Hitoshi Harada


Re: Window Functions: buffering strategy

From
Heikki Linnakangas
Date:
Hitoshi Harada wrote:
> The real problem is not how to cut off preceding rows, but how to read
> ahead after the current row. I intend to avoid reading ahead until end
> of the partition for only row_number() that doesn't need any following
> rows. Sometimes we have to store whole the partition before returning
> the first result and sometimes not. It depends on function categories,
> or function access range. My current idea is classify Window function
> API to three parallel to buffering strategies.

Could the rows be read ahead on demand? If the window function calls 
window_getarg on a row that's not yet fetched, fetch forward to that row.

> And the lag()/lead(), spec says "OFFSET is exact numeric literal" but
> we postgres doesn't have mechanism to limit the function argument data
> type to Const integer only. So I am thinking about OFFSET for
> lag()/lead() may be dynamic integer variable. If it comes, even those
> functions don't know how many rows should be cut off. The lag()/lead()
> can access any row of partition, per spec.

Hmm, yeah, that's a bit of a problem :-(.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Window Functions: buffering strategy

From
"Hitoshi Harada"
Date:
2008/10/21 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
> Hitoshi Harada wrote:
>>
>> The real problem is not how to cut off preceding rows, but how to read
>> ahead after the current row. I intend to avoid reading ahead until end
>> of the partition for only row_number() that doesn't need any following
>> rows. Sometimes we have to store whole the partition before returning
>> the first result and sometimes not. It depends on function categories,
>> or function access range. My current idea is classify Window function
>> API to three parallel to buffering strategies.
>
> Could the rows be read ahead on demand? If the window function calls
> window_getarg on a row that's not yet fetched, fetch forward to that row.

Well, it could be possible. But from my current view it will be very
complicated and might be impossible. So I will try to implement basic
approach, and let's consider your approach then. We keep stay in
private API so that we have time to consider again.

Regards,


-- 
Hitoshi Harada