> Here is the v12 patch to implement the optimization on top of Oliver's
> v11 patch. Only src/backend/executor/nodeWindowAgg.c was modified
> (especially ignorenulls_getfuncarginframe). In the patch I created
> 2-bit not null information array, representing following status for
> each row:
>
> UNKNOWN: the row is not determined whether it's NULL or NOT yet.
> This is the initial value.
> NULL: the row has been determined to be NULL.
> NOT NULL: the row has been determined to be NOT NULL.
>
> In ignorenulls_getfuncarginframe:
>
> For the first time window function visits a row in a frame, the row is
> fetched using window_gettupleslot() and it is checked whether it is in
> the frame using row_is_in_frame(). If it's in the frame and the
> information in the array is UNKNOWN, ExecEvalExpr() is executed to
> find out if the expression on the function argument is NULL or
> not. And the result (NULL or NOT NULL) is stored in the array.
>
> If the information in the array is not UNKNOWN, we can skip calling
> ExecEvalExpr() because the information is already in the array.
>
> Note that I do not skip calling window_gettupleslot() and
> row_is_in_frame(), skip only calling ExecEvalExpr(), because whether a
> row is in a frame or not could be changing as the current row position
> moves while processing window functions.
>
> With this technique I observed around 40% speed up in my environment
> using the script attached, comparing with Oliver's v11 patch.
>
> v11:
> rows duration (msec)
> 1000 41.019
> 2000 148.957
> 3000 248.291
> 4000 442.478
> 5000 687.395
>
> v12:
> rows duration (msec)
> 1000 27.515
> 2000 78.913
> 3000 174.737
> 4000 311.412
> 5000 482.156
>
> The patch is now generated using the standard git format-patch. Also
> I have slightly adjusted the coding style so that it aligns with the
> one used in nodeWindowAgg.c, and ran pgindent.
>
> Note that I have not modified ignorenulls_getfuncarginpartition yet. I
> think we could optimize it using the not null info infrastructure as
> well. Will come up with it.
Attached is the v13 patch to address this: i.e. optimize window
functions (lead/lag) working in a partition. In summary I get 40x
speed up comparing with v12 patch. Here is the test script.
EXPLAIN ANALYZE
SELECT lead(x, 5000) IGNORE NULLS OVER ()
FROM generate_series(1,10000) g(x);
This looks for the 5k th row in a partition including 10k rows.
The average duration of 3 trials are:
v12: 2563.665 ms
v13: 126.259 ms
So I got 40x speed up with the v13 patch. In v12, we needed to scan
10k row partition over and over again. In v13 I used the same NULL/NOT
NULL cache infrastructure created in v12, and we need to scan the
partition only once. This is the reason for the speed up.
Also I removed ignorenulls_getfuncarginpartition(), which was the work
horse for null treatment for window functions working for partitions
in v12 patch. Basically it was a copy and modified version of
WinGetFuncArgInPartition(), thus had quite a few code duplication. In
v13, instead I modified WinGetFuncArgInPartition() so that it can
handle directly null treatment procedures.
BTW I am still not satisfied by the performance improvement for window
functions for frames, that was only 40%. I will study the code to look
for more optimization.
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp