Home > mailing lists

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options - Mailing list pgsql-hackers

From	Tatsuo Ishii
Subject	Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
Date	June 12 01:52:35
Msg-id	20250612.075235.1886649649521129924.ishii@postgresql.org Whole thread Raw
In response to	Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options (Tatsuo Ishii <ishii@postgresql.org>)
List	pgsql-hackers

Tree view

> One thing I worry about the patch is, now the non-nulls array
> optimization was removed.  Since then I have been thinking about if
> there could be other way to optimize searching for non null rows.

Here is the v12 patch to implement the optimization on top of Oliver's
v11 patch. Only src/backend/executor/nodeWindowAgg.c was modified
(especially ignorenulls_getfuncarginframe). In the patch I created
2-bit not null information array, representing following status for
each row:

UNKNOWN: the row is not determined whether it's NULL or NOT yet.
     This is the initial value.
NULL: the row has been determined to be NULL.
NOT NULL: the row has been determined to be NOT NULL.

In ignorenulls_getfuncarginframe:

For the first time window function visits a row in a frame, the row is
fetched using window_gettupleslot() and it is checked whether it is in
the frame using row_is_in_frame(). If it's in the frame and the
information in the array is UNKNOWN, ExecEvalExpr() is executed to
find out if the expression on the function argument is NULL or
not. And the result (NULL or NOT NULL) is stored in the array.

If the information in the array is not UNKNOWN, we can skip calling
ExecEvalExpr() because the information is already in the array.

Note that I do not skip calling window_gettupleslot() and
row_is_in_frame(), skip only calling ExecEvalExpr(), because whether a
row is in a frame or not could be changing as the current row position
moves while processing window functions.

With this technique I observed around 40% speed up in my environment
using the script attached, comparing with Oliver's v11 patch.

v11:
rows    duration (msec)
1000    41.019 
2000    148.957 
3000    248.291 
4000    442.478 
5000    687.395

v12:
rows    duration (msec)
1000    27.515 
2000    78.913 
3000    174.737 
4000    311.412 
5000    482.156 

The patch is now generated using the standard git format-patch.  Also
I have slightly adjusted the coding style so that it aligns with the
one used in nodeWindowAgg.c, and ran pgindent.

Note that I have not modified ignorenulls_getfuncarginpartition yet. I
think we could optimize it using the not null info infrastructure as
well. Will come up with it.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
for i in 1000 2000 3000 4000 5000
do
    echo "$i rows: "
    pos=`expr $i / 2`
    psql -a test <<EOF
\timing
explain analyze
SELECT
  x,
nth_value(x,$pos) IGNORE NULLS OVER w
FROM generate_series(1,$i) g(x)
WINDOW w AS (ORDER BY x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING);
EOF

done | egrep "rows:|Time:" | egrep -v "Planning|Execution"|
    sed -e 's/rows: *//' -e 's/Time: //' -e 's/ms//'

Attachment

v12-0001-Add-IGNORE-NULLS-RESPECT-NULLS-option-to-Window-.patch

pgsql-hackers by date:

From: Peter Smith
Date: 12 June, 01:43:10
Subject: Re: Replication slot is not able to sync up

From: Noboru Saito
Date: 12 June, 02:49:11
Subject: Re: [PATCH] Proposal: Improvements to PDF stylesheet and table column widths

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options - Mailing list pgsql-hackers

Attachment

Previous

Next