Re: Qual push down to table AM - Mailing list pgsql-hackers
From | Mark Dilger |
---|---|
Subject | Re: Qual push down to table AM |
Date | |
Msg-id | CAHgHdKt9fY37iBwbhD4=5e4wGGuQE=sr-Tt8C9cf9Q16QkT=zQ@mail.gmail.com Whole thread Raw |
In response to | Qual push down to table AM (Julien Tachoires <julien@tachoires.me>) |
List | pgsql-hackers |
Thanks for the patchset, Julien.
v1-0001:
All current callers of scan_rescan() pass NULL for the `key` parameter, making it unclear to
Table AM authors if this is supposed to be a pointer to a single key, or an array. If an array,
how is it terminated? None of this is addressed in the current code comments, and given
that nobody uses this field, the intent is undiscoverable. By adding `int nkeys` to the parameter
list, your patch makes the intention clearer. Perhaps you could also update the documentation
for these functions?
v1-0002
Changing the EXPLAIN output to include where the exclusion happened is quite nice!
Out of curiosity, why did you wait until this patch to add the `int nkeys` parameter to
initscan()? It seems more on-topic for v1-0001. Likewise, renaming `key` as `keys` helps,
but could have been done in v1-0001.
As for Andres' concern upstream, I am including a Work-In-Progress patch (WIP) to check
that no light-weight locks are held during qual evaluation. I don't intend this for commit so much
as for discussion. It seems to me fairly clear that your patch does not evaluate the quals while
holding such locks, but I might have misunderstood the concern.
On Wed, Aug 27, 2025 at 1:28 PM Julien Tachoires <julien@tachoires.me> wrote:
Hi,
Please find attached a patch set proposal intended to implement WHERE
clauses (qual) push down to the underlying table AM during
table/sequential scan execution.
The primary goal of this project is to convert quals to ScanKeys and
pass them to the table AMs. Table AMs are then allowed to apply early
tuple filtering during table (sequential) scans. Applying filtering at
the table storage level is something necessary for non row-oriented
table storage like columnar storage. Index organized table is another
table storage that would need quals push down.
AFAIK, CustomScan is the one and only way to go for having table scan
using quals pushed down, but each table AM must implement its own
mechanism. IMHO, having this feature available in core would help the
development of new table AMs. About Heap, some performance testing
(detailed at the end of this message) shows between 45% and 60%
improvement in seq scan execution time when only one tuple is returned
from the table.
Only a few expressions are supported: OpExpr (<key> <operator> <value>),
ScalarArrayOpExpr (<key> <operator> ANY|ALL(ARRAY[...]), and NullTest.
Row comparison is not yet supported as this part is still not clear to
me. On the right part of the expression, we support: constant, variable,
function call, and subquery (InitPlan only).
In terms of security, we check if the function related to the operator
is not user defined: only functions from the catalog are supported. We
also check that the function is "leakproof".
Pushing down quals does not guaranty to the executor that the tuples
returned during table scan satisfy a qual, as we don't know if the table
AM (potentially implemented via an extension) has applied tuple
filtering. In order to ensure to produce the right response to the where
clause, pushed down quals are executed twice per tuple returned: once by
the table AM, and once by the executor. This produces a performance
regression (15-17%) where almost the entire table is returned (see perf.
test results at the end of the message). This could be optimized by
flagging the tuples filtered by the table AM, this way we could avoid
the re-execution of the pushed down quals.
Details about the patch files
v1-0001-Pass-the-number-of-ScanKeys-to-scan_rescan.patch: This patch
adds the number of ScanKeys passed via scan_rescan() as a new argument.
The number of ScanKeys was only passed to the table AM via begin_scan(),
but not in scan_rescan().
v1-0002-Simple-quals-push-down-to-table-AMs.patch: Core of the feature,
this patch adds qual push down support for OpExpr expressions.
v1-0003-Add-the-table-reloption-quals_push_down.patch: Adds a new
reloption: quals_push_down used to enable/disable qual push down for a
table. Disabled by default.
v1-0004-Add-tests-for-quals-push-down-to-table-AM.patch: Regression
tests.
v1-0005-Push-down-IN-NOT-IN-array-quals-to-table-AMs.patch:
ScalarArrayOpExpr support.
v1-0006-Push-down-IS-IS-NOT-NULL-quals-to-table-AMs.patch: NullTest
support.
Performance testing
Head:
CREATE TABLE t (i INTEGER);
Patch:
CREATE TABLE t (i INTEGER) WITH (quals_push_down = on);
n=1M:
INSERT INTO t SELECT generate_series(1, 1000000);
VACUUM t;
n=10M:
TRUNCATE t;
INSERT INTO t SELECT generate_series(1, 10000000);
VACUUM t;
n=100M:
TRUNCATE t;
INSERT INTO t SELECT generate_series(1, 100000000);
VACUUM t;
Case #1: SELECT COUNT(*) FROM t WHERE i = 50000;
| n=1M | n=10M | n=100M
+--------+--------+---------+---------+----------+---------
| Head | Patch | Head | Patch | Head | Patch
--------+--------+--------+---------+---------+----------+---------
Test #1 | 38.903 | 21.308 | 365.707 | 155.429 | 3939.937 | 1564.182
Test #2 | 39.239 | 21.271 | 364.206 | 153.127 | 3872.370 | 1527.988
Test #3 | 39.015 | 21.958 | 365.434 | 154.498 | 3812.382 | 1525.535
--------+--------+--------+---------+---------+----------+---------
--------+--------+--------+---------+---------+----------+---------
Average | 39.052 | 21.512 | 365.116 | 154.351 | 3874.896 | 1539.235
Std dev | 0.171 | 0.386 | 0.800 | 1.158 | 63.815 | 21.640
--------+--------+--------+---------+---------+----------+---------
Gain | 44.91% | 57.73% | 60.28%
Case #2: SELECT COUNT(*) FROM t WHERE i >= 2;
| n=1M | n=10M | n=100M
+--------+--------+---------+---------+----------+---------
| Head | Patch | Head | Patch | Head | Patch
--------+--------+--------+---------+---------+----------+---------
Test #1 | 68.422 | 81.233 | 674.397 | 778.427 | 6845.165 | 8071.627
Test #2 | 69.237 | 80.868 | 682.976 | 774.417 | 6533.091 | 7668.477
Test #3 | 69.579 | 80.418 | 676.072 | 791.465 | 6917.964 | 7916.182
--------+--------+--------+---------+---------+----------+---------
--------+--------+--------+---------+---------+----------+---------
Average | 69.079 | 80.840 | 677.815 | 781.436 | 6765.407 | 7885.429
Std dev | 0.594 | 0.408 | 4.547 | 8.914 | 204.457 | 203.327
--------+--------+--------+---------+---------+----------+---------
Gain | -17.02% | -15.29% | -16.56%
Thoughts?
Best regards,
--
Julien Tachoires
Mark Dilger
Attachment
pgsql-hackers by date: