Re: Row pattern recognition - Mailing list pgsql-hackers
| From | Henson Choi |
|---|---|
| Subject | Re: Row pattern recognition |
| Date | |
| Msg-id | CAAAe_zDEDtJEf6aO=x3qhPOs8kXmsha7Q8ROJXE8VZNp+Dxe2A@mail.gmail.com Whole thread |
| In response to | Re: Row pattern recognition (Henson Choi <assam258@gmail.com>) |
| Responses |
Re: Row pattern recognition
|
| List | pgsql-hackers |
Hi Tatsuo,
Attached are 15 incremental patches on top of v46 (replacing
the previous 8-patch set). Here is a summary of the changes
from the previous version:
0001-0005: Unchanged from the previous set.
0006: Fix DEFINE expression handling in RPR window
planning (revised)
Integration tests in 0007 revealed two crashes:
(1) subquery wrapping with outer aggregate causes
WindowAgg removal when RPR window function output
is unused, (2) RPR and non-RPR windows coexisting
causes SIGSEGV from RPRNavExpr propagating to the
wrong WindowAgg. This version extends the fix to
extract only Var nodes via pull_var_clause()
instead of adding the whole DEFINE expression to
the targetlist. The allpaths.c guard is extended
to also preserve columns referenced by DEFINE
clauses.
0007: Add RPR planner integration tests (new)
Planner optimization interactions with RPR windows
have been revealing crashes, so this is a dedicated
test file: window dedup, subquery flattening,
DEFINE propagation, LATERAL, recursive CTE,
etc. Will be expanded further.
0008: Replace reduced frame map with single match result
(new)
RPR processes rows sequentially and maintains at
most one active match per start position. This
replaces the per-row boolean frame map array with
four scalar fields (valid, matched, start, length),
simplifying nodeWindowAgg.c.
0009: Add fixed-length group absorption for RPR (new)
Extends context absorption to accept fixed-length
quantifiers (min == max) inside unbounded groups.
The NFA compiler optimizes (A B B)+ into
(A B{2})+, which previously lost absorption
eligibility because only {1,1} children
qualified. Now (A B{2})+ and ((A{2} B{3}){2})+
also qualify with no runtime changes. Also fixes
two NFA bugs: (1) incorrect absorption at
non-judgment points -- added ABSORBABLE check
in nfa_states_covered and fixed isAbsorbable
propagation in advance exit paths, (2) VAR
visited bitmap moved to nfa_add_state_unique
to prevent false cycle detection on new-iteration
loop-back caused by generalized END chain
traversal. Design note posted separately [1].
0010: Rename rpr_explain test views to descriptive
names (new)
Replaces sequential view names (rpr_ev83, rpr_ev84)
with descriptive names (rpr_ev_nav_prev1, etc.) for
maintainability when adding new tests.
0011: Implement 1-slot PREV/NEXT navigation for RPR
(was 0007, rebased)
Context changes only: optimizer.h include moved to
0006, test view names updated by 0010.
0012: Add JIT compilation support for RPR PREV/NEXT
(was 0008, unchanged)
Note: master recently disabled JIT by default
[3]. A quick benchmark with RPR did not show
measurable improvement from JIT. This patch is
retained for correctness (the new opcodes must
be handled when JIT is enabled), but the
practical benefit may be limited.
0013: Add tuplestore trim optimization for RPR PREV
navigation (new)
The planner extracts max_offset from all PREV
calls in DEFINE clauses and sets mark = currentpos
- max_offset at runtime, so only max_offset rows
are retained. Constant expressions are folded by
the planner; non-constant offsets (parameters,
etc.) are evaluated by the executor at init time
via RPR_NAV_OFFSET_NEEDS_EVAL.
0014: Update RPR code comments to reflect 1-slot
navigation model (new)
Updates stale comments in execRPR.c and
parse_rpr.c that still referenced the old 3-slot
design.
0015: Implement FIRST/LAST navigation for RPR
(work in progress)
Adds FIRST(column) and LAST(column) navigation
functions per SQL standard. Parser, planner,
executor, and tests are included. See the design
note [2] for the interaction with context
absorption.
Not yet done: compound navigation (e.g.
PREV(FIRST(col))), tests for FIRST/LAST offset
variants, tuplestore mark handling for FIRST,
and additional edge-case tests.
In the FIRST/LAST design note [2], I listed remaining
items that can be done within the current architecture.
With this patch set, all items except PREFIX pattern
absorption are now covered: fixed-length group absorption
(0009), JIT proper support (0012), and FIRST/LAST
navigation (0015). PREFIX absorption is planned
separately, but stabilizing the base patch takes
priority. If it is ready in time, would it make sense
to decide then whether to include it in this round?
Apart from bug fixes, are there any features you
consider essential before the patch can be committed?
[1] Fixed-length group absorption design note
https://www.postgresql.org/message-id/CAAAe_zAKAGKpK9iHx3ZSuG59sP93r5dfootqv5tCfaMt%3Dw6wzA%40mail.gmail.com
[2] FIRST/LAST navigation design note
https://www.postgresql.org/message-id/CAAAe_zCUrKGBgZdaazdO_v9QWHsS_1DXuP%3DrLeNhO3iwsHwwbg%40mail.gmail.com
[3] jit: Change the default to off
https://github.com/postgres/postgres/commit/7f8c88c2b872
Best regards,
Henson
Attached are 15 incremental patches on top of v46 (replacing
the previous 8-patch set). Here is a summary of the changes
from the previous version:
0001-0005: Unchanged from the previous set.
0006: Fix DEFINE expression handling in RPR window
planning (revised)
Integration tests in 0007 revealed two crashes:
(1) subquery wrapping with outer aggregate causes
WindowAgg removal when RPR window function output
is unused, (2) RPR and non-RPR windows coexisting
causes SIGSEGV from RPRNavExpr propagating to the
wrong WindowAgg. This version extends the fix to
extract only Var nodes via pull_var_clause()
instead of adding the whole DEFINE expression to
the targetlist. The allpaths.c guard is extended
to also preserve columns referenced by DEFINE
clauses.
0007: Add RPR planner integration tests (new)
Planner optimization interactions with RPR windows
have been revealing crashes, so this is a dedicated
test file: window dedup, subquery flattening,
DEFINE propagation, LATERAL, recursive CTE,
etc. Will be expanded further.
0008: Replace reduced frame map with single match result
(new)
RPR processes rows sequentially and maintains at
most one active match per start position. This
replaces the per-row boolean frame map array with
four scalar fields (valid, matched, start, length),
simplifying nodeWindowAgg.c.
0009: Add fixed-length group absorption for RPR (new)
Extends context absorption to accept fixed-length
quantifiers (min == max) inside unbounded groups.
The NFA compiler optimizes (A B B)+ into
(A B{2})+, which previously lost absorption
eligibility because only {1,1} children
qualified. Now (A B{2})+ and ((A{2} B{3}){2})+
also qualify with no runtime changes. Also fixes
two NFA bugs: (1) incorrect absorption at
non-judgment points -- added ABSORBABLE check
in nfa_states_covered and fixed isAbsorbable
propagation in advance exit paths, (2) VAR
visited bitmap moved to nfa_add_state_unique
to prevent false cycle detection on new-iteration
loop-back caused by generalized END chain
traversal. Design note posted separately [1].
0010: Rename rpr_explain test views to descriptive
names (new)
Replaces sequential view names (rpr_ev83, rpr_ev84)
with descriptive names (rpr_ev_nav_prev1, etc.) for
maintainability when adding new tests.
0011: Implement 1-slot PREV/NEXT navigation for RPR
(was 0007, rebased)
Context changes only: optimizer.h include moved to
0006, test view names updated by 0010.
0012: Add JIT compilation support for RPR PREV/NEXT
(was 0008, unchanged)
Note: master recently disabled JIT by default
[3]. A quick benchmark with RPR did not show
measurable improvement from JIT. This patch is
retained for correctness (the new opcodes must
be handled when JIT is enabled), but the
practical benefit may be limited.
0013: Add tuplestore trim optimization for RPR PREV
navigation (new)
The planner extracts max_offset from all PREV
calls in DEFINE clauses and sets mark = currentpos
- max_offset at runtime, so only max_offset rows
are retained. Constant expressions are folded by
the planner; non-constant offsets (parameters,
etc.) are evaluated by the executor at init time
via RPR_NAV_OFFSET_NEEDS_EVAL.
0014: Update RPR code comments to reflect 1-slot
navigation model (new)
Updates stale comments in execRPR.c and
parse_rpr.c that still referenced the old 3-slot
design.
0015: Implement FIRST/LAST navigation for RPR
(work in progress)
Adds FIRST(column) and LAST(column) navigation
functions per SQL standard. Parser, planner,
executor, and tests are included. See the design
note [2] for the interaction with context
absorption.
Not yet done: compound navigation (e.g.
PREV(FIRST(col))), tests for FIRST/LAST offset
variants, tuplestore mark handling for FIRST,
and additional edge-case tests.
In the FIRST/LAST design note [2], I listed remaining
items that can be done within the current architecture.
With this patch set, all items except PREFIX pattern
absorption are now covered: fixed-length group absorption
(0009), JIT proper support (0012), and FIRST/LAST
navigation (0015). PREFIX absorption is planned
separately, but stabilizing the base patch takes
priority. If it is ready in time, would it make sense
to decide then whether to include it in this round?
Apart from bug fixes, are there any features you
consider essential before the patch can be committed?
[1] Fixed-length group absorption design note
https://www.postgresql.org/message-id/CAAAe_zAKAGKpK9iHx3ZSuG59sP93r5dfootqv5tCfaMt%3Dw6wzA%40mail.gmail.com
[2] FIRST/LAST navigation design note
https://www.postgresql.org/message-id/CAAAe_zCUrKGBgZdaazdO_v9QWHsS_1DXuP%3DrLeNhO3iwsHwwbg%40mail.gmail.com
[3] jit: Change the default to off
https://github.com/postgres/postgres/commit/7f8c88c2b872
Best regards,
Henson
2026년 4월 2일 (목) 오후 12:51, Henson Choi <assam258@gmail.com>님이 작성:
Hi Tatsuo,
Thank you for the review and the attached patch for 0005.
I appreciate you taking the time to look at each patch
carefully.
Attached are 8 incremental patches on top of v46 (replacing
the previous 6-patch set). 0001-0003 are unchanged. 0004 and
0005 incorporate your feedback below. 0006 is a new planner
fix. Previous 0006 becomes 0007. 0008 is new.> 0004: Fix in-place modification of defineClause TargetEntry
Probably we want to modify the comment above since it implies an
in-place modification? What about something like this? (Modifies -> Replace)
/*
* Replace an expression tree in each DEFINE clause so that all Var
* nodes's varno refers to OUTER_VAR.
*/Good point, thank you. Applied in the updated 0004.> 0005: Fix mark handling for last_value() under RPR
I think instead we can set a mark at frameheadpos when seek type is
SEEK_TAIL and RPR is enabled. See attached patch.Thank you for the patch. Your approach is cleaner -- setting
the mark at frameheadpos is more direct than suppressing
advancement. I've adopted it in the updated 0005.> 0006: Implement 1-slot PREV/NEXT navigation for RPR
Excellent! I will take a look at it. (it will take for a while).No rush at all. In the meantime, while testing the PREV/NEXT
patch in various query patterns, I found a planner issue.
I've also added JIT support. Here is a summary of the new
patches:
0006: Prevent removal of RPR window functions in unused
subquery outputs
Wrapping an RPR window query in a subquery with an
outer aggregate causes a crash:
SELECT count(*) FROM (
SELECT count(*) OVER w FROM ...
WINDOW w AS (... DEFINE A AS i > PREV(i))
) t;
remove_unused_subquery_outputs() replaces unused subquery
target entries with NULL constants. When an RPR window
function's result is not referenced by the outer query,
this replacement eliminates all active window functions
for the WindowClause, causing the planner to omit the
WindowAgg node. DEFINE clause expressions containing
RPRNavExpr (PREV/NEXT) then lose their execution context,
leading to a crash.
The fix skips the NULL replacement for window functions
whose WindowClause has a defineClause. Even when the
window function result is unused, RPR pattern matching
(frame reduction) must still execute -- the WindowAgg
node must be preserved.
An alternative would be to let the planner remove the
window function but teach it to still generate the
WindowAgg node when defineClause is present, even with
no active window functions. That would be a more
targeted optimization but requires deeper planner
changes. Do you think the current approach is
sufficient, or would you prefer a different strategy?
0007: Implement 1-slot PREV/NEXT navigation for RPR
(unchanged from previous 0006)> 2. LLVM JIT fallback (in 0006): The mid-expression slot swap
I am not an expert of JIT, but for me, it sounds reasonable. We can
enhance it later on.0008: Add JIT compilation support for RPR PREV/NEXT navigation
EEOP_OUTER_VAR normally uses slot pointers cached in the
entry block, but the mid-expression slot swap in
NAV_SET/RESTORE invalidates them. To avoid penalizing
existing expressions, the reload path is only generated
when RPR navigation opcodes are present in the
expression (compile-time decision). Expressions without
PREV/NEXT produce identical machine code as before.
NAV_SET/RESTORE use the standard build_EvalXFunc pattern.
A 100K-row PREV/NEXT test case that runs under
jit_above_cost=0 is included in rpr.sql.
I would appreciate your thoughts on these when you have time.
Best regards,
Henson
Attachment
- nocfbot-0001-Remove-unused-regex-include.txt
- nocfbot-0002-CHECK_FOR_INTERRUPTS-nfa_add_state_unique.txt
- nocfbot-0003-CHECK_FOR_INTERRUPTS-nfa_try_absorb_context.txt
- nocfbot-0004-Fix-defineClause-TargetEntry-copy.txt
- nocfbot-0005-Fix-mark-handling-last_value-RPR.txt
- nocfbot-0006-Fix-DEFINE-expression-handling-RPR-window-planning.txt
- nocfbot-0007-Add-RPR-planner-integration-tests.txt
- nocfbot-0008-Replace-reduced-frame-map-with-single-match-result.txt
- nocfbot-0009-Add-fixed-length-group-absorption-for-RPR.txt
- nocfbot-0010-Rename-rpr_explain-test-views-to-descriptive-names.txt
- nocfbot-0011-Implement-1-slot-PREV-NEXT-navigation-for-RPR.txt
- nocfbot-0012-JIT-support-for-PREV-NEXT.txt
- nocfbot-0013-Add-tuplestore-trim-optimization-for-RPR-PREV.txt
- nocfbot-0014-Update-RPR-code-comments-for-1-slot-navigation.txt
- wip-0015-Implement-FIRST-LAST-navigation-for-RPR.txt
pgsql-hackers by date: