Re: Row pattern recognition - Mailing list pgsql-hackers

From Henson Choi
Subject Re: Row pattern recognition
Date
Msg-id CAAAe_zAzOgB2KR9ACDD2o3QNP_gaCKnUd2bRGgbhcD=og50XXA@mail.gmail.com
Whole thread Raw
In response to Re: Row pattern recognition  (Henson Choi <assam258@gmail.com>)
Responses Re: Row pattern recognition
List pgsql-hackers
Hi Tatsuo,

However, this raises interesting questions: should we optimize patterns
by removing {0} quantifiers or simplifying them? And if so, how should
we handle patterns that become empty after such optimization?

For example:
- PATTERN (A{0})          → empty pattern
- PATTERN (A{0} B{0})     → empty pattern
- PATTERN (A{0} B)        → PATTERN (B) after optimization

Empty patterns would result in zero-length matches, which our current
implementation explicitly treats as invalid (see initialAdvance flag
logic in nodeWindowAgg.c).

More importantly, I recall that zero-length matches caused serious
issues during development, which is why we added logic to explicitly
avoid them.

The reason I cannot immediately provide a concrete plan for A{0}
support is that I need to deeply understand the semantic meaning of
zero-length matches in the SQL standard first. Without this
understanding, any implementation approach could be fundamentally
flawed.

Specifically, I need to investigate:
- What zero-length matches mean semantically in RPR
- How to handle empty patterns according to the standard
- The correct behavior when a pattern optimizes to nothing

After the current code review phase is complete, I'm also considering
setting up an Oracle test environment to observe how it handles these
edge cases. This could provide valuable insights into the expected
behavior, especially for zero-length matches and empty patterns.

Our current implementation cannot support A{0} due to a structural limitation.

The reduced_frame_map uses row-based representation (reduced_frame_map[pos] = val),
which can only express matches consuming at least one row. It cannot represent
zero-length matches that occur between rows without consuming any row position.

Patterns like A{0}, A*, or A? can produce zero-length matches with no row to mark
as RF_FRAME_HEAD and no position to register in the frame map.

We currently prevent this using the initialAdvance flag (nodeWindowAgg.c),
which skips FIN recording during initial epsilon transitions.

Supporting A{0} requires either restructuring reduced_frame_map to handle
virtual positions, or separate handling for zero-length matches. Before
choosing an approach, we need clarity on what the SQL standard expects for
zero-length match semantics (output generation, aggregate behavior, etc.).

Given this structural limitation, I'd like to ask: should we keep the current
initialAdvance mechanism (which prevents zero-length matches) and handle A{0}
separately?

Best regards,
Henson 

pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: recovery.signal not cleaned up when both signal files are present
Next
From: Chao Li
Date:
Subject: Re: COMMENTS are not being copied in CREATE TABLE LIKE