Re: Row pattern recognition - Mailing list pgsql-hackers

From Henson Choi
Subject Re: Row pattern recognition
Date
Msg-id CAAAe_zC6DEf-3mRYh6KJGO8ighVq13gpAhY0NGhTGVvCUKweGw@mail.gmail.com
Whole thread
In response to Re: Row pattern recognition  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
Hi Tatsuo,

I found Trino uses "empty match" too [2]. So for SQL users, I guess
"empty match" is more familiar wording.

Agreed.  Since RPR is a feature defined by the SQL standard, using the
standard's own terminology is the right choice: it lets users
cross-reference the specification directly.  That said, "zero-length
match" is widely used in the regex literature (PCRE2, Perl, Python,
Java, etc.), so developers researching the topic should treat the two
terms as synonymous.  In the SQL context, "empty match" is the preferred term.

 
>   (a) Reorder within nodeWindowAgg.c: move the nfa_* functions up and
>       keep the "API exposed to window functions" section at the bottom,
>       matching master's layout.
>
>   (b) Separate file under src/backend/executor/, keeping it close to
>       nodeWindowAgg.c while making the boundary explicit.
>
>   (c) A dedicated src/backend/rpr/ directory modeled on
>       src/backend/regex/, giving the NFA engine its own namespace.
>       This could also be an opportunity to consolidate the existing
>       src/backend/optimizer/plan/rpr.c into the same directory.

I prefer (a) or (b) for now, at least for the first commit. The reason
is, current nfa functions take a WindowAggState argument. If we prefer
(c), I think we need to change some of (or most of) nfa functions so
that they do not take the WindowAggState argument. What do you think?
 
That is a valid concern.  (c) will be necessary in the long run, but
MATCH_RECOGNIZE (R010) is not yet on the horizon, and restructuring the
NFA layer now -- before that requirement is concrete -- carries a real
risk of introducing defects during the refactoring.  The risks outweigh the
benefits at this stage.

Instead, I'll try (b): a separate file under src/backend/executor/
can retain the WindowAggState argument while making the NFA boundary
explicit, and the change is still reviewable.  If (b) turns out to be
harder than expected or surfaces unexpected issues, I'll fall back to
(a).

If (b) works out, I'll add a structured header comment to the new file
to preserve the algorithm description for future reviewers.


I am not familiar with ECPG. Do you know if ECPG has Window clause
tests? If ECPG does not have any Window clause tests, is it worth to
add RPR tests to ECPG?
 
I checked src/interfaces/ecpg/test/ and confirmed there are no Window
clause tests there at all.  If ECPG has no Window clause coverage, I
see no reason to add RPR tests there at this stage.  I think it would be better to leave ECPG tests out of this patch set for now.


Best regards,
Henson

pgsql-hackers by date:

Previous
From: Nick Cleaton
Date:
Subject: Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch
Next
From: Junwang Zhao
Date:
Subject: Re: More speedups for tuple deformation