Home > mailing lists

Re: Row pattern recognition - Mailing list pgsql-hackers

From	Henson Choi
Subject	Re: Row pattern recognition
Date	March 7 02:18:27
Msg-id	CAAAe_zDgbDQf=RMtGw_axaNm1d_HY-e4=0ddDy809p7kTuUwHQ@mail.gmail.com Whole thread
In response to	Re: Row pattern recognition (Tatsuo Ishii <ishii@postgresql.org>)
Responses	Re: Row pattern recognition
List	pgsql-hackers

Tree view

Hi, Tatsuo

Does "a zero-length match" mean "an empty match"?

Yes, they refer to the same thing. "Zero-length match" is the more
common term in general regex implementations (PCRE2, Perl, Python,
Java, etc.[1]), but the RPR standard (ISO/IEC 19075-5, Section 4.12.2)
uses "empty match" exclusively.

[1] https://www.regular-expressions.info/zerolength.html

BTW, currently we place all nfa_* functions at the bottom of
nodeWindowAgg.c. However nodeWindowAgg.c in master branch places "API
exposed to window functions" at the bottom of the file. Do you think
we should follow the way?

Yes, we should follow master's convention. I see three options:

(a) Reorder within nodeWindowAgg.c: move the nfa_* functions up and
keep the "API exposed to window functions" section at the bottom,
matching master's layout.

(b) Separate file under src/backend/executor/, keeping it close to
nodeWindowAgg.c while making the boundary explicit.

(c) A dedicated src/backend/rpr/ directory modeled on
src/backend/regex/, giving the NFA engine its own namespace.
This could also be an opportunity to consolidate the existing
src/backend/optimizer/plan/rpr.c into the same directory.

For now (a) is the safest change. Longer term, (b) or (c) would make
more sense -- especially when we extend to MATCH_RECOGNIZE (R010),
where the NFA engine will need to be shared across both code paths.
Either way, the NFA engine can be exposed via a header so that R010
can share it without further restructuring.

Since the NFA algorithm is not familiar territory for most DBMS
developers, it would also be worth preserving the detailed algorithm
description posted earlier in this thread -- either as structured
comments or as a dedicated README alongside the code.

What do you think? Should we start with (a) now and revisit the
broader restructuring approaches -- (b) or (c) -- later, or would you
prefer to discuss them first? Either of those would also resolve the
file layout convention issue naturally, since new files would follow
proper conventions from the start.

One more thing: there are no ECPG example programs or regression tests
for RPR yet. I'd like to propose adding them. Shall I draft an
initial set, or would you prefer to coordinate with the ECPG
maintainers first?

Best regards,
Henson

pgsql-hackers by date:

From: Nathan Bossart
Date: 07 March, 02:13:13
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD

From: Nathan Bossart
Date: 07 March, 02:19:53
Subject: Re: Add starelid, attnum to pg_stats and leverage this in pg_dump

Re: Row pattern recognition - Mailing list pgsql-hackers

Previous

Next