Re: Row pattern recognition - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Row pattern recognition
Date
Msg-id 20260427.174220.1939160662649810289.ishii@postgresql.org
Whole thread
In response to Re: Row pattern recognition  (Henson Choi <assam258@gmail.com>)
List pgsql-hackers
Hi Henson,

> Thank you for pointing this out.  I will analyse A1 and reflect
> the findings in the next revision.

Thanks in advance.

>> Comment to 0009:
>>
>> +       /* Mark VAR in visited before duplicate check to prevent DFS loops
>> */
>> +       winstate->nfaVisitedElems[WORDNUM(state->elemIdx)] |=
>> +               ((bitmapword) 1 << BITNUM(state->elemIdx));
>>
>> Why do not use bms here? Maybe winstate->nfaVisitedElems could become
>> quite big?
>>
> 
> Thank you for the suggestion.  I was not aware of Bitmapset when
> this was written; I will switch nfaVisitedElems to Bitmapset.
> 
> On sizing, nfa->nelems is structurally bounded by RPR_ELEMIDX_MAX
> (INT16_MAX), since RPRNFAState.elemIdx is int16 and scanRPRPattern()
> rejects anything past that with "pattern too complex".  So the
> bitmap is at most 4 KB per WindowAgg even in the worst case.

Hmm. From the comment from bitmapset.c:
 * A bitmap set can represent any set of nonnegative integers, although
 * it is mainly intended for sets where the maximum value is not large,
 * say at most a few hundred.

Maybe 4 KB is tool large? If so, there's no need to rush to change the
implementation. You can leave it as it is for now.

>> Comment to 0024:
>>
>> +        <function>first</function> ( <parameter>value</parameter>
>> <type>anyelement</type> [, <parameter>offset</parameter>
>> <type>bigint</type> ] )
>> +        <returnvalue>anyelement</returnvalue>
>> +       </para>
>> +       <para>
>> +        Returns the column value at the row <parameter>offset</parameter>
>> +        rows after the match start row;
>> +        returns NULL if the target row is beyond the current row.
>> +        <parameter>offset</parameter> defaults to 0 if omitted, referring
>> to the
>> +        match start row itself.
>> +        <parameter>offset</parameter> must be a non-negative integer.
>> +        <parameter>offset</parameter> must not be NULL.
>> +        Can only be used in a <literal>DEFINE</literal> clause.
>> +       </para></entry>
>>
>> "Returns the column value at the row" is not appropriate because
>> first() accepts any expression as the first argument. e.g. first(col1
>> + col2).  Instead what about "Returns value evaluated at the row"?
>> Same thing can be said to other row pattern navigation operations.
>> I think they inherits my mistakes in my earlier patch. Sorry for this.
>>
> 
> Thank you, and please do not apologise -- I should have caught this
> when I carried the wording across.  I will reword the synopses for
> first(), last(), prev(), next() (and any other entry that inherited
> the same phrasing) along the line you suggested.

Thank you.

>> Let's prohibit volatile functions in DEFINE. Although this needs extra
>> checking, it would make proceeding codes and tests simpler.  If
>> others, especially Vik and Jacob, think differently, please join the
>> discussion.
>>
> 
> Thank you for the decision.  Since this is a policy change that may
> still attract further community discussion, I would like to handle
> it as a lower-priority, self-contained follow-up on top of 0031, so
> it can be picked up, held, or revised independently.  Tentative
> plan: reject volatile functions in DEFINE during parse-analysis via
> contain_volatile_functions(), fold with the existing offset
> runtime-constant check where the two share traversal, and -- if the
> prohibition lands -- convert the B9 test into an error-case test
> and drop the XXX note.

Your tentative plan looks good to me.

>> > On sequencing, if we do take it up, I would suggest handling it
>> > after the 31-patch set, alongside the README.rpr split as
>> > follow-up work on top of 0031.  Whether it ultimately lands
>> > inside v47 or as a separate piece on top does not need to be
>> > decided right now -- there is still room to discuss it as the
>> > review progresses, and I am happy to adjust either way based on
>> > your direction.
>>
>> I too prefer to have it after the 31-patch set.
>>
> 
> Thank you, confirmed.  Both items (volatile-DEFINE prohibition and
> README.rpr split) will land as follow-up patches on top of 0031,
> after the current series is committed.
> 
> One quick question: for the 0007 / 0009 / 0017 / 0024 review fixes
> summarised below, would additional follow-up patches on top of 0031
> be preferable to refreshing the already-reviewed series?

Yes, that's fine. I will apply the delta patch for 0007 / 0009 / 0017
/ 0024 on top of 0031.

> For ease of tracking, here is a one-line summary per patch of the
> changes I plan to fold into the next revision:
> 
>   0007: analyse A1 and, if the diagnosis holds, revise it so the
>         frame optimization bypass is observable in EXPLAIN
>         (adjusting the comment and expected output accordingly).
>   0009: switch WindowAggState.nfaVisitedElems from a bitmapword[]
>         to Bitmapset, replacing the manual WORDNUM/BITNUM sites
>         with bms_add_member() / bms_is_member() / bms_free().
>   0017: fix the "Checks for" -> "Check for" comment, rewrite the
>         called-out passive-voice errmsg() strings (and the similar
>         ones nearby) in the active voice, and use Max() at the two
>         offset-update sites.
>   0024: reword the navigation synopses (first/last/prev/next) to
>         "Returns the value of <value> evaluated at the row ...",
>         and replace the non-ASCII ≠ / → in rpr.out with != / ->
>         (sweeping the rest of rpr*.out for other non-ASCII).

Confirmed.

> In addition, I plan to follow the current series with two new
> patches on top of 0031 (final numbering deferred until they are
> posted):
> 
>   new: README.rpr split -- extract the design notes currently
>        inlined in execRPR.c into src/backend/executor/README.rpr,
>        as previously discussed.
>   new: Prohibit volatile functions in DEFINE -- separate,
>        self-contained patch as described above; intended as a
>        lower-priority follow-up that can be picked up, held, or
>        revised independently of the main series depending on how
>        the community discussion settles.

Ok.

> Separately, one item is deferred for later discussion rather than
> folded into a patch:
> 
>   deferred: B7 Recursive CTE XXX in 0007 -- whether this case
>             falls under ISO/IEC 9075-2 4.18.5 / 6.17.5 prohibition
>             ("Ok, let's discuss on this later on.").  Noted, happy
>             to pick this up whenever convenient.

Ok.

> Once you let me know your preference on the sequencing question
> above, I will send the corresponding patches as soon as they are
> ready, and post the two follow-ups separately so that they can be
> reviewed and committed on top of the main series.
> 
> Thank you again for the thorough review.

Thank you for your effort!
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



pgsql-hackers by date:

Previous
From: SATYANARAYANA NARLAPURAM
Date:
Subject: [Patch]: Fix excessive ProcArrayLock acquisitions with subscription max_retention_duration=0
Next
From: Peter Eisentraut
Date:
Subject: Re: DELETE/UPDATE FOR PORTION OF with rule system is not working