Re: Row pattern recognition - Mailing list pgsql-hackers
| From | Tatsuo Ishii |
|---|---|
| Subject | Re: Row pattern recognition |
| Date | |
| Msg-id | 20260314.164003.2119775273148039288.ishii@postgresql.org Whole thread Raw |
| In response to | Re: Row pattern recognition (Henson Choi <assam258@gmail.com>) |
| List | pgsql-hackers |
Hi Henson,
> Hi Tatsuo,
>
> Thank you for your thoughtful feedback on the tle identity concern.
>
> Ok. we should be consistent with WHERE etc. unless the standard
>> requres DEFINE stricter. But if the tle is in the target list
>> (possibly with resjunk attribute), it could be different from the one
>> in the DEFINE because of the casting by coerce_to_boolean(). i.e.:
>>
>> - tle in the target list has no cast node
>> - tle in the DEFINE may have cast node
>>
>> I don't know how this could affect subsequent RPR process but I would
>> like to keep the prerequisites that tle in DEINE and the traget list
>> is identical.
>>
>
>> is identical.
>
> That is a very good point regarding tle identity.
>
> On the implementation side, the DEFINE clause TargetEntries are
> already deep-copied via copyObject in the parser (parse_rpr.c),
> so adding a cast node to the DEFINE tle does not affect the
> target list tle -- they are independent copies at that point.
>
> On the semantic side, the two serve different purposes: the target
> list produces output columns (where the user expects the original
> type), while the DEFINE expression is evaluated as a boolean for
> pattern matching. This is the same pattern as WHERE, where the
> WHERE expression may carry a cast node but the target list for the
> same column does not.
>
> That said, if you would prefer to keep them structurally identical,
> an alternative would be to reject non-boolean types in DEFINE with
> a clear error (using exprType() check). However, this would be
> stricter than WHERE/HAVING behavior, which might surprise users.
>
> I would suggest that allowing coercion is the right choice for
> consistency, but I am happy to go with whichever approach you
> prefer.
I have run a SELECT below after applying your patch:
SELECT id, COUNT(*) OVER w AS cnt
FROM test_coerce WHERE val
WINDOW w AS (
ORDER BY id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
PATTERN (A+)
DEFINE A AS val
)
ORDER BY id;
The point with the SELECT is, the target list lacks val column but it
has WHERE clause with a qual "val". Thus the SELECT's query tree has
TargetEntry with VAR varattno = 2 (val) and resjunk attribute is true,
and "quals" has cast expression on top of VAR varattno = 2. So the
target entry and qual has different expressions. Probably I should not
expect that the expression in target list is identical to the
expression in other places (WHERE, DEFINE).
> I would suggest that allowing coercion is the right choice for
So I Agree with you.
> On a separate topic, I have found a server crash bug in the current
> PREV/NEXT implementation. Nesting navigation functions causes the
> backend to crash:
>
> CREATE TABLE t (id int, price int);
> INSERT INTO t VALUES (1,10),(2,20),(3,15),(4,25),(5,30);
>
> SELECT * FROM t WINDOW w AS (
> ORDER BY id
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> PATTERN (A B+)
> DEFINE B AS prev(prev(b.price)) > 0
> );
> -- server process crashes
>
> The SQL standard (ISO/IEC 19075-5, Section 5.6.2) prohibits nesting
> PREV/NEXT within PREV/NEXT. The first argument of PREV/NEXT must
> contain a column reference, not another navigation function call.
> Currently there is no parser-level check for this, so the nested
> call passes through to the executor and crashes.
>
> I have attached a test file (rpr_nav_nesting_test.txt) containing
> 6 test cases that cover the prohibited nesting patterns:
>
> - PREV(PREV(price)) direct same-kind nesting
> - NEXT(NEXT(price)) direct same-kind nesting
> - PREV(NEXT(price)) cross nesting
> - NEXT(PREV(price)) cross nesting
> - PREV(ABS(PREV(price))) nesting hidden inside another function
> - NEXT(ABS(NEXT(price))) nesting hidden inside another function
>
> All of these currently crash the server. After a fix, they should
> all produce a syntax error.
I thought PREV, NEXT nesting check was there but apparently I was
wrong.
> Finally, while reviewing the standard and the current
> PREV/NEXT implementation, I have some thoughts on the
> design.
>
> The current approach uses all three ExprContext slots
> (scan/outer/inner) with varno rewriting, which fixes offsets at
> +/-1. The standard requires variable offsets like PREV(price, 3),
> and the three-slot model cannot support this.
>
> I think a simpler approach would be to use just one slot
> (ecxt_outertuple) and swap it during expression evaluation:
>
> For `price > PREV(price, 2)`:
> ^^^^ ^^^^ ^^^^^
> | | |
> v v v
> 1. [price] read from current slot --> datum_a
> 2. [PREV( ,2)] swap slot to (currentpos - 2)
> 3. [price] read from swapped slot --> datum_b
> 4. [)] restore slot
> 5. [>] evaluate datum_a > datum_b
>
> Since step 1 already extracts the value as a Datum before the
> swap in step 2, the result is safe. No varno rewriting is needed
> -- all column reads just use the one slot, and the swap/restore
> controls which row they see.
>
> One thing I noticed is that the SQL standard explicitly requires
> all column references inside a navigation function to be qualified
> by the same row pattern variable (Section 5.2):
>
> "All row pattern column references in an aggregate or row
> pattern navigation operation are qualified by the same row
> pattern variable."
>
> This means the navigation argument is always evaluated in a single
> row context -- which is exactly what makes the one-slot swap model
> work. It seems like the standard may have been designed with this
> kind of implementation in mind.
>
> With this approach, the existing varno rewriting infrastructure and
> the extra slots for PREV/NEXT would no longer be needed, which
> should simplify the code considerably.
>
> I also believe this design would scale well to future extensions.
> Variable offsets (PREV(price, N)) would work naturally, and when
> we add FIRST/LAST navigation or row pattern variable qualifiers,
> only the target row calculation would change -- the swap/restore
> mechanism itself would remain the same.
>
> What do you think about this approach?
Looks great. Do you have any idea how to let the existing ExecEvalExpr
handle the swap/restore mechanism?
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
pgsql-hackers by date: