Re: Row pattern recognition - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Row pattern recognition
Date
Msg-id 20260314.164003.2119775273148039288.ishii@postgresql.org
Whole thread Raw
In response to Re: Row pattern recognition  (Henson Choi <assam258@gmail.com>)
List pgsql-hackers
Hi Henson,

> Hi Tatsuo,
> 
> Thank you for your thoughtful feedback on the tle identity concern.
> 
> Ok. we should be consistent with WHERE etc. unless the standard
>> requres DEFINE stricter. But if the tle is in the target list
>> (possibly with resjunk attribute), it could be different from the one
>> in the DEFINE because of the casting by coerce_to_boolean(). i.e.:
>>
>> - tle in the target list has no cast node
>> - tle in the DEFINE may have cast node
>>
>> I don't know how this could affect subsequent RPR process but I would
>> like to keep the prerequisites that tle in DEINE and the traget list
>> is identical.
>>
> 
>> is identical.
> 
> That is a very good point regarding tle identity.
> 
> On the implementation side, the DEFINE clause TargetEntries are
> already deep-copied via copyObject in the parser (parse_rpr.c),
> so adding a cast node to the DEFINE tle does not affect the
> target list tle -- they are independent copies at that point.
> 
> On the semantic side, the two serve different purposes: the target
> list produces output columns (where the user expects the original
> type), while the DEFINE expression is evaluated as a boolean for
> pattern matching. This is the same pattern as WHERE, where the
> WHERE expression may carry a cast node but the target list for the
> same column does not.
> 
> That said, if you would prefer to keep them structurally identical,
> an alternative would be to reject non-boolean types in DEFINE with
> a clear error (using exprType() check). However, this would be
> stricter than WHERE/HAVING behavior, which might surprise users.
> 
> I would suggest that allowing coercion is the right choice for
> consistency, but I am happy to go with whichever approach you
> prefer.

I have run a SELECT below after applying your patch:

SELECT id, COUNT(*) OVER w AS cnt
FROM test_coerce WHERE val
WINDOW w AS (
    ORDER BY id
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    PATTERN (A+)
    DEFINE A AS val
)
ORDER BY id;

The point with the SELECT is, the target list lacks val column but it
has WHERE clause with a qual "val".  Thus the SELECT's query tree has
TargetEntry with VAR varattno = 2 (val) and resjunk attribute is true,
and "quals" has cast expression on top of VAR varattno = 2. So the
target entry and qual has different expressions. Probably I should not
expect that the expression in target list is identical to the
expression in other places (WHERE, DEFINE).

> I would suggest that allowing coercion is the right choice for

So I Agree with you.

> On a separate topic, I have found a server crash bug in the current
> PREV/NEXT implementation. Nesting navigation functions causes the
> backend to crash:
> 
>     CREATE TABLE t (id int, price int);
>     INSERT INTO t VALUES (1,10),(2,20),(3,15),(4,25),(5,30);
> 
>     SELECT * FROM t WINDOW w AS (
>         ORDER BY id
>         ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>         PATTERN (A B+)
>         DEFINE B AS prev(prev(b.price)) > 0
>     );
>     -- server process crashes
> 
> The SQL standard (ISO/IEC 19075-5, Section 5.6.2) prohibits nesting
> PREV/NEXT within PREV/NEXT. The first argument of PREV/NEXT must
> contain a column reference, not another navigation function call.
> Currently there is no parser-level check for this, so the nested
> call passes through to the executor and crashes.
> 
> I have attached a test file (rpr_nav_nesting_test.txt) containing
> 6 test cases that cover the prohibited nesting patterns:
> 
>   - PREV(PREV(price))        direct same-kind nesting
>   - NEXT(NEXT(price))        direct same-kind nesting
>   - PREV(NEXT(price))        cross nesting
>   - NEXT(PREV(price))        cross nesting
>   - PREV(ABS(PREV(price)))   nesting hidden inside another function
>   - NEXT(ABS(NEXT(price)))   nesting hidden inside another function
> 
> All of these currently crash the server. After a fix, they should
> all produce a syntax error.

I thought PREV, NEXT nesting check was there but apparently I was
wrong.

> Finally, while reviewing the standard and the current
> PREV/NEXT implementation, I have some thoughts on the
> design.
> 
> The current approach uses all three ExprContext slots
> (scan/outer/inner) with varno rewriting, which fixes offsets at
> +/-1. The standard requires variable offsets like PREV(price, 3),
> and the three-slot model cannot support this.
> 
> I think a simpler approach would be to use just one slot
> (ecxt_outertuple) and swap it during expression evaluation:
> 
>   For `price > PREV(price, 2)`:
>         ^^^^    ^^^^  ^^^^^
>          |       |      |
>          v       v      v
>     1. [price]       read from current slot   --> datum_a
>     2. [PREV( ,2)]   swap slot to (currentpos - 2)
>     3.   [price]     read from swapped slot   --> datum_b
>     4. [)]           restore slot
>     5. [>]           evaluate datum_a > datum_b
> 
> Since step 1 already extracts the value as a Datum before the
> swap in step 2, the result is safe. No varno rewriting is needed
> -- all column reads just use the one slot, and the swap/restore
> controls which row they see.
> 
> One thing I noticed is that the SQL standard explicitly requires
> all column references inside a navigation function to be qualified
> by the same row pattern variable (Section 5.2):
> 
>   "All row pattern column references in an aggregate or row
>    pattern navigation operation are qualified by the same row
>    pattern variable."
> 
> This means the navigation argument is always evaluated in a single
> row context -- which is exactly what makes the one-slot swap model
> work. It seems like the standard may have been designed with this
> kind of implementation in mind.
> 
> With this approach, the existing varno rewriting infrastructure and
> the extra slots for PREV/NEXT would no longer be needed, which
> should simplify the code considerably.
> 
> I also believe this design would scale well to future extensions.
> Variable offsets (PREV(price, N)) would work naturally, and when
> we add FIRST/LAST navigation or row pattern variable qualifiers,
> only the target row calculation would change -- the swap/restore
> mechanism itself would remain the same.
> 
> What do you think about this approach?

Looks great. Do you have any idea how to let the existing ExecEvalExpr
handle the swap/restore mechanism?

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: Instability of phycodorus in pg_upgrade tests with JIT
Next
From: Álvaro Herrera
Date:
Subject: Re: some more include removal from headers