Re: Row pattern recognition - Mailing list pgsql-hackers

From Henson Choi
Subject Re: Row pattern recognition
Date
Msg-id CAAAe_zCwY4Lt_OS44DMuqua-szeDVVTgJDeH=c-d5Qco5v2sOQ@mail.gmail.com
Whole thread
In response to Re: Row pattern recognition  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
Hi Tatsuo,

Thanks for the close look.  Inline below, then a sketch of what I
plan to put in v48 so you can pick what to pull first.

v47 starts to check whether range variable qualified expressions are
used in DEFINE clause. If used, raise an error. This is good because
we don't support the syntax yet (pattern variable range var), or it's
prohibited (from clause range var). However, the error message may not
be appropreate for the case when complex data type is involved.

CREATE TEMP TABLE item (name TEXT, amount INT);
CREATE TABLE
CREATE TEMP TABLE sales(items item);
CREATE TABLE
SELECT (items).name, (items).amount, count(*) OVER w
FROM sales WINDOW w AS (
    ORDER BY (items).name
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    AFTER MATCH SKIP PAST LAST ROW
    PATTERN (A+)
    DEFINE A AS (A.items).amount > 10
);
ERROR:  pattern variable qualified column reference "a.items" is not supported in DEFINE clause
LINE 7:     DEFINE A AS (A.items).amount > 10
                         ^
If I change the DEFINE clause to:

    DEFINE A AS (sales.items).amount > 10

I get:

ERROR:  range variable qualified column reference "sales.items" is not allowed in DEFINE clause
LINE 8:     DEFINE A AS (sales.items).amount > 10
                         ^

In both cases, "a.items" or "sales.items" in the error messages are
not column names, therefore the wording "column reference" in the
error messages are not appropriate.

Agreed.  Even for the simple non-composite cases the noun is fuzzy
once A_Indirection enters the picture.  "expression" reads naturally
in both cases.
 
In order to fix the issue, I think we need to add code to understand
range var qualification form "A.item" or "sales.items" in complex data
types case. But is it worth the trouble? The reword is just nicer
error messages. If we support MEASURES, the code is useful. But so far
we have decided to not support it in the first cut of RPR.

Not now.  Teaching the pre-check about the surrounding A_Indirection
duplicates work that MEASURES will need anyway, and the only visible
gain before MEASURES is a more accurate echo of the source text.

Instead I'll mark the limitation in three places so future MEASURES
work can't miss it: a block comment above the pre-check that names
the limitation and the revisit point, an XXX cross-reference in
parse_rpr.c pointing back to the pre-check, and composite-type cases
in rpr_base whose expected output quotes only the ColumnRef portion
-- so when indirection-aware quoting lands, those outputs churn as
a tripwire. 
 
Maybe we should just change the error messages something like below
for now?

ERROR:  pattern variable qualified expression "a.items" is not supported in DEFINE clause
ERROR:  range variable qualified expression "sales.items" is not allowed in DEFINE clause

Yes -- planned for v48.  While I'm in that path I'll also clean up
two adjacent issues:

  - the pre-check is binary, so an unknown qualifier (typo,
    undefined name) is misreported as a range-variable reference
    with SYNTAX_ERROR instead of falling through to the standard
    "column does not exist" / "missing FROM-clause entry"
    diagnostic;

  - parse_rpr.c and the SELECT docs claim DEFINE-name "collection"
    and "filtering during planning"; the actual behavior is
    validate-and-reject at parse analysis.


------------------------------------------------------------------
v48 follow-up plan
------------------------------------------------------------------

The items below are independent, so they can ship as separate
patches or as a single batched posting -- whichever you prefer.
Order is just for narrative; nothing depends on anything else.

  [A] Sizable refactor: collapse the four DEFINE walkers across
      parser/planner/executor into a single phase-tagged
      traversal.  On that base, reject volatile / NextValueExpr
      in DEFINE (the NFA may re-evaluate predicates during
      backtracking; STABLE / IMMUTABLE remain accepted), and
      bundle a STABLE/IMMUTABLE baseline test as a guard against
      accidental over-rejection.

  [B] Make the empty-match path observable in tests: replace the
      stale XXX comments in rpr_nfa with the actual behavior, and
      add EXPLAIN ANALYZE coverage in rpr_explain that surfaces
      "NFA: N matched (len 0/0/0.0)" so the NFA-found-but-window-
      empty case is regression-visible.

  [C] Trim the per-advance NFA visited-bitmap reset to a high-water
      mark range instead of the full bitmap.  Tradeoff: two int16
      comparisons added per visit, paying off for larger NFAs but
      added overhead for single-word bitmaps; semantics unchanged.
      I'll leave the decision to apply to your judgment.

  [D] DEFINE qualifier diagnostic: tri-classify (pattern var /
      range var / fall-through), reword to "expression", add
      unknown-qualifier and composite-type tests, and sync the
      adjacent stale comments and SELECT doc.

If any of [A]..[D] looks misjudged or you'd prefer a different
slicing, I'll reshape before posting.

Best,
Henson 

pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Proposal: Conflict log history table for Logical Replication
Next
From: Amit Kapila
Date:
Subject: Re: Include schema-qualified names in publication error messages.