Re: Row pattern recognition - Mailing list pgsql-hackers

From Henson Choi
Subject Re: Row pattern recognition
Date
Msg-id CAAAe_zDaaOH4kaNfeN3Gk9u8VyTpUm-UgFebKc1KKB8o+PSjPg@mail.gmail.com
Whole thread Raw
In response to Re: Row pattern recognition  (Tatsuo Ishii <ishii@postgresql.org>)
Responses Re: Row pattern recognition
List pgsql-hackers
Hi Tatsuo,

Currently we do not account the cost of RPR while planning.  Attached
is the first attempt to try to estimate the RPR costs. The cost model
is very simple:

expression cost per PATTERN variable * number of input tuples

Any idea to make this estimation better?

 >   foreach(lc, windowFuncs)
>   {
>   ...
> + /* also add DEFINE clause expressions' cost to per-input-row costs */
> + if (winclause->rpPattern)
> + {
> + List   *pattern_vars; /* list of pattern variable names */
> + ListCell   *lc2;
> +
> + pattern_vars = collectPatternVariables(winclause->rpPattern);
> +
> + /* iterate according to the pattern variable */
> + foreach(lc2, pattern_vars)
> + {
> + char   *ptname = strVal((char *) lfirst(lc2));

`collectPatternVariables` returns a list of String nodes (via
`makeString()`), so `strVal(lfirst(lc2))` is the idiomatic form.
The `(char *)` cast is misleading.

There is also a correctness issue: DEFINE expressions belong to the
window clause, not to individual window functions, so their cost
should not be multiplied by the number of window functions sharing
the clause.

The fix is to compute the DEFINE cost once outside the loop and add
it to `startup_cost` and `total_cost` directly, after the
`foreach(lc, windowFuncs)` block.

Regarding the cost model: the NFA executor evaluates all DEFINE
expressions once per row into a shared `nfaVarMatched[]` array that
all active contexts read from, and contexts advance strictly forward
so no prior row is ever re-evaluated.  The one-evaluation-per-row
cost model is therefore accurate.  NFA-aware cost
modeling could be built on top of this foundation in a separate patch
down the road, once the NFA implementation has matured.

For now, the DEFINE expression costs themselves already serve as a
natural penalty — a window clause with RPR will consistently appear
more expensive than a comparable plain window function.  This gives
the surrounding plan a reasonable cost signal for decisions such as
join ordering and materialization of RPR subqueries.  So the current
approach is reasonable as a first step.

Other than that, the approach looks good to me.  Would it be okay if
I revise the patch along those lines?

Best regards,
Henson

pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: amcheck: add index-all-keys-match verification for B-Tree
Next
From: Nisha Moond
Date:
Subject: Re: Skipping schema changes in publication