Re: The flinfo->fn_extra question, from me this time. - Mailing list pgsql-hackers

From Dent John
Subject Re: The flinfo->fn_extra question, from me this time.
Date
Msg-id DB853715-9906-4E11-A183-E0DEE4840751@QQdd.eu
Whole thread Raw
In response to Re: The flinfo->fn_extra question, from me this time.  (Dent John <denty@QQdd.eu>)
Responses Re: The flinfo->fn_extra question, from me this time.  (Dent John <denty@QQdd.eu>)
List pgsql-hackers
Hi,

Turns out — to my embarrassment — that pretty much all of the regression tests failed with my patch. No idea if anyone spotted that and withheld reply in revenge, but I wouldn’t blame if you did!

I have spent a bit more time on it. The attached patch is a much better show, though there are still a few regressions and undoubtedly it’s still rough.

(Attached patch is against 12.0.)

As was perhaps predictable, some of the regression tests do indeed break in the case of rescan. To cite the specific class of fail, it’s this:

SELECT * FROM (VALUES (1),(2),(3)) v(r), ROWS FROM( rngfunc_sql(11,11), rngfunc_mat(10+r,13) );
  r | i  | s | i  | s 
 ---+----+---+----+—
  1 | 11 | 1 | 11 | 1
  1 |    |   | 12 | 2
  1 |    |   | 13 | 3
- 2 | 11 | 1 | 12 | 4
+ 2 | 11 | 2 | 12 | 4
  2 |    |   | 13 | 5
- 3 | 11 | 1 | 13 | 6
+ 3 | 11 | 3 | 13 | 6
 (6 rows)

The reason for the change is that ’s' comes from rngfunc_mat(), which computes s as nextval(). The patch currently prefers to re-execute the function in place of materialising it into a tuplestore.

Tom suggested not dropping the tuplestore creation logic. I can’t fathom a way of avoiding change for folk that have gotten used to the current behaviour without doing that. So I’m tempted to pipeline the rows back from a function (if it returns ValuePerCall), and also record it in a tuplestore, just in case rescan happens. There’s still wastage in this approach, but it would save the current behaviour, while stil enabling the early abort of ValuePerCall SRFs at relatively low cost, which is certainly one of my goals.

I’d welcome opinion on whether there are downsides the that approach, as I might move to integrate that next.

But I would also like to kick around ideas for how to avoid entirely the tuplestore.

Earlier, I suggested that we might make the decision logic prefer to materialise a tuplestore for VOLATILE functions, and prefer to pipeline directly from STABLE (and IMMUTABLE) functions. The docs on volatility categories describe that the optimiser will evaluate a VOLATILE function for every row it is needed, whereas it might cache STABLE and IMMUTABLE with greater aggression. It’s basically the polar opposite of what I want to achieve.

It is arguably also in conflict with current behaviour. I think we should make the docs clearer about that.

So, on second thoughts, I don’t think overloading the meaning of STABLE, et al., is the right thing to do. I wonder if we could invent a new modifier to CREATE FUNCTION, perhaps “PIPELINED”, which would simply declare a function's ability and preference for ValuePerCall mode.

Or perhaps modify the ROWS FROM extension, and adopt WITH’s [ NOT ] MATERIALIZED clause. For example, the following would achieve the above proposed behaviour:

ROWS FROM( rngfunc_sql(11,11) MATERIALIZED, rngfunc_mat(10+r,13) MATERIALIZED ) 

Of course, NOT MATERIALIZED would achieve ValuePerCall mode, and omit materialisation. I guess MATERIALIZED would have to be the default.

I wonder if another alternative would be to decide materialization based on what the outer plan includes. I guess we can tell if we’re part of a join, or if the plan requires the ability to scan backwards. Could that work?

denty.

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: dropdb --force
Next
From: Daniel Gustafsson
Date:
Subject: Re: Make StringInfo available to frontend code.