Re: WIP Patch: Precalculate stable functions, infrastructure v1 - Mailing list pgsql-hackers

From Marina Polyakova
Subject Re: WIP Patch: Precalculate stable functions, infrastructure v1
Date
Msg-id b9a484a81a59b724edb90c16288843cb@postgrespro.ru
Whole thread Raw
In response to Re: WIP Patch: Precalculate stable functions, infrastructure v1  (Marina Polyakova <m.polyakova@postgrespro.ru>)
Responses Re: WIP Patch: Precalculate stable functions, infrastructure v1
Re: WIP Patch: Precalculate stable functions, infrastructure v1
List pgsql-hackers
Hello, hackers!

This is the 8-th version of the patch for the precalculation of stable 
or immutable functions, stable or immutable operators and other 
nonvolatile expressions. This is a try to fix the most problems (I'm 
sorry, it took some time..) that Tom Lane and Andres Freund mentioned in 
[1], [2] and [3]. It is based on the top of master, and on my computer 
make check-world passes. And I'll continue work on it.

Any suggestions are welcome!

[1] 
https://www.postgresql.org/message-id/27837.1516138246%40sss.pgh.pa.us
[2] 
https://www.postgresql.org/message-id/29643.1516140301%40sss.pgh.pa.us
[3] 
https://www.postgresql.org/message-id/20180124203616.3gx4vm45hpoijpw3%40alap3.anarazel.de

> On 17-01-2018 0:30, Tom Lane wrote:
> ...
> This is indeed quite a large patch, but it seems to me it could become
> smaller. After a bit of review:
> 
> 1. I do not like what you've done with ParamListInfo. The changes 
> around
> that are invasive, accounting for a noticeable part of the patch bulk,
> and I don't think they're well designed. Having to cast back and forth
> between ParamListInfo and ParamListInfoCommon and so on is ugly and 
> prone
> to cause (or hide) errors. And I don't really understand why
> ParamListInfoPrecalculationData exists at all. Couldn't you have gotten
> the same result with far fewer notational changes by defining another
> PARAM_FLAG bit in the existing pflags field? (Or alternatively, maybe
> the real need here is for another ParamKind value for Param nodes?)
> 
> I also dislike this approach because it effectively throws away the
> support for "virtual" param arrays that I added in commit 6719b238e:
> ParamListInfoPrecalculationData has no support for dynamically 
> determined
> parameter properties, which is surely something that somebody will 
> need.
> (It's just luck that the patch doesn't break plpgsql today.) I realize
> that that's a recent commit and the code I'm complaining about predates
> it, but we need to adjust this so that it fits in with the new 
> approach.
> See comment block at lines 25ff in params.h.

Changed, now only the new flag PARAM_FLAG_PRECALCULATED

> 2. I don't follow the need for the also-rather-invasive changes to 
> domain
> constraint data structures.  I do see that the patch attempts to make
> CoerceToDomain nodes cacheable, which is flat wrong and has to be 
> ripped
> out.  You *cannot* assume that the planner has access to the same 
> domain
> constraints that will apply at runtime.

Removed

> 4. I don't like the way that you've inserted
> "replace_qual_cached_expressions" and
> "replace_pathtarget_cached_expressions" calls into seemingly random 
> places
> in the planner.  Why isn't that being done uniformly during expression
> preprocessing?  There's no apparent structure to where you've put these
> calls, and so they seem really vulnerable to errors of omission.  Also,
> if this were done in expression preprocessing, there'd be a chance of
> combining it with some existing pass over expression trees instead of
> having to do a separate (and expensive) expression tree mutation.
> I can't help suspecting that eval_const_expressions could take this on
> as an additional responsibility with a lot less than a thousand new 
> lines
> of code.

eval_const_expressions is changed accordingly and, thank you, now 
there're fewer omissions)

> 5. BTW, cost_eval_cacheable_expr seems like useless restructuring as 
> well.
> Why aren't you just recursively applying the regular costing function?

Fixed

> And what in the world is
> CheckBoundParams about?  The internal documentation in this patch
> isn't quite nonexistent, but it's well short of being in a
> committable state IMO.

This is a try to improve it..

> 3. I think you should also try hard to get rid of the need for
> PlannedStmt.hasCachedExpr.  AFAICS there's only one place that is
> using that flag, which is exec_simple_check_plan, and I have to
> think there are better ways we could deal with that.  In particular,
> I don't understand why you haven't simply set up plpgsql parameter
> references to be noncacheable.  Or maybe what we'd better do is
> disable CacheExpr insertions into potentially-simple plans in the
> first place.  As you have it here, it's possible for recompilation
> of an expression to result in a change in whether it should be deemed
> simple or not, which will break things (cf commit 00418c612).

<...>

> Another thing that's bothering me is that the execution semantics
> you're proposing for CachedExpr seem rather inflexible.  AFAICS, once a
> CachedExpr has run once, it will hang on to the result value and keep
> returning that for the entire lifespan of the compiled expression.
> We already noted that that breaks plpgsql's "simple expression"
> logic, and it seems inevitable to me that it will be an issue for
> other places as well.  I think it'd be a better design if we had some
> provision for resetting the cached values, short of recompiling the
> expression from scratch.
> 
> One way that occurs to me to do this is to replace the simple boolean
> isExecuted flags with a generation counter, and add a master generation
> counter to ExprState.  The rule for executing CachedExpr would be "if 
> my
> generation counter is different from the ExprState's counter, then
> evaluate the subexpression and copy the ExprState's counter into mine".
> Then the procedure for forcing recalculation of cached values is just 
> to
> increment the ExprState's counter.  There are other ways one could 
> imagine
> doing this --- for instance, I initially thought of keeping the master
> counter in the ExprContext being used to run the expression.  But you 
> need
> some way to remember what counter value was used last with a particular
> expression, so probably keeping it in ExprState is better.

I did something like that..

>> Keeping the stored value of a CachedExpr in a Param slot is an
>> interesting idea indeed.
> 
> We keep coming back to this, IIRC we had a pretty similar discussion
> around redesigning caseValue_datum/isNull domainValue_datum/isNull to 
> be
> less ugly. There also was
> https://www.postgresql.org/message-id/20171116182208.kcvf75nfaldv36uh@alap3.anarazel.de
> where we discussed using something similar to PARAM_EXEC Param nodes to
> allow inlining of volatile functions.
> 
> ISTM, there might be some value to consider all of them in the design 
> of
> the new mechanism.

I'm sorry, the other parts have occupied all the time, and I'll work on 
it..

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] why not parallel seq scan for slow functions
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] generated columns