Re: WIP Patch: Precalculate stable functions, infrastructure v1 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: WIP Patch: Precalculate stable functions, infrastructure v1
Date
Msg-id 20180124192048.rvt67r2lqpdixhhv@alap3.anarazel.de
Whole thread Raw
In response to Re: WIP Patch: Precalculate stable functions, infrastructure v1  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: WIP Patch: Precalculate stable functions, infrastructure v1
List pgsql-hackers
Hi,


On 2018-01-16 17:05:01 -0500, Tom Lane wrote:
> [ I'm sending this comment separately because I think it's an issue
> Andres might take an interest in. ]

Thanks for that. I indeed am interested. Sorry for the late response,
was very deep into the JIT patch.


> Marina Polyakova <m.polyakova@postgrespro.ru> writes:
> > [ v7-0001-Precalculate-stable-and-immutable-functions.patch ]
> 
> Another thing that's bothering me is that the execution semantics
> you're proposing for CachedExpr seem rather inflexible.  AFAICS, once a
> CachedExpr has run once, it will hang on to the result value and keep
> returning that for the entire lifespan of the compiled expression.
> We already noted that that breaks plpgsql's "simple expression"
> logic, and it seems inevitable to me that it will be an issue for
> other places as well.  I think it'd be a better design if we had some
> provision for resetting the cached values, short of recompiling the
> expression from scratch.

Hm. Yes, that makes me uncomfortable as well.


> One way that occurs to me to do this is to replace the simple boolean
> isExecuted flags with a generation counter, and add a master generation
> counter to ExprState.  The rule for executing CachedExpr would be "if my
> generation counter is different from the ExprState's counter, then
> evaluate the subexpression and copy the ExprState's counter into mine".
> Then the procedure for forcing recalculation of cached values is just to
> increment the ExprState's counter.  There are other ways one could imagine
> doing this --- for instance, I initially thought of keeping the master
> counter in the ExprContext being used to run the expression.  But you need
> some way to remember what counter value was used last with a particular
> expression, so probably keeping it in ExprState is better.

I'm not a big fan of this solution. We seem to be inventing more and
more places we keep state, rather than the contrary.


> Or we could just brute-force it by providing a function that runs through
> a compiled expression step list and resets the isExecuted flag for each
> EEOP_CACHEDEXPR_IF_CACHED step it finds.  A slightly less brute-force
> way is to link those steps together in a list, so that the function
> doesn't have to visit irrelevant steps.  If the reset function were seldom
> used then the extra cycles for this wouldn't be very expensive.  But I'm
> not sure it will be seldom used --- it seems like plpgsql simple
> expressions will be doing this every time --- so I think the counter
> approach might be a better idea.

Hm, that sounds like it'd not be cheap.


> I'm curious to know whether Andres has some other ideas, or whether he
> feels this is all a big wart on the compiled-expression concept.

I don't have too many "artistic" concerns from the compiled expression
POV. The biggest issue I see is that it'll make it a bit harder to
separate out the expression compilation phase from the expression
instantiation phase - something I think we definitely want.


> I don't think there are any existing cases where we keep any
> meaningful state across executions of a compiled-expression data
> structure; maybe that's a bad idea in itself.

To me, who has *not* followed the thread in detail, it sounds like the
relevant data shouldn't be stored inside the expression itself.  For
one, we do not want to have to visit every single simple expression and
reset them, for another it architecturally doesn't seem the right place
to me.  Storing all cached values in an EState or ExprContext (the
latter referring to the former) somewhat alike the values for Param's
sounds a lot more reasonable to me.

Besides that it seems to make it a lot easier to reset the values, it
also seems like it makes it a lot cleaner to cache stable functions
across multiple expressions in different places in a query? ISTM having
expression steps to actually compute the expression value in every
referencing expression is quite the waste.

This all reminds me a lot of the infrastructure for Params...

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Next
From: Bruce Momjian
Date:
Subject: Re: Would a BGW need shmem_access or database_connection toenumerate databases?