Summary: State of Caching Stable Subexpressions - Mailing list pgsql-hackers

From James Coleman
Subject Summary: State of Caching Stable Subexpressions
Date
Msg-id CAAaqYe9SnzGeXDvzcRNTcWPYbxShht-iW0v5i5x-7BhzAGK3mw@mail.gmail.com
Whole thread Raw
List pgsql-hackers
I've recently been thinking about some optimizations to scalar array
op expression evaluation [1], and Tom mentioned that I might want to
consider looking into previous efforts into caching stable
subexpressions as a component of that (particularly if I wanted it to
be useful for more than constant arrays.

I read through the many threads over the years on this topic, and I
thought it'd be worth sending a summary email -- both as a record of
the current state of things (for either my or someone else's reference
in getting this effort going again) and possibly as a way to generate
interest in the subject.

The general idea is that non-volatile expressions in a query ought to
be able to be calculated once (or once per param change) and reused
each time through the scan loop. A somewhat related idea (but I don't
believe to be necessary for a version 1) would be to subsequently
reduce multiple usages of the same expression within a query to a
single evaluation [again, possibly per param change].

This could potentially speed up a wide range of queries, potentially
benefiting everything from (executing repeatedly for each tuple) a
comparison to an expensive function to detoasting to casting each
member of a subquery to internal preprocessing of a value for to allow
an optimization in expression evaluation.

The first thread on $SUBJECT I'm aware of (courtesy of Tom Lane) was
in the 2011-2102 timeframe: [2] "[WIP] Caching for stable expressions
with constant arguments v2". This message had a patch attached, but
had no replies. Following on the heels of that (and by the same
author) we have [3] "Caching for stable expressions with constant
arguments v6". There was some decent discussion here, but ultimately
the author was unable to continue working on it.

In 2017 we have [4] "WIP Patch: Precalculate stable functions" which
noted the value for full text search expressions like `WHERE
body_tsvector @@ to_tsquery('postgres');`. After a suggestion by Tom
to look at the aforementioned thread from 2012, this patch re-emerged
in [5] "WIP Patch: Precalculate stable functions, infrastructure v1".
From what I can tell this effort advanced the state of this project
fairly significantly, and moved to implementing the caching as a
PARAM_EXEC param after suggestions from Tom and Andres. This thread
also died out, however, but is probably a pretty good starting point
for future work and discussion.

In 2017 we also have a note in [6] that this effort might also be
useful in "Re: Inlining functions with 'expensive' parameters"
(specifically for PostGIS in this case). Essentially, if we inline
function calls, then we have to worry about cost because we might
execute it more than once, but that can be fixed by being able to use
one evaluation to back multiple usages in the query.

In early 2019 Tom mentioned in [7] that this infrastructure would also
likely resolve performance issue Tomas Vondra had noted in "Re:
overhead due to casting extra parameters with aggregates (over and
over)" . Essentially a subquery returning a large number of numeric
values was being implicitly casted (repeatedly) in the main query.
Adding an explicit cast in the subquery resolved the issue, but seemed
like a pretty significant (and perceptually unnecessary) gotcha.

I'm hoping collating this all in one place is helpful; at the very
least it will be helpful to me as a reference should I find the time
to push this forward some more.

James

[1]: https://www.postgresql.org/message-id/flat/CAAaqYe-UQBba7sScrucDOyHb7cDoNbWf_rcLrOWeD4ikP3_qTQ%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/flat/CABRT9RBdRFS8sQNsJHxZOhC0tJe1x2jnomiz%3DFOhFkS07yRwQA%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/flat/CABRT9RA-RomVS-yzQ2wUtZ=m-eV61LcbrL1P1J3jydPStTfc6Q@mail.gmail.com
[4]: https://www.postgresql.org/message-id/flat/ba261b9fc25dea4069d8ba9a8fcadf35%40postgrespro.ru
[5]: https://www.postgresql.org/message-id/flat/da87bb6a014e029176a04f6e50033cfb%40postgrespro.ru
[6]: https://www.postgresql.org/message-id/flat/6480.1510861492%40sss.pgh.pa.us#c296736e96a3ea7a61dc1dd88f1891bc
[7]: https://www.postgresql.org/message-id/flat/10046.1569257616%40sss.pgh.pa.us#569f0f9f20be8212201b1df6cdb22ee0



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?
Next
From: James Coleman
Date:
Subject: Re: execExprInterp() questions / How to improve scalar array op expr eval?