Summary: State of Caching Stable Subexpressions - Mailing list pgsql-hackers
From | James Coleman |
---|---|
Subject | Summary: State of Caching Stable Subexpressions |
Date | |
Msg-id | CAAaqYe9SnzGeXDvzcRNTcWPYbxShht-iW0v5i5x-7BhzAGK3mw@mail.gmail.com Whole thread Raw |
List | pgsql-hackers |
I've recently been thinking about some optimizations to scalar array op expression evaluation [1], and Tom mentioned that I might want to consider looking into previous efforts into caching stable subexpressions as a component of that (particularly if I wanted it to be useful for more than constant arrays. I read through the many threads over the years on this topic, and I thought it'd be worth sending a summary email -- both as a record of the current state of things (for either my or someone else's reference in getting this effort going again) and possibly as a way to generate interest in the subject. The general idea is that non-volatile expressions in a query ought to be able to be calculated once (or once per param change) and reused each time through the scan loop. A somewhat related idea (but I don't believe to be necessary for a version 1) would be to subsequently reduce multiple usages of the same expression within a query to a single evaluation [again, possibly per param change]. This could potentially speed up a wide range of queries, potentially benefiting everything from (executing repeatedly for each tuple) a comparison to an expensive function to detoasting to casting each member of a subquery to internal preprocessing of a value for to allow an optimization in expression evaluation. The first thread on $SUBJECT I'm aware of (courtesy of Tom Lane) was in the 2011-2102 timeframe: [2] "[WIP] Caching for stable expressions with constant arguments v2". This message had a patch attached, but had no replies. Following on the heels of that (and by the same author) we have [3] "Caching for stable expressions with constant arguments v6". There was some decent discussion here, but ultimately the author was unable to continue working on it. In 2017 we have [4] "WIP Patch: Precalculate stable functions" which noted the value for full text search expressions like `WHERE body_tsvector @@ to_tsquery('postgres');`. After a suggestion by Tom to look at the aforementioned thread from 2012, this patch re-emerged in [5] "WIP Patch: Precalculate stable functions, infrastructure v1". From what I can tell this effort advanced the state of this project fairly significantly, and moved to implementing the caching as a PARAM_EXEC param after suggestions from Tom and Andres. This thread also died out, however, but is probably a pretty good starting point for future work and discussion. In 2017 we also have a note in [6] that this effort might also be useful in "Re: Inlining functions with 'expensive' parameters" (specifically for PostGIS in this case). Essentially, if we inline function calls, then we have to worry about cost because we might execute it more than once, but that can be fixed by being able to use one evaluation to back multiple usages in the query. In early 2019 Tom mentioned in [7] that this infrastructure would also likely resolve performance issue Tomas Vondra had noted in "Re: overhead due to casting extra parameters with aggregates (over and over)" . Essentially a subquery returning a large number of numeric values was being implicitly casted (repeatedly) in the main query. Adding an explicit cast in the subquery resolved the issue, but seemed like a pretty significant (and perceptually unnecessary) gotcha. I'm hoping collating this all in one place is helpful; at the very least it will be helpful to me as a reference should I find the time to push this forward some more. James [1]: https://www.postgresql.org/message-id/flat/CAAaqYe-UQBba7sScrucDOyHb7cDoNbWf_rcLrOWeD4ikP3_qTQ%40mail.gmail.com [2]: https://www.postgresql.org/message-id/flat/CABRT9RBdRFS8sQNsJHxZOhC0tJe1x2jnomiz%3DFOhFkS07yRwQA%40mail.gmail.com [3]: https://www.postgresql.org/message-id/flat/CABRT9RA-RomVS-yzQ2wUtZ=m-eV61LcbrL1P1J3jydPStTfc6Q@mail.gmail.com [4]: https://www.postgresql.org/message-id/flat/ba261b9fc25dea4069d8ba9a8fcadf35%40postgrespro.ru [5]: https://www.postgresql.org/message-id/flat/da87bb6a014e029176a04f6e50033cfb%40postgrespro.ru [6]: https://www.postgresql.org/message-id/flat/6480.1510861492%40sss.pgh.pa.us#c296736e96a3ea7a61dc1dd88f1891bc [7]: https://www.postgresql.org/message-id/flat/10046.1569257616%40sss.pgh.pa.us#569f0f9f20be8212201b1df6cdb22ee0
pgsql-hackers by date: