Thread: Prepared statements plan_cache_mode considerations

Prepared statements plan_cache_mode considerations

From
Zain Kabani
Date:
Hi team,

I was looking into using prepared statements and using the auto plan_cache_mode. The issue is that the current sample of data collected to determine whether to use custom plans or the generic one is very small and susceptible to a bad set of queries that might pick the suboptimal choice. It’s currently hard coded as 5 custom plans [https://github.com/postgres/postgres/blob/master/src/backend/utils/cache/plancache.c#L1051],  but I’d like to see this as a configurable parameter if possible. Failing that - I’d love to hear any recommendations on who to deal with this?

I'd want PG to be accurate at picking between the custom_plan and generic_plan strategy, so that we could safely realize the benefits that cached plans can give us. 

I’d also be curious to know people’s experience with using this and if moving to using prepared statements has resulted in a latency regression due to a bad strategy being used in production.
I’m a contributor to the PgCat [https://github.com/postgresml/pgcat] project and recently added support for prepared statements so I’m looking to understand the space a little better as we would be looking to migrate services that were previously not using prepared statements to using them.

--
Thanks,
Zain Kabani

Re: Prepared statements plan_cache_mode considerations

From
"David G. Johnston"
Date:
On Thu, Oct 26, 2023 at 10:56 AM Zain Kabani <zain.kabani@instacart.com> wrote:
I was looking into using prepared statements and using the auto plan_cache_mode. The issue is that the current sample of data collected to determine whether to use custom plans or the generic one is very small and susceptible to a bad set of queries that might pick the suboptimal choice.

It seems unreasonable for the live instance of PostgreSQL to figure this out on-the-fly.  A more useful approach would seem to be: use "always custom" for inexpensive plans over simple queries until you hit a point of pain.  Then test whether changing to "always generic" results in a better outcome given the specific situation you encountered.

The choice of 5 is basically the trade-off between people using prepared statements solely for their sql injection protection benefits without actually reusing them and never getting to a point of switching to a generic plan because the amount of time it takes to be confident (get a statistically significant sampling) is too far out.  How would you go about deciding what an appropriate value for this setting if you did have control of it?

If anything, removing the arbitrary number and basing the decision on, say, whether or not the incoming parameter is within the MCV array, would seem to be a way to expend a bit more effort with a cheap lookup that likely will save the need for a custom plan.  I would think some plans - say ones related to PK lookups, would simply be assumed better off done with a generic plan involving an index scan (maybe based upon a table size check) derived from the initial custom plan.

David J.