On Fri, Feb 24, 2023 at 08:54:00PM +0000, Imseih (AWS), Sami wrote:
> I think the only thing to do here is to call this out in docs with a
> suggestion to increase pg_stat_statements.max to reduce the
> likelihood. I also attached the suggested  doc enhancement as well.
Improving the docs about that sounds like a good idea.  This would be
less surprising for users, if we had some details about that.
> Any thoughts?
The risk of deallocation of an entry between the post-analyze hook and
the planner/utility hook represented with two calls of pgss_store()
means that this will never be able to work correctly as long as we
don't know the shape of the normalized query in the second code path
(planner, utility execution) updating the entries with the call
information, etc.  And we only want to pay the cost of normalization
once, after the post-analyze where we do the query jumbling.
Could things be done in a more stable way?  For example, imagine that
we have an extra Query field called void *private_data that extensions
can use to store custom data associated to a query ID, then we could
do something like that:
- In the post-analyze hook, check if an entry with the query ID
calculated exists.
-- If the entry exists, grab a copy of the existing query string,
which may be normalized or not, and save it into Query->private_data.
-- If the entry does not exist, normalize the query, store it in
Query->private_data but do not yet create an entry in the hash table.
- In the planner/utility hook, fetch the normalized query from
private_data, then use it if an entry needs to be created in the hash
table.  The entry may have been deallocated since the post-analyze
hook, in which case it is re-created with the normalized copy saved in
the first phase.
--
Michael