Hi,
Currently prepared statements store the whole query string that was submitted
by the client at the time of the PREPARE as-is. This is usually fine, but if
that query was a multi-statement query string it can lead to a waste of memory.
There are some pattern that are more likely to have such overhead, mine being
an application with a fixed set of prepared statements that are sent at the
connection start using a single query to avoid extra round trips.
One naive example of the outcome is as follow:
#= PREPARE s1 AS SELECT 1\; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE
relname = $1\; PREPARE s3(int, int) AS SELECT $1 + $2;
PREPARE
PREPARE
PREPARE
=# SELECT name, statement FROM pg_prepared_statements ;
name | statement
------+----------------------------------------------------------------------------
s1 | PREPARE s1 AS SELECT 1; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+
| relname = $1; PREPARE s3(int, int) AS SELECT $1 + $2;
s2 | PREPARE s1 AS SELECT 1; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+
| relname = $1; PREPARE s3(int, int) AS SELECT $1 + $2;
s3 | PREPARE s1 AS SELECT 1; PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+
| relname = $1; PREPARE s3(int, int) AS SELECT $1 + $2;
(3 rows)
The more prepared statements you have the bigger the waste. This is also not
particularly readable for people who want to rely on the pg_prepared_statements
views, as you need to parse the query again yourself to figure out what exactly
is the associated query.
I assume that some other patterns could lead to other kind of problems. For
instance if the query string includes a prepared statement and some DML, it
could lead some automated program to replay both the PREPARE and DML when only
the PREPARE was intended.
I'm attaching a POC patch to fix that behavior by teaching PREPARE to clean the
passed query text the same way as pg_stat_statements. Since it relies on the
location saved during parsing the overhead should be minimal, and only present
when some space can actually be saved. Note that I first tried to have the
cleanup done in CreateCachedPlan so that it's done everywhere including things
like the extended protocol but this lead to too many issues so I ended up doing
it for an explicit PREPARE statement only.
With this patch applied, the above scenario gives this new output:
=# SELECT name, statement FROM pg_prepared_statements ;
name | statement
------+----------------------------------------------------
s1 | PREPARE s1 AS SELECT 1
s2 | PREPARE s2(text) AS SELECT oid FROM pg_class WHERE+
| relname = $1
s3 | PREPARE s3(int, int) AS SELECT $1 + $2
(3 rows)
One possible issue is that any comment present at the beginning of the query
text would be discarded. I'm not sure if that's something used by e.g.
pg_hint_plan, but if yes it's always possible to put the statement in front of
the SELECT (or other actual first keyword) rather than the PREPARE itself to
preserve it.