Hi hackers,
As Greg Stark noted elsewhere [0], it is presently very difficult to
identify the PID of the session using a temporary schema, which is
particularly unfortunate when a temporary table is putting a cluster in
danger of transaction ID wraparound. I noted [1] that the following query
can be used to identify the PID for a given backend ID:
SELECT bid, pg_stat_get_backend_pid(bid) AS pid FROM pg_stat_get_backend_idset() bid;
But on closer inspection, this is just plain wrong. The backend IDs
returned by pg_stat_get_backend_idset() might initially bear some
resemblance to the backend IDs stored in PGPROC, so my suggested query
might work some of the time, but the pg_stat_get_backend_* backend IDs
typically diverge from the PGPROC backend IDs as sessions connect and
disconnect.
I think it would be nice to have a reliable way to discover the PID for a
given temporary schema via SQL. The other thread [2] introduces a helpful
log message that indicates the PID for temporary tables that are in danger
of causing transaction ID wraparound, and I intend for this proposal to be
complementary to that work.
At first, I thought about adding a new function for retrieving the PGPROC
backend IDs, but I am worried that having two sets of backend IDs would be
even more confusing than having one set that can't reliably be used for
temporary schemas. Instead, I tried adjusting the pg_stat_get_backend_*()
suite of functions to use the PGPROC backend IDs. This ended up being
simpler than anticipated. I added a backend_id field to the
LocalPgBackendStatus struct (which is populated within
pgstat_read_current_status()), and I changed pgstat_fetch_stat_beentry() to
bsearch() for the entry with the given PGPROC backend ID.
This does result in a small behavior change. Currently,
pg_stat_get_backend_idset() simply returns a range of numbers (1 to the
number of active backends). With the attached patch, this function will
still return a set of numbers, but there might be gaps between the IDs, and
the maximum backend ID will usually be greater than the number of active
backends. I suppose this might break some existing uses, but I'm not sure
how much we should worry about that. IMO uniting the backend IDs is a net
improvement.
Thoughts?
[0] https://postgr.es/m/CAM-w4HPCOuJDs4fdkgNdA8FFMeYMULPCAxjPpsOgvCO24KOAVg%40mail.gmail.com
[1] https://postgr.es/m/DDF0D1BC-261D-45C2-961C-5CBDBB41EE71%40amazon.com
[2] https://commitfest.postgresql.org/39/3358/
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com