identifying the backend that owns a temporary schema - Mailing list pgsql-hackers

From Nathan Bossart
Subject identifying the backend that owns a temporary schema
Date
Msg-id 20220815205811.GA250990@nathanxps13
Whole thread Raw
Responses Re: identifying the backend that owns a temporary schema
Re: identifying the backend that owns a temporary schema
List pgsql-hackers
Hi hackers,

As Greg Stark noted elsewhere [0], it is presently very difficult to
identify the PID of the session using a temporary schema, which is
particularly unfortunate when a temporary table is putting a cluster in
danger of transaction ID wraparound.  I noted [1] that the following query
can be used to identify the PID for a given backend ID:

    SELECT bid, pg_stat_get_backend_pid(bid) AS pid FROM pg_stat_get_backend_idset() bid;

But on closer inspection, this is just plain wrong.  The backend IDs
returned by pg_stat_get_backend_idset() might initially bear some
resemblance to the backend IDs stored in PGPROC, so my suggested query
might work some of the time, but the pg_stat_get_backend_* backend IDs
typically diverge from the PGPROC backend IDs as sessions connect and
disconnect.

I think it would be nice to have a reliable way to discover the PID for a
given temporary schema via SQL.  The other thread [2] introduces a helpful
log message that indicates the PID for temporary tables that are in danger
of causing transaction ID wraparound, and I intend for this proposal to be
complementary to that work.

At first, I thought about adding a new function for retrieving the PGPROC
backend IDs, but I am worried that having two sets of backend IDs would be
even more confusing than having one set that can't reliably be used for
temporary schemas.  Instead, I tried adjusting the pg_stat_get_backend_*()
suite of functions to use the PGPROC backend IDs.  This ended up being
simpler than anticipated.  I added a backend_id field to the
LocalPgBackendStatus struct (which is populated within
pgstat_read_current_status()), and I changed pgstat_fetch_stat_beentry() to
bsearch() for the entry with the given PGPROC backend ID.

This does result in a small behavior change.  Currently,
pg_stat_get_backend_idset() simply returns a range of numbers (1 to the
number of active backends).  With the attached patch, this function will
still return a set of numbers, but there might be gaps between the IDs, and
the maximum backend ID will usually be greater than the number of active
backends.  I suppose this might break some existing uses, but I'm not sure
how much we should worry about that.  IMO uniting the backend IDs is a net
improvement.

Thoughts?

[0] https://postgr.es/m/CAM-w4HPCOuJDs4fdkgNdA8FFMeYMULPCAxjPpsOgvCO24KOAVg%40mail.gmail.com
[1] https://postgr.es/m/DDF0D1BC-261D-45C2-961C-5CBDBB41EE71%40amazon.com
[2] https://commitfest.postgresql.org/39/3358/

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Cleaning up historical portability baggage
Next
From: Nathan Bossart
Date:
Subject: Re: [PATCH] Optimize json_lex_string by batching character copying