Re: A micro-optimisation for ProcSendSignal() - Mailing list pgsql-hackers
From | Soumyadeep Chakraborty |
---|---|
Subject | Re: A micro-optimisation for ProcSendSignal() |
Date | |
Msg-id | CAE-ML+__zDoUfwh9ntAhwMF=rJkPoK3iF=RY5Du05FSmHidBEw@mail.gmail.com Whole thread Raw |
In response to | Re: A micro-optimisation for ProcSendSignal() (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: A micro-optimisation for ProcSendSignal()
|
List | pgsql-hackers |
HI Thomas, On Tue, Jul 20, 2021 at 10:40 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > Slightly tangential: we should add a comment to PGPROC.pgprocno, for more > > immediate understandability: > > > > + int pgprocno; /* index of this PGPROC in ProcGlobal->allProcs */ > > I wonder why we need this member anyway, when you can compute it from > the address... #define GetPGProcNumber(p) ((p) - ProcGlobal->allProcs) > or something like that? Kinda wonder why we don't use > GetPGProcByNumber() in more places instead of open-coding access to > ProcGlobal->allProcs, too... I tried this out. See attached v4 of your patch with these changes. > > Also, why don't we take the opportunity to get rid of SERIALIZABLEXACT->pid? We > > took a stab. Attached is v2 of your patch with these changes. > > SERIALIZABLEXACT objects can live longer than the backends that > created them. They hang around to sabotage other transactions' plans, > depending on what else they overlapped with before they committed. > With that change, the pg_locks view might show the pid of some > unrelated session that moves into the same PGPROC. I see. > > It's only an "informational" pid, and pids are imperfect information > anyway because (1) they are themselves recycled, and (2) they won't be > interesting in a hypothetical multi-threaded future. One solution > would be to hide the pids from the view after the backend disconnects > (somehow -- add a generation number?), but they're also still kinda > useful, despite the weaknesses. I wonder what the ideal way would be > to refer to sessions, anyway, including those that are no longer > active; perhaps we could invent a new "session ID" concept. Updating the pg_locks view: Yes, the pids may be valuable for future debugging/audit purposes. Also, systems where pid_max is high enough to not see wraparound, will have pids that are not recycled. I would lean towards showing the pid even after the backend has exited. Perhaps we could overload the stored pid to be negated (i.e. a backend with pid 20000 will become -20000) to indicate that the pid belongs to a backend that has exited. Additionally, we could introduce a boolean field in pg_locks "backendisalive", so that the end user doesn't have to reason about negative pids. Session ID: Interesting, Greenplum uses the concept of session ID pervasively. Being a distributed database, the session ID helps to tie individual backends on each PG instance to the same session. The session ID of course comes with its share of bookkeeping: * These IDs are incrementally dished out with a counter (with pg_atomic_add_fetch_u32), in increments of 1, on the Greenplum coordinator PG instance in InitProcess. * The counter is a part of ProcGlobal and itself is initialized to 0 in InitProcGlobal, which means that session IDs are reset on cluster restart. * The sessionID tied to each proceess is maintained in PGPROC. * The sessionID finds its way into PgBackendStatus, which is further reported with pg_stat_get_activity. A session ID seems a bit heavy just to indicate whether a backend has exited. Regards, Soumyadeep
Attachment
pgsql-hackers by date: