Re: A micro-optimisation for ProcSendSignal() - Mailing list pgsql-hackers

From Soumyadeep Chakraborty
Subject Re: A micro-optimisation for ProcSendSignal()
Date
Msg-id CAE-ML+__zDoUfwh9ntAhwMF=rJkPoK3iF=RY5Du05FSmHidBEw@mail.gmail.com
Whole thread Raw
In response to Re: A micro-optimisation for ProcSendSignal()  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: A micro-optimisation for ProcSendSignal()  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
HI Thomas,

On Tue, Jul 20, 2021 at 10:40 PM Thomas Munro <thomas.munro@gmail.com> wrote:

> > Slightly tangential: we should add a comment to PGPROC.pgprocno, for more
> > immediate understandability:
> >
> > + int pgprocno; /* index of this PGPROC in ProcGlobal->allProcs */
>
> I wonder why we need this member anyway, when you can compute it from
> the address... #define GetPGProcNumber(p) ((p) - ProcGlobal->allProcs)
> or something like that?  Kinda wonder why we don't use
> GetPGProcByNumber() in more places instead of open-coding access to
> ProcGlobal->allProcs, too...

I tried this out. See attached v4 of your patch with these changes.

> > Also, why don't we take the opportunity to get rid of SERIALIZABLEXACT->pid? We
> > took a stab. Attached is v2 of your patch with these changes.
>
> SERIALIZABLEXACT objects can live longer than the backends that
> created them.  They hang around to sabotage other transactions' plans,
> depending on what else they overlapped with before they committed.
> With that change, the pg_locks view might show the pid of some
> unrelated session that moves into the same PGPROC.

I see.

>
> It's only an "informational" pid, and pids are imperfect information
> anyway because (1) they are themselves recycled, and (2) they won't be
> interesting in a hypothetical multi-threaded future.  One solution
> would be to hide the pids from the view after the backend disconnects
> (somehow -- add a generation number?), but they're also still kinda
> useful, despite the weaknesses.  I wonder what the ideal way would be
> to refer to sessions, anyway, including those that are no longer
> active; perhaps we could invent a new "session ID" concept.

Updating the pg_locks view:

Yes, the pids may be valuable for future debugging/audit purposes. Also,
systems where pid_max is high enough to not see wraparound, will have
pids that are not recycled. I would lean towards showing the pid even
after the backend has exited.

Perhaps we could overload the stored pid to be negated (i.e. a backend
with pid 20000 will become -20000) to indicate that the pid belongs to
a backend that has exited. Additionally, we could introduce a boolean
field in pg_locks "backendisalive", so that the end user doesn't have
to reason about negative pids.

Session ID:

Interesting, Greenplum uses the concept of session ID pervasively. Being
a distributed database, the session ID helps to tie individual backends
on each PG instance to the same session. The session ID of course comes
with its share of bookkeeping:

* These IDs are incrementally dished out with a counter
  (with pg_atomic_add_fetch_u32), in increments of 1, on the Greenplum
  coordinator PG instance in InitProcess.

* The counter is a part of ProcGlobal and itself is initialized to 0 in
  InitProcGlobal, which means that session IDs are reset on cluster
  restart.

* The sessionID tied to each proceess is maintained in PGPROC.

* The sessionID finds its way into PgBackendStatus, which is further
  reported with pg_stat_get_activity.

A session ID seems a bit heavy just to indicate whether a backend has
exited.

Regards,
Soumyadeep

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Followup Timestamp to timestamp with TZ conversion
Next
From: Pavel Stehule
Date:
Subject: proposal: plpgsql: special comments that will be part of AST