Re: RFC: replace pg_stat_activity.waiting with something more descriptive - Mailing list pgsql-hackers

From Robert Haas
Subject Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Date
Msg-id CA+TgmobUGGEnQrisPKWYFJeicm24CVkYov7zN89Xjbw4kf5SwQ@mail.gmail.com
Whole thread Raw
In response to Re: RFC: replace pg_stat_activity.waiting with something more descriptive  (Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>)
Responses Re: RFC: replace pg_stat_activity.waiting with something more descriptive  (Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>)
List pgsql-hackers
On Wed, Aug 5, 2015 at 1:10 PM, Ildus Kurbangaliev
<i.kurbangaliev@postgrespro.ru> wrote:
> About `memcpy`, PgBackendStatus struct already have a bunch of multi-byte
> variables,  so it will be
> not consistent anyway if somebody will want to copy it in that way. On the
> other hand two bytes in this case
> give less overhead because we can avoid the offset calculations. And as I've
> mentioned before the class
> of wait will be useful when monitoring of waits will be extended.

You're missing the point.  Those multi-byte fields have additional
synchronization requirements, as I explained in some detail in my
previous email. You can't just wave that away.

When people raise points in review, you need to respond to those with
discussion, not just blast out a new patch version that may or may not
have made some changes in that area.  Otherwise, you're wasting the
time of the people who are reviewing, which is not nice.

>> 1. have you tested this under -DEXEC_BACKEND ?  I wonder if those
>> initializations are going to work on Windows.
>
> No, it wasn't tested on Windows

I don't think it's going to work on Windows.
CreateSharedMemoryAndSemaphores() is called once only from the
postmaster on non-EXEC_BACKEND builds, but on EXEC_BACKEND builds
(i.e. Windows) it's called for every process startup.  Thus, it's got
to be idempotent: if the shared memory structures it's looking for
already exist, it must not try to recreate them.  You have, for
example, InitBufferPool() calling LWLockCreateTranche(), which
unconditionally assigns a new tranche ID.  It can't do that; all of
the processes in the system have to agree on what the tranche IDs are.

The general problem that I see with splitting apart the main lwlock
array into many tranches is that all of those tranches need to get set
up properly - with matching tranche IDs - in every backend.  In
non-EXEC_BACKEND builds, that's basically free, but on EXEC_BACKEND
builds it isn't.  I think that's OK; this won't be the first piece of
state where EXEC_BACKEND builds incur some extra overhead.  But we
should make an effort to keep that overhead small.

The way to do that, I think, is to store some state in shared memory
that, in EXEC_BACKEND builds, will allow new postmaster children to
correctly re-register all of the tranches.  It seems to me that we can
do this as follows:

1. Have a compiled-in array of built-in tranche names.
2. In LWLockShmemSize(), work out the number of lwlocks each tranche
should contain.
3. In CreateLWLocks(), if IsUnderPostmaster, grab enough shared memory
for all the lwlocks themselves plus enough extra shared memory for one
pointer per tranche.  Store pointers to the start of each tranche in
shared memory, and initialize all the lwlocks.
4. In CreateLWLocks(), if tranche registration has not yet been done
(either because we are the postmaster, or because this is the
EXEC_BACKEND case) loop over the array of built-in tranche names and
register each one with the corresponding address grabbed from shared
memory.

A more radical redesign would be to have each tranche of LWLocks as a
separate chunk of shared memory, registered with ShmemInitStruct(),
and let EXEC_BACKEND children find those chunks by name using
ShmemIndex.  But that seems like more work to me for not much benefit,
especially since there's this weird thing where lwlocks are
initialized before ShmemIndex gets set up.

Yet another possible approach is to have each module register its own
tranche and track its own tranche ID using the same kind of strategy
that replication origins and WAL insert locks already employ.  That
may well be the right long-term strategy, but it seems like sort of a
pain to all that effort right now for this project, so I'm inclined to
hack on the approach described above and see if I can get that working
for now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: LWLock deadlock and gdb advice
Next
From: Andres Freund
Date:
Subject: Re: LWLock deadlock and gdb advice