Re: Tracking wait event for latches - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Tracking wait event for latches |
Date | |
Msg-id | CA+TgmoYoT-reYMTN5kau_mK+UY2dB9O6hrKboGEizQ93A1Cm8g@mail.gmail.com Whole thread Raw |
In response to | Re: Tracking wait event for latches (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: Tracking wait event for latches
|
List | pgsql-hackers |
On Wed, Sep 28, 2016 at 9:40 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Sep 28, 2016 at 9:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Sep 28, 2016 at 8:38 AM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> So should I change back the patch to have only one argument for the >>> eventId, and guess classId from it? >> >> Why would you need to guess? > > Incorrect wording from me perhaps? i just meant that processing needs > to know what is the classId coming for a specific eventId. > >> But, yes, I think one argument is much preferable. > > OK. Here is the wanted patch. I have reduced the routines of WaitLatch > & friends to use only one argument, and added what is the classId > associated with a given eventId in an array of multiple fields, giving > something like that: > + const struct wait_event_entry WaitEventEntries[] = { > + /* Activity */ > + {WAIT_ACTIVITY, "ArchiverMain"}, > [...] > > I have cleaned up as well the inclusions of pgstat.h that I added > previously. Patch is attached. It seems to me that you haven't tested this patch very carefully, because as far as I can see it breaks wait event reporting for both heavyweight locks and buffer pins - or in other words two out of the three existing cases covered by the mechanism you are patching. The heavyweight lock portion is broken because WaitOnLock() reports a Lock wait before calling ProcSleep(), which now clobbers it. Instead of reporting that we're waiting for Lock/relation, a LOCK TABLE statement that hangs now reports IPC/ProcSleep. Similarly, a conflict over a tuple lock is now also reported as IPC/ProcSleep, and ditto for any other kind of lock, which were all reported separately before. Obviously, that's no good. I think it would be just fine to push down setting the wait event into ProcSleep(), doing it when we actually WaitLatch() or ResolveRecoveryConflictWithLock(), but we ought to report that we're waiting for the heavyweight lock, not ProcSleep(). As for buffer pins, note that LockBufferForCleanup() calls ProcWaitForSignal(), which now overwrites the wait event set in by its caller. I think we could just make ProcWaitForSignal() take a wait event as an argument; then LockBufferForCleanup() could pass an appropriate constant and other callers of ProcWaitForSignal() could pass their own wait events. The way that we're constructing the wait event ID that ultimately gets advertised in pg_stat_activity is a bit silly in this patch. We start with an event ID (which is an integer) and we then do an array lookup (via GetWaitEventClass) to get a second integer which is then XOR'd back into the first integer (via pgstat_report_wait_start) before storing it in the PGPROC. New idea: let's change pgstat_report_wait_start() to take a single integer argument which it stores without interpretation. Let's separate the construction of the wait event into a separate macro, say make_wait_event(). Then existing callers of pgstat_report_wait_start(a, b) will instead do pgstat_report_wait_start(make_wait_event(a, b)), but callers that want to construct the wait event and push it through a few levels of the call stack before advertising it only need to pass one integer. If done correctly, this should be cleaner and faster than what you've got right now. I am not a huge fan of the "WE_" prefix you've used. It's not terrible, but "we" doesn't obviously stand for "wait event" and it's also a word itself. I think a little more verbosity would add clarity. Maybe we could go with WAIT_EVENT_ instead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: