Re: Tracking wait event for latches - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Tracking wait event for latches
Date
Msg-id CAEepm=3tC5uuZdqYq9vN4HwijRzqbpARa-8FoT2Q1RuAAJ9+6Q@mail.gmail.com
Whole thread Raw
In response to Re: Tracking wait event for latches  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Tracking wait event for latches  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Sep 23, 2016 at 1:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Sep 21, 2016 at 5:49 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>>> Moreover, it's pretty confusing that we have this general concept of
>>> wait events in pg_stat_activity, and then here the specific type of
>>> wait event we're waiting for is the ... wait event kind.  Uh, what?
>>
>> Yeah, that is confusing.  It comes about because of the coincidence
>> that pg_stat_activity finished up with a wait_event column, and our IO
>> multiplexing abstraction finished up with the name WaitEventSet.
>> <stuck-record-mode>We could instead call these new things "wait
>> points", because, well, erm, they name points in the code at which we
>> wait.  They don't name events (they just happen to use the
>> WaitEventSet mechanism, which itself does involve events: but those
>> events are things like "hey, this socket is now ready for
>> writing").</>
>
> Sure, we could do that, but it means breaking backward compatibility
> for pg_stat_activity *again*.  I'd be willing to do it but I don't
> think I'd win any popularity contests.

I didn't mean changing the column headings in pg_stat_activity.  I
meant that the enum called WaitEventIdentifier in Michael's v5 patch
should be called WaitPointIdentifier, and if we go with a single name
to appear in the wait_event_type column then it could be "WaitPoint".
But I would also prefer to see something more informative in that
column, as discussed below (and upthread).

>> Well we already discussed a couple of different ways to get "Socket",
>> "Latch", "Socket|Latch", ... or something like that into the
>> wait_event_type column or new columns.  Wouldn't that be better, and
>> automatically fall out of the code rather than needing human curated
>> categories?  Although Michael suggested that that should be done as a
>> separate patch on top of the basic feature.
>
> I think making that a separate patch is just punting the decision down
> the field to a day that may never come.  Let's try to agree on
> something that we can all live with and implement it now.  In terms of
> avoiding human-curated categories, I basically see two options:
>
> 1. Find a name that is sufficiently generic that it covers everything
> that might reach WaitEventSetWait either now or in the future when it
> might wait for even more kinds of things than it does now.  For
> example, we could call it "Stuff" or "Thing".  Less tongue-in-cheek
> suggestions are welcome, but it's hard to come up with something that
> is sufficiently-generic without being tautological.  "Event" is an
> example of a name which is general enough to encompass everything but
> also stupid: the column is called "wait_event" so everything that
> appears in it is an event by definition.
>
> 2. Classify the events that fall into this category by some rigid
> principle based on the types of things being awaited.  For example, we
> could decide that if any sockets are awaited, the event class will be
> "Client" if it is connected to a user and "IPC" for auxiliary
> processes.
>
> [...]
> occur.  Likewise, it was pretty easy to find all of the events that
> were waiting for client I/O, and I grouped those all under "Client".
> A few of the remaining wait events seemed like they were clearly
> waiting for a particular timeout to expire, so I gave those their own
> "Timeout" category.

Interesting.  OK, I agree that it'd be useful to show that we're
waiting because there's nothing happening, or waiting because the user
asked us to sleep, or waiting on IO, or waiting for an IPC response
because something is happening, and that higher level information is
difficult/impossible to extract automatically from the WaitEventSet.

I understand that "Activity" is the category of wait points that are
waiting for activity, but I wonder if it might be clearer to users if
that were called "Idle", because it's the category of idle waits
caused by non-activity.

Why is WalSenderMain not in that category alongside WalReceiverMain?
Hmm, actually it's kind of a tricky one: whether it's really idle or
waiting for IO depends.  It's always ready to wait for clients to send
messages, but I'd say that's part of its "idle" behaviour.  But it's
sometimes waiting for the socket to be writable: if
(pq_is_send_pending()) wakeEvents |= WL_SOCKET_WRITABLE, and that's
when it's definitely not idle, it's actively trying to feed WAL down
the pipe.  Do we want to get into dynamic categories depending on
conditions like that?

I was thinking about suggesting a category "Replication" to cover the
waits for client IO relating to replication, as opposed to client IO
waits relating to regular user connections.  Then you could put sync
rep into that category instead of IPC, even though technically it is
waiting for IPC from walsender process(es), on the basis that it's
more newsworthy to a DBA that it's really waiting for a remote replica
to respond.  But it's probably pretty clear what's going on from the
the wait point names, so maybe it's not worth a category.  Thoughts?

WalSenderWaitForWAL is waiting for both IPC and Client, but if you
have to pick one isn't IPC the main thing it's waiting for?  It'll
stop waiting when it gets IPC telling it about flushed location
reaching some point.  It's multiplexing client IO processing sort of
"incidentally" while it does so.

WalReceiverWaitStart is waiting for IPC, not Client.

>> Obviously this gets complicated by the existence of static functions
>> whose names are ambiguous and lack context, and multiple wait points
>> in a single function.  Previously I've suggested a hierarchical
>> arrangement for these names which might help with that.  Imagine names
>> like: executor.Hash.<function> (reported by a background worker
>> executing a hypothetical parallel hash join),
>> executor.Hash.<function>.<something> to disambiguate multiple wait
>> points in one function, walsender.<function> etc.  That way we could
>> have a tidy curated meaningful naming scheme based on modules, but a
>> no-brainer systematic way to name the most numerous leaf bits of that
>> hierarchy.  Just an idea...
>
> Considering we only have a few dozen of those, this feels like it
> might be overkill to me, and I suspect we'll end up finding that it's
> a bit harder to make it consistent and useful than one might hope.  I
> am basically happy with the way Michael named them, but that's not to
> say I could never be happy with anything else.

Yeah, it's fine as it is.

I do suspect that the set of wait points will grow quite a bit as we
develop more parallel stuff though.  For example, I have been working
on a patch that adds several more wait points, indirectly via
condition variables (using your patch).  Actually in my case it's
BarrierWait -> ConditionVariableWait -> WaitEventSetWait.  I propose
that these higher level wait primitives should support passing a wait
point identifier through to WaitEventSetWait.  Some of my patch's wait
points are in the same function and some are in static functions, so
I'll finish up making up various affixes to disambiguate, and then I
might be tempted to include some separator characters...  I am also
aware of three other hackers who are working on patches that also make
use of condition variables, so (if you agree that condition variable
waits are wait points) they'll be adding more too.  That said, ad-hoc
name construction is fine and there is no reason we couldn't revise
things if names start to look messy when concrete patches that add
more names emerge.

-- 
Thomas Munro
http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: pg_upgrade vs user created range type extension
Next
From: Tom Lane
Date:
Subject: Re: pg_upgrade vs user created range type extension