Re: [PATCH] Identify LWLocks in tracepoints - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: [PATCH] Identify LWLocks in tracepoints |
Date | |
Msg-id | CAGRY4nxrZiGrnZOqZ+92JFMwE8q=0SP1hBZn8v6ZtYBfBWMroA@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Identify LWLocks in tracepoints (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [PATCH] Identify LWLocks in tracepoints
|
List | pgsql-hackers |
On Tue, 13 Apr 2021 at 11:06, Andres Freund <andres@anarazel.de> wrote: > > Each backend can have different tranche IDs (right?) > > No, they have to be the same in each. Note how the tranche ID is part of > struct LWLock. Which is why LWLockNewTrancheId() has to acquire a lock > etc. Ah. I misunderstood that at some point. That makes it potentially more sensible to skip reporting tranche names. Not great, because it's much less convenient to work with trace data full of internal ordinals that must be re-mapped in post-processing. But I'm generally OK with deferring runtime costs to tooling rather than the db itself so long as doing so is moderately practical. In this case, I think we could likely get away with removing the tranche names from the tracepoints if we instead emit a trace event on each dynamic tranche registration that reports the tranche id -> name mapping. It still sucks for tools, since they have to scrape up the static tranche registrations from somewhere else, but ... it'd be tolerable. > > The kernel is packed with extremely useful trace events, and for very > > good reasons. Some on very hot paths. > > IIRC those aren't really comparable - the kernel actually does modify > the executable code to replace the tracepoints with nops. Same with userspace static trace markers (USDTs). A followup mail will contain a testcase and samples to demonstrate this. > Although I still don't really buy that static tracepoints are the best > way to measure this kind of thing, given the delay introducing them and > the cost of having them around. I think I pointed out > https://postgr.es/m/20200813004233.hdsdfvufqrbdwzgr%40alap3.anarazel.de > before. Yeah. Semaphores are something hot enough that I'd hesitate to touch them. > > LWLock lock-ordering deadlocks > > This seems unrelated to tracepoints to me. If I can observe which locks are acquired in which order by each proc, I can then detect excessive waits and report the stack of held locks of both procs and their order of acquisition. Since LWLocks shmem state doesn't AFAICS track any information on the lock holder(s) I don't see a way to do this in-process. It's not vital, it's just one of the use cases I have in mind. I suspect that any case where such deadlocks are possible represents a misuse of LWLocks anyway. > > and there's no way to know what a given non-built-in tranche ID means > > for any given backend without accessing backend-specific in-memory > > state. Including for non-user-accessible backends like bgworkers and > > auxprocs, where it's not possible to just query the state from a view > > directly. > > The only per-backend part is that some backends might not know the > tranche name for dynamically registered tranches where the > LWLockRegisterTranche() hasn't been executed in a backend. Which should > pretty much never be an aux process or such. And even for bgworkers it > seems like a pretty rare thing, because those need to be started by > something... > > It might be worth proposing a shared hashtable with tranch names and > jut reserving enough space for ~hundred entries... Yeah, that'd probably work and be cheap enough not to really matter. Might even save us a chunk of memory by not turning CoW pages into private mappings for each backend during registration. > > And you can always build without `--enable-dtrace` and ... just not care. > > Practically speaking, distributions enable it, which then incurs the > cost for everyone. Yep. That's part of why I was so surprised to notice the GetLWTrancheName() function call in LWLock tracepoints. Nearly anywhere else it wouldn't matter at all, but LWLocks are hot enough that it just might matter for the no-wait fastpath.
pgsql-hackers by date: