Re: Adding wait events statistics - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Adding wait events statistics |
Date | |
Msg-id | 7wh6dalioz2kxc43efxeiwgb6gjzhfq4hz6zxkggzpqopk57rp@ji22dyzvjem5 Whole thread Raw |
In response to | Re: Adding wait events statistics (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
List | pgsql-hackers |
Hi, On 2025-07-22 12:24:46 +0000, Bertrand Drouvot wrote: > Anyway, let's forget about eBPF, I ran another experiment by counting the cycles > with: > > static inline uint64_t rdtsc(void) { > uint32_t lo, hi; > __asm__ __volatile__("rdtsc" : "=a" (lo), "=d" (hi)); > return ((uint64_t)hi << 32) | lo; > } > > and then calling this function before and after waitEventIncrementCounter() > and also at wait_start() and wait_end() (without the increment counters patchs). I think you're still going to get massively increased baseline numbers that way - the normal cost of a wait event is under 10 cycles. Doing two rdtscs costs somewhere between 60-90 cycles. Which means that any increase due to counters & timers will look a lot cheaper if compared to that increased baseline, than if you compared to the actual current cost of a wait event. > So that we can compare with the percentile cycles per wait events (see attached). > > We can see that, for those wait classes, all their wait events overhead would be > < 5% and more precisely: > > Overhead on the lock class is about 0.03% > Overhead on the timeout class is less than 0.01% > > and now we can also see that: > > Overhead on the lwlock class is about 1% > Overhead on the client class is about 0.5% > Overhead on the bufferpin class is about 0.2% I think that's largely because there is relatively few such wait events, because there is very very little contention in the regression tests and we just don't do a whole lot intensive things in the tests. I suspect that at least some of the high events here will actually be due to tests that explicitly test the contention behaviour, and thus will have very high wait times. E.g. if you measure client timings, the overhead here will be fairly low, because we're not going to be CPU bound by the back/forth between client and server, and thus many of the waits will be longer. If you instead measure a single client readonly pgbench, it'll look different. Similar, if you have lwlock contention in a real world workload, most of the waits will be incredibly short, but in our tests that will not necessarily be the case. > while the io and ipc classes have mixed results. > > So based on the cycles metric I think it looks pretty safe to implement for the > whole majority of classes. This precisely is why I am scared of this effort. If you only look at it in the right light, it'll look cheap, but in other cases it'll cause measureable slowdowns. > > I also continue to not believe that pure event counters are going to be useful > > for the majority of wait events. I'm not sure it is really interesting for > > *any* wait event that we don't already have independent stats for. > > For pure counters only I can see your point, but for counters + timings are you > also not convinced? For counters + timings I can see that it'd be useful. But i don't believe it's close to as cheap as you say it is. > > I think if we do want to have wait events that have more details, we need to: > > > > a) make them explicitly opt-in, i.e. code has to be changed over to use the > > extended wait events > > b) the extended wait events need to count both the number of encounters as > > well as the duration, the number of encounters is not useful on its own > > c) for each callsite that is converted to the extended wait event, you either > > need to reason why the added overhead is ok, or do a careful experiment > > > > I do agree with the above, what do you think about this lastest experiment counting > the cycles? I continue to not believe it at all, sorry. Even if the counting method were accurate, you can't use our tests to measure the relative overhead, as they aren't actually exercising the paths leading to waits > > Personally I'd rather have an in-core sampling collector, counting how often > > it sees certain wait events when sampling. > > Yeah but even if we are okay with losing "counters" by sampling, we'd still not get > the duration. For the duration to be meaningful we also need the exact number > of counters. You don't need precise duration to see what wait events are a problem. If you see that some event is samples a lot you know it's because there either are a *lot* of those wait events or the wait events are entered into for a long time. Greetings, Andres Freund
pgsql-hackers by date:
Previous
From: "Vitale, Anthony, Sony Music"Date:
Subject: RE: Question on any plans to use the User Server/User Mapping to provide Logical Replication Subscriptions the user/password in an encrypted manner
Next
From: Andrei LepikhovDate:
Subject: Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment