Re: Adding wait events statistics - Mailing list pgsql-hackers
| From | Andres Freund | 
|---|---|
| Subject | Re: Adding wait events statistics | 
| Date | |
| Msg-id | lt6n664ijbmfftnuv3bgvt47q7kjz4tflu4kg3ingv6njjtvld@kesknxnidemo Whole thread Raw | 
| In response to | Adding wait events statistics (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) | 
| Responses | Re: Adding wait events statistics | 
| List | pgsql-hackers | 
Hi, On 2025-06-30 13:36:12 +0000, Bertrand Drouvot wrote: > Wait events are useful to know what backends are waiting for when there is/was > a performance issue: for this we can sample pg_stat_activity at regular intervals > and record historical data. That’s how it is commonly used. > > It could also be useful to observe the engine/backends behavior over time and > help answer questions like: > > * Is the engine’s wait pattern the same over time? > * Is application’s "A" wait pattern the same over time? > * I observe a peak in wait event "W": is it because "W" is now waiting longer or > is it because it is hit more frequently? > * I observe a peak in some of them (say for example MultiXact%), is it due to a > workload change? > > For the above use cases, we need a way to track the wait events that occur between > samples: please find attached a proof of concept patch series doing so. > > > 0002 - It adds wait events statistics > > It adds a new stat kind PGSTAT_KIND_WAIT_EVENT for the wait event statistics. > > This new statistic kind is a fixed one because we know the maximum number of wait > events. Indeed: > > * it does not take into account custom wait events as extensions have all they need > at their disposal to create custom stats on their own wait events should they > want to (limited by WAIT_EVENT_CUSTOM_HASH_MAX_SIZE though). > > * it does not take into account LWLock > LWTRANCHE_FIRST_USER_DEFINED for the same > reasons as above. That said, there is no maximum limitation in LWLockNewTrancheId(). > > * we don't want to allocate memory in some places where the counters are > incremented (see 4feba03d8b9). We could still use the same implementation as for > backend statistics (i.e, make use of flush_static_cb) if we really want/need to > switch to variable stats. Even without allocating memory, this makes wait events considerably more expensive. Going from ~4 instructions to something more like ~40. Including external function calls. Today: start wait: 0x0000000000a08e8f <+63>: lea 0x735662(%rip),%r12 # 0x113e4f8 <my_wait_event_info> 0x0000000000a08e96 <+70>: mov (%r12),%rax 0x0000000000a08e9a <+74>: movl $0x9000006,(%rax) < do wait operation > end wait: 0x0000000000a08eaa <+90>: mov (%r12),%rax After: start wait: 0x0000000000a0b39f <+63>: lea 0x73673a(%rip),%r12 # 0x1141ae0 <my_wait_event_info> 0x0000000000a0b3a6 <+70>: mov (%r12),%rax 0x0000000000a0b3aa <+74>: movl $0x9000006,(%rax) < do wait operation > end wait: 0x0000000000a0b3ba <+90>: mov (%r12),%rax 0x0000000000a0b3be <+94>: mov (%rax),%edi 0x0000000000a0b3c0 <+96>: rex call 0xa452e0 <waitEventIncrementCounter> 0x0000000000a0b3c6 <+102>: mov (%r12),%rax Where waitEventIncrementCounter ends up as: 0x0000000000a452e0 <+0>: lea 0x6fc7f9(%rip),%rax # 0x1141ae0 <my_wait_event_info> 0x0000000000a452e7 <+7>: mov (%rax),%rax 0x0000000000a452ea <+10>: mov (%rax),%eax 0x0000000000a452ec <+12>: mov %eax,%edx 0x0000000000a452ee <+14>: and $0xff000000,%edx 0x0000000000a452f4 <+20>: jne 0xa45300 <waitEventIncrementCounter+32> 0x0000000000a452f6 <+22>: test %ax,%ax 0x0000000000a452f9 <+25>: jne 0xa45300 <waitEventIncrementCounter+32> 0x0000000000a452fb <+27>: ret 0x0000000000a452fc <+28>: nopl 0x0(%rax) 0x0000000000a45300 <+32>: cmp $0x1000000,%edx 0x0000000000a45306 <+38>: jne 0xa4530e <waitEventIncrementCounter+46> 0x0000000000a45308 <+40>: cmp $0x5e,%ax 0x0000000000a4530c <+44>: ja 0xa452fb <waitEventIncrementCounter+27> 0x0000000000a4530e <+46>: cmp $0x7000000,%edx 0x0000000000a45314 <+52>: sete %cl 0x0000000000a45317 <+55>: test %ax,%ax 0x0000000000a4531a <+58>: setne %al 0x0000000000a4531d <+61>: test %al,%cl 0x0000000000a4531f <+63>: jne 0xa452fb <waitEventIncrementCounter+27> 0x0000000000a45321 <+65>: cmp $0xb000000,%edx 0x0000000000a45327 <+71>: je 0xa452fb <waitEventIncrementCounter+27> 0x0000000000a45329 <+73>: mov %edi,%eax 0x0000000000a4532b <+75>: movzwl %di,%edi 0x0000000000a4532e <+78>: lea 0x714e4b(%rip),%rdx # 0x115a180 <PendingWaitEventStats> 0x0000000000a45335 <+85>: shr $0x18,%eax 0x0000000000a45338 <+88>: shl $0x5,%rax 0x0000000000a4533c <+92>: add 0x4e1735(%rip),%rax # 0xf26a78 0x0000000000a45343 <+99>: movslq 0x8(%rax),%rax 0x0000000000a45347 <+103>: add %rdi,%rax 0x0000000000a4534a <+106>: incq (%rdx,%rax,8) 0x0000000000a4534e <+110>: lea 0x714e0b(%rip),%rax # 0x115a160 <have_wait_event_stats> 0x0000000000a45355 <+117>: movb $0x1,(%rax) 0x0000000000a45358 <+120>: ret That's about an order of magnitude increase in overhead, just based on the number of instructions, and probably considerably xmore in runtime (you're adding external function calls etc). I think to make this viable, we would have to remove some wait events and be more careful about adding them. And optimize this code a good bit. This really shouldn't end up with this mouch code. Greetings, Andres Freund
pgsql-hackers by date: