Re: 64-bit wait_event and introduction of 32-bit wait_event_arg - Mailing list pgsql-hackers
| From | Jakub Wartak |
|---|---|
| Subject | Re: 64-bit wait_event and introduction of 32-bit wait_event_arg |
| Date | |
| Msg-id | CAKZiRmxeci4QypgYrZbjWqqGZN1+6Ozz+53jPQ4vNP8gGh4aQg@mail.gmail.com Whole thread Raw |
| In response to | Re: 64-bit wait_event and introduction of 32-bit wait_event_arg (Heikki Linnakangas <hlinnaka@iki.fi>) |
| List | pgsql-hackers |
Hi Heikki, thanks for having a look! On Mon, Dec 8, 2025 at 11:12 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 08/12/2025 11:54, Jakub Wartak wrote: > > While thinking about cons, the only cons that I could think of is that > > when we would be exposing something as 32-bits , then if the following > > major release changes some internal structure/data type to be a bit > > more heavy, it couldn't be exposed anymore like that (think of e.g. > > 64-bit OIDs?) > > > > Any help, opinions, ideas and code/co-authors are more than welcome. > Expanding it to 64 bit seems fine as far as performance is concerned. I > think the difficult and laborious part is to design the facilities to > make use of it. Right, I'm very interested in hearing what could be added there/what people want (bonus points if that is causing some performance issues today and we do not have the area covered and exposing that would fit in 32-bits ;) ) > For example, if you encode an table OID in it, how do > you interpret that when you're looking at pg_stat_activity? A new > pg_explain_wait_event(bigint waitevent) that returns a text > representation of the event perhaps? Well I was thinking initially just about leaving it as that (bigint), and the interpretation would have to be provided by the operator himself (based on docs) - not yet part of patch, because I still don't know if the idea is worth developing further. Technically the wait_event_arg value sometimes is going to be some OID, sometimes pid (like in SyncRep case), most often probably it could be reason_code (of the wait), sometimes maybe even some hash of something to make it fit? So yeah I think we could. I like the idea of having pg_explain_wait_event_argument(bigint)::text built-in that could add some additional hint to what the argument really shows without looking at the docs. Question what it should return, simple ::text like 'reason'/'pid'/'OID' or something more descriptive in English and wouldn't English only output be a problem for translators? The alternative would be just to have a table inside docs (for a start?) to explain the meaning. In practice you would hunt for specific wait_event or have some big CASE WHEN/ELSE IF big SQL query to interpret the values properly. > Wait events can be defined in extensions; how does an extension plug into this facility? I have not given extensions a lot of thought or coverage yet, but the answer is probably like: well, they don't seem to plug heavily into this, but I think one could in extension just use WaitEventExtensionNew() / pgstat_report_wait_start() as usual and later logically OR some 32-bit number, however the interpretation of the wait_event_arg would have to be provided by the extension itself (via docs) I guess. Would that approach be acceptable?, or Were You having some other idea? Maybe with Your idea of having pg_explain_wait_event_argument(), then we would have to alter to WaitEventExtensionNew(const char *wait_event_name) and add something like 'const char *wait_event_arg_description' there? > Inevitably, the extra 32 bits won't be enough to expose everything that > you might want to expose. Should we already think about what to do then? Well I wanted to stick to exposing only stuff that will _always_ fit 32-bits. If additional/more detailed instrumentation would be necessary then separate monitoring/observability/variables/subsystem probably should be built for that specific use case. So if that information can become over 32-bit, it should not be encoded into wait_event_arg, just to avoid debating performance regressions for any other additional wait-event infrastructure. I simply do not want to open a can of worms: see Bertrand tried that in [1], but I don't want this $thread to follow that route where Andres and Robert expressed their concerns earlier. E.g. one of the key questions is that I'm somehow lost if we would like to continue the earlier 56-bit [2] / 64-bit OID/RelFileNode attempt(s). If the project wants to continue with that, then probably we couldn't express ::relation id as 32-bit wait_event_arg or maybe I am missing something. (ofc, we could hash potential 64-bit OID back into 32-bit OID one day, but it sounds like a hack, doesn't it?) > For lock waits, for example, should we have another array in shared > memory with more details, and just store an offset into that array in > the extra wait event bits, for example? (we already have pg_locks, but > let's imagine we didn't. How would you design it in a green field scenario? If we didn't have pg_locks, I would probably stick with encoding the mode, maybe mode|granted|fastpath (assuming OIDs are no-go). Some brainstorming and other crazy(?) ideas how we could expose some intrinsic PG behavior: - writing while reading (AKA setting hint bits) - could be exposed as reason_code for write-like wait events? (e.g. for IO/WALWrite we could encode reason_code?) - same as above (hint bits), but for CLOG/SLRU but also for others? Maybe we could expose what SLRU exactly we are reading/writing IO/SLRU_READ|WRITE waits and encodes further some "reason" there too? - still for IO/WALWrite, we could also add another reason_code bit meaning: are we writing full FPI or not? (that would it make wait_event_arg for IO/WALWrite a bitmap: e.g. writing_FPI | writing_hintbits) -J. [1] - https://www.postgresql.org/message-id/lt6n664ijbmfftnuv3bgvt47q7kjz4tflu4kg3ingv6njjtvld%40kesknxnidemo [2] - https://www.postgresql.org/message-id/flat/CA%2BTgmobM5FN5x0u3tSpoNvk_TZPFCdbcHxsXCoY1ytn1dXROvg%40mail.gmail.com#1070c79256f2330ec52f063cdbe2add0
pgsql-hackers by date: