Re: 64-bit wait_event and introduction of 32-bit wait_event_arg - Mailing list pgsql-hackers

From Jakub Wartak
Subject Re: 64-bit wait_event and introduction of 32-bit wait_event_arg
Date
Msg-id CAKZiRmyZzmOODYS6n8mns9zN4RcS3o9kfrdQDyeRupqaGp9PmQ@mail.gmail.com
Whole thread Raw
In response to Re: 64-bit wait_event and introduction of 32-bit wait_event_arg  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
List pgsql-hackers
On Tue, Dec 9, 2025 at 10:11 AM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
>
> Hi Heikki, thanks for having a look!
>
> On Mon, Dec 8, 2025 at 11:12 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >
> > On 08/12/2025 11:54, Jakub Wartak wrote:
> > > While thinking about cons, the only cons that I could think of is that
> > > when we would be exposing something as 32-bits , then if the following
> > > major release changes some internal structure/data type to be a bit
> > > more heavy, it couldn't be exposed anymore like that  (think of e.g.
> > > 64-bit OIDs?)
> > >
> > > Any help, opinions, ideas and code/co-authors are more than welcome.
>
> > Expanding it to 64 bit seems fine as far as performance is concerned. I
> > think the difficult and laborious part is to design the facilities to
> > make use of it.
>
> Right, I'm very interested in hearing what could be added there/what
> people want (bonus points if that is causing some performance issues
> today and we do not have the area covered and exposing that would fit
> in 32-bits ;) )
>

OK, so v3 is attached. Changes in v3:
- added proper RelFileNumber as wait_event_arg for DataFileRead/Write/etc
  waits instead of simply using "filedescriptor" as wait_event_arg
- cfbot complained hard on win32 due to lack of support of uint64 for enums
  ("warning C4309: 'initializing': truncation of constant value"), so i've
  tried two ways how enum can be forced into 64-bit ints instead of just
  default (32-bit int). However none of the tricks seem to help the MSVC case:
    a) `typedef enum : uint64_t` causes ""error C2332: 'enum': missing tag name"
    b) putting `PG_WAIT_ACTIVITY_MAX = 0xFFFFFFFFFFFFFFFFULL` at the end of
       enum also doesnt work
  so I had to get rid of enum{} and stick to #defines to make cfbot happy there

- pass RelFileNumber/tablespaceId as wait_event_arg for recovery conflict waits
  (earlier you would get that information only from log, but here we pinpoint
  exact RelFileNumber for which startup is waiting), e.g. use case demo, we run
  some long analytical query on standby (while read/write pgbench is
hitting hard
  primary and we run without hot_standby_feedback):

    s1) "SELECT count(*) FROM pgbench_accounts a CROSS JOIN pgbench_accounts b;"

    s2) we immediately can see query wait_event_arg and it shows
recovery being stuck
        on the specific relationId:
      pid  | backend_type | type |        wait_event        | wait_event_arg
    -------+--------------+------+--------------------------+----------------
     68824 | startup      | IPC  | RecoveryConflictSnapshot |          16427

    postgres=# select relname from pg_class where relfilenode = 16427;
        relname
    ------------------
    pgbench_branches

    s1) after some time (max_standby_streaming_delay) we get:
        ERROR:  canceling statement due to conflict with recovery


- added description of wait_event_arg to wait event infrastructure
  (pg_wait_events view and docs)

- if there's high I/O on SLRU we can get data from pg_stat_slru,
however previously
  one couldn't exactly pinpoint which exact SLRU type affects which backend,
  so I've thought I've add class of Slru to IO/SLRU{Read,Write} as
  wait_event_arg to make it easier on multitenant DBs, e.g. it shows:

   pid  |         query                    | type | wait_event | wait_event_arg
  ------+----------------------------------+------+------------+----------------
  57400 | update locations set loc_name .. | IO   | SlruRead   |              5
  57605 | INSERT INTO users (loc_id, fna.. | IO   | SlruRead   |              6
  (2 rows)

  postgres=# select waiteventarg_description from pg_wait_events where
name='SlruRead';
                                waiteventarg_description
  ---------------------------------------------------------------------------------------
  SlruType: unknown(0), [..] multixactoffset (5), multixactmembers(6),
serialializable(7)

  -- \d will show FK (so we connect the dots with less ambiguity about
FK <-> multixacts):
  postgres=# \d+ users
  [..]
  Foreign-key constraints:
      "fk1" FOREIGN KEY (loc_id) REFERENCES locations(loc_id)

  postgres=# \d+ locations
  [..]
  Referenced by:
      TABLE "users" CONSTRAINT "fk1" FOREIGN KEY (loc_id) REFERENCES
locations(loc_id)


> > For example, if you encode an table OID in it, how do
> > you interpret that when you're looking at pg_stat_activity? A new
> > pg_explain_wait_event(bigint waitevent) that returns a text
> > representation of the event perhaps?
>
> Well I was thinking initially[..irrelevant, so snipped out]

Right, so v3 has built-in self-description of wait_event_arg in
pg_wait_events (and also docs also contain such details too)

[..]

> > Inevitably, the extra 32 bits won't be enough to expose everything that
> > you might want to expose. Should we already think about what to do then?
>
> Well I wanted to stick to exposing only stuff that will _always_ fit
> 32-bits. If additional/more detailed instrumentation would be
> necessary then separate monitoring/observability/variables/subsystem
> probably should be built for that specific use case. So if that
> information can become over 32-bit, it should not be encoded into
> wait_event_arg, just to avoid debating performance regressions for any
> other additional wait-event infrastructure. I simply do not want to
> open a can of worms: see Bertrand tried that in [1], but I don't want
> this $thread to follow that route where Andres and Robert expressed
> their concerns earlier.  E.g. one of the key questions is that I'm
> somehow lost if we would like to continue the earlier 56-bit [2] /
> 64-bit OID/RelFileNode attempt(s). If the project wants to continue
> with that, then probably we couldn't express ::relation id as 32-bit
> wait_event_arg or maybe I am missing something. (ofc, we could hash
> potential 64-bit OID back into 32-bit OID one day, but it sounds like
> a hack, doesn't it?)
>

Questions:

1. Question about 56-bit relfilenode idea [1] (05d4cbf9b6ba, reverted by
   a448e49bcbe): can I assume that it is dead in the water and can I assume
   that >> 33-bits RelFileNode is not going to happen?
   (if my 64-bit wait_events with 32-bits for wait_events_args use
   RelFileNode -- that makes it incompatible)

2. Please ignore the 0002 quality (multixact), but I would grateful for feedback
   on is such extending MultiXact routines (to contain RelFileNumber) ok or
   not ok? And if not , what would be a better way to pass through
such information?

-J.

[1] - https://www.postgresql.org/message-id/CA+TgmobM5FN5x0u3tSpoNvk_TZPFCdbcHxsXCoY1ytn1dXROvg@mail.gmail.com

Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: file_fdw: Support multi-line HEADER option.
Next
From: Peter Eisentraut
Date:
Subject: Re: Decouple C++ support in Meson's PGXS from LLVM enablement