Re: 64-bit wait_event and introduction of 32-bit wait_event_arg - Mailing list pgsql-hackers

From Jakub Wartak
Subject Re: 64-bit wait_event and introduction of 32-bit wait_event_arg
Date
Msg-id CAKZiRmxw1KwEPJZk8equXFyFweSt_X9hH59RdSAzpNROGEKG=w@mail.gmail.com
Whole thread
In response to Re: 64-bit wait_event and introduction of 32-bit wait_event_arg  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
Responses Re: 64-bit wait_event and introduction of 32-bit wait_event_arg
List pgsql-hackers
On Wed, Jan 14, 2026 at 9:56 AM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
>
> On Wed, Jan 14, 2026 at 9:38 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Fri, Jan 09, 2026 at 11:34:09AM +0100, Jakub Wartak wrote:
> > > On Tue, Dec 9, 2025 at 10:11 AM Jakub Wartak
> > > <jakub.wartak@enterprisedb.com> wrote:
> > > >
> > > > Hi Heikki, thanks for having a look!
> > > >
> > > > On Mon, Dec 8, 2025 at 11:12 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > > > >
> > > > > On 08/12/2025 11:54, Jakub Wartak wrote:
> > > > > > While thinking about cons, the only cons that I could think of is that
> > > > > > when we would be exposing something as 32-bits , then if the following
> > > > > > major release changes some internal structure/data type to be a bit
> > > > > > more heavy, it couldn't be exposed anymore like that  (think of e.g.
> > > > > > 64-bit OIDs?)
> > > > > >
> > > > > > Any help, opinions, ideas and code/co-authors are more than welcome.
> > > >
> > > > > Expanding it to 64 bit seems fine as far as performance is concerned. I
> > > > > think the difficult and laborious part is to design the facilities to
> > > > > make use of it.
> > > >
> > > > Right, I'm very interested in hearing what could be added there/what
> > > > people want (bonus points if that is causing some performance issues
> > > > today and we do not have the area covered and exposing that would fit
> > > > in 32-bits ;) )
> > > >
> > >
> > > OK, so v3 is attached. Changes in v3:
> >
> > Thanks for the new version!
> >
> > It looks like that it needs a rebase. Also, FWIW, a quick scan shows a few
> > numbers of "XXX" and elog calls commented out (that are probably used during
> > your own debugging?).
>
> Yes, indeed, that's intentional right now - it's more like a draft
> rather than something that should be polished.
>
> To be honest I would like to avoid sinking more time on it, if the
> sole idea gets shot down or there is opposition due e.g. to concerns
> of exposing 32-bit relfilenodes that way (see that  56-bit relfilenode
> idea).

Goodafter gentlemen,

I was considering marking this as Rejected/RwF and giving up due
RelFilesNodes could becoming > 32-bits which kinda goes against the
the main intention of this patch (showing involved relations involved
in some complex LWLock/ Multixact performance scenarios).

In offline discussions with Andres and Robert I've learned that:
1. there's  still room that RelFileNodes could become 56-bits one day
2. introducing another uint64 just for wait_events_arg is a no-go zone
   due to performance concerns.
3. exposing something like "relfilenode % (2^32)" is seem as hack and could
   cause issues (problems with interpretation/conflicts in future when
   RelFileNode would be bigger)

Anyway, today this WIP/PoC patchset gives:

postgres=# select type, substring(name, 1, 20) wait,
substring(waiteventarg_description,1,43) as desc from pg_get_wait_events()
where waiteventarg_description != '';
  type   |         wait         |                    desc
---------+----------------------+---------------------------------------------
 Buffer  | BufferCleanup        | Buffer# or UINT32_MAX for local(temporary)..
 Buffer  | BufferExclusive      | Buffer# or UINT32_MAX for local(temporary)..
 Buffer  | BufferShared         | Buffer# or UINT32_MAX for local(temporary)..
 Buffer  | BufferShareExclusive | Buffer# or UINT32_MAX for local(temporary)..
 IO      | SlruFlushSync        | SlruType: unknown(0), notify(1), clog(2), ..
 IO      | SlruRead             | SlruType: unknown(0), notify(1), clog(2), ..
 IO      | SlruSync             | SlruType: unknown(0), notify(1), clog(2), ..
 IO      | SlruWrite            | SlruType: unknown(0), notify(1), clog(2), ..
 IPC     | BufferIo             | Buffer# or UINT32_MAX for local(temporary)
 IPC     | RecoveryConflictTabl | tablespace Oid causing conflict.
 IPC     | SyncRep              | PID of the slowest walsender.
 Timeout | PgSleep              | how many seconds to sleep for.
 Timeout | SpinDelay            | Number of spinlock delays.

Summary of changes since previous version:

- Removed all refilnodeid references including
    ProcSleep()->WaitLatch(..PG_WAIT_LOCK | locktag_field2 );
  as we cannot take locktag_type_field2 (which maps to reloid, set by
  SET_LOCKTAG_RELATION)

- In pgstat_report_wait_end() change volatile direct set to zero with
  more proper:  pg_atomic_write_u64(..,0);

- separated patch for SyncRepWaitForLSN() as I have plenty of performance
  concerns there (with abnormally high max_wal_senders). I could reduce those
  spinlocks happen not more often than every N iterations as today
there is a full scan
  under spinlocks every time the latch is reset, but how often to do this
  scan then?

- added exposing Buffer# (one can lookup relation via pg_buffercache),
  idea by Andres, it seems to work (simulated with fetching from cursor):

    pid   | type   |  wait_event  | wait_event_arg | state  | query
  --------+--------+--------------+----------------+--------+----------------
   250556 | Buffer  BufferCleanup |            225 | active | VACUUM (FREEZE)..

   postgres=# select
        pg_filenode_relation(0, relfilenode)::regclass,
        pinning_backends
    from pg_buffercache where bufferid = 225;

   pg_filenode_relation | pinning_backends
  ----------------------+-----------------
   pin_test             |                2

- added exposing Timeout/SpinDelay, not sure if that would be helpful

What's left:
- Earlier Heikki raised the question "Wait events can be defined in extensions;
  how does an extension plug into this facility?" - that's still unanswered.
  I think they could just OR 32-bit value themselves, but maybe we could
  just provide a way to plug into pg_get_wait_events().waiteventarg_description?
- docs
- of course it could be extended with some reporting if one finds further
  ideas

-J.

Attachment

pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: Allow ON CONFLICT DO UPDATE to return EXCLUDED values
Next
From: "Matheus Alcantara"
Date:
Subject: Re: Add CREATE SCHEMA ... LIKE support