Re: Can a child process detect postmaster death when in pg_usleep? - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Can a child process detect postmaster death when in pg_usleep?
Date
Msg-id CALj2ACX=pWymzbHOV2RHLWAzZ_a8Pv894Nj-miM4GNwv25WHvg@mail.gmail.com
Whole thread Raw
In response to Re: Can a child process detect postmaster death when in pg_usleep?  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Can a child process detect postmaster death when in pg_usleep?
List pgsql-hackers
On Thu, Apr 15, 2021 at 5:28 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Thanks a lot for the detailed explanation.

> On Thu, Apr 15, 2021 at 2:06 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > 1) Is it really harmful to use pg_usleep in a postmaster child process
> > as it doesn't let the child process detect postmaster death?
>
> Yeah, that's a bad idea.  Any long-term waiting (including short waits
> in a loop) should ideally be done with the latch infrastructure.

Agree. Along with short waits in a loop, I think we also should
replace pg_usleep with WaitLatch that has a user configurable
parameter like below:

pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);
pg_usleep(PostAuthDelay * 1000000L);
pg_usleep(CommitDelay);

> 4) Are there any places where we need to replace pg_usleep with
> > WaitLatch/equivalent of pg_sleep to detect the postmaster death
> > properly?
>
> We definitely have replaced a lot of sleeps with latch.c primitives
> over the past few years, since we got WL_EXIT_ON_PM_DEATH and
> condition variables.  There may be many more to improve...  You
> mentioned autovacuum... yeah, Stephen fixed one of these with commit
> 4753ef37, but yeah it's not great to have those others in there...

I have not looked at the commit 4753ef37 previously, but it
essentially addresses the problem with pg_usleep for vacuum delay. I'm
thinking we can also replace pg_usleep in below places based on the
fact that pg_usleep should be avoided in 1) short waits in a loop 2)
when wait time is dependent on user configurable parameters. And using
WaitLatch may require us to add wait event types to WaitEventTimeout
enum, but that's okay.

1) pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L); in lazy_truncate_heap
2) pg_usleep(CommitDelay); in XLogFlush
3) pg_usleep(10000L); in CreateCheckPoint
4) pg_usleep(1000000L); in do_pg_stop_backup
5) pg_usleep(1000L); in read_local_xlog_page
6) pg_usleep(PostAuthDelay * 1000000L); in AutoVacLauncherMain,
AutoVacWorkerMain, StartBackgroundWorker, InitPostgres
7) pg_usleep(100000L); in RequestCheckpoint
8) pg_usleep(1000000L); in pgarch_ArchiverCopyLoop
9) pg_usleep(PGSTAT_RETRY_DELAY * 1000L); in backend_read_statsfile
10) pg_usleep(PreAuthDelay * 1000000L); in BackendInitialize
11) pg_usleep(10000L); in WalSndWaitStopping
12) pg_usleep(standbyWait_us); in WaitExceedsMaxStandbyDelay
13) pg_usleep(10000L); in RegisterSyncRequest

I'm sure we won't be changing in all of the above places. It will be
good to review and correct the above list.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: File truncation within PostgresNode::issues_sql_like() wrong on Windows
Next
From: Amit Kapila
Date:
Subject: Re: Replication slot stats misgivings