Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)
Date
Msg-id 2cc94e71-af66-e510-5c51-cb254f260264@iki.fi
Whole thread Raw
In response to PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On 17/09/2020 12:48, Thomas Munro wrote:
> Hello,
> 
> In commits 9f095299 and f98b8476 we improved recovery performance on
> Linux and FreeBSD but we didn't help other operating systems.  David
> just confirmed for me that commenting out the PostmasterIsAlive() call
> in the main recovery loop speeds up crash recovery considerably on his
> Windows system: 93s -> 70s or 1.32x faster.

Nice speedup!

> So I think we should do
> something like what Heikki originally proposed to lower the frequency
> of checks, on systems where we don't have the ability to skip the
> check completely.  Please see attached.

If you put the counter in HandleStartupProcInterrupts(), it could be a 
long wait if the startup process is e.g. waiting for WAL to arrive in 
the loop in WaitForWALToBecomeAvailable(), or in recoveryPausesHere(). 
My original patch only reduced the frequency in the WAL redo loop, when 
you're actively replaying records.

We could probably do better on Windows. Maybe the signal handler thread 
could wait on the PostmasterHandle at the same time that it waits on the 
signal pipe, and set postmaster_possibly_dead. But I'm not going to work 
on that, and it would only help on Windows, so I'm OK with just adding 
the counter.

- Heikki



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Allow CURRENT_ROLE in GRANTED BY
Next
From: Thomas Munro
Date:
Subject: Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)