Home > mailing lists

Re: Latch implementation that wakes on postmaster death on both win32 and Unix - Mailing list pgsql-hackers

From	Florian Pflug
Subject	Re: Latch implementation that wakes on postmaster death on both win32 and Unix
Date	July 8, 2011 14:58:19
Msg-id	1D4E5807-8E49-4441-9E80-210E71A91501@phlo.org Whole thread Raw
In response to	Re: Latch implementation that wakes on postmaster death on both win32 and Unix (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses	Re: Latch implementation that wakes on postmaster death on both win32 and Unix
List	pgsql-hackers

Tree view

On Jul8, 2011, at 14:40 , Heikki Linnakangas wrote:
> On 08.07.2011 13:58, Florian Pflug wrote:
>> On Jul8, 2011, at 11:57 , Peter Geoghegan wrote:
>>> On 7 July 2011 19:15, Robert Haas<robertmhaas@gmail.com>  wrote:
>>>>> I'm not concerned about the possibility of spurious extra cycles of
>>>>> auxiliary process event loops - should I be?
>>>>
>>>> A tight loop would be bad, but an occasional spurious wake-up seems harmless.
>>>
>>> We should also assert !PostmasterIsAlive() from within the latch code
>>> after waking due to apparent Postmaster death. The reason that I don't
>>> want to follow Florian's suggestion to check it in production is that
>>> I don't know what to do if the postmaster turns out to be alive. Why
>>> is it more reasonable to try again than to just return?
>>
>> I'd say return, but don't indicate postmaster death in the return value
>> if PostmasterIsAlive() returns true. Or don't call PostmasterIsAlive() in
>> WaitLatch(), and return indicating postmaster death whenever select()
>> says so, and put the burden of re-checking on the callers.
>
> I put the burden on the callers. Removing the return value from WaitLatch() altogether just makes life unnecessarily
difficultfor callers that could safely use that information, even if you sometimes get spurious wakeups. In particular,
thecoding in pgarch.c is nicer if you can simply check the return code for WL_TIMEOUT, rather than call time(NULL) to
figureout if the timeout was reached. 
>
> Attached is a new version of this patch. PostmasterIsAlive() now uses read() on the pipe instead of kill().

I did notice a few (very minor) loose ends...

SyncRepWaitForLSN() says /*  * Wait on latch for up to 60 seconds. This allows us to check for  * postmaster death
regularlywhile waiting. Note that timeout here  * does not necessarily release from loop.  */
WaitLatch(&MyProc->waitLatch,60000000L); 

I guess that 60-second timeout is unnecessary now that we'll wake up
on postmaster death anyway.

Also, none of the callers of WaitLatch() seems to actually inspect
the WL_POSTMASTER_DEATH bit of the result. We might want to make
their !PostmasterIsAlive() check conditional on the WL_POSTMASTER_DEATH
bit being set. At least in the case of SyncRepWaitForLSN(), it seems
that avoiding the extra read() syscall might be beneficial.

Maybe these cleanups would better be done in a separate patch, though...

best regards,
Florian Pflug

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 08 July 2011, 14:57:18
Subject: Re: Re: [COMMITTERS] pgsql: Adjust OLDSERXID_MAX_PAGE based on BLCKSZ.

From: "Kevin Grittner"
Date: 08 July 2011, 15:17:06
Subject: Re: Re: [COMMITTERS] pgsql: Adjust OLDSERXID_MAX_PAGE based on BLCKSZ.

Re: Latch implementation that wakes on postmaster death on both win32 and Unix - Mailing list pgsql-hackers

Previous

Next