Re: Latch implementation that wakes on postmaster death on both win32 and Unix - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: Latch implementation that wakes on postmaster death on both win32 and Unix
Date
Msg-id 1D4E5807-8E49-4441-9E80-210E71A91501@phlo.org
Whole thread Raw
In response to Re: Latch implementation that wakes on postmaster death on both win32 and Unix  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Latch implementation that wakes on postmaster death on both win32 and Unix
List pgsql-hackers
On Jul8, 2011, at 14:40 , Heikki Linnakangas wrote:
> On 08.07.2011 13:58, Florian Pflug wrote:
>> On Jul8, 2011, at 11:57 , Peter Geoghegan wrote:
>>> On 7 July 2011 19:15, Robert Haas<robertmhaas@gmail.com>  wrote:
>>>>> I'm not concerned about the possibility of spurious extra cycles of
>>>>> auxiliary process event loops - should I be?
>>>>
>>>> A tight loop would be bad, but an occasional spurious wake-up seems harmless.
>>>
>>> We should also assert !PostmasterIsAlive() from within the latch code
>>> after waking due to apparent Postmaster death. The reason that I don't
>>> want to follow Florian's suggestion to check it in production is that
>>> I don't know what to do if the postmaster turns out to be alive. Why
>>> is it more reasonable to try again than to just return?
>>
>> I'd say return, but don't indicate postmaster death in the return value
>> if PostmasterIsAlive() returns true. Or don't call PostmasterIsAlive() in
>> WaitLatch(), and return indicating postmaster death whenever select()
>> says so, and put the burden of re-checking on the callers.
>
> I put the burden on the callers. Removing the return value from WaitLatch() altogether just makes life unnecessarily
difficultfor callers that could safely use that information, even if you sometimes get spurious wakeups. In particular,
thecoding in pgarch.c is nicer if you can simply check the return code for WL_TIMEOUT, rather than call time(NULL) to
figureout if the timeout was reached. 
>
> Attached is a new version of this patch. PostmasterIsAlive() now uses read() on the pipe instead of kill().

I did notice a few (very minor) loose ends...

SyncRepWaitForLSN() says /*  * Wait on latch for up to 60 seconds. This allows us to check for  * postmaster death
regularlywhile waiting. Note that timeout here  * does not necessarily release from loop.  */
WaitLatch(&MyProc->waitLatch,60000000L); 

I guess that 60-second timeout is unnecessary now that we'll wake up
on postmaster death anyway.

Also, none of the callers of WaitLatch() seems to actually inspect
the WL_POSTMASTER_DEATH bit of the result. We might want to make
their !PostmasterIsAlive() check conditional on the WL_POSTMASTER_DEATH
bit being set. At least in the case of SyncRepWaitForLSN(), it seems
that avoiding the extra read() syscall might be beneficial.

Maybe these cleanups would better be done in a separate patch, though...

best regards,
Florian Pflug



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Adjust OLDSERXID_MAX_PAGE based on BLCKSZ.
Next
From: "Kevin Grittner"
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Adjust OLDSERXID_MAX_PAGE based on BLCKSZ.