Re: retry shm attach for windows (WAS: Re: [HACKERS] OK, so culicidaeis *still* broken) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: retry shm attach for windows (WAS: Re: [HACKERS] OK, so culicidaeis *still* broken)
Date
Msg-id CAA4eK1+moFp1=nVCrJYdnoNM7ZHjzyhHNUB+g3Ha0KPBA=H63w@mail.gmail.com
Whole thread Raw
In response to Re: retry shm attach for windows (WAS: Re: [HACKERS] OK, so culicidaeis *still* broken)  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Wed, May 24, 2017 at 6:59 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, May 23, 2017 at 8:14 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> So it seems both you and Tom are leaning towards some sort of retry
>> mechanism for shm reattach on Windows.  I also think that is a viable
>> option to negate the impact of ASLR.  Attached patch does that.  Note
>> that, as I have mentioned above I think we need to do it for shm
>> reserve operation as well.  I think we need to decide how many retries
>> are sufficient before bailing out.  As of now, I have used 10 to have
>> some similarity with PGSharedMemoryCreate(), but we can choose some
>> different count as well.  One might say that we can have "number of
>> retries" as a guc parameter, but I am not sure about it, so not used.
>
> New GUCs can be backpatched if necessary, though this does not seem
> necessary. Who is going to set up that anyway if we have a limit high
> enough. 10 looks like a sufficient number to me.
>

Okay.

>> Another point to consider is that do we want the same retry mechanism
>> for EXEC_BACKEND builds (function
>> PGSharedMemoryReAttach is different on Windows and EXEC_BACKEND
>> builds).  I think it makes sense to have retry mechanism for
>> EXEC_BACKEND builds, so done that way in the patch.  Yet another point
>> which needs some thought is for reattach operation, before retrying do
>> we want to reserve the shm by using VirtualAllocEx?
>
> -       elog(FATAL, "could not reattach to shared memory (key=%p,
> addr=%p): error code %lu",
> +   {
> +       elog(LOG, "could not reattach to shared memory (key=%p,
> addr=%p): error code %lu",
>              UsedShmemSegID, UsedShmemSegAddr, GetLastError());
> +       return false;
> +   }
> This should be a WARNING, with the attempt number reported as well?
>

I think for retry we just want to log, why you want to display it as a
warning?  During startup, other similar places (where we continue
startup even after the call has failed) also use LOG (refer
PGSharedMemoryDetach), so why do differently here?   However, I think
adding retry_count should be okay.

> -void
> -PGSharedMemoryReAttach(void)
> +bool
> +PGSharedMemoryReAttach(int retry_count)
> I think that the loop logic should be kept within
> PGSharedMemoryReAttach, this makes the code of postmaster.c cleaner,
>

Sure, we can do that, but then we need to repeat the same looping
logic in both sysv and win32 case.  Now, if decide not to do for the
sysv case, then it might make sense to consider it doing it in
function PGSharedMemoryReAttach().

> and it seems to me that each step of PGSharedMemoryReAttach() should
> be retried in order. Do we need also to worry about SysV? I agree with
> you that having consistency is better, but I don't recall seeing
> failures or complains related to cygwin for ASLR.
>

I am also not aware of Cygwin failures, but I think keeping the code
same for the cases where we are not using fork mechanism seems like an
advisable approach.  Also, if someone is testing EXEC_BACKEND on Linux
then randomization is 'on' by default, so one can hit this issue
during tests which doesn't matter much, but it still seems better to
have retry logic to prevent the issue.

> I think that you are forgetting PGSharedMemoryCreate in the retry process.
>

No, we don't need retry for PGSharedMemoryCreate as we need this only
we are trying to attach to some pre-reserved shared memory.  Do you
have something else in mind?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] pg_upgrade translation
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] unreachable code in partition.c