Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> I am seeing Postgres 8.3.7 running as a service on Windows Server 2003
>> repeatedly fail to restart after a backend crash because of the
>> following code in port/win32_shmem.c:
>
> On further review, I see an entirely different explanation for possible
> failures of that code.
>
> It says here:
> http://msdn.microsoft.com/en-us/library/ms885627.aspx
FWIW, this is the Windows CE documentation. The one for win32 is at:
http://msdn.microsoft.com/en-us/library/ms679360(VS.85).aspx
> that GetLastError() continues to return the same error code until
> someone calls SetLastError() to change it. It further says that
> only a few operating system functions call SetLastError(0) on success,
> and that it is explicitly documented whenever a function does so.
> I see no such statement for CreateFileMapping:
> http://msdn.microsoft.com/en-us/library/aa366537(VS.85).aspx
>
> This leads me to conclude that after a successful creation,
> GetLastError will return whatever the errno previously was,
> meaning that you cannot reliably distinguish creation from non
> creation unless you do SetLastError(0) beforehand. Which we don't.
>
> Now this would only explain problems if there were some code path
> through the postmaster that could leave the errno set to
> ERROR_ALREADY_EXISTS (a/k/a EEXIST) when this code is reached. I'm not
> sure there is one, and I have even less of a theory as to why system
> load might make it more probable to happen. Still, this looks like a
> bug from here, and repeating the create call won't fix it.
The ref page for CreateFileMapping you linked has:
"If the object exists before the function call, the function returns a
handle to the existing object (with its current size, not the specified
size), and GetLastError returns ERROR_ALREADY_EXISTS. "
I think that qualifies as it documenting that it's setting the return
value, no? That would never work if it isn't set to something other than
ERROR_ALREADY_EXISTS (probably zero) when it *didn't* already exist.
The quick try would be to stick a SetLastError(0) in there, just to be
sure... Could be worth a try?
Andrew, just to confirm: you've found a case where this happens
*repeatably*? That's what we've failed to do before - it's happened now
and then, but never during testing...
//Magnus