Thread: [HACKERS] [bug fix] postgres.exe crashes with access violation on Windowswhile starting up

[HACKERS] [bug fix] postgres.exe crashes with access violation on Windowswhile starting up

From
"Tsunakawa, Takayuki"
Date:
Hello,

We encountered a rare and hard-to-investigate problem on Windows, which one of our customers reported.  Please find the
attachedpatch to fix that.  I'll add this to the next CF.
 


PROBLEM
==============================

PostgreSQL sometimes crashes with the following messages.  This is infrequent (but frequent for the customer); it
occurredabout 10 times in the past 5 months.
 

LOG:  server process (PID 2712) was terminated by exception 0xC0000005
HINT:  See C include file "ntstatus.h" for a description of the hexadecimal value.
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory.
 
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
LOG:  all server processes terminated; reinitializing

"server process" shows that an client backend crashed.  The above messages indicate that the process was not running an
SQLcommand.
 

PostgreSQL runs as a Windows service.

No crash dump was produced anywhere, despite the facts:
- <PGDATA>/crashdumps folder exists and is writable by the PostgreSQL user account (which is the user postgres.exe runs
as)
- The Windows registry configuration allows dumping the crash dump


CAUSE
==============================

We believe WSAStartup() in main.c failed.  The only conceivable error is:

WSAEPROCLIM
10067
Too many processes.
A Windows Sockets implementation may have a limit on the number of applications that can use it simultaneously.
WSAStartupmay fail with this error if the limit has been reached.
 

But I couldn't find what the limit is and whether we can tune it.  We couldn't reproduce the problem.

When I pretend that WSAStartup() failed while a client backend is starting up, I could see the same phenomenon as the
customer. This problem only occurs when PostgreSQL runs as a Windows service.
 

The bug is in write_eventlog().  It calls pgwin32_message_to_utf16() which in turn calls palloc(), which requires the
memorymanagement system to be set up (CurrentMemoryContext != NULL).
 


FIX
==============================

Add the check "CurrentMemoryContext != NULL" in write_eventlog() as in write_console().


NOTE
==============================

The reason is for not outputing the crash dump is a) the crash occurred before installing the Windows exception handler
(pgwin32_install_crashdump_handler()call) and b) the effect of the following call in postmaster is inherited in the
childprocess.
 

        /* In case of general protection fault, don't show GUI popup box */
        SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);

But I'm not sure in what order we should do pgwin32_install_crashdump_handler(), startup_hacks() and steps therein,
MemoryContextInit(). I think that's another patch.
 

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tsunakawa,
> Takayuki
> The reason is for not outputing the crash dump is a) the crash occurred
> before installing the Windows exception handler
> (pgwin32_install_crashdump_handler() call) and b) the effect of the
> following call in postmaster is inherited in the child process.
> 
>         /* In case of general protection fault, don't show GUI popup
> box */
>         SetErrorMode(SEM_FAILCRITICALERRORS |
> SEM_NOGPFAULTERRORBOX);
> 
> But I'm not sure in what order we should do
> pgwin32_install_crashdump_handler(), startup_hacks() and steps therein,
> MemoryContextInit().  I think that's another patch.

Just installing the handler at the beginning of main() seems fine.  Patch attached.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Thu, Oct 26, 2017 at 7:10 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> FIX
> ==============================
>
> Add the check "CurrentMemoryContext != NULL" in write_eventlog() as in write_console().
        * Also verify that we are not on our way into error recursion
trouble due        * to error messages thrown deep inside pgwin32_message_to_UTF16().        */       if
(!in_error_recursion_trouble()&&
 
+               CurrentMemoryContext != NULL &&               GetMessageEncoding() != GetACPEncoding())       {
So you are basically ready to lose any message that could be pushed
here if there is no memory context? That does not sound like a good
trade-off to me. A static buffer does not look like the best idea
either to not truncate message, so couldn't we envisage to just use
malloc? pgwin32_message_to_UTF16() is called in two places in elog.c,
and there is a full control on the error code paths.

> NOTE
> ==============================
>
> The reason is for not outputing the crash dump is a) the crash occurred before installing the Windows exception
handler(pgwin32_install_crashdump_handler() call) and b) the effect of the following call in postmaster is inherited in
thechild process.
 
>
>                 /* In case of general protection fault, don't show GUI popup box */
>                 SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);
>
> But I'm not sure in what order we should do pgwin32_install_crashdump_handler(), startup_hacks() and steps therein,
MemoryContextInit(). I think that's another patch.
 

Perhaps. I don't have a final opinion on this matter.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
> So you are basically ready to lose any message that could be pushed
> here if there is no memory context? That does not sound like a good
> trade-off to me. A static buffer does not look like the best idea
> either to not truncate message, so couldn't we envisage to just use
> malloc? pgwin32_message_to_UTF16() is called in two places in elog.c,
> and there is a full control on the error code paths.

Thank you for reviewing a rare bug fix on Windows that most people wouldn't be interested in. When CurrentMemoryContext
isNULL, the message  is logged with ReportEventA().  This is similar to write_console().
 

Regards
Takayuki Tsunakawa



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Tue, Oct 31, 2017 at 6:59 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> From: pgsql-hackers-owner@postgresql.org
>> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
>> So you are basically ready to lose any message that could be pushed
>> here if there is no memory context? That does not sound like a good
>> trade-off to me. A static buffer does not look like the best idea
>> either to not truncate message, so couldn't we envisage to just use
>> malloc? pgwin32_message_to_UTF16() is called in two places in elog.c,
>> and there is a full control on the error code paths.
>
> Thank you for reviewing a rare bug fix on Windows that most people wouldn't be interested in.

Yeah, it may take a while until a committer gets interested I am
afraid. See my bug about pg_basebackup on Windows with path names..

> When CurrentMemoryContext is NULL, the message  is logged with ReportEventA().  This is similar to write_console().

My point is that as Postgres is running as a service, isn't it wrong
to write a message to stderr as a fallback if the memory context is
not set? You would lose a message. It seems to me that for an
operation that can happen at a low-level like the postmaster startup,
we should really use a low-level operation as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] postgres.exe crashes with access violationon Windows while starting up

From
"Tsunakawa, Takayuki"
Date:
From: Michael Paquier [mailto:michael.paquier@gmail.com]
> On Tue, Oct 31, 2017 at 6:59 AM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
> > When CurrentMemoryContext is NULL, the message  is logged with
> ReportEventA().  This is similar to write_console().
> 
> My point is that as Postgres is running as a service, isn't it wrong to
> write a message to stderr as a fallback if the memory context is not set?
> You would lose a message. It seems to me that for an operation that can
> happen at a low-level like the postmaster startup, we should really use
> a low-level operation as well.

I'm sorry I may not have been clear.  With this patch, write_eventlog() outputs the message to event log with
ReportEventA()when CurrentMemoryContext is NULL.  It doesn't write to stderr.  So the message won't be lost.
 

Regards
Takayuki Tsunakawa



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Wed, Nov 1, 2017 at 12:07 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> From: Michael Paquier [mailto:michael.paquier@gmail.com]
>> On Tue, Oct 31, 2017 at 6:59 AM, Tsunakawa, Takayuki
>> <tsunakawa.takay@jp.fujitsu.com> wrote:
>> > When CurrentMemoryContext is NULL, the message  is logged with
>> ReportEventA().  This is similar to write_console().
>>
>> My point is that as Postgres is running as a service, isn't it wrong to
>> write a message to stderr as a fallback if the memory context is not set?
>> You would lose a message. It seems to me that for an operation that can
>> happen at a low-level like the postmaster startup, we should really use
>> a low-level operation as well.
>
> I'm sorry I may not have been clear.  With this patch, write_eventlog() outputs the message to event log with
ReportEventA()when CurrentMemoryContext is NULL.  It doesn't write to stderr.  So the message won't be lost.
 

Oh, yes. Sorry. I got confused a bit by write_eventlog(), which is
already doing what your patch is adding for write_eventlog(). I am
switching the patch as ready for committer, I definitely agree that
you are taking the write approach here.

I am also adding Noah to get some input on this issue, as he is the
author and committer of 5f538ad0 which has improved the handling of
non-ASCII characters in this code path, and more importantly has
tweaked 43adc7a7 to handle properly transaction contexts in
pgwin32_message_to_UTF16() which is where the palloc calls happen. I
would be the one in the pool of committers who would most likely
commit your patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Sat, Oct 28, 2017 at 03:43:02PM -0700, Michael Paquier wrote:
> couldn't we envisage to just use
> malloc? pgwin32_message_to_UTF16() is called in two places in elog.c,
> and there is a full control on the error code paths.

Switching to malloc is feasible, but it wouldn't enable PostgreSQL to handle
non-ASCII messages any earlier.  Messages should be ASCII-only until the
init_locale(LC_CTYPE) call initializes MessageEncoding.  (Before that call,
pgwin32_message_to_UTF16() assumes the message is UTF8-encoded.  I've expanded
the comments slightly.  We easily comply with that restriction today.)

On Fri, Nov 03, 2017 at 11:10:14AM +0000, Michael Paquier wrote:
> I am
> switching the patch as ready for committer, I definitely agree that
> you are taking the write approach here.

Committed both patches.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Mon, Nov 13, 2017 at 7:37 AM, Noah Misch <noah@leadboat.com> wrote:
> On Fri, Nov 03, 2017 at 11:10:14AM +0000, Michael Paquier wrote:
>> I am
>> switching the patch as ready for committer, I definitely agree that
>> you are taking the write approach here.

s/write/right/.

> Committed both patches.

Thanks for double-checking, Noah.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers