Thread: [HACKERS] [bug fix] postgres.exe crashes with access violation on Windowswhile starting up
[HACKERS] [bug fix] postgres.exe crashes with access violation on Windowswhile starting up
From
"Tsunakawa, Takayuki"
Date:
Hello, We encountered a rare and hard-to-investigate problem on Windows, which one of our customers reported. Please find the attachedpatch to fix that. I'll add this to the next CF. PROBLEM ============================== PostgreSQL sometimes crashes with the following messages. This is infrequent (but frequent for the customer); it occurredabout 10 times in the past 5 months. LOG: server process (PID 2712) was terminated by exception 0xC0000005 HINT: See C include file "ntstatus.h" for a description of the hexadecimal value. LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. LOG: all server processes terminated; reinitializing "server process" shows that an client backend crashed. The above messages indicate that the process was not running an SQLcommand. PostgreSQL runs as a Windows service. No crash dump was produced anywhere, despite the facts: - <PGDATA>/crashdumps folder exists and is writable by the PostgreSQL user account (which is the user postgres.exe runs as) - The Windows registry configuration allows dumping the crash dump CAUSE ============================== We believe WSAStartup() in main.c failed. The only conceivable error is: WSAEPROCLIM 10067 Too many processes. A Windows Sockets implementation may have a limit on the number of applications that can use it simultaneously. WSAStartupmay fail with this error if the limit has been reached. But I couldn't find what the limit is and whether we can tune it. We couldn't reproduce the problem. When I pretend that WSAStartup() failed while a client backend is starting up, I could see the same phenomenon as the customer. This problem only occurs when PostgreSQL runs as a Windows service. The bug is in write_eventlog(). It calls pgwin32_message_to_utf16() which in turn calls palloc(), which requires the memorymanagement system to be set up (CurrentMemoryContext != NULL). FIX ============================== Add the check "CurrentMemoryContext != NULL" in write_eventlog() as in write_console(). NOTE ============================== The reason is for not outputing the crash dump is a) the crash occurred before installing the Windows exception handler (pgwin32_install_crashdump_handler()call) and b) the effect of the following call in postmaster is inherited in the childprocess. /* In case of general protection fault, don't show GUI popup box */ SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX); But I'm not sure in what order we should do pgwin32_install_crashdump_handler(), startup_hacks() and steps therein, MemoryContextInit(). I think that's another patch. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
[HACKERS] Re: [bug fix] postgres.exe crashes with access violation on Windowswhile starting up
From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tsunakawa, > Takayuki > The reason is for not outputing the crash dump is a) the crash occurred > before installing the Windows exception handler > (pgwin32_install_crashdump_handler() call) and b) the effect of the > following call in postmaster is inherited in the child process. > > /* In case of general protection fault, don't show GUI popup > box */ > SetErrorMode(SEM_FAILCRITICALERRORS | > SEM_NOGPFAULTERRORBOX); > > But I'm not sure in what order we should do > pgwin32_install_crashdump_handler(), startup_hacks() and steps therein, > MemoryContextInit(). I think that's another patch. Just installing the handler at the beginning of main() seems fine. Patch attached. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] [bug fix] postgres.exe crashes with access violation onWindows while starting up
From
Michael Paquier
Date:
On Thu, Oct 26, 2017 at 7:10 PM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > FIX > ============================== > > Add the check "CurrentMemoryContext != NULL" in write_eventlog() as in write_console(). * Also verify that we are not on our way into error recursion trouble due * to error messages thrown deep inside pgwin32_message_to_UTF16(). */ if (!in_error_recursion_trouble()&& + CurrentMemoryContext != NULL && GetMessageEncoding() != GetACPEncoding()) { So you are basically ready to lose any message that could be pushed here if there is no memory context? That does not sound like a good trade-off to me. A static buffer does not look like the best idea either to not truncate message, so couldn't we envisage to just use malloc? pgwin32_message_to_UTF16() is called in two places in elog.c, and there is a full control on the error code paths. > NOTE > ============================== > > The reason is for not outputing the crash dump is a) the crash occurred before installing the Windows exception handler(pgwin32_install_crashdump_handler() call) and b) the effect of the following call in postmaster is inherited in thechild process. > > /* In case of general protection fault, don't show GUI popup box */ > SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX); > > But I'm not sure in what order we should do pgwin32_install_crashdump_handler(), startup_hacks() and steps therein, MemoryContextInit(). I think that's another patch. Perhaps. I don't have a final opinion on this matter. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] postgres.exe crashes with access violationon Windows while starting up
From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier > So you are basically ready to lose any message that could be pushed > here if there is no memory context? That does not sound like a good > trade-off to me. A static buffer does not look like the best idea > either to not truncate message, so couldn't we envisage to just use > malloc? pgwin32_message_to_UTF16() is called in two places in elog.c, > and there is a full control on the error code paths. Thank you for reviewing a rare bug fix on Windows that most people wouldn't be interested in. When CurrentMemoryContext isNULL, the message is logged with ReportEventA(). This is similar to write_console(). Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] postgres.exe crashes with access violation onWindows while starting up
From
Michael Paquier
Date:
On Tue, Oct 31, 2017 at 6:59 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > From: pgsql-hackers-owner@postgresql.org >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier >> So you are basically ready to lose any message that could be pushed >> here if there is no memory context? That does not sound like a good >> trade-off to me. A static buffer does not look like the best idea >> either to not truncate message, so couldn't we envisage to just use >> malloc? pgwin32_message_to_UTF16() is called in two places in elog.c, >> and there is a full control on the error code paths. > > Thank you for reviewing a rare bug fix on Windows that most people wouldn't be interested in. Yeah, it may take a while until a committer gets interested I am afraid. See my bug about pg_basebackup on Windows with path names.. > When CurrentMemoryContext is NULL, the message is logged with ReportEventA(). This is similar to write_console(). My point is that as Postgres is running as a service, isn't it wrong to write a message to stderr as a fallback if the memory context is not set? You would lose a message. It seems to me that for an operation that can happen at a low-level like the postmaster startup, we should really use a low-level operation as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] postgres.exe crashes with access violationon Windows while starting up
From
"Tsunakawa, Takayuki"
Date:
From: Michael Paquier [mailto:michael.paquier@gmail.com] > On Tue, Oct 31, 2017 at 6:59 AM, Tsunakawa, Takayuki > <tsunakawa.takay@jp.fujitsu.com> wrote: > > When CurrentMemoryContext is NULL, the message is logged with > ReportEventA(). This is similar to write_console(). > > My point is that as Postgres is running as a service, isn't it wrong to > write a message to stderr as a fallback if the memory context is not set? > You would lose a message. It seems to me that for an operation that can > happen at a low-level like the postmaster startup, we should really use > a low-level operation as well. I'm sorry I may not have been clear. With this patch, write_eventlog() outputs the message to event log with ReportEventA()when CurrentMemoryContext is NULL. It doesn't write to stderr. So the message won't be lost. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] postgres.exe crashes with access violation onWindows while starting up
From
Michael Paquier
Date:
On Wed, Nov 1, 2017 at 12:07 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > From: Michael Paquier [mailto:michael.paquier@gmail.com] >> On Tue, Oct 31, 2017 at 6:59 AM, Tsunakawa, Takayuki >> <tsunakawa.takay@jp.fujitsu.com> wrote: >> > When CurrentMemoryContext is NULL, the message is logged with >> ReportEventA(). This is similar to write_console(). >> >> My point is that as Postgres is running as a service, isn't it wrong to >> write a message to stderr as a fallback if the memory context is not set? >> You would lose a message. It seems to me that for an operation that can >> happen at a low-level like the postmaster startup, we should really use >> a low-level operation as well. > > I'm sorry I may not have been clear. With this patch, write_eventlog() outputs the message to event log with ReportEventA()when CurrentMemoryContext is NULL. It doesn't write to stderr. So the message won't be lost. Oh, yes. Sorry. I got confused a bit by write_eventlog(), which is already doing what your patch is adding for write_eventlog(). I am switching the patch as ready for committer, I definitely agree that you are taking the write approach here. I am also adding Noah to get some input on this issue, as he is the author and committer of 5f538ad0 which has improved the handling of non-ASCII characters in this code path, and more importantly has tweaked 43adc7a7 to handle properly transaction contexts in pgwin32_message_to_UTF16() which is where the palloc calls happen. I would be the one in the pool of committers who would most likely commit your patch. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] postgres.exe crashes with access violationon Windows while starting up
From
Noah Misch
Date:
On Sat, Oct 28, 2017 at 03:43:02PM -0700, Michael Paquier wrote: > couldn't we envisage to just use > malloc? pgwin32_message_to_UTF16() is called in two places in elog.c, > and there is a full control on the error code paths. Switching to malloc is feasible, but it wouldn't enable PostgreSQL to handle non-ASCII messages any earlier. Messages should be ASCII-only until the init_locale(LC_CTYPE) call initializes MessageEncoding. (Before that call, pgwin32_message_to_UTF16() assumes the message is UTF8-encoded. I've expanded the comments slightly. We easily comply with that restriction today.) On Fri, Nov 03, 2017 at 11:10:14AM +0000, Michael Paquier wrote: > I am > switching the patch as ready for committer, I definitely agree that > you are taking the write approach here. Committed both patches. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] postgres.exe crashes with access violation onWindows while starting up
From
Michael Paquier
Date:
On Mon, Nov 13, 2017 at 7:37 AM, Noah Misch <noah@leadboat.com> wrote: > On Fri, Nov 03, 2017 at 11:10:14AM +0000, Michael Paquier wrote: >> I am >> switching the patch as ready for committer, I definitely agree that >> you are taking the write approach here. s/write/right/. > Committed both patches. Thanks for double-checking, Noah. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers