Re: [HACKERS] [COMMITTERS] pgsql: Perform only one ReadControlFile() during startup. - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] [COMMITTERS] pgsql: Perform only one ReadControlFile() during startup.
Date
Msg-id 14134.1505572349@sss.pgh.pa.us
Whole thread Raw
Responses [HACKERS] Re: [COMMITTERS] pgsql: Perform only one ReadControlFile() duringstartup.
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> Perform only one ReadControlFile() during startup.

This patch or something closely related to it has broken the postmaster's
ability to recover from a backend crash.  For example, after exercising
the backend crash Andreas just reported:

regression=# select from information_schema.user_mapping_options;
server closed the connection unexpectedly       This probably means the server terminated abnormally       before or
whileprocessing the request.
 
The connection to the server was lost. Attempting reset: Failed.
!> \q

attempting to reconnect fails, because the postmaster isn't there.
It left a core file behind though, in which I find

Program terminated with signal 11, Segmentation fault.
#0  0x000000000088b792 in GetMemoryChunkContext (pointer=0x7fdb36fc3f00)   at
../../../../src/include/utils/memutils.h:124
124             AssertArg(MemoryContextIsValid(context));
(gdb) bt
#0  0x000000000088b792 in GetMemoryChunkContext (pointer=0x7fdb36fc3f00)   at
../../../../src/include/utils/memutils.h:124
#1  pfree (pointer=0x7fdb36fc3f00) at mcxt.c:951
#2  0x0000000000512843 in XLOGShmemInit () at xlog.c:4897
#3  0x0000000000737fd9 in CreateSharedMemoryAndSemaphores (   makePrivate=0 '\000', port=5440) at ipci.c:220
#4  0x00000000006e4a78 in reset_shared () at postmaster.c:2516
#5  PostmasterStateMachine () at postmaster.c:3832
#6  0x00000000006e541d in reaper (postgres_signal_arg=<value optimized out>)   at postmaster.c:3081
#7  <signal handler called>
#8  0x0000003b78ae1603 in __select_nocancel ()   at ../sysdeps/unix/syscall-template.S:82
#9  0x00000000008a432a in pg_usleep (microsec=<value optimized out>)   at pgsleep.c:56
#10 0x00000000006e75d7 in ServerLoop (argc=<value optimized out>,    argv=<value optimized out>) at postmaster.c:1705
#11 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)   at postmaster.c:1364

It's dying at "pfree(localControlFile)".  localControlFile seems to
be pointing at a region of memory that's entirely zeroes; certainly
the data that it just moved into shared memory is all zeroes.
It looks like someone didn't think hard enough about when to reset
ControlFile to null.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Gerdan Santos
Date:
Subject: Re: [HACKERS] Variable substitution in psql backtick expansion
Next
From: chenhj
Date:
Subject: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files