Thread: BUG #16094: Database entering recovery mode
The following bug has been logged on the website: Bug reference: 16094 Logged by: Mircea Pirv Email address: mircea@reva.tech PostgreSQL version: 12.0 Operating system: OSX Description: Hello, We've updated our database from 10.6 to 12.0 recently and we keep encountering an error which says that the database is entering recovery mode. Our setup has replication enabled, with a replication slot, and using a replication subscription. On our development machines replication is setup to use the same machine. The problem is, that with replication enabled, changing a column value using an update statement, causes the database to enter recovery mode. This happens only when replication is enabled. We could not find any pattern, like what kind of columns might be the culprit. It happens with all kinds of columns, regardless of the fact that we have constraints, triggers, defaults on the columns. With some columns it happens randomly, with some it happens all the time. In the logs the only error we see, is a segmentation fault, when replication tries to run. We tried with both a local server, and a dockerized version (12.0). It only reproduces in 12.0. 10.6 and 11.5 do not have this problem, we could not reproduce it. Any ideas on what may have changed in 12.0 to cause this behavior ? Thanks, Mircea
PG Bug reporting form <noreply@postgresql.org> writes: > We've updated our database from 10.6 to 12.0 recently and we keep > encountering an error which says that the database is entering recovery > mode. > ... > In the logs the only error we see, is a segmentation fault, when replication > tries to run. It should be possible to collect a stack trace for the segfault, which would greatly assist debugging this. Please see https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend regards, tom lane
This is all that I managed to retrieve using gdb
#0 0x00007f53d89eb7b7 in epoll_wait (epfd=3, events=0x55bc137a59c0, maxevents=1, maxevents@entry=<error reading variable: Cannot access memory at address 0x7ffee2d44afc>, timeout=-1, timeout@entry=<error reading variable: Cannot access memory at address 0x7ffee2d44ae8>) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
resultvar = 18446744073709551612
sc_ret = <optimized out>
#1 0x000055bc11790379 in WaitEventSetWaitBlock (nevents=<error reading variable: Cannot access memory at address 0x7ffee2d44afc>, occurred_events=<error reading variable: Cannot access memory at address 0x7ffee2d44ad8>, cur_timeout=<error reading variable: Cannot access memory at address 0x7ffee2d44ae8>, set=0x55bc137a5948) at ./build/../src/backend/storage/ipc/latch.c:1080
returned_events = 0
rc = <optimized out>
cur_event = <optimized out>
cur_epoll_event = <optimized out>
returned_events = <optimized out>
rc = <optimized out>
cur_event = <optimized out>
cur_epoll_event = <optimized out>
__func__ = <error reading variable __func__ (Cannot access memory at address 0x55bc11a5a510)>
__errno_location = <optimized out>
#2 WaitEventSetWait (set=0x55bc137a5948, timeout=<error reading variable: Cannot access memory at address 0x7ffee2d44ae0>, occurred_events=<error reading variable: Cannot access memory at address 0x7ffee2d44ad8>, nevents=<error reading variable: Cannot access memory at address 0x7ffee2d44afc>, wait_event_info=<optimized out>) at ./build/../src/backend/storage/ipc/latch.c:1032
rc = <optimized out>
returned_events = 0
start_time = <error reading variable start_time (Cannot access memory at address 0x7ffee2d44b00)>
cur_time = <error reading variable cur_time (Cannot access memory at address 0x7ffee2d44b10)>
cur_timeout = <error reading variable cur_timeout (Cannot access memory at address 0x7ffee2d44ae8)>
Backtrace stopped: Cannot access memory at address 0x7ffee2d44b78
resultvar = 18446744073709551612
sc_ret = <optimized out>
#1 0x000055bc11790379 in WaitEventSetWaitBlock (nevents=<error reading variable: Cannot access memory at address 0x7ffee2d44afc>, occurred_events=<error reading variable: Cannot access memory at address 0x7ffee2d44ad8>, cur_timeout=<error reading variable: Cannot access memory at address 0x7ffee2d44ae8>, set=0x55bc137a5948) at ./build/../src/backend/storage/ipc/latch.c:1080
returned_events = 0
rc = <optimized out>
cur_event = <optimized out>
cur_epoll_event = <optimized out>
returned_events = <optimized out>
rc = <optimized out>
cur_event = <optimized out>
cur_epoll_event = <optimized out>
__func__ = <error reading variable __func__ (Cannot access memory at address 0x55bc11a5a510)>
__errno_location = <optimized out>
#2 WaitEventSetWait (set=0x55bc137a5948, timeout=<error reading variable: Cannot access memory at address 0x7ffee2d44ae0>, occurred_events=<error reading variable: Cannot access memory at address 0x7ffee2d44ad8>, nevents=<error reading variable: Cannot access memory at address 0x7ffee2d44afc>, wait_event_info=<optimized out>) at ./build/../src/backend/storage/ipc/latch.c:1032
rc = <optimized out>
returned_events = 0
start_time = <error reading variable start_time (Cannot access memory at address 0x7ffee2d44b00)>
cur_time = <error reading variable cur_time (Cannot access memory at address 0x7ffee2d44b10)>
cur_timeout = <error reading variable cur_timeout (Cannot access memory at address 0x7ffee2d44ae8)>
Backtrace stopped: Cannot access memory at address 0x7ffee2d44b78
On Mon, Nov 4, 2019 at 5:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> We've updated our database from 10.6 to 12.0 recently and we keep
> encountering an error which says that the database is entering recovery
> mode.
> ...
> In the logs the only error we see, is a segmentation fault, when replication
> tries to run.
It should be possible to collect a stack trace for the segfault,
which would greatly assist debugging this. Please see
https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
regards, tom lane
On Wed, Nov 06, 2019 at 03:37:31PM +0200, Mircea Pirv wrote: > #2 WaitEventSetWait (set=0x55bc137a5948, timeout=<error reading variable: > Cannot access memory at address 0x7ffee2d44ae0>, occurred_events=<error > reading variable: Cannot access memory at address 0x7ffee2d44ad8>, > nevents=<error reading variable: Cannot access memory at address > 0x7ffee2d44afc>, wait_event_info=<optimized out>) at > ./build/../src/backend/storage/ipc/latch.c:1032 > rc = <optimized out> > returned_events = 0 > start_time = <error reading variable start_time (Cannot access > memory at address 0x7ffee2d44b00)> > cur_time = <error reading variable cur_time (Cannot access memory > at address 0x7ffee2d44b10)> > cur_timeout = <error reading variable cur_timeout (Cannot access > memory at address 0x7ffee2d44ae8)> > Backtrace stopped: Cannot access memory at address 0x7ffee2d44b78 Is this the only part of the stack you can get? WaitEventSetWait() gets called in three places, which are be-secure.c, syslogger.c and condition_variable.c. Could it be possible to see more of the actual callers here? -- Michael
Attachment
Hi, Thanks for the reply. That's all the bt full command prints out when the segmentation fault error occurs.
Thanks.
Mircea
On Thu, Nov 7, 2019 at 4:35 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Nov 06, 2019 at 03:37:31PM +0200, Mircea Pirv wrote:
> #2 WaitEventSetWait (set=0x55bc137a5948, timeout=<error reading variable:
> Cannot access memory at address 0x7ffee2d44ae0>, occurred_events=<error
> reading variable: Cannot access memory at address 0x7ffee2d44ad8>,
> nevents=<error reading variable: Cannot access memory at address
> 0x7ffee2d44afc>, wait_event_info=<optimized out>) at
> ./build/../src/backend/storage/ipc/latch.c:1032
> rc = <optimized out>
> returned_events = 0
> start_time = <error reading variable start_time (Cannot access
> memory at address 0x7ffee2d44b00)>
> cur_time = <error reading variable cur_time (Cannot access memory
> at address 0x7ffee2d44b10)>
> cur_timeout = <error reading variable cur_timeout (Cannot access
> memory at address 0x7ffee2d44ae8)>
> Backtrace stopped: Cannot access memory at address 0x7ffee2d44b78
Is this the only part of the stack you can get? WaitEventSetWait()
gets called in three places, which are be-secure.c, syslogger.c and
condition_variable.c. Could it be possible to see more of the actual
callers here?
--
Michael