BUG #8579: CoreDump of background writer process - Mailing list pgsql-bugs

From rgr@cslab.de
Subject BUG #8579: CoreDump of background writer process
Date
Msg-id E1Vdyst-000374-Fd@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #8579: CoreDump of background writer process  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: BUG #8579: CoreDump of background writer process  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      8579
Logged by:          Rene Gruen
Email address:      rgr@cslab.de
PostgreSQL version: 9.0.13
Operating system:   QNX 6.5
Description:

Greetings everybody.


We are using postgresql for round about 10 years in our software. Starting
with our update from version 8.3 to 9.0 we are receveing coredumps of the
server process.
This seems to happen while inserting different data into different tables
via COPY.
I do not add some example data / table-descriptions, because the error does
not ocure each time (also using the same data and the same table)


I have seen the core using ecpg (examples one and two) and using psql
(example 3).
Each time there is a message like "could not seek to block XXX in file YYY"
just before the server terminates.


We are using QNX 6.5 SP 1 and a port based on the pkgsrc.


I hope the attached data will help to locate the problem.


Example one (using a "normal" version):


Logfile:
---633794700-2013-11-01 00:09:58 CET:CONTEXT:  automatic analyze of table
"gltdb.glttab001.glt_dp_para_trend_1hour"
---1507368-2013-11-01 00:10:23 CET:LOG:  received SIGHUP, reloading
configuration files
---1519657-2013-11-01 00:10:23 CET:ERROR:  could not seek to block 149256 in
file "pg_tblspc/17639/PG_9.0_201008051/16387/28109.1": Interrupted function
call
---1519657-2013-11-01 00:10:23 CET:CONTEXT:  writing block 149256 of
relation pg_tblspc/17639/PG_9.0_201008051/16387/28109
---1507368-2013-11-01 00:11:58 CET:LOG:  background writer process (PID
1519657) was terminated by signal 11: segmentation violation
---1507368-2013-11-01 00:11:58 CET:LOG:  terminating any other active server
processes


contents of the dump-file:


#0  0x00000000 in ?? ()
#1  0x08205ef4 in BackgroundWriterMain ()
#2  0x080e2759 in AuxiliaryProcessMain ()
#3  0x0820bb87 in StartChildProcess ()
#4  0x0820deca in reaper ()
#5  <signal handler called>
#6  0xb033e412 in SignalWaitinfo () from /lib/libc.so.3
#7  0xb0371ef2 in _poll () from /lib/libc.so.3
#8  0xb0372253 in select () from /lib/libc.so.3
#9  0x0820f46b in PostmasterMain ()
#10 0x081b6bfb in main ()






Example two (using a debug-version):


Logfile:
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:31:46 CET:LOG:
duration: 65108.037 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000002'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:33:21 CET:LOG:
duration: 94909.477 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000003'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:35:24 CET:LOG:
duration: 122674.228 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000004'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:37:44 CET:LOG:
duration: 140464.505 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000005'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:40:06 CET:LOG:
duration: 141908.285 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000006'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:42:34 CET:LOG:
duration: 147574.417 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000007'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:44:46 CET:LOG:
duration: 131793.832 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000008'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:48:08 CET:LOG:
duration: 202173.063 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000009'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:51:25 CET:LOG:
duration: 196979.858 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000010'
---1478697-2013-11-05 16:53:23 CET:ERROR:  could not seek to block 267907 in
file "pg_tblspc/17639/PG_9.0_201008051/16387/28106.2": Interrupted function
call
---1478697-2013-11-05 16:53:23 CET:CONTEXT:  writing block 267907 of
relation pg_tblspc/17639/PG_9.0_201008051/16387/28106
---1466408-2013-11-05 16:54:52 CET:LOG:  background writer process (PID
1478697) was terminated by signal 11: segmentation violation
---1466408-2013-11-05 16:54:52 CET:LOG:  terminating any other active server
processes
---1184063623-2013-11-05 16:54:52 CET:WARNING:  terminating connection
because of crash of another server process
---1184063623-2013-11-05 16:54:52 CET:DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
---1184063623-2013-11-05 16:54:52 CET:HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
GLTlogger[2777125]-gltdb-[local]-2908210-2013-11-05 16:54:52 CET:WARNING:
terminating connection because of crash of another server process
GLTlogger[2777125]-gltdb-[local]-2908210-2013-11-05 16:54:52 CET:DETAIL:
The postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
GLTlogger[2777125]-gltdb-[local]-2908210-2013-11-05 16:54:52 CET:HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:54:52 CET:WARNING:
terminating connection because of crash of another server process
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:54:52 CET:DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:54:52 CET:HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:54:52 CET:CONTEXT:
COPY glt_dp_para_trend_1min, line 117201: "65732    2012-12-19
13:45:00+01    23016900.005000    1    I"
---1466408-2013-11-05 16:54:53 CET:LOG:  all server processes terminated;
reinitializing
---1752227881-2013-11-05 16:54:53 CET:LOG:  database system was interrupted;
last known up at 2013-11-05 16:52:06 CET
---1752227881-2013-11-05 16:54:53 CET:LOG:  database system was not properly
shut down; automatic recovery in progress
---1752227881-2013-11-05 16:54:53 CET:LOG:  redo starts at 38/D300E054
[unknown]-gltdb-[local]-1752231980-2013-11-05 16:55:00 CET:FATAL:  the
database system is in recovery mode
---1752227881-2013-11-05 16:55:10 CET:LOG:  invalid magic number 0000 in log
file 56, segment 221, offset 11935744
---1752227881-2013-11-05 16:55:10 CET:LOG:  redo done at 38/DDB60ED4
---1752227881-2013-11-05 16:55:10 CET:LOG:  last completed transaction was
at log time 2013-11-05 16:53:27.692009+01
---1752391757-2013-11-05 16:55:13 CET:LOG:  autovacuum launcher started
---1466408-2013-11-05 16:55:13 CET:LOG:  database system is ready to accept
connections


coredump:


Program terminated with signal 11, Segmentation fault.
[New pid 1478697 tid 1]
#0  0x00000000 in ?? ()


(gdb) bt
#0  0x00000000 in ?? ()
#1  0x08205ef4 in BackgroundWriterMain () at bgwriter.c:329
#2  0x080e2759 in AuxiliaryProcessMain (argc=2, argv=0x8047460) at
bootstrap.c:417
#3  0x0820bb87 in StartChildProcess (type=<value optimized out>) at
postmaster.c:4458
#4  0x0820deca in reaper (postgres_signal_arg=18) at postmaster.c:2418
#5  <signal handler called>
#6  0xb033e412 in SignalWaitinfo () from /lib/libc.so.3
#7  0xb0371ef2 in _poll () from /lib/libc.so.3
#8  0xb0372253 in select () from /lib/libc.so.3
#9  0x0820f46b in PostmasterMain (argc=8, argv=0x84b5100) at
postmaster.c:1414
#10 0x081b6bfb in main (argc=8, argv=0x84b5100) at main.c:188


(gdb) up
#1  0x08205ef4 in BackgroundWriterMain () at bgwriter.c:329
329     bgwriter.c: No such file or directory.
        in bgwriter.c


329@bgwriter.c: MemoryContextResetAndDeleteChildren(bgwriter_context);


(gdb) print bgwriter_context
$1 = (MemoryContext) 0x84c0c00


(gdb) print *bgwriter_context
$3 = {type = T_Invalid, methods = 0x84c0d00, parent = 0x10, firstchild =
0x0, nextchild = 0x0, name = 0x0}
(gdb)






Example Tree (debug version):


psql-gltdb-[local]-29880389-2013-11-05 22:27:39 CET:LOG:  duration:
244502.586 ms  statement: COPY glttab018.pdbt_d4_contobject FROM STDIN;
psql-gltdb-[local]-30502981-2013-11-05 22:32:14 CET:LOG:  duration:
274090.058 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-31166533-2013-11-05 22:36:52 CET:LOG:  duration:
276984.615 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-31858757-2013-11-05 22:41:40 CET:LOG:  duration:
288205.898 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-32526405-2013-11-05 22:46:31 CET:LOG:  duration:
290317.574 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-33275974-2013-11-05 22:51:23 CET:LOG:  duration:
291812.346 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-33972293-2013-11-05 22:56:12 CET:LOG:  duration:
288272.888 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-34664517-2013-11-05 23:01:19 CET:LOG:  duration:
306371.118 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-35409990-2013-11-05 23:06:18 CET:LOG:  duration:
298632.302 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-36130886-2013-11-05 23:11:18 CET:LOG:  duration:
298665.298 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-36851782-2013-11-05 23:16:22 CET:LOG:  duration:
304064.471 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-37576773-2013-11-05 23:21:38 CET:LOG:  duration:
315617.703 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-38326341-2013-11-05 23:26:54 CET:LOG:  duration:
315162.772 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-39100485-2013-11-05 23:32:13 CET:LOG:  duration:
317958.345 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-39850053-2013-11-05 23:37:45 CET:LOG:  duration:
331930.207 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-40628293-2013-11-05 23:43:03 CET:LOG:  duration:
317776.373 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-41427014-2013-11-05 23:48:48 CET:LOG:  duration:
343849.383 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
---1634347-2013-11-05 23:50:58 CET:ERROR:  could not seek to block 13243 in
file "pg_tblspc/78600/PG_9.0_201008051/16387/79270": Interrupted function
call
---1634347-2013-11-05 23:50:58 CET:CONTEXT:  writing block 13243 of relation
pg_tblspc/78600/PG_9.0_201008051/16387/79270
psql-gltdb-[local]-42233926-2013-11-05 23:57:52 CET:LOG:  duration:
544057.747 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
---1613866-2013-11-05 23:59:21 CET:LOG:  background writer process (PID
1634347) was terminated by signal 11: segmentation violation
---1613866-2013-11-05 23:59:21 CET:LOG:  terminating any other active server
processes
psql-gltdb-[local]-43360325-2013-11-05 23:59:21 CET:WARNING:  terminating
connection because of crash of another server process
psql-gltdb-[local]-43360325-2013-11-05 23:59:21 CET:DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
psql-gltdb-[local]-43360325-2013-11-05 23:59:21 CET:HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
psql-gltdb-[local]-43360325-2013-11-05 23:59:21 CET:CONTEXT:  COPY
pdbt_d4_contparameter, line 82696:
"919360    14666    2    0    1    1    0    0    0    Kühlzeit    0    0    0    I    10001    5224    0    -1"
---1613866-2013-11-05 23:59:23 CET:LOG:  all server processes terminated;
reinitializing
[unknown]-gltdb-[local]-43606062-2013-11-05 23:59:24 CET:FATAL:  the
database system is in recovery mode
---43606059-2013-11-05 23:59:24 CET:LOG:  database system was interrupted;
last known up at 2013-11-05 23:50:10 CET
---1613866-2013-11-05 23:59:24 CET:LOG:  received SIGHUP, reloading
configuration files
---43606059-2013-11-05 23:59:25 CET:LOG:  could not remove cache file
"base/849886/pg_internal.init": Interrupted function call
---43606059-2013-11-05 23:59:26 CET:LOG:  database system was not properly
shut down; automatic recovery in progress
---43606059-2013-11-05 23:59:26 CET:LOG:  redo starts at 14/B300F554
---1613866-2013-11-05 23:59:27 CET:LOG:  received SIGHUP, reloading
configuration files
---1613866-2013-11-05 23:59:27 CET:LOG:  parameter "autovacuum" changed to
"on"
[unknown]-gltdb-[local]-43773970-2013-11-05 23:59:32 CET:FATAL:  the
database system is in recovery mode
[unknown]-gltdb-[local]-43810834-2013-11-05 23:59:33 CET:FATAL:  the
database system is in recovery mode
[unknown]-gltdb-[local]-44539922-2013-11-06 00:04:35 CET:FATAL:  the
database system is in recovery mode
---43606059-2013-11-06 00:07:24 CET:LOG:  could not open file
"pg_xlog/0000000100000014000000C1" (log file 20, segment 193): No such file
or directory
---43606059-2013-11-06 00:07:24 CET:LOG:  redo done at 14/C0FFFFB0
---43606059-2013-11-06 00:07:24 CET:LOG:  last completed transaction was at
log time 2013-11-05 23:57:52.607898+01
---44941357-2013-11-06 00:07:29 CET:LOG:  autovacuum launcher started
---1613866-2013-11-06 00:07:29 CET:LOG:  database system is ready to accept
connections
---1613866-2013-11-06 00:13:02 CET:LOG:  received SIGHUP, reloading
configuration files
vacuumdb-gltdb-[local]-45871174-2013-11-06 01:14:08 CET:WARNING:  relation
"pdbt_d4_contparameter" page 21075 is uninitialized --- fixing
vacuumdb-gltdb-[local]-45871174-2013-11-06 01:14:08 CET:WARNING:  relation
"pdbt_d4_contparameter" page 21076 is uninitialized --- fixing
vacuumdb-gltdb-[local]-45871174-2013-11-06 01:14:08 CET:WARNING:  relation
"pdbt_d4_contparameter" page 21077 is uninitialized --- fixing
vacuumdb-gltdb-[local]-45871174-2013-11-06 01:18:02 CET:LOG:  duration:
3896493.745 ms  statement: VACUUM (ANALYZE);




contents of the core file:


Program terminated with signal 11, Segmentation fault.
[New pid 1634347 tid 1]
#0  0x00000000 in ?? ()


(gdb) bt
#0  0x00000000 in ?? ()
#1  0x08205ef4 in BackgroundWriterMain () at bgwriter.c:329
#2  0x080e2759 in AuxiliaryProcessMain (argc=2, argv=0x8047460) at
bootstrap.c:417
#3  0x0820bb87 in StartChildProcess (type=<value optimized out>) at
postmaster.c:4458
#4  0x0820deca in reaper (postgres_signal_arg=18) at postmaster.c:2418
#5  <signal handler called>
#6  0xb033e412 in SignalWaitinfo () from /lib/libc.so.3
#7  0xb0371ef2 in _poll () from /lib/libc.so.3
#8  0xb0372253 in select () from /lib/libc.so.3
#9  0x0820f46b in PostmasterMain (argc=8, argv=0x84b5100) at
postmaster.c:1414
#10 0x081b6bfb in main (argc=8, argv=0x84b5100) at main.c:188


(gdb) up
#1  0x08205ef4 in BackgroundWriterMain () at bgwriter.c:329
329     bgwriter.c: No such file or directory.
        in bgwriter.c


(gdb) print *bgwriter_context
$1 = {type = T_Invalid, methods = 0x84c0d00, parent = 0x10, firstchild =
0x0, nextchild = 0x0, name = 0x0}

pgsql-bugs by date:

Previous
From: Robert Nix
Date:
Subject: Re: BUG #8578: loading a 33G (compressed) pg_dump into a fresh host and db instance crashes a postgresql process
Next
From: Peter Eisentraut
Date:
Subject: Re: Fwd: Problems with pg_stat_activity view