Thread: BUG #8579: CoreDump of background writer process

BUG #8579: CoreDump of background writer process

From
rgr@cslab.de
Date:
The following bug has been logged on the website:

Bug reference:      8579
Logged by:          Rene Gruen
Email address:      rgr@cslab.de
PostgreSQL version: 9.0.13
Operating system:   QNX 6.5
Description:

Greetings everybody.


We are using postgresql for round about 10 years in our software. Starting
with our update from version 8.3 to 9.0 we are receveing coredumps of the
server process.
This seems to happen while inserting different data into different tables
via COPY.
I do not add some example data / table-descriptions, because the error does
not ocure each time (also using the same data and the same table)


I have seen the core using ecpg (examples one and two) and using psql
(example 3).
Each time there is a message like "could not seek to block XXX in file YYY"
just before the server terminates.


We are using QNX 6.5 SP 1 and a port based on the pkgsrc.


I hope the attached data will help to locate the problem.


Example one (using a "normal" version):


Logfile:
---633794700-2013-11-01 00:09:58 CET:CONTEXT:  automatic analyze of table
"gltdb.glttab001.glt_dp_para_trend_1hour"
---1507368-2013-11-01 00:10:23 CET:LOG:  received SIGHUP, reloading
configuration files
---1519657-2013-11-01 00:10:23 CET:ERROR:  could not seek to block 149256 in
file "pg_tblspc/17639/PG_9.0_201008051/16387/28109.1": Interrupted function
call
---1519657-2013-11-01 00:10:23 CET:CONTEXT:  writing block 149256 of
relation pg_tblspc/17639/PG_9.0_201008051/16387/28109
---1507368-2013-11-01 00:11:58 CET:LOG:  background writer process (PID
1519657) was terminated by signal 11: segmentation violation
---1507368-2013-11-01 00:11:58 CET:LOG:  terminating any other active server
processes


contents of the dump-file:


#0  0x00000000 in ?? ()
#1  0x08205ef4 in BackgroundWriterMain ()
#2  0x080e2759 in AuxiliaryProcessMain ()
#3  0x0820bb87 in StartChildProcess ()
#4  0x0820deca in reaper ()
#5  <signal handler called>
#6  0xb033e412 in SignalWaitinfo () from /lib/libc.so.3
#7  0xb0371ef2 in _poll () from /lib/libc.so.3
#8  0xb0372253 in select () from /lib/libc.so.3
#9  0x0820f46b in PostmasterMain ()
#10 0x081b6bfb in main ()






Example two (using a debug-version):


Logfile:
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:31:46 CET:LOG:
duration: 65108.037 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000002'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:33:21 CET:LOG:
duration: 94909.477 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000003'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:35:24 CET:LOG:
duration: 122674.228 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000004'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:37:44 CET:LOG:
duration: 140464.505 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000005'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:40:06 CET:LOG:
duration: 141908.285 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000006'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:42:34 CET:LOG:
duration: 147574.417 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000007'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:44:46 CET:LOG:
duration: 131793.832 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000008'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:48:08 CET:LOG:
duration: 202173.063 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000009'
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:51:25 CET:LOG:
duration: 196979.858 ms  execute copy_stmt: COPY glt_dp_para_trend_1min FROM
'/tmp/odbc5689466-65732-1-000010'
---1478697-2013-11-05 16:53:23 CET:ERROR:  could not seek to block 267907 in
file "pg_tblspc/17639/PG_9.0_201008051/16387/28106.2": Interrupted function
call
---1478697-2013-11-05 16:53:23 CET:CONTEXT:  writing block 267907 of
relation pg_tblspc/17639/PG_9.0_201008051/16387/28106
---1466408-2013-11-05 16:54:52 CET:LOG:  background writer process (PID
1478697) was terminated by signal 11: segmentation violation
---1466408-2013-11-05 16:54:52 CET:LOG:  terminating any other active server
processes
---1184063623-2013-11-05 16:54:52 CET:WARNING:  terminating connection
because of crash of another server process
---1184063623-2013-11-05 16:54:52 CET:DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
---1184063623-2013-11-05 16:54:52 CET:HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
GLTlogger[2777125]-gltdb-[local]-2908210-2013-11-05 16:54:52 CET:WARNING:
terminating connection because of crash of another server process
GLTlogger[2777125]-gltdb-[local]-2908210-2013-11-05 16:54:52 CET:DETAIL:
The postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
GLTlogger[2777125]-gltdb-[local]-2908210-2013-11-05 16:54:52 CET:HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:54:52 CET:WARNING:
terminating connection because of crash of another server process
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:54:52 CET:DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:54:52 CET:HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.
ODBChist[5689466]-gltdb-[local]-9003121-2013-11-05 16:54:52 CET:CONTEXT:
COPY glt_dp_para_trend_1min, line 117201: "65732    2012-12-19
13:45:00+01    23016900.005000    1    I"
---1466408-2013-11-05 16:54:53 CET:LOG:  all server processes terminated;
reinitializing
---1752227881-2013-11-05 16:54:53 CET:LOG:  database system was interrupted;
last known up at 2013-11-05 16:52:06 CET
---1752227881-2013-11-05 16:54:53 CET:LOG:  database system was not properly
shut down; automatic recovery in progress
---1752227881-2013-11-05 16:54:53 CET:LOG:  redo starts at 38/D300E054
[unknown]-gltdb-[local]-1752231980-2013-11-05 16:55:00 CET:FATAL:  the
database system is in recovery mode
---1752227881-2013-11-05 16:55:10 CET:LOG:  invalid magic number 0000 in log
file 56, segment 221, offset 11935744
---1752227881-2013-11-05 16:55:10 CET:LOG:  redo done at 38/DDB60ED4
---1752227881-2013-11-05 16:55:10 CET:LOG:  last completed transaction was
at log time 2013-11-05 16:53:27.692009+01
---1752391757-2013-11-05 16:55:13 CET:LOG:  autovacuum launcher started
---1466408-2013-11-05 16:55:13 CET:LOG:  database system is ready to accept
connections


coredump:


Program terminated with signal 11, Segmentation fault.
[New pid 1478697 tid 1]
#0  0x00000000 in ?? ()


(gdb) bt
#0  0x00000000 in ?? ()
#1  0x08205ef4 in BackgroundWriterMain () at bgwriter.c:329
#2  0x080e2759 in AuxiliaryProcessMain (argc=2, argv=0x8047460) at
bootstrap.c:417
#3  0x0820bb87 in StartChildProcess (type=<value optimized out>) at
postmaster.c:4458
#4  0x0820deca in reaper (postgres_signal_arg=18) at postmaster.c:2418
#5  <signal handler called>
#6  0xb033e412 in SignalWaitinfo () from /lib/libc.so.3
#7  0xb0371ef2 in _poll () from /lib/libc.so.3
#8  0xb0372253 in select () from /lib/libc.so.3
#9  0x0820f46b in PostmasterMain (argc=8, argv=0x84b5100) at
postmaster.c:1414
#10 0x081b6bfb in main (argc=8, argv=0x84b5100) at main.c:188


(gdb) up
#1  0x08205ef4 in BackgroundWriterMain () at bgwriter.c:329
329     bgwriter.c: No such file or directory.
        in bgwriter.c


329@bgwriter.c: MemoryContextResetAndDeleteChildren(bgwriter_context);


(gdb) print bgwriter_context
$1 = (MemoryContext) 0x84c0c00


(gdb) print *bgwriter_context
$3 = {type = T_Invalid, methods = 0x84c0d00, parent = 0x10, firstchild =
0x0, nextchild = 0x0, name = 0x0}
(gdb)






Example Tree (debug version):


psql-gltdb-[local]-29880389-2013-11-05 22:27:39 CET:LOG:  duration:
244502.586 ms  statement: COPY glttab018.pdbt_d4_contobject FROM STDIN;
psql-gltdb-[local]-30502981-2013-11-05 22:32:14 CET:LOG:  duration:
274090.058 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-31166533-2013-11-05 22:36:52 CET:LOG:  duration:
276984.615 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-31858757-2013-11-05 22:41:40 CET:LOG:  duration:
288205.898 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-32526405-2013-11-05 22:46:31 CET:LOG:  duration:
290317.574 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-33275974-2013-11-05 22:51:23 CET:LOG:  duration:
291812.346 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-33972293-2013-11-05 22:56:12 CET:LOG:  duration:
288272.888 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-34664517-2013-11-05 23:01:19 CET:LOG:  duration:
306371.118 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-35409990-2013-11-05 23:06:18 CET:LOG:  duration:
298632.302 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-36130886-2013-11-05 23:11:18 CET:LOG:  duration:
298665.298 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-36851782-2013-11-05 23:16:22 CET:LOG:  duration:
304064.471 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-37576773-2013-11-05 23:21:38 CET:LOG:  duration:
315617.703 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-38326341-2013-11-05 23:26:54 CET:LOG:  duration:
315162.772 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-39100485-2013-11-05 23:32:13 CET:LOG:  duration:
317958.345 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-39850053-2013-11-05 23:37:45 CET:LOG:  duration:
331930.207 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-40628293-2013-11-05 23:43:03 CET:LOG:  duration:
317776.373 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
psql-gltdb-[local]-41427014-2013-11-05 23:48:48 CET:LOG:  duration:
343849.383 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
---1634347-2013-11-05 23:50:58 CET:ERROR:  could not seek to block 13243 in
file "pg_tblspc/78600/PG_9.0_201008051/16387/79270": Interrupted function
call
---1634347-2013-11-05 23:50:58 CET:CONTEXT:  writing block 13243 of relation
pg_tblspc/78600/PG_9.0_201008051/16387/79270
psql-gltdb-[local]-42233926-2013-11-05 23:57:52 CET:LOG:  duration:
544057.747 ms  statement: COPY glttab018.pdbt_d4_contparameter FROM STDIN;
---1613866-2013-11-05 23:59:21 CET:LOG:  background writer process (PID
1634347) was terminated by signal 11: segmentation violation
---1613866-2013-11-05 23:59:21 CET:LOG:  terminating any other active server
processes
psql-gltdb-[local]-43360325-2013-11-05 23:59:21 CET:WARNING:  terminating
connection because of crash of another server process
psql-gltdb-[local]-43360325-2013-11-05 23:59:21 CET:DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
psql-gltdb-[local]-43360325-2013-11-05 23:59:21 CET:HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
psql-gltdb-[local]-43360325-2013-11-05 23:59:21 CET:CONTEXT:  COPY
pdbt_d4_contparameter, line 82696:
"919360    14666    2    0    1    1    0    0    0    Kühlzeit    0    0    0    I    10001    5224    0    -1"
---1613866-2013-11-05 23:59:23 CET:LOG:  all server processes terminated;
reinitializing
[unknown]-gltdb-[local]-43606062-2013-11-05 23:59:24 CET:FATAL:  the
database system is in recovery mode
---43606059-2013-11-05 23:59:24 CET:LOG:  database system was interrupted;
last known up at 2013-11-05 23:50:10 CET
---1613866-2013-11-05 23:59:24 CET:LOG:  received SIGHUP, reloading
configuration files
---43606059-2013-11-05 23:59:25 CET:LOG:  could not remove cache file
"base/849886/pg_internal.init": Interrupted function call
---43606059-2013-11-05 23:59:26 CET:LOG:  database system was not properly
shut down; automatic recovery in progress
---43606059-2013-11-05 23:59:26 CET:LOG:  redo starts at 14/B300F554
---1613866-2013-11-05 23:59:27 CET:LOG:  received SIGHUP, reloading
configuration files
---1613866-2013-11-05 23:59:27 CET:LOG:  parameter "autovacuum" changed to
"on"
[unknown]-gltdb-[local]-43773970-2013-11-05 23:59:32 CET:FATAL:  the
database system is in recovery mode
[unknown]-gltdb-[local]-43810834-2013-11-05 23:59:33 CET:FATAL:  the
database system is in recovery mode
[unknown]-gltdb-[local]-44539922-2013-11-06 00:04:35 CET:FATAL:  the
database system is in recovery mode
---43606059-2013-11-06 00:07:24 CET:LOG:  could not open file
"pg_xlog/0000000100000014000000C1" (log file 20, segment 193): No such file
or directory
---43606059-2013-11-06 00:07:24 CET:LOG:  redo done at 14/C0FFFFB0
---43606059-2013-11-06 00:07:24 CET:LOG:  last completed transaction was at
log time 2013-11-05 23:57:52.607898+01
---44941357-2013-11-06 00:07:29 CET:LOG:  autovacuum launcher started
---1613866-2013-11-06 00:07:29 CET:LOG:  database system is ready to accept
connections
---1613866-2013-11-06 00:13:02 CET:LOG:  received SIGHUP, reloading
configuration files
vacuumdb-gltdb-[local]-45871174-2013-11-06 01:14:08 CET:WARNING:  relation
"pdbt_d4_contparameter" page 21075 is uninitialized --- fixing
vacuumdb-gltdb-[local]-45871174-2013-11-06 01:14:08 CET:WARNING:  relation
"pdbt_d4_contparameter" page 21076 is uninitialized --- fixing
vacuumdb-gltdb-[local]-45871174-2013-11-06 01:14:08 CET:WARNING:  relation
"pdbt_d4_contparameter" page 21077 is uninitialized --- fixing
vacuumdb-gltdb-[local]-45871174-2013-11-06 01:18:02 CET:LOG:  duration:
3896493.745 ms  statement: VACUUM (ANALYZE);




contents of the core file:


Program terminated with signal 11, Segmentation fault.
[New pid 1634347 tid 1]
#0  0x00000000 in ?? ()


(gdb) bt
#0  0x00000000 in ?? ()
#1  0x08205ef4 in BackgroundWriterMain () at bgwriter.c:329
#2  0x080e2759 in AuxiliaryProcessMain (argc=2, argv=0x8047460) at
bootstrap.c:417
#3  0x0820bb87 in StartChildProcess (type=<value optimized out>) at
postmaster.c:4458
#4  0x0820deca in reaper (postgres_signal_arg=18) at postmaster.c:2418
#5  <signal handler called>
#6  0xb033e412 in SignalWaitinfo () from /lib/libc.so.3
#7  0xb0371ef2 in _poll () from /lib/libc.so.3
#8  0xb0372253 in select () from /lib/libc.so.3
#9  0x0820f46b in PostmasterMain (argc=8, argv=0x84b5100) at
postmaster.c:1414
#10 0x081b6bfb in main (argc=8, argv=0x84b5100) at main.c:188


(gdb) up
#1  0x08205ef4 in BackgroundWriterMain () at bgwriter.c:329
329     bgwriter.c: No such file or directory.
        in bgwriter.c


(gdb) print *bgwriter_context
$1 = {type = T_Invalid, methods = 0x84c0d00, parent = 0x10, firstchild =
0x0, nextchild = 0x0, name = 0x0}

Re: BUG #8579: CoreDump of background writer process

From
Tom Lane
Date:
rgr@cslab.de writes:
> [ bgwriter crashes here after reporting an error: ]
> (gdb) up
> #1  0x08205ef4 in BackgroundWriterMain () at bgwriter.c:329
> 329     bgwriter.c: No such file or directory.
>         in bgwriter.c
> 329@bgwriter.c: MemoryContextResetAndDeleteChildren(bgwriter_context);
>
> (gdb) print bgwriter_context
> $1 = (MemoryContext) 0x84c0c00
>
> (gdb) print *bgwriter_context
> $3 = {type = T_Invalid, methods = 0x84c0d00, parent = 0x10, firstchild =
> 0x0, nextchild = 0x0, name = 0x0}

The only theory I can think of for that is that the bgwriter_context
local variable wasn't correctly saved and restored by sigsetjmp ...
which IMO would be a compiler bug, but possibly a compiler weenie
would have a different opinion.  What compiler are you using, anyway,
and with what optimization options?

If this theory is correct, attaching "volatile" to the declaration of
bgwriter_context (line 202) ought to work around the problem.  Or
if that doesn't work, mark the variable static.

It'd also be interesting to know if this was the first error reported
by the bgwriter process, or if it successfully recovered from any
previous errors.  (Some trawling of your postmaster log should reveal
this.)  If the latter, it'd suggest there's something funny about
returning to the same sigsetjmp context multiple times; which would
perhaps be a bug worth filing with your OS vendor.

            regards, tom lane

Re: BUG #8579: CoreDump of background writer process

From
Alvaro Herrera
Date:
rgr@cslab.de wrote:

> We are using QNX 6.5 SP 1 and a port based on the pkgsrc.

I thought we no longer supported QNX ... in fact, I think we removed it
before the 8.2 release.  So what's this port you're using?  This
probably means a heavily patched source.

Author: Bruce Momjian <bruce@momjian.us>
Branch: master Release: REL8_2_BR [6f84b2da7] 2006-01-05 02:43:38 +0000

    Remove QNX file.

Author: Bruce Momjian <bruce@momjian.us>
Branch: master Release: REL8_2_BR [a1675649e] 2006-01-05 01:56:30 +0000

    Remove QNX port.


> ---1519657-2013-11-01 00:10:23 CET:ERROR:  could not seek to block 149256 in
> file "pg_tblspc/17639/PG_9.0_201008051/16387/28109.1": Interrupted function
> call
> ---1519657-2013-11-01 00:10:23 CET:CONTEXT:  writing block 149256 of
> relation pg_tblspc/17639/PG_9.0_201008051/16387/28109

This is also a bit funny.  Normally each segment is 1GB long, unless you
configure it otherwise; this means a segment has only 131072 blocks.
But you have at least 149256 blocks here.

I wonder if the problem is mishandling of signals -- i.e. perhaps the
port is at fault, or maybe it was right in 8.3 but we changed something
that would affect the port, and it wasn't properly updated to match.


> ---1507368-2013-11-01 00:11:58 CET:LOG:  background writer process (PID
> 1519657) was terminated by signal 11: segmentation violation
> ---1507368-2013-11-01 00:11:58 CET:LOG:  terminating any other active server
> processes
>
>
> contents of the dump-file:
>
>
> #0  0x00000000 in ?? ()
> #1  0x08205ef4 in BackgroundWriterMain ()
> #2  0x080e2759 in AuxiliaryProcessMain ()

Not very useful, is it :-(

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8579: CoreDump of background writer process

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I wonder if the problem is mishandling of signals -- i.e. perhaps the
> port is at fault, or maybe it was right in 8.3 but we changed something
> that would affect the port, and it wasn't properly updated to match.

The postmaster log looked like the problem was triggered by SIGHUP
arriving while the bgwriter was doing an lseek().  It's not usual
for seeks to be interruptable, though, unless maybe you're running
the database over NFS?  I tend to not trust that kind of arrangement
much, mainly because NFS exposes you to all sorts of poorly-tested
error recovery paths.  Like this one.

Anyway, in theory the bgwriter ought to be able to recover from
such an error.  Somehow the local state of BackgroundWriterMain
is getting messed up, though.

>> #0  0x00000000 in ?? ()
>> #1  0x08205ef4 in BackgroundWriterMain ()
>> #2  0x080e2759 in AuxiliaryProcessMain ()

> Not very useful, is it :-(

The OP did provide a stack trace with debug symbols, further down.

            regards, tom lane

Re: BUG #8579: CoreDump of background writer process

From
Rene Grün
Date:
alvherre@2ndquadrant.com wrote:


> I thought we no longer supported QNX ... in fact, I think we removed it before the 8.2 release.  So what's this port
you'reusing?  This probably means a heavily patched source. 

Yes we had to add some patches, but not as much as you are expecting. We were running the 8.3 release with the same
patchesfor a long time without these kinds of troubles. I don't had to add much more patches to get the new release
runningand passing the regression-test successfully . 

Patched Files:

patch-configure
patch-configure.in
patch-contrib_Makefile
patch-src_Makefile.shlib
patch-src_backend_Makefile
patch-src_backend_access_common_reloptions.c
patch-src_backend_libpq_pqsignal.c
patch-src_backend_port_Makefile
patch-src_backend_port_ipc__test.c
patch-src_backend_port_sysv__shmem.c
patch-src_backend_postmaster_postmaster.c
patch-src_backend_replication_libpqwalreceiver_Makefile
patch-src_backend_snowball_Makefile
patch-src_backend_tcop_postgres.c
patch-src_backend_utils_init_postinit.c
patch-src_backend_utils_mb_conversion__procs_proc.mk
patch-src_backend_utils_misc_guc.c
patch-src_backend_utils_misc_ps__status.c
patch-src_bin_pg__dump_pg__backup.h
patch-src_bin_pg__dump_pg__backup__archiver.c
patch-src_bin_pg__dump_pg__backup__archiver.h
patch-src_bin_pg__dump_pg__backup__tar.c
patch-src_bin_pg__dump_pg__dump.c
patch-src_bin_pg__dump_pg__dump.h
patch-src_bin_pg__dump_pg__dumpall.c
patch-src_include_pg__config__manual.h
patch-src_include_regex_regguts.h
patch-src_include_storage_s__lock.h
patch-src_include_tcop_tcopprot.h
patch-src_interfaces_ecpg_ecpglib_misc.c
patch-src_interfaces_ecpg_include_ecpglib.h
patch-src_interfaces_libpq_pqsignal.c
patch-src_pl_plpgsql_src_Makefile
patch-src_port_isinf.c
patch-src_test_regress_pg__regress.c
patch-src_test_regress_resultmap
patch-src_timezone_localtime.c
patch-src_timezone_private.h
patch-src_timezone_strftime.c
patch-src_timezone_zic.c

Files to be added:

GNUmakefile.libpq
contrib_dbmirrorc_AddTrigger.sql
contrib_dbmirrorc_DBMirror.pl
contrib_dbmirrorc_Makefile
contrib_dbmirrorc_MirrorSetup.sql
contrib_dbmirrorc_README.dbmirror
contrib_dbmirrorc_clean__pending.pl
contrib_dbmirrorc_pending.c
contrib_dbmirrorc_slaveDatabase.conf
dynloader-ltdl.h
src_backend_port_dynloader_qnx6.c
src_backend_port_dynloader_qnx6.h
src_backend_port_qnx6_Makefile
src_backend_port_qnx6_shm.c
src_backend_port_qnx6_shm.h
src_include_port_qnx6.h
src_makefiles_Makefile.qnx6
src_template_qnx6



> This is also a bit funny.  Normally each segment is 1GB long, unless you configure it otherwise; this means a segment
hasonly 131072 blocks. 
> But you have at least 149256 blocks here.

These are the changes from postgresql.conf, all other settings are commented-out:

unix_socket_directory = '/bac/sys/data'
tcp_keepalives_idle = 120
max_files_per_process = 200
vacuum_cost_delay = 20ms
checkpoint_segments = 6
log_destination = 'stderr'
logging_collector = on
log_directory = '/bac/sys/log'
log_filename = 'postgresql.log.%a'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_min_duration_statement = 60000
log_hostname = on
log_line_prefix = '%a-%d-%h-%p-%t:'
autovacuum = on
search_path = 'public,syscore,systab000,gltcore,glttab001'
bytea_output = 'escape'
datestyle = 'iso, mdy'
lc_messages = 'POSIX'
lc_monetary = 'POSIX'
lc_numeric = 'POSIX'
lc_time = 'POSIX'
max_locks_per_transaction = 100

I hope this helps...

Mit freundlichen Grüßen aus Krefeld,
With best regards from Krefeld,

CS-Lab GmbH
i. A. René Grün

E-Mail: rgr@cslab.de
Fon:    +49 2151 72949-0
Fax:    +49 2151 72949-9
---
CS-Lab GmbH (Creativ Software Labor GmbH)
Römerstr. 15
D-47809 Krefeld
Geschäftsführer: Dieter Schmitz
Registergericht Krefeld, HRB 12257, USt.-ID: DE 263 834 180




Re: BUG #8579: CoreDump of background writer process

From
Rene Grün
Date:
Thank you very much for your help.

I will try to give more information.

We are using gcc 4.4.2 provided by QNX. It is the default-compiler for this system.
I have compiled the port in a pkgsrc-environment, so I can't tell you which optimizations are used.


There are no other matches for "background" in the logfiles.


We are using a normal qnx4-filesystem directly on the harddisc (ahci).

The SIGHUP only appears in the first example.


>From the QNX-documentation for the function lseek:

Classification:

lseek() is POSIX 1003.1;

Safety:
Cancellation point    No
Interrupt handler    No
Signal handler        Yes
Thread            Yes




Mit freundlichen Grüßen aus Krefeld,
With best regards from Krefeld,

CS-Lab GmbH
i. A. René Grün

E-Mail: rgr@cslab.de
Fon:    +49 2151 72949-0
Fax:    +49 2151 72949-9
---
CS-Lab GmbH (Creativ Software Labor GmbH)
Römerstr. 15
D-47809 Krefeld
Geschäftsführer: Dieter Schmitz
Registergericht Krefeld, HRB 12257, USt.-ID: DE 263 834 180



-----Ursprüngliche Nachricht-----
Von: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Gesendet: Mittwoch, 6. November 2013 16:57
An: Alvaro Herrera
Cc: rgr@cslab.de; pgsql-bugs@postgresql.org
Betreff: Re: [BUGS] BUG #8579: CoreDump of background writer process

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I wonder if the problem is mishandling of signals -- i.e. perhaps the
> port is at fault, or maybe it was right in 8.3 but we changed
> something that would affect the port, and it wasn't properly updated to match.

The postmaster log looked like the problem was triggered by SIGHUP arriving while the bgwriter was doing an lseek().
It'snot usual for seeks to be interruptable, though, unless maybe you're running the database over NFS?  I tend to not
trustthat kind of arrangement much, mainly because NFS exposes you to all sorts of poorly-tested error recovery paths.
Likethis one. 

Anyway, in theory the bgwriter ought to be able to recover from such an error.  Somehow the local state of
BackgroundWriterMainis getting messed up, though. 

>> #0  0x00000000 in ?? ()
>> #1  0x08205ef4 in BackgroundWriterMain ()
>> #2  0x080e2759 in AuxiliaryProcessMain ()

> Not very useful, is it :-(

The OP did provide a stack trace with debug symbols, further down.
        regards, tom lane



BUG #8579: CoreDump of background writer process

From
Rene Grün
Date:
No more ideas, what the problem may be?

regards, René Grün