Thread: simple query terminated by signal 11

simple query terminated by signal 11

From

"Thomas Chille"

Date:

19 June 2006, 17:06:26

Hi List,

i run in to an error while dumping a db.

after investigating it, i found a possible corrupted table. but i am not sure.
and i dont know how i can repair it? could it be a harddrive error?

Here are the logs:

# all fine: SELECT * FROM hst_sales_report WHERE id = 5078866

[6208 / 2006-06-19 18:46:17 CEST]LOG:  00000: connection received:
host=[local] port=
[6208 / 2006-06-19 18:46:17 CEST]LOCATION:  BackendRun, postmaster.c:2679
[6208 / 2006-06-19 18:46:17 CEST]LOG:  00000: connection authorized:
user=postgres database=backoffice_db
[6208 / 2006-06-19 18:46:17 CEST]LOCATION:  BackendRun, postmaster.c:2751
[6208 / 2006-06-19 18:46:17 CEST]LOG:  00000: statement: SELECT * FROM
hst_sales_report WHERE id = 5078866
[6208 / 2006-06-19 18:46:17 CEST]LOCATION:  pg_parse_query, postgres.c:526
[6208 / 2006-06-19 18:46:18 CEST]LOG:  00000: duration: 117.638 ms
[6208 / 2006-06-19 18:46:18 CEST]LOCATION:  exec_simple_query, postgres.c:1076
[6208 / 2006-06-19 18:46:18 CEST]LOG:  00000: disconnection: session
time: 0:00:00.12 user=postgres database=backoffice_db host=[local]
port=
[6208 / 2006-06-19 18:46:18 CEST]LOCATION:  log_disconnections, postgres.c:3447

# now the error: SELECT * FROM hst_sales_report WHERE id = 5078867

[6216 / 2006-06-19 18:46:23 CEST]LOG:  00000: connection received:
host=[local] port=
[6216 / 2006-06-19 18:46:23 CEST]LOCATION:  BackendRun, postmaster.c:2679
[6216 / 2006-06-19 18:46:23 CEST]LOG:  00000: connection authorized:
user=postgres database=backoffice_db
[6216 / 2006-06-19 18:46:23 CEST]LOCATION:  BackendRun, postmaster.c:2751
[6216 / 2006-06-19 18:46:23 CEST]LOG:  00000: statement: SELECT * FROM
hst_sales_report WHERE id = 5078867
[6216 / 2006-06-19 18:46:23 CEST]LOCATION:  pg_parse_query, postgres.c:526
[3762 / 2006-06-19 18:46:23 CEST]LOG:  00000: server process (PID
6216) was terminated by signal 11
[3762 / 2006-06-19 18:46:23 CEST]LOCATION:  LogChildExit, postmaster.c:2358
[3762 / 2006-06-19 18:46:23 CEST]LOG:  00000: terminating any other
active server processes
[3762 / 2006-06-19 18:46:23 CEST]LOCATION:  HandleChildCrash, postmaster.c:2251
[3985 / 2006-06-19 18:46:23 CEST]WARNING:  57P02: terminating
connection because of crash of another server process
[3985 / 2006-06-19 18:46:23 CEST]DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly
corrupted shared memory.
[3985 / 2006-06-19 18:46:23 CEST]HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
[3985 / 2006-06-19 18:46:23 CEST]LOCATION:  quickdie, postgres.c:1945
[3762 / 2006-06-19 18:46:23 CEST]LOG:  00000: all server processes
terminated; reinitializing
[3762 / 2006-06-19 18:46:23 CEST]LOCATION:  reaper, postmaster.c:2150
[6217 / 2006-06-19 18:46:23 CEST]LOG:  00000: database system was
interrupted at 2006-06-19 18:42:49 CEST
[6217 / 2006-06-19 18:46:23 CEST]LOCATION:  StartupXLOG, xlog.c:4094
[6217 / 2006-06-19 18:46:23 CEST]LOG:  00000: checkpoint record is at
11/3E77AB1C
[6217 / 2006-06-19 18:46:23 CEST]LOCATION:  StartupXLOG, xlog.c:4163
[6217 / 2006-06-19 18:46:23 CEST]LOG:  00000: redo record is at
11/3E774940; undo record is at 0/0; shutdown FALSE
[6217 / 2006-06-19 18:46:23 CEST]LOCATION:  StartupXLOG, xlog.c:4191
[6217 / 2006-06-19 18:46:23 CEST]LOG:  00000: next transaction ID:
3899415; next OID: 46429694
[6217 / 2006-06-19 18:46:23 CEST]LOCATION:  StartupXLOG, xlog.c:4194
[6217 / 2006-06-19 18:46:23 CEST]LOG:  00000: database system was not
properly shut down; automatic recovery in progress
[6217 / 2006-06-19 18:46:23 CEST]LOCATION:  StartupXLOG, xlog.c:4250
[6217 / 2006-06-19 18:46:23 CEST]LOG:  00000: redo starts at 11/3E774940
[6217 / 2006-06-19 18:46:23 CEST]LOCATION:  StartupXLOG, xlog.c:4287
[6217 / 2006-06-19 18:46:23 CEST]LOG:  00000: record with zero length
at 11/3E77AD20
[6217 / 2006-06-19 18:46:23 CEST]LOCATION:  ReadRecord, xlog.c:2496
[6217 / 2006-06-19 18:46:23 CEST]LOG:  00000: redo done at 11/3E77ACF8
[6217 / 2006-06-19 18:46:23 CEST]LOCATION:  StartupXLOG, xlog.c:4345
[6217 / 2006-06-19 18:46:23 CEST]LOG:  00000: database system is ready
[6217 / 2006-06-19 18:46:23 CEST]LOCATION:  StartupXLOG, xlog.c:4557

Can anyone help me, please?

regards,
thomas!

Re: simple query terminated by signal 11

From

"Qingqing Zhou"

Date:

20 June 2006, 02:15:36

""Thomas Chille"" <thomas.chille@gmail.com> wrote
> Hi List,
>
> i run in to an error while dumping a db.
>
> after investigating it, i found a possible corrupted table. but i am not
sure.
> and i dont know how i can repair it? could it be a harddrive error?
>
>
> # now the error: SELECT * FROM hst_sales_report WHERE id = 5078867
>
> [6216 / 2006-06-19 18:46:23 CEST]LOG:  00000: connection received:
> host=[local] port=
> [6216 / 2006-06-19 18:46:23 CEST]LOCATION:  BackendRun, postmaster.c:2679
> [6216 / 2006-06-19 18:46:23 CEST]LOG:  00000: connection authorized:
> user=postgres database=backoffice_db
> [6216 / 2006-06-19 18:46:23 CEST]LOCATION:  BackendRun, postmaster.c:2751
> [6216 / 2006-06-19 18:46:23 CEST]LOG:  00000: statement: SELECT * FROM
> hst_sales_report WHERE id = 5078867
> [6216 / 2006-06-19 18:46:23 CEST]LOCATION:  pg_parse_query, postgres.c:526
> [3762 / 2006-06-19 18:46:23 CEST]LOG:  00000: server process (PID
> 6216) was terminated by signal 11
> [3762 / 2006-06-19 18:46:23 CEST]LOCATION:  LogChildExit,
postmaster.c:2358
> [3762 / 2006-06-19 18:46:23 CEST]LOG:  00000: terminating any other
> active server processes
> [3762 / 2006-06-19 18:46:23 CEST]LOCATION:  HandleChildCrash,
postmaster.c:2251
> [3985 / 2006-06-19 18:46:23 CEST]WARNING:  57P02: terminating
> connection because of crash of another server process
> [3985 / 2006-06-19 18:46:23 CEST]DETAIL:  The postmaster has commanded
> this server process to roll back the current transaction and exit,
> because another server process exited abnormally and possibly
> corrupted shared memory.

Which verison are you using? In any way, except a random hardware error, we
expect Postgres to be able to detect and report the problem instead of a
silent core dump.  So can you gather the core dump and post it here?

Regards,
Qingqing

Re: simple query terminated by signal 11

From

"Thomas Chille"

Date:

20 June 2006, 12:06:26

Hi Qingqing,

thanks for your reply!

The postgresql version is 8.0.4 and runs on a debian based linux
server with kernel  2.6.11.2.

I never dealed with a core dump before. but  after setting "ulimit -c
1024" i got it.

I don't know how to post it, because the size is 1,5 MB?! I try to
attch it as gzip.

I also could not install dbg on the erroneous system, so i tried to
examine the core dump on another machine (gentoo) with  postgres 8.0.4
anf got the following output:

spoonpc01 ~ # gdb /usr/bin/postgres core
GNU gdb 6.4
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".


warning: core file may not match specified executable file.
(no debugging symbols found)
Core was generated by `postgres: postgres backoffice_db [local] SELECT        '
.
Program terminated with signal 11, Segmentation fault.
#0  0x080753c2 in DataFill ()
(gdb) where
#0  0x080753c2 in DataFill ()
#1  0xb74253d4 in ?? ()
#2  0x0000001d in ?? ()
#3  0x08356fa8 in ?? ()
#4  0x08379420 in ?? ()
#5  0x00000000 in ?? ()
(gdb)

What i can say too, is that i can reproduce the error  everytime with
the same query.

thanks in advonce

On 6/20/06, Qingqing Zhou <zhouqq@cs.toronto.edu> wrote:
>
> ""Thomas Chille"" <thomas.chille@gmail.com> wrote
> > Hi List,
> >
> > i run in to an error while dumping a db.
> >
> > after investigating it, i found a possible corrupted table. but i am not
> sure.
> > and i dont know how i can repair it? could it be a harddrive error?
> >
> >
> > # now the error: SELECT * FROM hst_sales_report WHERE id = 5078867
> >
> > [6216 / 2006-06-19 18:46:23 CEST]LOG:  00000: connection received:
> > host=[local] port=
> > [6216 / 2006-06-19 18:46:23 CEST]LOCATION:  BackendRun, postmaster.c:2679
> > [6216 / 2006-06-19 18:46:23 CEST]LOG:  00000: connection authorized:
> > user=postgres database=backoffice_db
> > [6216 / 2006-06-19 18:46:23 CEST]LOCATION:  BackendRun, postmaster.c:2751
> > [6216 / 2006-06-19 18:46:23 CEST]LOG:  00000: statement: SELECT * FROM
> > hst_sales_report WHERE id = 5078867
> > [6216 / 2006-06-19 18:46:23 CEST]LOCATION:  pg_parse_query, postgres.c:526
> > [3762 / 2006-06-19 18:46:23 CEST]LOG:  00000: server process (PID
> > 6216) was terminated by signal 11
> > [3762 / 2006-06-19 18:46:23 CEST]LOCATION:  LogChildExit,
> postmaster.c:2358
> > [3762 / 2006-06-19 18:46:23 CEST]LOG:  00000: terminating any other
> > active server processes
> > [3762 / 2006-06-19 18:46:23 CEST]LOCATION:  HandleChildCrash,
> postmaster.c:2251
> > [3985 / 2006-06-19 18:46:23 CEST]WARNING:  57P02: terminating
> > connection because of crash of another server process
> > [3985 / 2006-06-19 18:46:23 CEST]DETAIL:  The postmaster has commanded
> > this server process to roll back the current transaction and exit,
> > because another server process exited abnormally and possibly
> > corrupted shared memory.
>
> Which verison are you using? In any way, except a random hardware error, we
> expect Postgres to be able to detect and report the problem instead of a
> silent core dump.  So can you gather the core dump and post it here?
>
> Regards,
> Qingqing
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Re: simple query terminated by signal 11

From

"Qingqing Zhou"

Date:

21 June 2006, 01:56:20

""Thomas Chille"" <thomas@chille.de> wrote
>
> I don't know how to post it, because the size is 1,5 MB?! I try to
> attch it as gzip.
>

No ...  I mean the "bt" result of the core dump.

$gdb <postgres_exe_path> -c <core_file_name>
bt

> .
> Program terminated with signal 11, Segmentation fault.
> #0  0x080753c2 in DataFill ()
> (gdb) where
> #0  0x080753c2 in DataFill ()
> #1  0xb74253d4 in ?? ()
> #2  0x0000001d in ?? ()
> #3  0x08356fa8 in ?? ()
> #4  0x08379420 in ?? ()
> #5  0x00000000 in ?? ()
> (gdb)
>

Since it is repeatable in your machine, you can compile a new postgres
version with "--enable-cassert" (enable assertions in code) and
"--enable-debug"  (enable gcc debug support) configuration. Then run it on
your data and "bt" the core dump.

Regards,
Qingqing

Re: simple query terminated by signal 11

From

"Thomas Chille"

Date:

22 June 2006, 13:59:50

Thanks for your Tipps!

> Since it is repeatable in your machine, you can compile a new postgres
> version with "--enable-cassert" (enable assertions in code) and
> "--enable-debug"  (enable gcc debug support) configuration. Then run it on
> your data and "bt" the core dump.

I try to found out the reason for that behavoir.

For now i could drop this damaged table und restore it from an older
backup, so all works fine again.

regards,
thomas!