[7.4.1] signal 11 while accesing DB - Mailing list pgsql-general

From Marcin
Subject [7.4.1] signal 11 while accesing DB
Date
Msg-id 20050124174606Z2660446-20169+489@kps3.test.onet.pl
Whole thread Raw
Responses Re: [7.4.1] signal 11 while accesing DB
Re: [7.4.1] signal 11 while accesing DB
List pgsql-general
Hi,
Yesterday morning one of our PostgreSQL engines stopped unexpectedly,
with the following log:

postgres[4192]: [5105-1] LOG:  server process (PID 26678) was terminated by signal 11
postgres[4192]: [5106-1] LOG:  terminating any other active server processes
postgres[4192]: [5107-1] LOG:  all server processes terminated; reinitializing
postgres[26717]: [5108-1] LOG:  database system was interrupted at 2005-01-23 07:23:37 CET
postgres[26717]: [5109-1] LOG:  checkpoint record is at 31B/FE2C9BA4
postgres[26717]: [5110-1] LOG:  redo record is at 31B/FE00A484; undo record is at 0/0; shutdown FALSE
postgres[26717]: [5111-1] LOG:  next transaction ID: 390435693; next OID: 678576128
postgres[26717]: [5112-1] LOG:  database system was not properly shut down; automatic recovery in progress
postgres[26717]: [5113-1] LOG:  redo starts at 31B/FE00A484
postgres[26717]: [5114-1] LOG:  record with zero length at 31B/FEFCE3A8
postgres[26717]: [5115-1] LOG:  redo done at 31B/FEFCE384
postgres[26717]: [5116-1] LOG:  removing transaction log file "0000031B000000FB"
postgres[26717]: [5117-1] LOG:  removing transaction log file "0000031B000000FC"
postgres[26717]: [5118-1] LOG:  removing transaction log file "0000031B000000FD"
postgres[26717]: [5119-1] LOG:  database system is ready

and then keep crashing as client connects to one, specified database:

postgres[4192]: [5108-1] LOG:  server process (PID 4501) was terminated by signal 11
postgres[4192]: [5109-1] LOG:  terminating any other active server processes
postgres[4192]: [5110-1] LOG:  all server processes terminated; reinitializing
postgres[4621]: [5111-1] LOG:  database system was interrupted at 2005-01-23 07:25:32 CET
postgres[4621]: [5112-1] LOG:  checkpoint record is at 31B/FEFCE3A8
postgres[4621]: [5113-1] LOG:  redo record is at 31B/FEFCE3A8; undo record is at 0/0; shutdown TRUE
postgres[4621]: [5114-1] LOG:  next transaction ID: 390439982; next OID: 678592512
postgres[4621]: [5115-1] LOG:  database system was not properly shut down; automatic recovery in progress
postgres[4621]: [5116-1] LOG:  record with zero length at 31B/FEFCE3E8
postgres[4621]: [5117-1] LOG:  redo is not required
postgres[4621]: [5118-1] LOG:  database system is ready
postgres[4192]: [5111-1] LOG:  server process (PID 16866) was terminated by signal 11

I find out two ways to crash the server:
1. Connecting to the one, specified database ("data")
2. Creating another database, and trying to access it

Other databases works find, I can add or remove records with no problem.

I copied the PGDATA to another machine for debugging and restored the PGDATA from night dump. Everything went smoothly.
On debugging machine with exactly the same version of PostgreSQL the
backend keep crashing:

[16350] DEBUG:  /usr/lib/postgresql/bin/postmaster: PostmasterMain: initial environ dump:
[16350] DEBUG:  -----------------------------------------
[16350] DEBUG:      PWD=/DEBUG/lib/postgres/data
[16350] DEBUG:      USER=postgres
[16350] DEBUG:      MAIL=/root/Mailbox
[16350] DEBUG:      LANG=pl_PL
[16350] DEBUG:      LOGNAME=root
[16350] DEBUG:      SHLVL=4
[16350] DEBUG:      SHELL=/bin/sh
[16350] DEBUG:      HOME=/var/lib/postgres
[16350] DEBUG:      TERM=linux
[16350] DEBUG:      PGDATA=/DEBUG/lib/postgres/data
[16350] DEBUG:      PATH=/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games
[16350] DEBUG:      LESSCHARSET=latin1
[16350] DEBUG:      SSH_TTY=/dev/pts/0
[16350] DEBUG:      _=/usr/lib/postgresql/bin/postmaster
[16350] DEBUG:  -----------------------------------------
[16350] DEBUG:  found "/usr/lib/postgresql/bin/postgres" using argv[0]
[16350] LOG:  could not create IPv6 socket: Rodzina adresów nie obsługiwana przez protokół
[16350] DEBUG:  invoking IpcMemoryCreate(size=31940608)
[16350] DEBUG:  found "/usr/lib/postgresql/bin/postmaster" using argv[0]
[16356] LOG:  database system was shut down at 2005-01-23 14:36:26 CET
[16356] LOG:  checkpoint record is at 31C/1F7C150
[16356] LOG:  redo record is at 31C/1F7C150; undo record is at 0/0; shutdown TRUE
[16356] LOG:  next transaction ID: 390466146; next OID: 679780352
[16356] LOG:  database system is ready
[16356] DEBUG:  proc_exit(0)
[16356] DEBUG:  shmem_exit(0)
[16356] DEBUG:  exit(0)
[16350] DEBUG:  reaping dead processes
[16350] DEBUG:  forked new backend, pid=16435 socket=8
[16435] DEBUG:  /usr/lib/postgresql/bin/postmaster child[16435]: starting with (
[16435] DEBUG:      postgres
[16435] DEBUG:      -v196608
[16435] DEBUG:      -p
[16435] DEBUG:      template1
[16435] DEBUG:  )
[16435] DEBUG:  InitPostgres
[16435] DEBUG:  StartTransactionCommand
[16435] DEBUG:  CommitTransactionCommand
[16435] DEBUG:  proc_exit(0)
[16435] DEBUG:  shmem_exit(0)
[16435] DEBUG:  exit(0)
[16350] DEBUG:  reaping dead processes
[16350] DEBUG:  child process (PID 16435) exited with exit code 0
[16350] DEBUG:  forked new backend, pid=16510 socket=8
[16510] DEBUG:  /usr/lib/postgresql/bin/postmaster child[16510]: starting with (
[16510] DEBUG:      postgres
[16510] DEBUG:      -v196608
[16510] DEBUG:      -p
[16510] DEBUG:      data
[16510] DEBUG:  )
[16510] DEBUG:  InitPostgres
[16350] DEBUG:  reaping dead processes
[16350] DEBUG:  child process (PID 16510) was terminated by signal 11
[16350] LOG:  server process (PID 16510) was terminated by signal 11
[16350] LOG:  terminating any other active server processes
[16350] LOG:  all server processes terminated; reinitializing
[16350] DEBUG:  shmem_exit(0)
[16350] DEBUG:  invoking IpcMemoryCreate(size=31940608)
[16516] LOG:  database system was interrupted at 2005-01-24 08:54:48 CET
[16516] LOG:  checkpoint record is at 31C/1F7C150
[16516] LOG:  redo record is at 31C/1F7C150; undo record is at 0/0; shutdown TRUE
[16516] LOG:  next transaction ID: 390466146; next OID: 679780352
[16516] LOG:  database system was not properly shut down; automatic recovery in progress
[16516] LOG:  record with zero length at 31C/1F7C190
[16516] LOG:  redo is not required
[16516] LOG:  database system is ready

the postgres process left the core, and the backtrace in gdb shows:

(gdb) bt
#0  0x081b4e79 in GetRawDatabaseInfo ()
#1  0x081b4865 in InitPostgres ()
#2  0x08154741 in PostgresMain ()
#3  0x0813437d in ClosePostmasterPorts ()
#4  0x08133c0f in ClosePostmasterPorts ()
#5  0x08132519 in PostmasterMain ()
#6  0x08131f50 in PostmasterMain ()
#7  0x081088df in main ()
#8  0x001ef14f in __libc_start_main () from /lib/libc.so.6

Some specs:
Main machine: Dual Xeon 2.8GHz, 2GB ECC RAM, two SCSI 73GBs 15K RPM.
Both machines are Debian Woody with 2.4 kernels, and own compilation of
PostgreSQL 7.4.1.
The server is running for nearly two years now, with PostgreSQL for all
the time, and as it's the first problem I encountered, I don't suspect
it's a hardware fault.

Thanks for your help,
--
Marcin

pgsql-general by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: Ident authentication SOLVED
Next
From: Harald Fuchs
Date:
Subject: Re: on update / on delete performance of foreign keys