shared memory corruption - Mailing list pgsql-bugs

From Todd Nemanich
Subject shared memory corruption
Date
Msg-id 3EC14F9A.2060403@twopunks.org
Whole thread Raw
Responses Re: shared memory corruption  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
I filled out the template as asked. I'm not certain where the real bug
is with this, but if anyone has seen this, or can give some insight as
to where to look, I would appreciate it.

============================================================================
                         POSTGRESQL BUG REPORT TEMPLATE
============================================================================


Your name               :       Todd Nemanich
Your email address      :       todd@twopunks.org


System Configuration
---------------------
   Architecture (example: Intel Pentium)         :       4x Intel Xeon

   Operating System (example: Linux 2.0.26 ELF)  :       Linux 2.4.19

   PostgreSQL version (example: PostgreSQL-7.3.1):   PostgreSQL-7.3.1

   Compiler used (example:  gcc 2.95.2)          :       ? (PGDG 7.3.1
rpms on RH 7.3)


Please enter a FULL description of your problem:
------------------------------------------------
My postgresql DB dropped into recovery mode, but failed to restart.
Typically, 600-800 backends are running at any time.
Below are some excerpts from the postgres.log:

May 13 14:01:17 db3 postgres[2618]: [1] LOG:  server process (pid 14721)
was terminated by signal 6
May 13 14:01:17 db3 postgres[2618]: [2] LOG:  terminating any other
active server processes
May 13 14:01:17 db3 postgres[15044]: [1-1] WARNING:  Message from
PostgreSQL backend:
May 13 14:01:17 db3 postgres[15044]: [1-2] ^IThe Postmaster has informed
me that some other backend
May 13 14:01:17 db3 postgres[15044]: [1-3] ^Idied abnormally and
possibly corrupted shared memory.
May 13 14:01:17 db3 postgres[15044]: [1-4] ^II have rolled back the
current transaction and am
May 13 14:01:17 db3 postgres[15044]: [1-5] ^Igoing to terminate your
database system connection and exit.
May 13 14:01:17 db3 postgres[15044]: [1-6] ^IPlease reconnect to the
database system and repeat your query.
May 13 14:01:17 db3 postgres[15046]: [1-1] WARNING:  Message from
PostgreSQL backend:
May 13 14:01:17 db3 postgres[15046]: [1-2] ^IThe Postmaster has informed
me that some other backend
May 13 14:01:17 db3 postgres[15031]: [1-1] WARNING:  Message from
PostgreSQL backend:
May 13 14:01:17 db3 postgres[15046]: [1-3] ^Idied abnormally and
possibly corrupted shared memory.
May 13 14:01:17 db3 postgres[14650]: [1-1] WARNING:  Message from
PostgreSQL backend:
May 13 14:01:17 db3 postgres[15046]: [1-4] ^II have rolled back the
current transaction and am
May 13 14:01:17 db3 postgres[15042]: [1-1] WARNING:  Message from
PostgreSQL backend:
May 13 14:01:17 db3 postgres[15032]: [1-1] WARNING:  Message from
PostgreSQL backend:

<skip a couple thousand lines>

May 13 14:30:54 db3 postgres[2100]: [3] FATAL:  The database system is
in recovery mode
May 13 14:30:54 db3 postgres[2132]: [3] FATAL:  The database system is
in recovery mode
May 13 14:30:54 db3 postgres[2618]: [3] LOG:  fast shutdown request
May 13 14:30:54 db3 postgres[2618]: [4] LOG:  all server processes
terminated; reinitializing shared memory and semaphores
May 13 14:30:54 db3 postgres[2139]: [5] FATAL:  The database system is
shutting down
May 13 14:30:54 db3 postgres[2136]: [5] LOG:  database system was
interrupted at 2003-05-13 14:00:10 EDT
May 13 14:30:54 db3 postgres[2138]: [5] FATAL:  The database system is
shutting down
May 13 14:30:54 db3 postgres[2137]: [5] FATAL:  The database system is
shutting down
May 13 14:30:54 db3 postgres[2140]: [5] FATAL:  The database system is
shutting down
May 13 14:30:54 db3 postgres[2136]: [6] LOG:  checkpoint record is at
ED/A7D7CD08
May 13 14:30:54 db3 postgres[2141]: [5] FATAL:  The database system is
shutting down
May 13 14:30:54 db3 postgres[2136]: [7] LOG:  redo record is at
ED/A7BBEF88; undo record is at 0/0; shutdown FALSE
May 13 14:30:54 db3 postgres[2136]: [8] LOG:  next transaction id:
754449278; next oid: 33734849
May 13 14:30:54 db3 postgres[2136]: [9] LOG:  database system was not
properly shut down; automatic recovery in progress
May 13 14:30:54 db3 postgres[2136]: [10] LOG:  redo starts at ED/A7BBEF88
May 13 14:30:54 db3 postgres[2142]: [5] FATAL:  The database system is
shutting down

<skip the shutdown messages>

May 13 14:31:22 db3 postgres[2816]: [5] FATAL:  The database system is
shutting down
May 13 14:31:22 db3 postgres[2758]: [6] LOG:  recycled transaction log
file 000000ED000000A9
May 13 14:31:22 db3 postgres[2758]: [7] LOG:  recycled transaction log
file 000000ED000000AA
May 13 14:31:22 db3 postgres[2758]: [8] LOG:  recycled transaction log
file 000000ED000000AB
May 13 14:31:22 db3 postgres[2758]: [9] LOG:  recycled transaction log
file 000000ED000000A7
May 13 14:31:22 db3 postgres[2758]: [10] LOG:  recycled transaction log
file 000000ED000000A8
May 13 14:31:22 db3 postgres[2758]: [11] LOG:  database system is shut down
May 13 14:31:36 db3 postgres[2877]: [1] FATAL:  The database system is
starting up
May 13 14:31:36 db3 postgres[2876]: [1] LOG:  database system was shut
down at 2003-05-13 14:31:22 EDT
May 13 14:31:36 db3 postgres[2876]: [2] LOG:  checkpoint record is at
ED/ACB32480
May 13 14:31:36 db3 postgres[2876]: [3] LOG:  redo record is at
ED/ACB32480; undo record is at 0/0; shutdown TRUE
May 13 14:31:36 db3 postgres[2876]: [4] LOG:  next transaction id:
754504089; next oid: 33734849
May 13 14:31:36 db3 postgres[2878]: [1] FATAL:  The database system is
starting up
May 13 14:31:36 db3 postgres[2876]: [5] LOG:  database system is ready




Please describe a way to repeat the problem.   Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------
No idea. Any suggestions as to where to look for the cause would be
appreciated.




If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------

pgsql-bugs by date:

Previous
From: "Donald Fraser"
Date:
Subject: Re: Bug in AdjustIntervalForTypmod function
Next
From: Tom Lane
Date:
Subject: Re: shared memory corruption