Thread: postmaster crash

postmaster crash

From
"Michael Beckstette"
Date:
Hi,

from time to time my postmaster crashes with the below mentioned errormessage.

I am using PostgreSQL 7.1.2 on sparc-sun-solaris2.5.1, compiled by GCC 2.95 on
a 4CPU/4GB UltraSparc II system.

The crash happened during a VACUUM ANALYZE which i perform every 30 minutes.
At the point of the crash i had several open client connections which tried to
fill one table (table1),by a 'COPY table1 FROM stdin.....'  simultaniously.

LOG:
DEBUG:  --Relation cluster_nodes--
DEBUG:  Pages 52: Changed 0, reaped 52, Empty 0, New 0; Tup 23: Vac 1914,
Keep/VTL 0/0, Crash 0, UnUsed 28, MinLen 120, MaxLen 372; Re-using: Free/Avail.
Space 411044/405288; EndEmpty/Avail. Pages 0/51. CPU 0.00s/0.00u sec.
DEBUG:  Index node_index: Pages 14; Tuples 23: Deleted 1914. CPU 0.00s/0.08u
sec.
DEBUG:  Rel cluster_nodes: Pages: 52 --> 1; Tuple(s) moved: 23. CPU 0.00s/0.05u
sec.
DEBUG:  Index node_index: Pages 14; Tuples 23: Deleted 23. CPU 0.00s/0.00u sec.
DEBUG:  MoveOfflineLogs: remove 0000000D000000A3
FATAL 2:  MoveOfflineLogs: cannot read xlog dir: Invalid argument
Server process (pid 11560) exited with status 512 at Mon Mar 25 22:00:53 2002
Terminating any active server processes...
NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend  died abnormally
and possibly corrupted shared memory.
        I have rolled back the current transaction and am       going to
terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend  died abnormally
and possibly corrupted shared memory.
        I have rolled back the current transaction and am       going to
terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
'
'
'
'
'
'
Server processes were terminated at Mon Mar 25 22:00:53 2002
Reinitializing shared memory and semaphores
DEBUG:  database system was interrupted at 2002-03-25 22:00:53 MET
DEBUG:  CheckPoint record at (13, 2751970248)
DEBUG:  Redo record at (13, 2751936936); Undo record at (0, 0); Shutdown FALSE
DEBUG:  NextTransactionId: 254232066; NextOid: 52967926
DEBUG:  database system was not properly shut down; automatic recovery in
progress...
DEBUG:  redo starts at (13, 2751936936)
DEBUG:  ReadRecord: record with zero len at (13, 2751970312)
DEBUG:  redo done at (13, 2751970248)
DEBUG:  database system is in production state

Thanks for an almost :) excellent  product.



Regards
Michael Beckstette

Re: postmaster crash

From
Tom Lane
Date:
"Michael Beckstette" <mbeckste@TechFak.Uni-Bielefeld.DE> writes:
> from time to time my postmaster crashes with the below mentioned errormessage.

> DEBUG:  MoveOfflineLogs: remove 0000000D000000A3
> FATAL 2:  MoveOfflineLogs: cannot read xlog dir: Invalid argument

Hmm.  I cannot see a reason for the xlog directory to be unreadable,
especially not if it's readable most of the time.

The code that is reporting this failure is in MoveOfflineLogs() in
src/backend/access/transam/xlog.c.  You could maybe add some
additional debug logging there to try to understand what is going wrong
... but I sure don't see any reason for readdir() to fail, especially
not if it's succeeded on previous calls --- and your log indicates it's
succeeded at least once.

Are you perhaps running Postgres over an NFS mount?  That's widely
considered unreliable.

            regards, tom lane

Re: postmaster crash

From
"Michael Beckstette"
Date:
Hi Tom,

thanx for your quick response.
Yes the DB is on a NFS mounted volume, but i have checked my logs: There are no
nfs error messages. As i said these error occures only from time to time (3
times in the last 6 month...I guess).And the only thing i recognized (except
the same errormessage (xlogdir)) everytime was, that it happened when
transfering large amounts of data (10 clients simultaniously, each ca. 100MB to
one single table using COPY).


Michael