Thread: FATAL 2: XLogWrite: write request 3/39000000 is past end of log 3/38FC2000 (fwd)

Hello,

I've recenly  been working with 7.2.1 and been having problems with
corrupting the DB when loading a new database with a Perl/DBI script doing
large numbers of inserts to a single table.  I was also seeing the problem
with 7.2

Environment - Linux RH7.2, Athlon, stock RH kernel.

After the back end process dies I try to reconnect and a simple query
returns:

database=# select count(*) from loadtable;
ERROR:  XLogFlush: request 800003/58ECB1A8 is not satisfied --- flushed
only to 3/86002234

Delving further, I've identified the initial error in the log.  Here's an
excerpt:

6449104 DEBUG:  XLogFlush: request 800003/168380F4; write 3/38FFDE04; flush 3/38FFDE04
6449105
6449106 FATAL 2:  XLogWrite: write request 3/39000000 is past end of log 3/38FC2000
6449107 DEBUG:  XLogFlush: request 3/1683CD1C; write 0/0; flush 0/0
6449108
6449109 DEBUG:  XLogFlush: request 3/168418C8; write 3/38FFDE04; flush 3/38FFDE04
6449110
6449111 DEBUG:  XLogFlush: request 3/168465C0; write 3/38FFDE04; flush 3/38FFDE04
6449112
6449113 DEBUG:  XLogFlush: request 3/1684CB2C; write 3/38FFDE04; flush 3/38FFDE04
6449114
6449115 DEBUG:  XLogFlush: request 3/16852EAC; write 3/38FFDE04; flush 3/38FFDE04
6449116
6449117 DEBUG:  XLogFlush: request 3/16857B8C; write 3/38FFDE04; flush 3/38FFDE04
6449118
6449119 DEBUG:  XLogFlush: request 3/38FFDF34; write 3/38FFDE04; flush 3/38FFDE04
6449120
6449121 FATAL 2:  XLogWrite: write request 3/38FFDE04 is past end of log 3/38FC2000
6449122 DEBUG:  INSERT @ 3/38FFE8D8: prev 3/38FFE89C; xprev 3/38FFE89C; xid 11559295: Btree - insert: node
16556/43072044;tid 2290/176 
6449123 DEBUG:  INSERT @ 3/38FFE91C: prev 3/38FFE8D8; xprev 3/38FFE8D8; xid 11559295: Btree - insert: node
16556/43072045;tid 3636/219 
6449124 DEBUG:  INSERT @ 3/38FFE958: prev 3/38FFE91C; xprev 3/38FFE91C; xid 11559295: Heap - insert: node
16556/43072041;tid 23894/22 
6449125 DEBUG:  INSERT @ 3/38FFE9F8: prev 3/38FFE958; xprev 3/38FFE958; xid 11559295: Btree - insert: node
16556/43072043;tid 3889/370 
6449126 DEBUG:  XLogFlush: request 3/38FFE0B8; write 3/38FFDE04; flush 3/38FFDE04
6449127
6449128 FATAL 2:  XLogWrite: write request 3/38FFDE04 is past end of log 3/38FC2000
6449129 DEBUG:  server process (pid 14180) exited with exit code 2
6449130 DEBUG:  terminating any other active server processes
6449131 NOTICE:  Message from PostgreSQL backend:
6449132         The Postmaster has informed me that some other backend
6449133         died abnormally and possibly corrupted shared memory.
6449134         I have rolled back the current transaction and am
6449135         going to terminate your database system connection and exit.
6449136         Please reconnect to the database system and repeat your query.
6449137 DEBUG:  pq_flush: send() failed: Bad file descriptor
6449138 NOTICE:  Message from PostgreSQL backend:
6449139         The Postmaster has informed me that some other backend
6449140         died abnormally and possibly corrupted shared memory.
6449141         I have rolled back the current transaction and am
6449142         going to terminate your database system connection and exit.
6449143         Please reconnect to the database system and repeat your query.
6449144 DEBUG:  all server processes terminated; reinitializing shared memory and semaphores
6449145 DEBUG:  database system was interrupted at 2002-04-16 10:28:11 PDT
6449146 DEBUG:  checkpoint record is at 3/2EE62918
6449147 DEBUG:  redo record is at 3/2A003150; undo record is at 0/0; shutdown FALSE
6449148 DEBUG:  next transaction id: 11559231; next oid: 43919301
6449149 DEBUG:  database system was not properly shut down; automatic recovery in progress
6449150 DEBUG:  redo starts at 3/2A003150
6449151 DEBUG:  REDO @ 3/2A003150; LSN 3/2A0051A8: prev 3/2A003114; xprev 3/2A003114; xid 11559230; bkpb 1: Btree -
insert:node 16556/43072044; tid 2108/92 

The numbers to the left are line numbers.  Line 6449106 is the first sign
of any trouble.  This usually occurs after about 2 million records are
loaded (of 4 million).  There are two btree indexes on this table.  The
system has plenty of disk space for the logs and database.

The Perl scripts I'm using have been heavily used/tested with 6.5 and 7.0.

Has anyone seen this behavior?  It looks like a blown pointer or counter
getting botched where WAL is attempting to write out of bounds.

Postgres WAL is new to me, so, did I miss something?

Thanks.
Matt Olson
Matt Olson <molson@oceanconsulting.com> writes:
> I've recenly  been working with 7.2.1 and been having problems with
> corrupting the DB when loading a new database with a Perl/DBI script doing
> large numbers of inserts to a single table.

This does look like a bug.  Can you provide a self-contained test case?
Or give someone access to your machine to dig into it?

            regards, tom lane