Thread: WAL Crash during index vacuuming 7.1beta4

WAL Crash during index vacuuming 7.1beta4

From
Giuseppe Tanzilli - CSF
Date:
Hi,
during the nightly vacuum pgsql closed and do not start any more.
Attached the log.

Seems the problem was rebuilding an Index,
There is a way to force wal to ignore indexes ?
Can I delete it ?

thanks in advance

-------------------------------------------------------
Giuseppe Tanzilli        g.tanzilli@gruppocsf.com
CSF Sistemi srl            phone ++39 0775 7771
Via del Ciavattino 
Anagni FR
Italy

/usr/local/pgsql/bin/postmaster: reaping dead processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: pid 30842 exited with status 0
DEBUG:  Index BD_Prodotto_SuddMerc_key: Pages 14533; Tuples 247153: Deleted 246556. CPU 2.61s/4.93u sec.
DEBUG:  XLogWrite: new log file created - try to increase WAL_FILES
DEBUG:  XLogWrite: new log file created - try to increase WAL_FILES
DEBUG:  XLogWrite: new log file created - try to increase WAL_FILES
DEBUG:  XLogWrite: new log file created - try to increase WAL_FILES
DEBUG:  XLogWrite: new log file created - try to increase WAL_FILES
DEBUG:  XLogWrite: new log file created - try to increase WAL_FILES
DEBUG:  XLogWrite: new log file created - try to increase WAL_FILES
DEBUG:  XLogWrite: new log file created - try to increase WAL_FILES
DEBUG:  XLogWrite: new log file created - try to increase WAL_FILES
DEBUG:  XLogWrite: new log file created - try to increase WAL_FILES
DEBUG:  XLogWrite: new log file created - try to increase WAL_FILES

FATAL: s_lock(0x402a3535) at bufmgr.c:2072, stuck spinlock. Aborting.

FATAL: s_lock(0x402a3535) at bufmgr.c:2072, stuck spinlock. Aborting.
/usr/local/pgsql/bin/postmaster: reaping dead processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: pid 30355 exited with status 6
Server process (pid 30355) exited with status 6 at Thu Feb  1 04:36:21 2001
Terminating any active server processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: sending SIGUSR1 to process 30843
/usr/local/pgsql/bin/postmaster: ServerLoop:        handling reading 5
/usr/local/pgsql/bin/postmaster: ServerLoop:        handling reading 5
The Data Base System is in recovery mode
/usr/local/pgsql/bin/postmaster: ServerLoop:        handling writing 5
/usr/local/pgsql/bin/postmaster: ServerLoop:        handling reading 5
/usr/local/pgsql/bin/postmaster: ServerLoop:        handling reading 5
The Data Base System is in recovery mode
/usr/local/pgsql/bin/postmaster: ServerLoop:        handling writing 5
/usr/local/pgsql/bin/postmaster: ServerLoop:        handling reading 5
/usr/local/pgsql/bin/postmaster: ServerLoop:        handling reading 5
The Data Base System is in recovery mode
/usr/local/pgsql/bin/postmaster: ServerLoop:        handling writing 5
DEBUG:  proc_exit(0)
DEBUG:  shmem_exit(0)
DEBUG:  exit(0)
/usr/local/pgsql/bin/postmaster: reaping dead processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: pid 30843 exited with status 0
Server processes were terminated at Thu Feb  1 04:36:38 2001
Reinitializing shared memory and semaphores
invoking IpcMemoryCreate(size=68608000)
DEBUG:  starting up
DEBUG:  database system was interrupted at 2001-02-01 04:36:38
DEBUG:  CheckPoint record at (6, 4150292860)
DEBUG:  Redo record at (6, 4144414796); Undo record at (6, 3622728240); Shutdown FALSE
DEBUG:  NextTransactionId: 156659; NextOid: 7401689
DEBUG:  database system was not properly shut down; automatic recovery in progress...
DEBUG:  redo starts at (6, 4144414796)
NOTICE:  PageAddItem: tried overwrite of used ItemId
FATAL 2:  heap_update_redo: failed to add tuple
DEBUG:  proc_exit(2)
DEBUG:  shmem_exit(2)
DEBUG:  exit(2)
/usr/local/pgsql/bin/postmaster: reaping dead processes...
Startup failed - abort
invoking IpcMemoryCreate(size=68608000)
FindExec: found "/usr/local/pgsql/bin/postmaster" using argv[0]
DEBUG:  starting up
DEBUG:  database system was interrupted being in recovery at 2001-02-01 04:36:38This propably means that some data
blocksare corruptedand you will have to use last backup for recovery.
 
DEBUG:  CheckPoint record at (6, 4150292860)
DEBUG:  Redo record at (6, 4144414796); Undo record at (6, 3622728240); Shutdown FALSE
DEBUG:  NextTransactionId: 156659; NextOid: 7401689
DEBUG:  database system was not properly shut down; automatic recovery in progress...
DEBUG:  redo starts at (6, 4144414796)
NOTICE:  PageAddItem: tried overwrite of used ItemId
FATAL 2:  heap_update_redo: failed to add tuple
DEBUG:  proc_exit(2)
DEBUG:  shmem_exit(2)
DEBUG:  exit(2)
/usr/local/pgsql/bin/postmaster: reaping dead processes...
Startup failed - abort
invoking IpcMemoryCreate(size=68608000)
FindExec: found "/usr/local/pgsql/bin/postmaster" using argv[0]
DEBUG:  starting up
DEBUG:  database system was interrupted being in recovery at 2001-02-01 09:47:40This propably means that some data
blocksare corruptedand you will have to use last backup for recovery.
 
DEBUG:  CheckPoint record at (6, 4150292860)
DEBUG:  Redo record at (6, 4144414796); Undo record at (6, 3622728240); Shutdown FALSE
DEBUG:  NextTransactionId: 156659; NextOid: 7401689
DEBUG:  database system was not properly shut down; automatic recovery in progress...
DEBUG:  redo starts at (6, 4144414796)
NOTICE:  PageAddItem: tried overwrite of used ItemId
FATAL 2:  heap_update_redo: failed to add tuple
DEBUG:  proc_exit(2)
DEBUG:  shmem_exit(2)
DEBUG:  exit(2)
/usr/local/pgsql/bin/postmaster: reaping dead processes...
Startup failed - abort

Re: WAL Crash during index vacuuming 7.1beta4

From
"Vadim Mikheev"
Date:
> during the nightly vacuum pgsql closed and do not start any more.
> Attached the log.
> 
> Seems the problem was rebuilding an Index,
> There is a way to force wal to ignore indexes ?

The problem was in redoing tuple movement in *table*.

> Can I delete it ?
>
...
>
> DEBUG:  redo starts at (6, 4144414796)
> NOTICE:  PageAddItem: tried overwrite of used ItemId
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> FATAL 2:  heap_update_redo: failed to add tuple

I think that I've just fixed this problem (must not check itemid'
flag in PageAddItem in overwrite mode when offset number
== maxoff + 1). I hope that Giuseppe will check new code soon.

Thanks to Giuseppe for help!

Vadim