Re: Hot Standby 0.2.1 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Hot Standby 0.2.1
Date
Msg-id 4AB9E547.9040602@enterprisedb.com
Whole thread Raw
In response to Hot Standby 0.2.1  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Hot Standby 0.2.1
Re: Hot Standby 0.2.1
Re: Hot Standby 0.2.1
List pgsql-hackers
The logic in the lock manager to track the number of held
AccessExclusiveLocks (with ProcArrayIncrementNumHeldLocks and
ProcArrayDecrementNumHeldLocks) seems to be broken. I added an Assertion
into ProcArrayDecrementNumHeldLocks:

--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1401,6 +1401,7 @@ ProcArrayIncrementNumHeldLocks(PGPROC *proc)voidProcArrayDecrementNumHeldLocks(PGPROC *proc){
+   Assert(proc->numHeldLocks > 0);   proc->numHeldLocks--;}

This tripped the assertion:

postgres=# CREATE TABLE foo (id int4 primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"foo_pkey" for table "foo"
server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing
therequest.
 

Making matters worse, the primary server refuses to startup up after
that, tripping the assertion again in crash recovery:

$ bin/postmaster -D data
LOG:  database system was interrupted while in recovery at 2009-09-23
11:56:15 EEST
HINT:  This probably means that some data is corrupted and you will have
to use the last backup for recovery.
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  redo starts at 0/32000070
LOG:  REDO @ 0/32000070; LSN 0/320000AC: prev 0/32000020; xid 0; len 32:
Heap2 - clean: rel 1663/11562/1249; blk 32 remxid 4352
LOG:  consistent recovery state reached
LOG:  REDO @ 0/320000AC; LSN 0/320000CC: prev 0/32000070; xid 0; len 4:
XLOG - nextOid: 24600
LOG:  REDO @ 0/320000CC; LSN 0/320000F4: prev 0/320000AC; xid 0; len 12:
Storage - file create: base/11562/16408
LOG:  REDO @ 0/320000F4; LSN 0/3200011C: prev 0/320000CC; xid 4364; len
12: Relation - exclusive relation lock: xid 4364 db 11562 rel 16408
LOG:  REDO @ 0/3200011C; LSN 0/320001D8: prev 0/320000F4; xid 4364; len
159: Heap - insert: rel 1663/11562/1259; tid 5/4
...
LOG:  REDO @ 0/32004754; LSN 0/32004878: prev 0/320046A8; xid 4364; len
264: Transaction - commit: 2009-09-23 11:55:51.888398+03; 15 inval
msgs:catcache id38 catcache id37 catcache id38 catcache id37 catcache
id38 catcache id37 catcache id7 catcache id6 catcache id26 smgr relcache
smgr relcache smgr relcache
TRAP: FailedAssertion("!(proc->numHeldLocks > 0)", File: "procarray.c",
Line: 1404)
LOG:  startup process (PID 27430) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure

I'm sure that's just a simple bug somewhere, but it highlights that we
need be careful to avoid putting any extra work into the normal recovery
path. Otherwise bugs in hot standby related code can cause crash
recovery to fail.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Hot Standby 0.2.1
Next
From: Roger Leigh
Date:
Subject: Re: Unicode UTF-8 table formatting for psql text output