Home > mailing lists

Re: Disaster! - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Disaster!
Date	January 23, 2004 17:21:42
Msg-id	4221.1074892864@sss.pgh.pa.us Whole thread Raw
In response to	Re: Disaster! (Martín Marqués<martin@bugs.unl.edu.ar>)
Responses	Re: Disaster!
List	pgsql-hackers

Tree view

Martín Marqués <martin@bugs.unl.edu.ar> writes:
> Tom, could you give a small insight on what occurred here, why those
> 8k of zeros fixed it, and what is a "WAL replay"?

I think what happened is that there was insufficient space to write out
a new page of the clog (transaction commit) file.  This would result in
a database panic, which is fine --- you're not gonna get much done
anyway if you are down to zero free disk space.  However, after Chris
freed up space, the system needed to replay the WAL from the last
checkpoint to ensure consistency.  The WAL entries evidently included
references to transactions whose commit bits were in the unwritten page.
Now there would also be WAL entries recording those commits, so once the
replay was complete everything would be cool.  But the clog access code
evidently got confused by being asked to read a page that didn't exist
in the file.  I'm not sure yet how that sequence of events occurred,
which is why I asked Chris for a stack trace.

Adding a page of zeroes fixed it by eliminating the read error
condition.  It was okay to do so because zeroes is the correct initial
state for a clog page (all transactions in it "still in progress").
After WAL replay, any completed transactions would be updated in the page.
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 23 January 2004, 17:13:44
Subject: Re: Disaster!

From: Alvaro Herrera
Date: 23 January 2004, 17:23:21
Subject: Re: Disaster!

Re: Disaster! - Mailing list pgsql-hackers

Previous

Next