Re: Proposed WAL changes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Proposed WAL changes
Date
Msg-id 9601.983981365@sss.pgh.pa.us
Whole thread Raw
In response to Re: Proposed WAL changes  ("Vadim Mikheev" <vmikheev@sectorbase.com>)
Responses Re: Proposed WAL changes  (ncm@zembu.com (Nathan Myers))
List pgsql-hackers
"Vadim Mikheev" <vmikheev@sectorbase.com> writes:
>> I have just sent to the pgsql-patches list a rather large set of
> Please send it to me directly - pgsql-patches' archieve is dated by Feb -:(

Done under separate cover.

>> proposed diffs for the WAL code.  These changes:
>> 
>> * Store two past checkpoint locations, not just one, in pg_control.
>> On startup, we fall back to the older checkpoint if the newer one
>> is unreadable.  Also, a physical copy of the newest checkpoint record

> And what to do if older one is unreadable too?
> (Isn't it like using 2 x CRC32 instead of CRC64 ? -:))

Then you lose --- but two checkpoints gives you twice the chance of
recovery (probably more, actually, since it's much more likely that
the previous checkpoint will have reached disk safely).

> And what to do if pg_control was lost? (We already discussed that we
> should read all logs from newest to oldest ones to find checkpoint).

If you have valid WAL files and broken pg_control, then reading the WAL
files is a way to recover.  If you have valid pg_control and broken WAL
files, you have a big problem, but using pg_control to generate a new
empty WAL will at least let you get at your heap files.

> And why to keep old log files with older checkpoint?

Not much point in remembering the older checkpoint location if the
associated WAL file is removed...

> Mmmm, how recovery is possible if log was lost? All what could be done
> with DB in the event of corrupted/lost log is dumping data from tables
> *asis*, without any guarantee about consistency.

Exactly.  That is still better than not being able to dump the data at
all.

>> * Change XID allocation to work more like OID allocation, so that we
>> can flush XID alloc info to the log before there is any chance an XID
>> will appear in heap files.

> I didn't read you postings about this yet.

See later discussion --- Andreas convinced me that flushing NEXTXID
records to disk isn't really needed after all.  (I didn't take the flush
out of my patch yet, but will do so.)  I still want to leave the NEXTXID
records in there, though, because I think that XID and OID assignment
ought to work as nearly alike as possible.

>> Before committing this stuff, I intend to prepare a contrib utility that
>> can be used to reset pg_control and pg_xlog.  This is mainly for
>> disaster recovery purposes, but as a side benefit it will allow people

> Once again, I would call this "disaster *dump* purposes" -:)
> After such operation DB shouldn't be used for anything but dump!

Fair enough.  But we need it.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: AW: Proposed WAL changes
Next
From: "Michal Maru¹ka"
Date:
Subject: Re: psql missing feature