Re: crash-safe visibility map, take three - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: crash-safe visibility map, take three
Date
Msg-id 4CF4A8EC.2070408@enterprisedb.com
Whole thread Raw
In response to crash-safe visibility map, take three  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: crash-safe visibility map, take three
Re: crash-safe visibility map, take three
List pgsql-hackers
On 30.11.2010 06:57, Robert Haas wrote:
> I can't say I'm totally in love with any of these designs.  Anyone
> else have any ideas, or any opinions about which one is best?

Well, the design I've been pondering goes like this:

At vacuum:

1. Write an "intent" XLOG record listing a chunk of visibility map bits 
that are not currently set, that we are going to try to set. A chunk of 
say 100 bits would be about right.

2. Scan the 100 heap pages as we currently do, setting the visibility 
map bits as we go.

3. After the scan, lock the visibility map page, check which of the bits 
that we set in step 2 are still set (concurrent updates might've cleared 
some), and write a final XLOG record listing the set bits. This step 
isn't necessary for correctness, BTW, but without it you lose all the 
set bits if you crash before next checkpoint.

At replay, when we see the intent XLOG record, clear all the bits listed 
in it. This ensures that if we crashed and some of the visibility map 
bits were flushed to disk but the corresponding changes to the heap 
pages were not, the bits are cleared. When we see the final XLOG record, 
we set the bits.

Some care is needed with checkpoints. Setting visibility map bits in 
step 2 is safe because crash recovery will replay the intent XLOG record 
and clear any incorrectly set bits. But if a checkpoint has happened 
after the intent XLOG record was written, that's not true. This can be 
avoided by checking RedoRecPtr in step 2, and writing a new intent XLOG 
record if it has changed since the last intent XLOG record was written.

There's a small race condition in the way a visibility map bit is 
currently cleared. When a heap page is updated, it is locked, the update 
is WAL-logged, and the lock is released. The visibility map page is 
updated only after that. If the final vacuum XLOG record is written just 
after updating the heap page, but before the visibility map bit is 
cleared, replaying the final XLOG record will set a bit that should not 
have been set.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: profiling connection overhead
Next
From: Itagaki Takahiro
Date:
Subject: Re: Tab completion for view triggers in psql