Home > mailing lists

Re: corrupt pages detected by enabling checksums - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: corrupt pages detected by enabling checksums
Date	May 1, 2013 17:02:30
Msg-id	CA+U5nMKwguCMugQEDzoyx__Ug3TUcRKA_MejkkoJLkc0hS78gg@mail.gmail.com Whole thread Raw
In response to	Re: corrupt pages detected by enabling checksums (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: corrupt pages detected by enabling checksums
List	pgsql-hackers

Tree view

On 1 May 2013 16:33, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, May 1, 2013 at 11:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I was worried because SyncOneBuffer checks whether it needs writing
>>> without taking a content lock, so the exclusive lock doesn't help. That
>>> makes sense, because you don't want a checkpoint to have to get a
>>> content lock on every buffer in the buffer pool. But it also means we
>>> need to follow the rules laid out in transam/README and dirty the pages
>>> before writing WAL.
>>
>> On further review, I think you're right about this.  We'd have a
>> problem if the following sequence of events were to occur:
>>
>> 1. vacuum writes the WAL record, but does not yet mark the buffer
>> dirty, and then goes into the tank
>> 2. checkpointer decides where the checkpoint will begin
>> 3. buffer pool is scanned as part of the checkpoint process, observing
>> target buffer not to be dirty
>> 4. vacuum finally wakes up and marks the buffer dirty
>> 5. crash, after WAL is flushed but before buffer is written
>>
>> However, on looking at this a little more, shouldn't we be doing
>> log_heap_clean() *before* we do visibilitymap_set()?
>
> Hit send too soon.
>
> To finish that thought: right now the order of operations is...
>
> 1. Perform the cleanup operations on the buffer.
> 2. Set the visibility map bit.
> 3. Log the visibility map change.
> 4. Log the cleanup operations.
>
> It seems to me that it would be better to do (4) as close to (1) as
> possible.  Isn't it bad if the operations are replayed in an order
> that differs from the order in which they were performed - or if, say,
> the first WAL record were replayed but the second were not, for
> whatever reason?

I agree, but that was in the original coding wasn't it?

Why aren't we writing just one WAL record for this action? We use a
single WAL record in other places where we make changes to multiple
blocks with multiple full page writes, e.g. index block split. That
would make the action atomic and we'd just have this...

1. Perform the cleanup operations on the buffer.
2. Set the visibility map bit.
3. Log the cleanup operations and visibility map change.

which can then be replayed with correct sequence, locking etc.
and AFAICS would likely be faster also.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Greg Smith
Date: 01 May 2013, 16:56:16
Subject: Re: [PATCH] add --throttle to pgbench (submission 3)

From: Stefan Kaltenbrunner
Date: 01 May 2013, 17:28:07
Subject: Re: Re: [BUGS] BUG #8128: pg_dump (>= 9.1) failed while dumping a scheme named "old" from PostgreSQL 8.4

Re: corrupt pages detected by enabling checksums - Mailing list pgsql-hackers

Previous

Next