Re: Bug in visibility map WAL-logging - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Bug in visibility map WAL-logging
Date
Msg-id 52CC573F.5080600@vmware.com
Whole thread Raw
In response to Re: Bug in visibility map WAL-logging  (Matheus de Oliveira <matioli.matheus@gmail.com>)
Responses Re: Bug in visibility map WAL-logging
Re: Bug in visibility map WAL-logging
Re: Bug in visibility map WAL-logging
List pgsql-hackers
On 01/07/2014 07:15 PM, Matheus de Oliveira wrote:
> Hi folks,
>
>
> On Fri, Dec 13, 2013 at 9:47 AM, Heikki Linnakangas <hlinnakangas@vmware.com
>> wrote:
>
>> lazy_vacuum_page() does this:
>>
>> 1. START_CRIT_SECTION()
>> 2. Remove dead tuples from the page, marking the itemid's unused.
>> 3. MarkBufferDirty
>> 4. if all remaining tuples on the page are visible to everyone, set the
>> all-visible flag on the page, and call visibilitymap_set() to set the VM
>> bit.
>> 5 visibilitymap_set() writes a WAL record about setting the bit in the
>> visibility map.
>> 6. write the WAL record of removing the dead tuples.
>> 7. END_CRIT_SECTION().
>>
>> See the problem? Setting the VM bit is WAL-logged first, before the
>> removal of the tuples. If you stop recovery between the two WAL records,
>> the page's VM bit in the VM map will be set, but the dead tuples are still
>> on the page.
>>
>> This bug was introduced in Feb 2013, by commit
>> fdf9e21196a6f58c6021c967dc5776a16190f295, so it's present in 9.3 and master.
>>
>> The fix seems quite straightforward, we just have to change the order of
>> those two records. I'll go commit that.
>>
>
>
> With a lot of help from Andres on IRC (thanks a lot man), I think (he
> thinks actually, =P ) I may ran into the issue this bug can cause. I'm
> sending this e-mail to (1) confirm if my issue is really caused by that bug
> and if that is the case, (2) let you guys know the problems it can cause.
>
> Scenario:
>
> Master: 9.3.1 (I know, trying to go to 9.3.2)
> Slave: 9.3.2
>
> The slave was running with hot_standby=off (because of other bug Andres is
> working on), but it stopped working with the following log lines:
>
> 2014-01-07 12:38:04 BRST [22668]: [7-1] user=,db= WARNING:  page 1076 of
> relation base/883916/151040222 is uninitialized
> 2014-01-07 12:38:04 BRST [22668]: [8-1] user=,db= CONTEXT:  xlog redo
> visible: rel 1663/883916/151040222; blk 1076
> 2014-01-07 12:38:04 BRST [22668]: [9-1] user=,db= PANIC:  WAL contains
> references to invalid pages
> 2014-01-07 12:38:04 BRST [22668]: [10-1] user=,db= CONTEXT:  xlog redo
> visible: rel 1663/883916/151040222; blk 1076
> 2014-01-07 12:38:04 BRST [22665]: [3-1] user=,db= LOG:  startup process
> (PID 22668) was terminated by signal 6: Aborted
>
> (Complete log at https://pgsql.privatepaste.com/343f3190fe).
>
> I used pg_xlogdump to (try to) find the block the error relates to:
>
> $ pg_xlogdump pg_xlog/00000001000002BC0000002B 00000001000002BC0000007F |
> grep -n -C5 '883916/151040222; blk 1076'
>
> 2088220 ... Heap2 ...  24/   192, ... lsn: 2BC/46AB8B20 ... desc: clean:
> rel 1663/883916/151040222; blk 1073 remxid 107409880
> 2088221 ... Heap2 ...  20/    52, ... lsn: 2BC/46AB8BE0 ... desc: visible:
> rel 1663/883916/151040222; blk 1074
> 2088222 ... Heap2 ...  24/   128, ... lsn: 2BC/46AB8C18 ... desc: clean:
> rel 1663/883916/151040222; blk 1074 remxid 107409880> ...

Hmm. The xlogdump indeed shows that the order of 'clean' and 'visible' 
is incorrect, but I don't immediately see how that could cause the 
PANIC. Why is the page uninitialized in the standby? If VACUUM is 
removing some dead tuples from it, it certainly should exist and be 
correctly initialized.

How did you set up the standby? Did you initialize it from an offline 
backup of the master's data directory, perhaps? The log shows that the 
startup took the the "crash recovery first, then start archive recovery" 
path, because there was no backup label file. In that mode, the standby 
assumes that the system is consistent after replaying all the WAL in 
pg_xlog, which is correct if you initialize from an offline backup or 
atomic filesystem snapshot, for example. But "WAL contains references to 
invalid pages" could also be a symptom of an inconsistent base backup, 
cause by incorrect backup procedure. In particular, I have to ask 
because I've seen it before: you didn't delete backup_label from the 
backup, did you?

- Heikki



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: Re: How to reproduce serialization failure for a read only transaction.
Next
From: Kevin Grittner
Date:
Subject: Re: Re: How to reproduce serialization failure for a read only transaction.