Home > mailing lists

Re: [BUG] Archive recovery failure on 9.3+. - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: [BUG] Archive recovery failure on 9.3+.
Date	January 9, 2014 20:06:12
Msg-id	52CF0125.1070003@vmware.com Whole thread Raw
In response to	Re: [BUG] Archive recovery failure on 9.3+. (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses	Re: [BUG] Archive recovery failure on 9.3+.
List	pgsql-hackers

Tree view

On 01/09/2014 08:18 PM, Heikki Linnakangas wrote:
> On 12/12/2013 04:00 AM, Kyotaro HORIGUCHI wrote:
>> Hello, we happened to see server crash on archive recovery under
>> some condition.
>>
>> After TLI was incremented, there should be the case that the WAL
>> file for older timeline is archived but not for that of the same
>> segment id but for newer timeline. Archive recovery should fail
>> for the case with PANIC error like follows,
>>
>> | PANIC: record with zero length at 0/1820D40
>>
>> Replay script is attached. This issue occured for 9.4dev, 9.3.2,
>> and not for 9.2.6 and 9.1.11. The latter search pg_xlog for the
>> TLI before trying archive for older TLIs.
>>
>> This occurrs during fetching checkpoint redo record in archive
>> recovery.
>>
>>> if (checkPoint.redo < RecPtr)
>>> {
>>>     /* back up to find the record */
>>>     record = ReadRecord(xlogreader, checkPoint.redo, PANIC, false);
>
> Hmm. After running the script, pg_controldata says:
>
> Latest checkpoint location:           0/2000090
> Prior checkpoint location:            0/2000090
> Latest checkpoint's REDO location:    0/2000090
>
> The PANIC is actually coming from here:
>
>>     /*
>>      * Re-fetch the last valid or last applied record, so we can
>> identify the
>>      * exact endpoint of what we consider the valid portion of WAL.
>>      */
>>     record = ReadRecord(xlogreader, LastRec, PANIC, false);
>
> If recovery started from an online checkpoint, then I think you'd get a
> similar PANIC from the code you quoted.
>
>> And this is caused by that the segment file for older timeline in
>> archive directory is preferred to that for newer timeline in
>> pg_xlog.
>
> Yep.

Actually, why is the partially-filled 000000010000000000000002 file 
archived in the first place? Looking at the code, it's been like that 
forever, but it seems like a bad idea. If the original server is still 
up and running, and writing more data to that file, what will happen is 
that when the original server later tries to archive it, it will fail 
because the partial version of the file is already in the archive. Or 
worse, the partial version overwrites a previously archived more 
complete version.

This is the code that does that:

>     /*
>      * If we are establishing a new timeline, we have to copy data from the
>      * last WAL segment of the old timeline to create a starting WAL segment
>      * for the new timeline.
>      *
>      * Notify the archiver that the last WAL segment of the old timeline is
>      * ready to copy to archival storage. Otherwise, it is not archived for a
>      * while.
>      */
>     if (endTLI != ThisTimeLineID)
>     {
>         XLogFileCopy(endLogSegNo, endTLI, endLogSegNo);
>
>         if (XLogArchivingActive())
>         {
>             XLogFileName(xlogpath, endTLI, endLogSegNo);
>             XLogArchiveNotify(xlogpath);
>         }
>     }

So, the rationale is that otherwise it would take a long time until that 
segment is archived. To be precise, I don't think the segment with the 
old TLI would ever be archived without the above, but the same segment 
on the new timeline would, after it fills up.

Wouldn't it be better to not archive the old segment, and instead switch 
to a new segment after writing the end-of-recovery checkpoint, so that 
the segment on the new timeline is archived sooner?

- Heikki

pgsql-hackers by date:

From: Claudio Freire
Date: 09 January 2014, 19:50:40
Subject: Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

From: Tom Lane
Date: 09 January 2014, 20:16:22
Subject: Re: [BUG] Archive recovery failure on 9.3+.

Re: [BUG] Archive recovery failure on 9.3+. - Mailing list pgsql-hackers

Previous

Next