Home > mailing lists

Re: Switching timeline over streaming replication - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Switching timeline over streaming replication
Date	November 21, 2012 18:06:29
Msg-id	50AD181E.3030803@vmware.com Whole thread Raw
In response to	Re: Switching timeline over streaming replication (Amit Kapila <amit.kapila@huawei.com>)
Responses	Re: Switching timeline over streaming replication
List	pgsql-hackers

Tree view

On 20.11.2012 15:33, Amit Kapila wrote:
> Defect-2:
>      1. start primary A
>      2. start standby B following A
>      3. start cascade standby C following B.
>      4. Start another standby D following C.
>      5. Execute the following commands in the primary A.
>             create table tbl(f int);
>             insert into tbl values(generate_series(1,1000));
>      6. Promote standby B.
>      7. Execute the following commands in the primary B.
>             insert into tbl values(generate_series(1001,2000));
>             insert into tbl values(generate_series(2001,3000));
>
>      The following logs are observed on standby C:
>
>      LOG:  restarted WAL streaming at position 0/7000000 on tli 2
>      ERROR:  requested WAL segment 000000020000000000000007 has already been
> removed
>      LOG:  record with zero length at 0/7028190
>      LOG:  record with zero length at 0/7048540
>      LOG:  out-of-sequence timeline ID 1 (after 2) in log segment
> 000000020000000000000007, offset 0

Hmm, this one is actually a pre-existing bug. There's a sanity check
that the sequence of timeline IDs that are seen in the XLOG page headers
doesn't go backwards. In other words, if the last XLOG page that was
read had timeline id X, the next page must have a tli >= X. The startup
process keeps track of the last seen timeline id in lastPageTLI. In
standby_mode, when the startup process is reading from a pre-existing
file in pg_xlog (typically put there by streaming replication) and it
reaches the end of valid WAL (marked by an error in decoding it, ie.
"record with zero length" in your case), it sleeps for five seconds and
retries. At retry, the WAL file is re-opened, and as part of sanity
checking it, the first page header in the file is validated.

Now, if there was a timeline change in the current WAL segment, and
we've already replayed past that point, lastPageTLI will already be set
to the new TLI, but the first page on the file contains the old TLI.
When the file is re-opened, and the first page is validated, you get the
error.

The fix is quite straightforward: we should refrain from checking the
TLI when we re-open a WAL file. Or better yet, compare it against the
TLI we saw at the beginning of the last WAL segment, not the last WAL page.

I propose the attached patch (against 9.2) to fix that. This should be
backpatched to 9.0, where standby_mode was introduced. The code was the
same in 8.4, too, but AFAICS there was no problem there because 8.4
never tried to re-open the same WAL segment after replaying some of it.

- Heikki

Attachment

fix-segment-reread-after-tli-switch-1.patch

pgsql-hackers by date:

From: Andres Freund
Date: 21 November 2012, 18:04:59
Subject: Re: [PATCH] binary heap implementation

From: Magnus Hagander
Date: 21 November 2012, 18:19:14
Subject: PQconninfo function for libpq

Re: Switching timeline over streaming replication - Mailing list pgsql-hackers

Attachment

Previous

Next