Re: Too strict check when starting from a basebackup taken off a standby - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Too strict check when starting from a basebackup taken off a standby
Date
Msg-id 20141218083001.GY5023@alap3.anarazel.de
Whole thread Raw
In response to Re: Too strict check when starting from a basebackup taken off a standby  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On 2014-12-16 18:37:48 +0200, Heikki Linnakangas wrote:
> On 12/11/2014 04:21 PM, Marco Nenciarini wrote:
> >Il 11/12/14 12:38, Andres Freund ha scritto:
> >>On December 11, 2014 9:56:09 AM CET, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> >>>On 12/11/2014 05:45 AM, Andres Freund wrote:
> >>>
> >>>Yeah. I was not able to reproduce this, but I'm clearly missing
> >>>something, since both you and Sergey have seen this happening. Can you
> >>>write a script to reproduce?
> >>
> >>Not right now, I only have my mobile... Its quite easy though. Create a pg-basebackup from a standby. Create a
recovery.confwith a broken primary conninfo. Start. Shutdown. Fix conninfo. Start.
 
> >>
> >
> >Just tested it. There steps are not sufficient to reproduce the issue on
> >a test installation. I suppose because, on small test datadir, the
> >checkpoint location and the redo location on the pg_control are the same
> >present in the backup_label.
> >
> >To trigger this bug you need to have at least a restartpoint happened on
> >standby between the start and the end of the backup.
> >
> >you could simulate it issuing a checkpoint on master, a checkpoint on
> >standby (to force a restartpoint), then copying the pg_control from the
> >standby.
> >
> >This way I've been able to reproduce it.
> 
> Ok, got it. I was able to reproduce this by using pg_basebackup
> --max-rate=1024, and issuing "CHECKPOINT" in the standby while the backup
> was running.

FWIW, I can reproduce it without any such hangups. I've just tested it
using my local scripts:
# create primary
$ reinit-pg-dev-master
$ run-pg-dev-master
# create first standby
$ reinit-pg-dev-master-standby
$ run-pg-dev-master-standby
# create 2nd standby
$ pg_basebackup -h /tmp/ -p 5441 -D /tmp/tree --write-recovery-conf
$ PGHOST=frakbar run-pg-dev-master-standby -D /tmp/tree
LOG:  creating missing WAL directory "pg_xlog/archive_status"
LOG:  entering standby mode
FATAL:  could not connect to the primary server: could not translate host name "frakbar" to address: Name or service
notknown
 
$ PGHOST=/tmp run-pg-dev-master-standby -D /tmp/tree
LOG:  started streaming WAL from primary at 0/2000000 on timeline 1
FATAL:  backup_label contains data inconsistent with control file
HINT:  This means that the backup is corrupted and you will have to use another backup for recovery.

After the fix I just pushed that sequence works.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Next
From: Fujii Masao
Date:
Subject: Re: Minor improvement to explain.c