Re: Online base backup from the hot-standby - Mailing list pgsql-hackers

From Erik Rijkers
Subject Re: Online base backup from the hot-standby
Date
Msg-id 84ea59759fa18be26f1401c9afd37c49.squirrel@webmail.xs4all.nl
Whole thread Raw
In response to Re: Online base backup from the hot-standby  (Steve Singer <ssinger_pg@sympatico.ca>)
Responses Re: Online base backup from the hot-standby
List pgsql-hackers
On Fri, January 20, 2012 05:01, Steve Singer wrote:
> On 12-01-17 05:38 AM, Fujii Masao wrote:
>> On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>>> The amount of code changes to allow pg_basebackup to make a backup from
>>> the standby seems to be small. So I ended up merging that changes and the
>>> infrastructure patch. WIP patch attached. But I'd happy to split the patch again
>>> if you want.
>> Attached is the updated version of the patch. I wrote the limitations of
>> standby-only backup in the document and changed the error messages.
>>
>
> Here is my review of this verison of the patch. I think this patch has
> been in every CF for 9.2 and I feel it is getting close to being
> committed.  The only issue of significants is a crash I encountered
> while testing, see below.
>
> I am fine with including the pg_basebackup changes in the patch it also
> makes testing some of the other changes possible.
>
>
> The documentation updates you have are good
>
> I don't see any issues looking at the code.
>
>
>
> Testing Review
> --------------------------------
>
> I encountered this on my first replica (the one based on the master).  I
> am not sure if it is related to this patch, it happened after the
> pg_basebackup against the replica finished.
>
> TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
> "twophase.c", Line: 1238)
> LOG:  startup process (PID 12222) was terminated by signal 6: Aborted
> LOG:  terminating any other active server processes
>
> A little earlier this postmaster had printed.
>
> LOG:  restored log file "00000001000000000000001F" from archive
> LOG:  restored log file "000000010000000000000020" from archive
> cp: cannot stat
> `/usr/local/pgsql92git/archive/000000010000000000000021': No such file
> or directory
> LOG:  unexpected pageaddr 0/19000000 in log file 0, segment 33, offset 0
> cp: cannot stat
> `/usr/local/pgsql92git/archive/000000010000000000000021': No such file
> or directory
>
>
> I have NOT been able to replicate this error  and I am not sure exactly
> what I had done in my testing prior to that point.
>

I'm not sure, but it does look like this is the "mystery" bug that I encountered repeatedly
already in 9.0devel; but I was never able to reproduce it reliably.  But I don't think it was ever
solved.
 http://archives.postgresql.org/pgsql-hackers/2010-03/msg00223.php

Erik Rijkers















>
> In another test run I had
>
> - set full page writes=off and did a checkpoint
> - Started the pg_basebackup
> - set full_page_writes=on and did a HUP + some database activity that
> might have forced a checkpoint.
>
> I got this message from pg_basebackup.
> ./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
> pg_basebackup: could not get WAL end position from server
>
> I point this out because the message is different than the normal "could
> not initiate base backup: FATAL:  WAL generated with
> full_page_writes=off" thatI normally see.    We might want to add a
> PQerrorMessage(conn)) to pg_basebackup to print the error details.
> Since this patch didn't actually change pg_basebackup I don't think your
> required to improve the error messages in it.  I am just mentioning this
> because it came up in testing.
>
>
> The rest of the tests I did involving changing full_page_writes
> with/without checkpoints and sighups and promoting the replica seemed to
> work as expected.
>




pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Online base backup from the hot-standby
Next
From: Simon Riggs
Date:
Subject: Re: WAL Restore process during recovery