Re: Online base backup from the hot-standby - Mailing list pgsql-hackers
From | Erik Rijkers |
---|---|
Subject | Re: Online base backup from the hot-standby |
Date | |
Msg-id | 84ea59759fa18be26f1401c9afd37c49.squirrel@webmail.xs4all.nl Whole thread Raw |
In response to | Re: Online base backup from the hot-standby (Steve Singer <ssinger_pg@sympatico.ca>) |
Responses |
Re: Online base backup from the hot-standby
|
List | pgsql-hackers |
On Fri, January 20, 2012 05:01, Steve Singer wrote: > On 12-01-17 05:38 AM, Fujii Masao wrote: >> On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao<masao.fujii@gmail.com> wrote: >>> The amount of code changes to allow pg_basebackup to make a backup from >>> the standby seems to be small. So I ended up merging that changes and the >>> infrastructure patch. WIP patch attached. But I'd happy to split the patch again >>> if you want. >> Attached is the updated version of the patch. I wrote the limitations of >> standby-only backup in the document and changed the error messages. >> > > Here is my review of this verison of the patch. I think this patch has > been in every CF for 9.2 and I feel it is getting close to being > committed. The only issue of significants is a crash I encountered > while testing, see below. > > I am fine with including the pg_basebackup changes in the patch it also > makes testing some of the other changes possible. > > > The documentation updates you have are good > > I don't see any issues looking at the code. > > > > Testing Review > -------------------------------- > > I encountered this on my first replica (the one based on the master). I > am not sure if it is related to this patch, it happened after the > pg_basebackup against the replica finished. > > TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: > "twophase.c", Line: 1238) > LOG: startup process (PID 12222) was terminated by signal 6: Aborted > LOG: terminating any other active server processes > > A little earlier this postmaster had printed. > > LOG: restored log file "00000001000000000000001F" from archive > LOG: restored log file "000000010000000000000020" from archive > cp: cannot stat > `/usr/local/pgsql92git/archive/000000010000000000000021': No such file > or directory > LOG: unexpected pageaddr 0/19000000 in log file 0, segment 33, offset 0 > cp: cannot stat > `/usr/local/pgsql92git/archive/000000010000000000000021': No such file > or directory > > > I have NOT been able to replicate this error and I am not sure exactly > what I had done in my testing prior to that point. > I'm not sure, but it does look like this is the "mystery" bug that I encountered repeatedly already in 9.0devel; but I was never able to reproduce it reliably. But I don't think it was ever solved. http://archives.postgresql.org/pgsql-hackers/2010-03/msg00223.php Erik Rijkers > > In another test run I had > > - set full page writes=off and did a checkpoint > - Started the pg_basebackup > - set full_page_writes=on and did a HUP + some database activity that > might have forced a checkpoint. > > I got this message from pg_basebackup. > ./pg_basebackup -D ../data3 -l foo -h localhost -p 5438 > pg_basebackup: could not get WAL end position from server > > I point this out because the message is different than the normal "could > not initiate base backup: FATAL: WAL generated with > full_page_writes=off" thatI normally see. We might want to add a > PQerrorMessage(conn)) to pg_basebackup to print the error details. > Since this patch didn't actually change pg_basebackup I don't think your > required to improve the error messages in it. I am just mentioning this > because it came up in testing. > > > The rest of the tests I did involving changing full_page_writes > with/without checkpoints and sighups and promoting the replica seemed to > work as expected. >
pgsql-hackers by date: