Re: Online base backup from the hot-standby - Mailing list pgsql-hackers

From Steve Singer
Subject Re: Online base backup from the hot-standby
Date
Msg-id BLU0-SMTP598D7314D4D1468AA0C4238EF30@phx.gbl
Whole thread Raw
In response to Re: Online base backup from the hot-standby  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Online base backup from the hot-standby
List pgsql-hackers
On 11-09-22 09:24 AM, Fujii Masao wrote: <blockquote
cite="mid:CAHGQGwEQpF2nY1CTZkioXu=ifZtVTjF0dq_RiqJUzu7MGOACjw@mail.gmail.com"type="cite"><pre wrap="">On Wed, Sep 21,
2011at 11:50 AM, Fujii Masao <a class="moz-txt-link-rfc2396E"
href="mailto:masao.fujii@gmail.com"><masao.fujii@gmail.com></a>wrote:
 
</pre><blockquote type="cite"><pre wrap="">2011/9/13 Jun Ishiduka <a class="moz-txt-link-rfc2396E"
href="mailto:ishizuka.jun@po.ntts.co.jp"><ishizuka.jun@po.ntts.co.jp></a>:
</pre><blockquote type="cite"><pre wrap="">
Update patch.

Changes:
 * set 'on' full_page_writes by user (in document)
 * read "FROM: XX" in backup_label (in xlog.c)
 * check status when pg_stop_backup is executed (in xlog.c)
</pre></blockquote><pre wrap="">
Thanks for updating the patch.

Before reviewing the patch, to encourage people to comment and
review the patch, I explain what this patch provides:
</pre></blockquote><pre wrap="">
Attached is the updated version of the patch. I refactored the code, fixed
some bugs, added lots of source code comments, improved the document,
but didn't change the basic design. Please check this patch, and let's use
this patch as the base if you agree with that.

</pre></blockquote><br /> I have looked at both Jun's patch from Sept 13 and Fujii's updates to the patch.  I agree
thatFujii's updated version should be used as the basis for changes going forward.   My comments below refer to that
version(unless otherwise noted).<br /><br /><br /> In backup.sgml  the new section titled "Making a Base Backup during
Recovery" I would prefer to see some mention in the title that this procedure is for standby servers ie "Making a Base
Backupfrom a Standby Database".  Users who have setup a hot-standby database should be familiar with the 'standby'
terminology.I agree that the "during recovery" description is technically correct but I'm not sure someone who is
lookingthrough the manual for instructions on making a base backup from here standby will realize this is the section
theyshould read.<br /><br /> Around line 969 where you give an example of copying the control file I would be a bit
clearerthat this is an example command.  Ie (Copy the pg_control file from the cluster directory to the global
sub-directoryof the backup.  For example "cp $PGDATA/global/pg_control /mnt/server/backupdir/global")<br /><br /><br />
TestingNotes<br /> -----------------------------<br /><br /> I created a standby server from a base backup of another
standbyserver. On this new standby server I then<br /><br /> 1. Ran pg_start_backup('3'); and left the psql connection
open<br/> 2. touch /tmp/3 -- my trigger_file<br /><br /> ssinger@ssinger-laptop:/usr/local/pgsql92git/bin$ LOG: 
triggerfile found: /tmp/3<br /> FATAL:  terminating walreceiver process due to administrator command<br /> LOG: 
restoredlog file "000000010000000000000006" from archive<br /> LOG:  record with zero length at 0/60002F0<br /> LOG: 
restoredlog file "000000010000000000000006" from archive<br /> LOG:  redo done at 0/6000298<br /> LOG:  restored log
file"000000010000000000000006" from archive<br /> PANIC:  record with zero length at 0/6000298<br /> LOG:  startup
process(PID 19011) was terminated by signal 6: Aborted<br /> LOG:  terminating any other active server processes<br />
WARNING: terminating connection because of crash of another server process<br /> DETAIL:  The postmaster has commanded
thisserver process to roll back the current transaction and exit, because another server process exited abnormally and
possiblycorrupted shared memory.<br /> HINT:  In a moment you should be able to reconnect to the database and repeat
yourcommand.<br /><br /> The new postmaster (the one trying to be promoted) dies.  This is somewhat repeatable.<br
/><br/> ----<br /><br /> If a base backup is in progress on a recovery database and that recovery database is promoted
tomaster, following the promotion (if you don't restart the postmaster).  I see<br /> select pg_stop_backup();<br />
ERROR: database system status mismatches between pg_start_backup() and pg_stop_backup()<br /><br /> If you restart the
postmasterthis goes away.  When the postmaster leaves recovery mode I think it should abort an existing base backup so
pg_stop_backup()will say no backup in progress, or give an error message on pg_stop_backup() saying that the base
backupwon't be usable.  The above error doesn't really tell the user why there is a mismatch.<br /><br /> ---------<br
/><br/> In my testing a few times I got into a situation where a standby server coming from a recovery target took a
whileto finish recovery (this is on a database with no activity).  Then when i tried promoting that server to master I
got<br/><br /> LOG:  trigger file found: /tmp/3<br /> FATAL:  terminating walreceiver process due to administrator
command<br/> LOG:  restored log file "000000010000000000000009" from archive<br /> LOG:  restored log file
"000000010000000000000009"from archive<br /> LOG:  redo done at 0/90000E8<br /> LOG:  restored log file
"000000010000000000000009"from archive<br /> PANIC:  unexpected pageaddr 0/6000000 in log file 0, segment 9, offset
0<br/> LOG:  startup process (PID 1804) was terminated by signal 6: Aborted<br /> LOG:  terminating any other active
serverprocesses<br /><br /><br /> It is *possible* I mixed up the order of a step somewhere since my testing isn't
scriptbased. A standby server that 'looks' okay but can't actually be promoted is dangerous.<br /><br /> This version
ofthe patch (I was testing the Sept 22nd version) seems less stable than how I remember the version from the July CF. 
MaybeI'm just testing it harder or maybe something has been broken.<br /><br /><br /><br /><blockquote
cite="mid:CAHGQGwEQpF2nY1CTZkioXu=ifZtVTjF0dq_RiqJUzu7MGOACjw@mail.gmail.com"type="cite"><pre wrap="">In the current
patch,there is no safeguard for preventing users from
 
taking backup during recovery when FPW is disabled. This is unsafe.
Are you planning to implement such a safeguard?

</pre></blockquote><br /> I agree with Fujii that we need a way (on the recovery machine) to detect if the master
doesn'thave FPW on. The ideas up-thread on how to do this sound good.<br /><br /><br /><blockquote
cite="mid:CAHGQGwEQpF2nY1CTZkioXu=ifZtVTjF0dq_RiqJUzu7MGOACjw@mail.gmail.com"type="cite"><pre wrap="">Regards,
 

</pre> <pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>

</pre></blockquote><br />

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Inlining comparators as a performance optimisation
Next
From: Noah Misch
Date:
Subject: Re: [v9.2] Fix Leaky View Problem