Thread: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
"Chris"
Date:
The following bug has been logged online: Bug reference: 5602 Logged by: Chris Email address: christoph.springer@magicinternet.de PostgreSQL version: 9.0beta3 Operating system: Debian 5.0.3 Description: Recovering from Hot-Standby file backup leads to the currupted indexes Details: A Hot-Standby database is being backed up to a new machine like this: 0. pause WAL file deletion from Hot-Standby 1. backup pg_control 2. sync data and tablespace files 3. wait for checkout using pg_controldata and waiting for UNDO address change 4. copy all WAL files 5. recover the newly created database using WAL files The database recovers and starts normally, but some of the indexes are corrupt afterwards. I could not find any strange log record in logs for the recovered database. This is a definition of a index that is corrupt afterwards: CREATE INDEX idx_tbl_textcol1_textcol2 ON tbl USING btree (textcol1, textcol2) TABLESPACE data5_tbs; Thank you in advance, Chris
Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Tom Lane
Date:
"Chris" <christoph.springer@magicinternet.de> writes: > A Hot-Standby database is being backed up to a new machine like this: > 0. pause WAL file deletion from Hot-Standby > 1. backup pg_control > 2. sync data and tablespace files > 3. wait for checkout using pg_controldata and waiting for UNDO address > change > 4. copy all WAL files > 5. recover the newly created database using WAL files This process seems almost entirely unrelated to the documented way of doing it; I'm not surprised that you end up with some files not in sync. Please see pg_start_backup and friends. regards, tom lane
Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
valgog
Date:
> This process seems almost entirely unrelated to the documented way of > doing it; I'm not surprised that you end up with some files not in sync. > Please see pg_start_backup and friends. > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 regards, tom lane > Hallo Tom, It was done as documented in http://www.postgresql.org/docs/9.0/static/back= up-incremental-updated.html With my best regards, -- Valentine
Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Tom Lane
Date:
valgog <valgog@gmail.com> writes: >> This process seems almost entirely unrelated to the documented way of >> doing it; I'm not surprised that you end up with some files not in sync. >> Please see pg_start_backup and friends. > It was done as documented in http://www.postgresql.org/docs/9.0/static/backup-incremental-updated.html Oh, I missed that you were copying from a hot-standby slave rather than the master. Still, your procedure doesn't clearly match step 2, and that step is the weak point of the process --- the grandchild slave isn't consistent until it's replayed WAL far enough, but we don't have any automated support for verifying that. (I hope that's going to get improved in 9.1.) I suspect you allowed the grandchild to go live before it was really consistent. regards, tom lane
Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Simon Riggs
Date:
On Thu, 2010-08-05 at 11:28 -0700, valgog wrote: > > This process seems almost entirely unrelated to the documented way of > > doing it; I'm not surprised that you end up with some files not in sync. > > Please see pg_start_backup and friends. > It was done as documented in http://www.postgresql.org/docs/9.0/static/backup-incremental-updated.html The procedure used does differ from that documented. However, IMHO the procedure *documented* is *not* safe and could lead to corrupt indexes in the way described, since the last recovered point might be mid-way between two halves of an index split record, which will never be corrected during HS. What I find surprising is that the technique the OP describes should be safe, assuming step 5 waits for the correct point of consistency before attempting to run queries. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services
Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes: > On Thu, 2010-08-05 at 11:28 -0700, valgog wrote: >> It was done as documented in http://www.postgresql.org/docs/9.0/static/backup-incremental-updated.html > The procedure used does differ from that documented. However, IMHO the > procedure *documented* is *not* safe and could lead to corrupt indexes > in the way described, since the last recovered point might be mid-way > between two halves of an index split record, which will never be > corrected during HS. Hm, I was looking at that and thinking it seemed unsafe for entirely different reasons. But if you didn't write it, who did? regards, tom lane
Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Fujii Masao
Date:
On Fri, Aug 6, 2010 at 12:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hm, I was looking at that and thinking it seemed unsafe for entirely > different reasons. =A0But if you didn't write it, who did? Incrementally Updated Backups technique has been in the document since 8.2. In the development cycle of 9.0, I and Heikki appended some description to make the technique more robust; pg_control file should be backed up first and the backup end point should be checked before running query. If it's unsafe, I'm happy to modify it. Which part looks unsafe? Regards, --=20 Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Chris
Date:
Hi all, the procedure that waits for the checkpoint location change looks like:: function wait_for_checkpoint_location_change() { PRE_WAIT=$( pg_controldata $PGDATA | awk -F: '/Latest checkpoint location/ { print $2 }' ) log "Waiting for checkpoint" while true ; do sleep 5 if [[ "$PRE_WAIT" != "$( pg_controldata $PGDATA | awk -F: '/ Latest checkpoint location/ { print $2 }' )" ]] ; then log "Checkpoint." return fi done } The grandchild database log has the following records about the recovery process:: 2010-08-06 09:48:31.266 CEST,,,30649,,4c5bbe46.77b9,15,,2010-08-06 09:48:22 CEST,,0,LOG,00000,"restored log file ""000000010000005B000000DC"" from archive",,, ,,,,,,"" 2010-08-06 09:48:31.564 CEST,,,30649,,4c5bbe46.77b9,16,,2010-08-06 09:48:22 CEST,,0,LOG,00000,"consistent recovery state reached at 5B/ DD000000",,,,,,,,,"" 2010-08-06 09:48:31.751 CEST,,,30649,,4c5bbe46.77b9,17,,2010-08-06 09:48:22 CEST,,0,LOG,00000,"restored log file ""000000010000005B000000DD"" from archive",,, ,,,,,,"" Regards, -- Chris
Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Fujii Masao
Date:
On Fri, Aug 6, 2010 at 7:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > The procedure used does differ from that documented. However, IMHO the > procedure *documented* is *not* safe and could lead to corrupt indexes > in the way described, since the last recovered point might be mid-way > between two halves of an index split record, which will never be > corrected during HS. An index split record is replayed by two calls of rm_redo()? If not, we don't need to worry about the above since the last recovered point which pg_last_xlog_replay_location() returns is updated after every rm_redo(). Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Tom Lane
Date:
Fujii Masao <masao.fujii@gmail.com> writes: > On Fri, Aug 6, 2010 at 7:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> The procedure used does differ from that documented. However, IMHO the >> procedure *documented* is *not* safe and could lead to corrupt indexes >> in the way described, since the last recovered point might be mid-way >> between two halves of an index split record, which will never be >> corrected during HS. > An index split record is replayed by two calls of rm_redo()? If not, > we don't need to worry about the above since the last recovered point > which pg_last_xlog_replay_location() returns is updated after every > rm_redo(). Yeah, I thought that was bogus too. If we're following a live master, the second xlog record should be along shortly, and in any case queries will give the correct result in between. The problem is only interesting if the WAL series ends and we have to cons up the split completion by ourselves; but the logic to do that does exist. What was bothering me about the procedure is that it's not clear when the new slave has reached consistency, in the sense of having used WAL to clean up any out-of-sync conditions in the base backup it was started from. So you can't be sure when it's okay to begin treating it as a trustworthy backup or potential master. We track the minimum safe recovery point for normal PITR recovery cases, but that mechanism isn't available for slaves cloned according to this procedure. So the DBA is just flying blind as to whether the slave is trustworthy yet. I can't prove that that's what burnt the original complainant, but it fits the symptoms. regards, tom lane
Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Simon Riggs
Date:
On Thu, 2010-08-12 at 01:31 -0400, Tom Lane wrote: > So the DBA is > just flying blind as to whether the slave is trustworthy yet. I can't > prove that that's what burnt the original complainant, but it fits the > symptoms. The safest approach is to 1. run pg_start_backup() on master, remember LSN 2. copy backup_label from master to standby 3. wait for starting LSN to be applied on standby 4. run backup on standby 5. run pg_stop_backup() on master That ensures we don't run without full page writes during backup and that we have an explicit consistency point to work from. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services
Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Fujii Masao
Date:
On Thu, Aug 12, 2010 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > What was bothering me about the procedure is that it's not clear when > the new slave has reached consistency, in the sense of having used WAL > to clean up any out-of-sync conditions in the base backup it was started > from. =A0So you can't be sure when it's okay to begin treating it as a > trustworthy backup or potential master. =A0We track the minimum safe > recovery point for normal PITR recovery cases, but that mechanism isn't > available for slaves cloned according to this procedure. =A0So the DBA is > just flying blind as to whether the slave is trustworthy yet. =A0I can't > prove that that's what burnt the original complainant, but it fits the > symptoms. The step 2 of the procedure can ensure that new slave has reached consistency. No? Regards, --=20 Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Fujii Masao
Date:
On Thu, Aug 12, 2010 at 4:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > The safest approach is to > > 1. run pg_start_backup() on master, remember LSN > 2. copy backup_label from master to standby > 3. wait for starting LSN to be applied on standby ISTM we should wait for the latest checkpoint redo location to reach the starting location which pg_start_backup() returns. > That ensures we don't run without full page writes during backup and > that we have an explicit consistency point to work from. Yeah, the current documented procedure might not work when full_page_writes is off on the master. This should be documented. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Valentine Gogichashvili
Date:
Hi, Actually full_page_write being turned off on the master is probably a problem. -- Valentine On Thu, Aug 12, 2010 at 9:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Aug 12, 2010 at 4:18 PM, Simon Riggs <simon@2ndquadrant.com> > wrote: > > The safest approach is to > > > > 1. run pg_start_backup() on master, remember LSN > > 2. copy backup_label from master to standby > > 3. wait for starting LSN to be applied on standby > > ISTM we should wait for the latest checkpoint redo location to reach > the starting location which pg_start_backup() returns. > > > That ensures we don't run without full page writes during backup and > > that we have an explicit consistency point to work from. > > Yeah, the current documented procedure might not work when > full_page_writes is off on the master. This should be documented. > > Regards, > > -- > Fujii Masao > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center >
Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Fujii Masao
Date:
On Thu, Aug 12, 2010 at 5:33 PM, Valentine Gogichashvili <valgog@gmail.com> wrote: > Hi, > Actually full_page_write being turned off on the master is probably a > problem. Yep. As Simon suggests, we must run pg_start_backup on the master, to take a backup from the standby safely even if full_page_writes is disabled. So we should: (based on Simon's suggestion) 1. run pg_start_backup() on master. 2. copy backup_label from master to temporary area. copying backup_label directly to standby would generate another weakness (e.g., what if standby is restarted while backup_label exists in standby?), so backup_label should be copied to elsewhere than standby. 3. wait for "Latest checkpoint's REDO location" which pg_controldata on standby returns, to reach or exceed "START WAL LOCATION" in backup_label copied in the step 2. This would take long, but we can run checkpoint on standby to shorten waiting time. 4. run backup on standby 5. run pg_stop_backup() on master 6. copy backup_label from temporary are to backup Is this procedure still unsafe? I'm thinking that it's worth leaving the current documented procedure since it has less impact on the master, if we write clearly that the procedure is safe only when full_page_writes is enabled. Comments? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Tom Lane
Date:
Fujii Masao <masao.fujii@gmail.com> writes: > On Thu, Aug 12, 2010 at 5:33 PM, Valentine Gogichashvili > <valgog@gmail.com> wrote: >> Actually full_page_write being turned off on the master is probably a >> problem. > Yep. As Simon suggests, we must run pg_start_backup on the master, > to take a backup from the standby safely even if full_page_writes > is disabled. So we should: > (based on Simon's suggestion) > 1. run pg_start_backup() on master. > 2. copy backup_label from master to temporary area. > copying backup_label directly to standby would generate another > weakness (e.g., what if standby is restarted while backup_label > exists in standby?), so backup_label should be copied to elsewhere > than standby. > 3. wait for "Latest checkpoint's REDO location" which pg_controldata > on standby returns, to reach or exceed "START WAL LOCATION" in > backup_label copied in the step 2. This would take long, but we > can run checkpoint on standby to shorten waiting time. Hm, can you actually execute CHECKPOINT on a HS slave? Is it guaranteed to cause a restartpoint to be created? > 4. run backup on standby > 5. run pg_stop_backup() on master > 6. copy backup_label from temporary are to backup > Is this procedure still unsafe? This still isn't doing anything to address the problem I'm worried about, which is when does the copy actually reach consistency. The above procedure might guarantee that it eventually will reach consistency, but you don't know when it has. regards, tom lane
Re: Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
From
Fujii Masao
Date:
On Thu, Aug 12, 2010 at 11:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> (based on Simon's suggestion) >> 1. run pg_start_backup() on master. >> 2. copy backup_label from master to temporary area. >> =A0 =A0copying backup_label directly to standby would generate another >> =A0 =A0weakness (e.g., what if standby is restarted while backup_label >> =A0 =A0exists in standby?), so backup_label should be copied to elsewhere >> =A0 =A0than standby. >> 3. wait for "Latest checkpoint's REDO location" which pg_controldata >> =A0 =A0on standby returns, to reach or exceed "START WAL LOCATION" in >> =A0 =A0backup_label copied in the step 2. This would take long, but we >> =A0 =A0can run checkpoint on standby to shorten waiting time. > > Hm, can you actually execute CHECKPOINT on a HS slave? Yes. > =A0Is it guaranteed > to cause a restartpoint to be created? CHECKPOINT on a HS slave creates a restartpoint only when there is CHECKPOINT record which has already been replayed but has not created a restartpoint yet. Such a CHECKPOINT record is expected to exist after the step 2 because it's generated by pg_start_backup in the step1. So executing CHECKPOINT on a HS slave at the step 3 would almost create a restartpoint. But, in file-based log shipping case, it might take long to ship such a CHECKPOINT record. So we might need to execute pg_switch_xlog() on the master before executing CHECKPOINT on the slave. >> 4. run backup on standby >> 5. run pg_stop_backup() on master >> 6. copy backup_label from temporary are to backup > >> Is this procedure still unsafe? > > This still isn't doing anything to address the problem I'm worried > about, which is when does the copy actually reach consistency. =A0The > above procedure might guarantee that it eventually will reach > consistency, but you don't know when it has. Once new standby starting from the backup taken from another standby has reached the backup end location (i.e., it has read the XLOG_BACKUP_END record generated by pg_stop_backup in the step 5), we can think that the database has reach consistency. Since new standby doesn't accept connections from the client until that, we can ensure that the users will not access to inconsistent database. Regards, PS. I'll be unable to read hackers from Aug 13th to 20th because of a vacation. --=20 Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center