Thread: Online backup cause boot failure, anyone know why?
I want to create a database backup when PG is running, so I call pg_start_backup(''), scp the data to a backup directory,pg_stop_backup. Then I reboot PG , PG boot failed with log like "unexpected pageaddr X/X in log file X, segment X, offset X" "WAL ends beforeend time of backup dump". Then I check the failure XLOG file, found the error page contains a pageaddr 8K before it should be, and the failure XLOGrecord a ONLINE CHECKPONT with 60 bytes in former page, the other 4 bytes missing. Any one met this before? Please help me! -------------- Richard 2010-08-05
PS : I am using PG 8.3.7 ------------------ Richard 2010-08-05 ------------------------------------------------------------- 发件人:Richard 发送日期:2010-08-05 21:19:27 收件人:pgsql-hackers 抄送: 主题:Online backup cause boot failure, anyone know why? I want to create a database backup when PG is running, so I call pg_start_backup(''), scp the data to a backup directory,pg_stop_backup. Then I reboot PG , PG boot failed with log like "unexpected pageaddr X/X in log file X, segment X, offset X" "WAL ends beforeend time of backup dump". Then I check the failure XLOG file, found the error page contains a pageaddr 8K before it should be, and the failure XLOGrecord a ONLINE CHECKPONT with 60 bytes in former page, the other 4 bytes missing. Any one met this before? Please help me! -------------- Richard 2010-08-05
On 08/05/2010 09:19 AM, Richard wrote: > I want to create a database backup when PG is running, so I call pg_start_backup(''), scp the data to a backup directory,pg_stop_backup. > Then I reboot PG , PG boot failed with log like "unexpected pageaddr X/X in log file X, segment X, offset X" "WAL endsbefore end time of backup dump". > Then I check the failure XLOG file, found the error page contains a pageaddr 8K before it should be, and the failure XLOGrecord a ONLINE CHECKPONT with 60 bytes in former page, the other 4 bytes missing. > > Any one met this before? Please help me! > > This question really belongs on the pgsql-general list, not the -hackers list. If all you copied was the data directory then you haven't done this right anyway. See <http://www.postgresql.org/docs/8.3/static/continuous-archiving.html#BACKUP-TIPS> Why did you reboot postgres after taking your backup? cheers andrew
I reboot PG because I found PG recovery end point if far away from the actual end point of the XLOG on the backup directory,so I want to test if the original DB is OK. Unfortunately, I got the same PG log on the original DB. I don't unstand what you said, I missing what? ------------------ Richard 2010-08-05 ------------------------------------------------------------- 发件人:Andrew Dunstan 发送日期:2010-08-05 21:40:13 收件人:Richard 抄送:pgsql-hackers 主题:Re: [HACKERS] Online backup cause boot failure, anyone know why? On 08/05/2010 09:19 AM, Richard wrote: > I want to create a database backup when PG is running, so I call pg_start_backup(''), scp the data to a backup directory,pg_stop_backup. > Then I reboot PG , PG boot failed with log like "unexpected pageaddr X/X in log file X, segment X, offset X" "WAL endsbefore end time of backup dump". > Then I check the failure XLOG file, found the error page contains a pageaddr 8K before it should be, and the failure XLOGrecord a ONLINE CHECKPONT with 60 bytes in former page, the other 4 bytes missing. > > Any one met this before? Please help me! > > This question really belongs on the pgsql-general list, not the -hackers list. If all you copied was the data directory then you haven't done this right anyway. See <http://www.postgresql.org/docs/8.3/static/continuous-archiving.html#BACKUP-TIPS> Why did you reboot postgres after taking your backup? cheers andrew
"Richard" <husttripper@vip.sina.com> writes: > PS : I am using PG 8.3.7 I believe there's a related bug fix in 8.3.8. BTW, -hackers is not the place for this type of question. regards, tom lane
On Thu, Aug 5, 2010 at 9:50 AM, Richard <husttripper@vip.sina.com> wrote: > I reboot PG because I found PG recovery end point if far away from the actual end point of the XLOG on the backup directory,so I want to test if the original DB is OK. > Unfortunately, I got the same PG log on the original DB. I don't unstand what you said, I missing what? The transaction logs archived during the backup? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Thanks for replying. But I could not find any relation between the RequestXLogSwitch function and the error I met. For perfromance purpose , I change the pg_start_backup checkpoint type from CHECKPOINT_WAIT to CHECKPOINT_IMMEDIATE, doesit matter? ------------------ Richard 2010-08-05 ------------------------------------------------------------- 发件人:Tom Lane 发送日期:2010-08-05 22:04:30 收件人:Richard 抄送:pgsql-hackers 主题:Re: [HACKERS] Online backup cause boot failure, anyone know why? "Richard" <husttripper@vip.sina.com> writes: > PS : I am using PG 8.3.7 I believe there's a related bug fix in 8.3.8. BTW, -hackers is not the place for this type of question. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Re: [HACKERS] Re: Re: [HACKERS] Online backup cause boot failure,anyone know why?
From
"Richard"
Date:
Oh sorry, I missed something. I turned off the XLOG archive in code after pg_start_backup so the pg_xlog directory containsall the xlog files. And for performance purpose, I change the checkpoint type in pg_start_backup to CHECKPOINT_IMMEDIATE, does it matter? The PG log I mentioned above is the running error log not the XLOG. ------------------ Richard 2010-08-05 ------------------------------------------------------------- 发件人:Robert Haas 发送日期:2010-08-05 22:07:45 收件人:Richard 抄送:Andrew Dunstan; pgsql-hackers 主题:Re: [HACKERS] Re: Re: [HACKERS] Online backup cause boot failure,anyone know why? On Thu, Aug 5, 2010 at 9:50 AM, Richard <husttripper@vip.sina.com> wrote: > I reboot PG because I found PG recovery end point if far away from the actual end point of the XLOG on the backup directory,so I want to test if the original DB is OK. > Unfortunately, I got the same PG log on the original DB. I don't unstand what you said, I missing what? The transaction logs archived during the backup? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for replying. But I could't find relation between the RequestXLogSwitch function and the error I met. For perfromance purpose , I change the pg_start_backup checkpoint type from CHECKPOINT_WAIT to CHECKPOINT_IMMEDIATE, doesit matter? ------------------ Richard 2010-08-05 ------------------------------------------------------------- 发件人:Tom Lane 发送日期:2010-08-05 22:04:30 收件人:Richard 抄送:pgsql-hackers 主题:Re: [HACKERS] Online backup cause boot failure, anyone know why? "Richard" <husttripper@vip.sina.com> writes: > PS : I am using PG 8.3.7 I believe there's a related bug fix in 8.3.8. BTW, -hackers is not the place for this type of question. regards, tom lane
On Thu, Aug 5, 2010 at 10:20 AM, Richard <husttripper@vip.sina.com> wrote: > Oh sorry, I missed something. I turned off the XLOG archive in code after pg_start_backup so the pg_xlog directory containsall the xlog files. > And for performance purpose, I change the checkpoint type in pg_start_backup to CHECKPOINT_IMMEDIATE, does it matter? > The PG log I mentioned above is the running error log not the XLOG. Well, it's pretty clear that you're missing some WAL; otherwise, you wouldn't be getting an error that says "WAL ends before end time of backup dump". It's hard to speculate as to whether that's a configuration problem or a result of your custom modifications to the source code, since you haven't provided many details about either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
"Richard" <husttripper@vip.sina.com> writes: > For perfromance purpose , I change the pg_start_backup checkpoint type from CHECKPOINT_WAIT to CHECKPOINT_IMMEDIATE, doesit matter? Oh, so this isn't so much "8.3.7" as "randomly-hacked-up 8.3.7". Yes, that'd break it, I believe. CHECKPOINT_IMMEDIATE doesn't imply waiting. regards, tom lane
Re: Re: Re: [HACKERS] Re: Re: [HACKERS] Online backup cause bootfailure,anyone know why?
From
"Richard"
Date:
Thanks for your patience. I use XLogCtl->Insert.forcePageWrites for XLOG recycling flag. So after pg_start_backup, no more XLOG files will be recycled.And as I said above, I make a CHECKPOINT_IMMEDIATE checkpoint in pg_start_backup, instead CHECKPOINT_WAIT. That all I did to code. I wonder whether the XLOG is corrupted, because the first error is "unexpected pageaddr %X/%X in log file %u, segment %u,offset %u" .The error page addr contains a LSN 8K before it should do and I compare the two pages , they are almost the same except the last several bytes. So it should not be missing some XLOG, can be the XLOG file or buffer was corrupted. ------------------ Richard 2010-08-05 ------------------------------------------------------------- 发件人:Robert Haas 发送日期:2010-08-05 22:38:37 收件人:Richard 抄送:Andrew Dunstan; pgsql-hackers 主题:Re: Re: [HACKERS] Re: Re: [HACKERS] Online backup cause bootfailure,anyone know why? On Thu, Aug 5, 2010 at 10:20 AM, Richard <husttripper@vip.sina.com> wrote: > Oh sorry, I missed something. I turned off the XLOG archive in code after pg_start_backup so the pg_xlog directory containsall the xlog files. > And for performance purpose, I change the checkpoint type in pg_start_backup to CHECKPOINT_IMMEDIATE, does it matter? > The PG log I mentioned above is the running error log not the XLOG. Well, it's pretty clear that you're missing some WAL; otherwise, you wouldn't be getting an error that says "WAL ends before end time of backup dump". It's hard to speculate as to whether that's a configuration problem or a result of your custom modifications to the source code, since you haven't provided many details about either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
I am sorry, my English is poor. I was confused by what you said. What do you mean by saying "that'd break it"! ------------------ Richard 2010-08-05 ------------------------------------------------------------- 发件人:Tom Lane 发送日期:2010-08-05 22:44:50 收件人:Richard 抄送:pgsql-hackers 主题:Re: [HACKERS] Online backup cause boot failure, anyone know why? "Richard" <husttripper@vip.sina.com> writes: > For perfromance purpose , I change the pg_start_backup checkpoint type from CHECKPOINT_WAIT to CHECKPOINT_IMMEDIATE, doesit matter? Oh, so this isn't so much "8.3.7" as "randomly-hacked-up 8.3.7". Yes, that'd break it, I believe. CHECKPOINT_IMMEDIATE doesn't imply waiting. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
I am sorry, my English is poor. I was confused by what you said. What do you mean by saying "that'd break it"! ------------------ Richard 2010-08-05 ------------------------------------------------------------- 发件人:Tom Lane 发送日期:2010-08-05 22:44:50 收件人:Richard 抄送:pgsql-hackers 主题:Re: [HACKERS] Online backup cause boot failure, anyone know why? "Richard" <husttripper@vip.sina.com> writes: > For perfromance purpose , I change the pg_start_backup checkpoint type from CHECKPOINT_WAIT to CHECKPOINT_IMMEDIATE, doesit matter? Oh, so this isn't so much "8.3.7" as "randomly-hacked-up 8.3.7". Yes, that'd break it, I believe. CHECKPOINT_IMMEDIATE doesn't imply waiting. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 05/08/10 17:56, Richard wrote: > I am sorry, my English is poor. > I was confused by what you said. > What do you mean by saying "that'd break it"! Replacing CHECKPOINT_WAIT with CHECKPOINT_IMMEDIATE broke it. Don't do that. If you want to change the behavior of pg_start_backup() to perform the checkpoint immediately, change "CHECKPOINT_WAIT" to "CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE". The usual work-around though is not to hack the source code, but perform a manual CHECKPOINT just before calling pg_start_backuo(). That makes the checkpoint performed by pg_start_backup() finish quickly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
All jods are done by client code, not manually. I still did't not understand what you said. What break what? Thandks! ------------------ Richard 2010-08-05 ------------------------------------------------------------- 发件人:Heikki Linnakangas 发送日期:2010-08-05 23:21:54 收件人:Richard 抄送:Tom Lane; pgsql-hackers 主题:Re: [HACKERS] Online backup cause boot failure, anyone know why? On 05/08/10 17:56, Richard wrote: > I am sorry, my English is poor. > I was confused by what you said. > What do you mean by saying "that'd break it"! Replacing CHECKPOINT_WAIT with CHECKPOINT_IMMEDIATE broke it. Don't do that. If you want to change the behavior of pg_start_backup() to perform the checkpoint immediately, change "CHECKPOINT_WAIT" to "CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE". The usual work-around though is not to hack the source code, but perform a manual CHECKPOINT just before calling pg_start_backuo(). That makes the checkpoint performed by pg_start_backup() finish quickly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
2010/8/5 Richard <husttripper@vip.sina.com>: > All jods are done by client code, not manually. What is a jod? > I still did't not understand what you said. > What break what? The fact that you replaced CHECKPOINT_WAIT with CHECKPOINT_IMMEDIATE is the cause of your problem. You "broke" the correctness of the system by doing so. Nicolas
Sorry, wrong word, it should be job. You mean the wrong type of checkpoint causes XLOG file recovery fail? I was confused, the XLOG files seem corrupted, is it also caused by the checkpoint type? If so , why it can do this? ------------------ Richard 2010-08-05 ------------------------------------------------------------- 发件人:Nicolas Barbier 发送日期:2010-08-05 23:43:22 收件人:Richard 抄送:Heikki Linnakangas; Tom Lane; pgsql-hackers 主题:Re: [HACKERS] Online backup cause boot failure, anyone know why? 2010/8/5 Richard <husttripper@vip.sina.com>: > All jods are done by client code, not manually. What is a jod? > I still did't not understand what you said. > What break what? The fact that you replaced CHECKPOINT_WAIT with CHECKPOINT_IMMEDIATE is the cause of your problem. You "broke" the correctness of the system by doing so. Nicolas
Let's be clear. If you change the postgres code and then things break I think you're pretty much on your own. We can accept some responsibility for helping you if you're running our code, but not if you're running our code which you have subsequently mangled. If you break things you get to fix them. cheers andrew On 08/05/2010 10:20 AM, Richard wrote: > Oh sorry, I missed something. I turned off the XLOG archive in code after pg_start_backup so the pg_xlog directory containsall the xlog files. > And for performance purpose, I change the checkpoint type in pg_start_backup to CHECKPOINT_IMMEDIATE, does it matter? > The PG log I mentioned above is the running error log not the XLOG. > > ------------------ > Richard > 2010-08-05 > > ------------------------------------------------------------- > 发件人:Robert Haas > 发送日期:2010-08-05 22:07:45 > 收件人:Richard > 抄送:Andrew Dunstan; pgsql-hackers > 主题:Re: [HACKERS] Re: Re: [HACKERS] Online backup cause boot failure,anyone know why? > > On Thu, Aug 5, 2010 at 9:50 AM, Richard<husttripper@vip.sina.com> wrote: >> I reboot PG because I found PG recovery end point if far away from the actual end point of the XLOG on the backup directory,so I want to test if the original DB is OK. >> Unfortunately, I got the same PG log on the original DB. I don't unstand what you said, I missing what? > The transaction logs archived during the backup? >