Thread: Online backup cause boot failure, anyone know why?

Online backup cause boot failure, anyone know why?

From
"Richard"
Date:
I want to create a database backup when PG is running, so I call pg_start_backup(''), scp the data to a backup
directory,pg_stop_backup. 
 
Then I reboot PG , PG boot failed with log like "unexpected pageaddr X/X in log file X, segment X, offset X"  "WAL ends
beforeend time of backup dump". 
 
Then I check the  failure XLOG file, found the error page contains a pageaddr 8K before it should be, and the failure
XLOGrecord a  ONLINE CHECKPONT with 60 bytes in former page, the other 4 bytes missing. 
 

Any one met this before? Please help me!                   
--------------
Richard
2010-08-05



Re: Online backup cause boot failure, anyone know why?

From
"Richard"
Date:
PS : I am using PG 8.3.7

------------------                 
Richard
2010-08-05

-------------------------------------------------------------
发件人:Richard
发送日期:2010-08-05 21:19:27
收件人:pgsql-hackers
抄送:
主题:Online backup cause boot failure, anyone know why?

I want to create a database backup when PG is running, so I call pg_start_backup(''), scp the data to a backup
directory,pg_stop_backup. 
 
Then I reboot PG , PG boot failed with log like "unexpected pageaddr X/X in log file X, segment X, offset X"  "WAL ends
beforeend time of backup dump". 
 
Then I check the  failure XLOG file, found the error page contains a pageaddr 8K before it should be, and the failure
XLOGrecord a  ONLINE CHECKPONT with 60 bytes in former page, the other 4 bytes missing. 
 

Any one met this before? Please help me!                   
--------------
Richard
2010-08-05

Re: Online backup cause boot failure, anyone know why?

From
Andrew Dunstan
Date:

On 08/05/2010 09:19 AM, Richard wrote:
> I want to create a database backup when PG is running, so I call pg_start_backup(''), scp the data to a backup
directory,pg_stop_backup.
 
> Then I reboot PG , PG boot failed with log like "unexpected pageaddr X/X in log file X, segment X, offset X"  "WAL
endsbefore end time of backup dump".
 
> Then I check the  failure XLOG file, found the error page contains a pageaddr 8K before it should be, and the failure
XLOGrecord a  ONLINE CHECKPONT with 60 bytes in former page, the other 4 bytes missing.
 
>
> Any one met this before? Please help me!
>                   
>

This question really belongs on the pgsql-general list, not the -hackers 
list.

If all you copied was the data directory then you haven't done this 
right anyway. See 
<http://www.postgresql.org/docs/8.3/static/continuous-archiving.html#BACKUP-TIPS>

Why did you reboot postgres after taking your backup?

cheers

andrew


Re: Re: [HACKERS] Online backup cause boot failure, anyone know why?

From
"Richard"
Date:
I reboot PG because  I found PG recovery end point if far away from the actual end point of the XLOG on the backup
directory,so  I want to test if the original DB is OK.  
 
Unfortunately, I got the same PG log on the original DB. I don't unstand what you said, I missing what?  


------------------                 
Richard
2010-08-05

-------------------------------------------------------------
发件人:Andrew Dunstan
发送日期:2010-08-05 21:40:13
收件人:Richard
抄送:pgsql-hackers
主题:Re: [HACKERS] Online backup cause boot failure, anyone know why?



On 08/05/2010 09:19 AM, Richard wrote:
> I want to create a database backup when PG is running, so I call pg_start_backup(''), scp the data to a backup
directory,pg_stop_backup.
 
> Then I reboot PG , PG boot failed with log like "unexpected pageaddr X/X in log file X, segment X, offset X"  "WAL
endsbefore end time of backup dump".
 
> Then I check the  failure XLOG file, found the error page contains a pageaddr 8K before it should be, and the failure
XLOGrecord a  ONLINE CHECKPONT with 60 bytes in former page, the other 4 bytes missing.
 
>
> Any one met this before? Please help me!
>                   
>

This question really belongs on the pgsql-general list, not the -hackers 
list.

If all you copied was the data directory then you haven't done this 
right anyway. See 
<http://www.postgresql.org/docs/8.3/static/continuous-archiving.html#BACKUP-TIPS>

Why did you reboot postgres after taking your backup?

cheers

andrew

Re: Online backup cause boot failure, anyone know why?

From
Tom Lane
Date:
"Richard" <husttripper@vip.sina.com> writes:
> PS : I am using PG 8.3.7

I believe there's a related bug fix in 8.3.8.

BTW, -hackers is not the place for this type of question.
        regards, tom lane


Re: Re: Re: [HACKERS] Online backup cause boot failure, anyone know why?

From
Robert Haas
Date:
On Thu, Aug 5, 2010 at 9:50 AM, Richard <husttripper@vip.sina.com> wrote:
> I reboot PG because  I found PG recovery end point if far away from the actual end point of the XLOG on the backup
directory,so  I want to test if the original DB is OK. 
> Unfortunately, I got the same PG log on the original DB. I don't unstand what you said, I missing what?

The transaction logs archived during the backup?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Online backup cause boot failure, anyone know why?

From
"Richard"
Date:
Thanks for replying.
But I could not find any relation between the RequestXLogSwitch function and the error I met.
For perfromance purpose , I change the pg_start_backup checkpoint type from CHECKPOINT_WAIT to  CHECKPOINT_IMMEDIATE,
doesit matter?
 

------------------                 
Richard
2010-08-05

-------------------------------------------------------------
发件人:Tom Lane
发送日期:2010-08-05 22:04:30
收件人:Richard
抄送:pgsql-hackers
主题:Re: [HACKERS] Online backup cause boot failure, anyone know why?

"Richard" <husttripper@vip.sina.com> writes:
> PS : I am using PG 8.3.7

I believe there's a related bug fix in 8.3.8.

BTW, -hackers is not the place for this type of question.
        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Oh sorry, I missed something. I turned off the XLOG archive in code after pg_start_backup so the pg_xlog directory
containsall the xlog files.
 
And for performance purpose, I change the checkpoint type in pg_start_backup to CHECKPOINT_IMMEDIATE, does it matter? 
The PG log I mentioned above is the running error log not the XLOG.

------------------                 
Richard
2010-08-05

-------------------------------------------------------------
发件人:Robert Haas
发送日期:2010-08-05 22:07:45
收件人:Richard
抄送:Andrew Dunstan; pgsql-hackers
主题:Re: [HACKERS] Re: Re: [HACKERS] Online backup cause boot failure,anyone know why?

On Thu, Aug 5, 2010 at 9:50 AM, Richard <husttripper@vip.sina.com> wrote:
> I reboot PG because  I found PG recovery end point if far away from the actual end point of the XLOG on the backup
directory,so  I want to test if the original DB is OK.
 
> Unfortunately, I got the same PG log on the original DB. I don't unstand what you said, I missing what?

The transaction logs archived during the backup?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Online backup cause boot failure, anyone know why?

From
"Richard"
Date:
Thanks for replying.
But I could't find  relation between the RequestXLogSwitch function and the error I met.
For perfromance purpose , I change the pg_start_backup checkpoint type from CHECKPOINT_WAIT to  CHECKPOINT_IMMEDIATE,
doesit matter?
 

------------------                 
Richard
2010-08-05

-------------------------------------------------------------
发件人:Tom Lane
发送日期:2010-08-05 22:04:30
收件人:Richard
抄送:pgsql-hackers
主题:Re: [HACKERS] Online backup cause boot failure, anyone know why?

"Richard" <husttripper@vip.sina.com> writes:
> PS : I am using PG 8.3.7

I believe there's a related bug fix in 8.3.8.

BTW, -hackers is not the place for this type of question.
        regards, tom lane

Re: Re: Re: [HACKERS] Online backup cause boot failure,anyone know why?

From
Robert Haas
Date:
On Thu, Aug 5, 2010 at 10:20 AM, Richard <husttripper@vip.sina.com> wrote:
> Oh sorry, I missed something. I turned off the XLOG archive in code after pg_start_backup so the pg_xlog directory
containsall the xlog files.
 
> And for performance purpose, I change the checkpoint type in pg_start_backup to CHECKPOINT_IMMEDIATE, does it
matter?
> The PG log I mentioned above is the running error log not the XLOG.

Well, it's pretty clear that you're missing some WAL; otherwise, you
wouldn't be getting an error that says "WAL ends before end time of
backup dump".  It's hard to speculate as to whether that's a
configuration problem or a result of your custom modifications to the
source code, since you haven't provided many details about either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Online backup cause boot failure, anyone know why?

From
Tom Lane
Date:
"Richard" <husttripper@vip.sina.com> writes:
> For perfromance purpose , I change the pg_start_backup checkpoint type from CHECKPOINT_WAIT to  CHECKPOINT_IMMEDIATE,
doesit matter?
 

Oh, so this isn't so much "8.3.7" as "randomly-hacked-up 8.3.7".

Yes, that'd break it, I believe.  CHECKPOINT_IMMEDIATE doesn't imply
waiting.
        regards, tom lane


Thanks for your patience.
I use XLogCtl->Insert.forcePageWrites  for XLOG recycling flag. So after pg_start_backup, no more XLOG files will be
recycled.And as I said above, 
 
I make a  CHECKPOINT_IMMEDIATE checkpoint in pg_start_backup, instead CHECKPOINT_WAIT. That all I did to code.
I wonder whether the XLOG is corrupted, because the first error is  "unexpected pageaddr %X/%X in log file %u, segment
%u,offset %u" .The error page addr contains
 
a LSN 8K before it should do and I compare the two pages , they are almost the same except the last several bytes.  
So it should not be missing some XLOG,  can be the XLOG file or buffer was corrupted. 

------------------                 
Richard
2010-08-05

-------------------------------------------------------------
发件人:Robert Haas
发送日期:2010-08-05 22:38:37
收件人:Richard
抄送:Andrew Dunstan; pgsql-hackers
主题:Re: Re: [HACKERS] Re: Re: [HACKERS] Online backup cause bootfailure,anyone know why?

On Thu, Aug 5, 2010 at 10:20 AM, Richard <husttripper@vip.sina.com> wrote:
> Oh sorry, I missed something. I turned off the XLOG archive in code after pg_start_backup so the pg_xlog directory
containsall the xlog files.
 
> And for performance purpose, I change the checkpoint type in pg_start_backup to CHECKPOINT_IMMEDIATE, does it
matter?
> The PG log I mentioned above is the running error log not the XLOG.

Well, it's pretty clear that you're missing some WAL; otherwise, you
wouldn't be getting an error that says "WAL ends before end time of
backup dump".  It's hard to speculate as to whether that's a
configuration problem or a result of your custom modifications to the
source code, since you haven't provided many details about either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Online backup cause boot failure, anyone know why?

From
"Richard"
Date:
I am sorry, my English is poor.
I was confused by what you said.
What do you mean by saying   "that'd break it"!

------------------                 
Richard
2010-08-05

-------------------------------------------------------------
发件人:Tom Lane
发送日期:2010-08-05 22:44:50
收件人:Richard
抄送:pgsql-hackers
主题:Re: [HACKERS] Online backup cause boot failure, anyone know why?

"Richard" <husttripper@vip.sina.com> writes:
> For perfromance purpose , I change the pg_start_backup checkpoint type from CHECKPOINT_WAIT to  CHECKPOINT_IMMEDIATE,
doesit matter?
 

Oh, so this isn't so much "8.3.7" as "randomly-hacked-up 8.3.7".

Yes, that'd break it, I believe.  CHECKPOINT_IMMEDIATE doesn't imply
waiting.
        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Online backup cause boot failure, anyone know why?

From
"Richard"
Date:
I am sorry, my English is poor.
I was confused by what you said.
What do you mean by saying   "that'd break it"!

------------------                 
Richard
2010-08-05

-------------------------------------------------------------
发件人:Tom Lane
发送日期:2010-08-05 22:44:50
收件人:Richard
抄送:pgsql-hackers
主题:Re: [HACKERS] Online backup cause boot failure, anyone know why?

"Richard" <husttripper@vip.sina.com> writes:
> For perfromance purpose , I change the pg_start_backup checkpoint type from CHECKPOINT_WAIT to  CHECKPOINT_IMMEDIATE,
doesit matter?
 

Oh, so this isn't so much "8.3.7" as "randomly-hacked-up 8.3.7".

Yes, that'd break it, I believe.  CHECKPOINT_IMMEDIATE doesn't imply
waiting.
        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Online backup cause boot failure, anyone know why?

From
Heikki Linnakangas
Date:
On 05/08/10 17:56, Richard wrote:
> I am sorry, my English is poor.
> I was confused by what you said.
> What do you mean by saying   "that'd break it"!

Replacing CHECKPOINT_WAIT with CHECKPOINT_IMMEDIATE broke it. Don't do that.

If you want to change the behavior of pg_start_backup() to perform the
checkpoint immediately, change "CHECKPOINT_WAIT" to "CHECKPOINT_WAIT |
CHECKPOINT_IMMEDIATE".

The usual work-around though is not to hack the source code, but perform
a manual CHECKPOINT just before calling pg_start_backuo(). That makes
the checkpoint performed by pg_start_backup() finish quickly.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Online backup cause boot failure, anyone know why?

From
"Richard"
Date:
All jods are done by client code, not manually. 
I still did't not understand what you said.
What break what?
Thandks!

------------------                 
Richard
2010-08-05

-------------------------------------------------------------
发件人:Heikki Linnakangas
发送日期:2010-08-05 23:21:54
收件人:Richard
抄送:Tom Lane; pgsql-hackers
主题:Re: [HACKERS] Online backup cause boot failure, anyone know why?

On 05/08/10 17:56, Richard wrote:
> I am sorry, my English is poor.
> I was confused by what you said.
> What do you mean by saying   "that'd break it"!

Replacing CHECKPOINT_WAIT with CHECKPOINT_IMMEDIATE broke it. Don't do that.

If you want to change the behavior of pg_start_backup() to perform the
checkpoint immediately, change "CHECKPOINT_WAIT" to "CHECKPOINT_WAIT |
CHECKPOINT_IMMEDIATE".

The usual work-around though is not to hack the source code, but perform
a manual CHECKPOINT just before calling pg_start_backuo(). That makes
the checkpoint performed by pg_start_backup() finish quickly.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com

Re: Online backup cause boot failure, anyone know why?

From
Nicolas Barbier
Date:
2010/8/5 Richard <husttripper@vip.sina.com>:

> All jods are done by client code, not manually.

What is a jod?

> I still did't not understand what you said.
> What break what?

The fact that you replaced CHECKPOINT_WAIT with CHECKPOINT_IMMEDIATE
is the cause of your problem. You "broke" the correctness of the
system by doing so.

Nicolas


Re: Re: [HACKERS] Online backup cause boot failure, anyone know why?

From
"Richard"
Date:
Sorry, wrong word, it should be job.
You mean the wrong type of checkpoint causes XLOG file recovery fail?
I was confused, the XLOG files seem corrupted, is it also caused by the checkpoint type? If so , why it can do this?

------------------                 
Richard
2010-08-05

-------------------------------------------------------------
发件人:Nicolas Barbier
发送日期:2010-08-05 23:43:22
收件人:Richard
抄送:Heikki Linnakangas; Tom Lane; pgsql-hackers
主题:Re: [HACKERS] Online backup cause boot failure, anyone know why?

2010/8/5 Richard <husttripper@vip.sina.com>:

> All jods are done by client code, not manually.

What is a jod?

> I still did't not understand what you said.
> What break what?

The fact that you replaced CHECKPOINT_WAIT with CHECKPOINT_IMMEDIATE
is the cause of your problem. You "broke" the correctness of the
system by doing so.

Nicolas

Re: Re: Re: [HACKERS] Online backup cause boot failure,anyone know why?

From
Andrew Dunstan
Date:
Let's be clear. If you change the postgres code and then things break I 
think you're pretty much on your own. We can accept some responsibility 
for helping you if you're running our code, but not if you're running 
our code which you have subsequently mangled. If you break things you 
get to fix them.

cheers

andrew

On 08/05/2010 10:20 AM, Richard wrote:
> Oh sorry, I missed something. I turned off the XLOG archive in code after pg_start_backup so the pg_xlog directory
containsall the xlog files.
 
> And for performance purpose, I change the checkpoint type in pg_start_backup to CHECKPOINT_IMMEDIATE, does it
matter?
> The PG log I mentioned above is the running error log not the XLOG.
>
> ------------------                
> Richard
> 2010-08-05
>
> -------------------------------------------------------------
> 发件人:Robert Haas
> 发送日期:2010-08-05 22:07:45
> 收件人:Richard
> 抄送:Andrew Dunstan; pgsql-hackers
> 主题:Re: [HACKERS] Re: Re: [HACKERS] Online backup cause boot failure,anyone know why?
>
> On Thu, Aug 5, 2010 at 9:50 AM, Richard<husttripper@vip.sina.com>  wrote:
>> I reboot PG because  I found PG recovery end point if far away from the actual end point of the XLOG on the backup
directory,so  I want to test if the original DB is OK.
 
>> Unfortunately, I got the same PG log on the original DB. I don't unstand what you said, I missing what?
> The transaction logs archived during the backup?
>