Thread: Fatal Error during PITR Recovery

Fatal Error during PITR Recovery

From
"Gnanakumar"
Date:
Hi,

My production is running PostgreSQL v8.2.3 on CentOS release 5.2 (Final).

As part of our routine, we just wanted to make sure and practice once in a
while, whether PITR recovery process is performed without fail.  When I
started the recovery process, after sometime, I see the following error in
the serverlog.  This error was thrown after processing WAL archives for
almost 2 and half hour. I couldn't able to understand what is wrong by
reading this error statements, since it is cryptic technically.  Can experts
help me in this regard?  How do I proceed further from here?

I do have enough space in both my data/ and /mnt/pitr/walarchive/
directories. I don't think this is because of scarcity in disk space.

My base backup date: Aug03, 2009

recovery.conf
===========
restore_command='cp /mnt/pitr/walarchive/%f "%p"'
recovery_target_time='2010-06-08 09:10'
recovery_target_inclusive='true'

ERROR
======
[]:2010-06-09 04:43:27 EDTLOG:  restored log file "00000001000000130000003D"
from archive
[]:2010-06-09 04:43:29 EDTLOG:  restored log file "00000001000000130000003E"
from archive
[]:2010-06-09 04:43:30 EDTLOG:  restored log file "00000001000000130000003F"
from archive
[]:2010-06-09 04:44:59 EDTLOG:  could not fsync segment 0 of relation
1663/169462/252056: No such file or directory
[]:2010-06-09 04:44:59 EDTCONTEXT:  xlog redo checkpoint: redo 13/3FE33EB0;
undo 0/0; tli 1; xid 0/331456443; oid 256783; multi 337; offset 676; online
[]:2010-06-09 04:44:59 EDTFATAL:  storage sync failed on magnetic disk: No
such file or directory
[]:2010-06-09 04:44:59 EDTCONTEXT:  xlog redo checkpoint: redo 13/3FE33EB0;
undo 0/0; tli 1; xid 0/331456443; oid 256783; multi 337; offset 676; online
[]:2010-06-09 04:44:59 EDTLOG:  startup process (PID 22855) exited with exit
code 1
[]:2010-06-09 04:44:59 EDTLOG:  aborting startup due to startup process
failure
[]:2010-06-09 04:44:59 EDTLOG:  logger shutting down

Regards,
Gnanam


Re: Fatal Error during PITR Recovery

From
Tom Lane
Date:
"Gnanakumar" <gnanam@zoniac.com> writes:
> My production is running PostgreSQL v8.2.3 on CentOS release 5.2 (Final).

> As part of our routine, we just wanted to make sure and practice once in a
> while, whether PITR recovery process is performed without fail.  When I
> started the recovery process, after sometime, I see the following error in
> the serverlog.

> []:2010-06-09 04:44:59 EDTLOG:  could not fsync segment 0 of relation
> 1663/169462/252056: No such file or directory
> []:2010-06-09 04:44:59 EDTCONTEXT:  xlog redo checkpoint: redo 13/3FE33EB0;
> undo 0/0; tli 1; xid 0/331456443; oid 256783; multi 337; offset 676; online
> []:2010-06-09 04:44:59 EDTFATAL:  storage sync failed on magnetic disk: No
> such file or directory

This is probably a bug we fixed in 8.2.4:

2007-04-12 11:04  tgl

    * src/backend/commands/dbcommands.c (REL8_2_STABLE): Cancel pending
    fsync requests during WAL replay of DROP DATABASE, per bug report
    from David Darville.  Back-patch as far as 8.1, which may or may
    not have the problem but it seems a safe change anyway.

You really ought to be running a less ancient minor release of PG --- 8.2.x
is up to 8.2.17.  Your OS sounds a bit long in the tooth as well.

            regards, tom lane

Re: Fatal Error during PITR Recovery

From
"Gnanakumar"
Date:
Hi Tom,

Thanks for your comment.

After posting this question here, I did a quick search in Google, and I'm
finding other kind of reasons also, like a hardware problem, here:
http://www.mail-archive.com/pgsql-general@postgresql.org/msg69085.html

But in my case, I don't think that this could be a hardware problem, because
my database server is running in Amazon EC2 cloud.  So the chances of
hardware failure is least expected.  As you said, this is more likely of a
bug in v8.2.3.

I'm finding another solution also here, to do 'pg_resetxlog', even though
this problem is reported on during database startup after an abrupt
shutdown.  I believe that 'pg_resetxlog' will work only when the database is
in normal mode and not in recovery mode.  Is my understanding correct?
Please comment on this.
http://stackoverflow.com/questions/598200/how-do-i-fix-postgres-so-it-will-s
tart-after-an-abrupt-shutdown

If I would like to upgrade to the latest minor version in 8.2.x series, that
is v8.2.17, how do I upgrade this without doing dump/restore?
Links/documentation on this are appreciated.

Regards,
Gnanam

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, June 09, 2010 9:27 PM
To: gnanam@zoniac.com
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Fatal Error during PITR Recovery

"Gnanakumar" <gnanam@zoniac.com> writes:
> My production is running PostgreSQL v8.2.3 on CentOS release 5.2 (Final).

> As part of our routine, we just wanted to make sure and practice once in a
> while, whether PITR recovery process is performed without fail.  When I
> started the recovery process, after sometime, I see the following error in
> the serverlog.

> []:2010-06-09 04:44:59 EDTLOG:  could not fsync segment 0 of relation
> 1663/169462/252056: No such file or directory
> []:2010-06-09 04:44:59 EDTCONTEXT:  xlog redo checkpoint: redo
13/3FE33EB0;
> undo 0/0; tli 1; xid 0/331456443; oid 256783; multi 337; offset 676;
online
> []:2010-06-09 04:44:59 EDTFATAL:  storage sync failed on magnetic disk: No
> such file or directory

This is probably a bug we fixed in 8.2.4:

2007-04-12 11:04  tgl

    * src/backend/commands/dbcommands.c (REL8_2_STABLE): Cancel pending
    fsync requests during WAL replay of DROP DATABASE, per bug report
    from David Darville.  Back-patch as far as 8.1, which may or may
    not have the problem but it seems a safe change anyway.

You really ought to be running a less ancient minor release of PG --- 8.2.x
is up to 8.2.17.  Your OS sounds a bit long in the tooth as well.

            regards, tom lane


Re: Fatal Error during PITR Recovery

From
"Kevin Grittner"
Date:
"Gnanakumar"  wrote:

> If I would like to upgrade to the latest minor version in 8.2.x
> series, that is v8.2.17, how do I upgrade this [from 8.2.3 on
> CentOS] without doing dump/restore?  Links/documentation on this
> are appreciated.

From the PostgreSQL home page there is a link to:

http://www.postgresql.org/docs/8.2/static/release-8-2-17.html

Which contains a migration section, saying this:

| A dump/restore is not required for those running 8.2.X. However, if
| you are upgrading from a version earlier than 8.2.14, see the
| release notes for 8.2.14.

The migration section of:

http://www.postgresql.org/docs/8.2/static/release-8-2-14.html

says:

| A dump/restore is not required for those running 8.2.X. However, if
| you have any hash indexes on interval columns, you must REINDEX
| them after updating to 8.2.14. Also, if you are upgrading from a
| version earlier than 8.2.11, see the release notes for 8.2.11.

http://www.postgresql.org/docs/8.2/static/release-8-2-11.html

says:

| A dump/restore is not required for those running 8.2.X. However, if
| you are upgrading from a version earlier than 8.2.7, see the
| release notes for 8.2.7. Also, if you were running a previous 8.2.X
| release, it is recommended to REINDEX all GiST indexes after the
| upgrade.

http://www.postgresql.org/docs/8.2/static/release-8-2-7.html

says:

| A dump/restore is not required for those running 8.2.X. However,
| you might need to REINDEX indexes on textual columns after
| updating, if you are affected by the Windows locale issue described
| below.

Since you're not running Windows, that last one does not affect you;
so, after stopping the server under 8.2.3 and starting it with
8.2.17, you need to rebuild any GiST indexes and rebuild any hash
indexes on interval columns.

-Kevin

Re: Fatal Error during PITR Recovery

From
"Gnanakumar"
Date:
Hi Kevin,

Thank you very much for that valuable suggestion.

When the migration section says "A dump/restore is not required ...", what
is the correct method of upgrading/installing to 8.2.17?

Let me make the question/idea, which I'm thinking in my minding of
upgrading, clear here:

1. Rename '/usr/local/pgsql/data' to '/usr/local/pgsql/data.old'
2. Install v8.2.17 to its default from the source (Because installing from
source is comfortable for me)
3. After installation, just remove '/usr/local/pgsql/data' and rename
'/usr/local/pgsql/data.old' to '/usr/local/pgsql/data'
4. Start server.  It should have become v8.2.17

Please share your idea/comment  on this.

Regards,
Gnanam

-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: Thursday, June 10, 2010 5:52 PM
To: tgl@sss.pgh.pa.us; gnanam@zoniac.com
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Fatal Error during PITR Recovery

"Gnanakumar"  wrote:

> If I would like to upgrade to the latest minor version in 8.2.x
> series, that is v8.2.17, how do I upgrade this [from 8.2.3 on
> CentOS] without doing dump/restore?  Links/documentation on this
> are appreciated.

From the PostgreSQL home page there is a link to:

http://www.postgresql.org/docs/8.2/static/release-8-2-17.html

Which contains a migration section, saying this:

| A dump/restore is not required for those running 8.2.X. However, if
| you are upgrading from a version earlier than 8.2.14, see the
| release notes for 8.2.14.

The migration section of:

http://www.postgresql.org/docs/8.2/static/release-8-2-14.html

says:

| A dump/restore is not required for those running 8.2.X. However, if
| you have any hash indexes on interval columns, you must REINDEX
| them after updating to 8.2.14. Also, if you are upgrading from a
| version earlier than 8.2.11, see the release notes for 8.2.11.

http://www.postgresql.org/docs/8.2/static/release-8-2-11.html

says:

| A dump/restore is not required for those running 8.2.X. However, if
| you are upgrading from a version earlier than 8.2.7, see the
| release notes for 8.2.7. Also, if you were running a previous 8.2.X
| release, it is recommended to REINDEX all GiST indexes after the
| upgrade.

http://www.postgresql.org/docs/8.2/static/release-8-2-7.html

says:

| A dump/restore is not required for those running 8.2.X. However,
| you might need to REINDEX indexes on textual columns after
| updating, if you are affected by the Windows locale issue described
| below.

Since you're not running Windows, that last one does not affect you;
so, after stopping the server under 8.2.3 and starting it with
8.2.17, you need to rebuild any GiST indexes and rebuild any hash
indexes on interval columns.

-Kevin


Re: Fatal Error during PITR Recovery

From
"Kevin Grittner"
Date:
Gnanakumar" <gnanam@zoniac.com> wrote:

> what is the correct method of upgrading/installing to 8.2.17?
>
> Let me make the question/idea, which I'm thinking in my minding of
> upgrading, clear here:
>
> 1. Rename '/usr/local/pgsql/data' to '/usr/local/pgsql/data.old'
> 2. Install v8.2.17 to its default from the source (Because
> installing from source is comfortable for me)
> 3. After installation, just remove '/usr/local/pgsql/data' and
> rename '/usr/local/pgsql/data.old' to '/usr/local/pgsql/data'
> 4. Start server.  It should have become v8.2.17

There are so many ways to do this, particularly since you are
comfortable installing from source, that I'll just tell you how we
do it and let you adjust as needed for your shop.

1. We build from source with a version-specific prefix (for example:
/usr/local/pgsql-8.4.4).  We always point to a particular executable
in our service scripts, but create a symbolic link to the one we
want as the default client; the bin directory of that symlink is put
onto our path.  We do like to initdb a test cluster and confirm that
our custom functions all got created properly and seem to be working
before moving on.

2. Once the new minor version is installed, we point the service
script to it for the database we're about to upgrade.

3. We always make sure we're backed up, but since we use PITR
techniques with archiving always active, that's no big deal.

4. We do a service stop followed by a service start (*not* a
restart!) to start using the new version.

5. We do any post-upgrade steps, like any needed REINDEX.

I'm not sure why you needed that rename dodge that you mentioned,
but you know your environment better than I do.

-Kevin

Re: Fatal Error during PITR Recovery

From
"Gnanakumar"
Date:
Hi Kevin,

Thanks for sharing on your way of upgrading.

Regards,
Gnanam

-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: Thursday, June 10, 2010 7:16 PM
To: tgl@sss.pgh.pa.us; gnanam@zoniac.com
Cc: pgsql-admin@postgresql.org
Subject: RE: [ADMIN] Fatal Error during PITR Recovery

Gnanakumar" <gnanam@zoniac.com> wrote:

> what is the correct method of upgrading/installing to 8.2.17?
>
> Let me make the question/idea, which I'm thinking in my minding of
> upgrading, clear here:
>
> 1. Rename '/usr/local/pgsql/data' to '/usr/local/pgsql/data.old'
> 2. Install v8.2.17 to its default from the source (Because
> installing from source is comfortable for me)
> 3. After installation, just remove '/usr/local/pgsql/data' and
> rename '/usr/local/pgsql/data.old' to '/usr/local/pgsql/data'
> 4. Start server.  It should have become v8.2.17

There are so many ways to do this, particularly since you are
comfortable installing from source, that I'll just tell you how we
do it and let you adjust as needed for your shop.

1. We build from source with a version-specific prefix (for example:
/usr/local/pgsql-8.4.4).  We always point to a particular executable
in our service scripts, but create a symbolic link to the one we
want as the default client; the bin directory of that symlink is put
onto our path.  We do like to initdb a test cluster and confirm that
our custom functions all got created properly and seem to be working
before moving on.

2. Once the new minor version is installed, we point the service
script to it for the database we're about to upgrade.

3. We always make sure we're backed up, but since we use PITR
techniques with archiving always active, that's no big deal.

4. We do a service stop followed by a service start (*not* a
restart!) to start using the new version.

5. We do any post-upgrade steps, like any needed REINDEX.

I'm not sure why you needed that rename dodge that you mentioned,
but you know your environment better than I do.

-Kevin