Thread: recovery via base + WAL replay failure

recovery via base + WAL replay failure

From
Rob Adams
Date:
I'm trying to demonstrate recovery using the "continuous archiving"
backup technique.  I'm using 8.3 on Windows.

I made a base backup while the postgres was running using the following
batch file:

--------------------------
psql -d test_database -U user_name -c "SELECT pg_start_backup('test');"

7za a -tzip "C:\backup\base.zip" "C:\Program Files\PostgreSQL\8.3\data"

psql -d test_database -U user_name -c "SELECT pg_stop_backup();"
--------------------------

No indication of any errors.  However, I am unable to recover.  Here are
my steps:

1. Stop service, replace the data directory w/ the base backup.
2. Create recovery.conf with this setting:
    restore_command = 'copy C:\backup\%f "%p"'
3. Try to start service

After about 90 seconds, the dos prompt displays:
   "(postgres 8.3) service could not be started"
   "The service did not report an error"

Does anyone know what I am probably doing wrong?

Thanks,
--Rob Adams

Re: recovery via base + WAL replay failure

From
Magnus Hagander
Date:
Rob Adams wrote:
> I'm trying to demonstrate recovery using the "continuous archiving"
> backup technique.  I'm using 8.3 on Windows.
>
> I made a base backup while the postgres was running using the following
> batch file:
>
> --------------------------
> psql -d test_database -U user_name -c "SELECT pg_start_backup('test');"
>
> 7za a -tzip "C:\backup\base.zip" "C:\Program Files\PostgreSQL\8.3\data"
>
> psql -d test_database -U user_name -c "SELECT pg_stop_backup();"
> --------------------------
>
> No indication of any errors.  However, I am unable to recover.  Here are
> my steps:
>
> 1. Stop service, replace the data directory w/ the base backup.
> 2. Create recovery.conf with this setting:
>    restore_command = 'copy C:\backup\%f "%p"'
> 3. Try to start service
>
> After about 90 seconds, the dos prompt displays:
>   "(postgres 8.3) service could not be started"
>   "The service did not report an error"
>
> Does anyone know what I am probably doing wrong?

You'll need to look at the PostgreSQL logs to see what they say.

But I'm curious about the 7za step. It gives no errors or warnings?
IIRC, you need to use VSS on Windows to avoid sharing violations when
opening the files unless a specific open mode is used. It could be that
7za uses that by default, but it's also possible tha tit's not actually
backing up all files...

//Magnus

Re: recovery via base + WAL replay failure

From
Greg Smith
Date:
On Sun, 3 Aug 2008, Rob Adams wrote:

> I made a base backup while the postgres was running using the following batch
> file:
> psql -d test_database -U user_name -c "SELECT pg_start_backup('test');"

What did you have archive_command set to?  That needs to dump the WAL
files generated while the backup is going on somewhere that gets copied
over after the main copy is done, and you need the last of them referenced
by the backup copied over before you can use that backup.  Steps (1) and
(5) of
http://www.postgresql.org/docs/current/static/continuous-archiving.html
are the hard parts here and I don't see that you're addressing them so
far, and that will keep the copy from starting if all the files aren't
there.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: recovery via base + WAL replay failure

From
Lennin Caro
Date:
what error show the log file?


--- On Mon, 8/4/08, Greg Smith <gsmith@gregsmith.com> wrote:

> From: Greg Smith <gsmith@gregsmith.com>
> Subject: Re: [GENERAL] recovery via base + WAL replay failure
> To: "Rob Adams" <robfadams@cox.net>
> Cc: "postgres general" <pgsql-general@postgresql.org>
> Date: Monday, August 4, 2008, 5:58 PM
> On Sun, 3 Aug 2008, Rob Adams wrote:
>
> > I made a base backup while the postgres was running
> using the following batch
> > file:
> > psql -d test_database -U user_name -c "SELECT
> pg_start_backup('test');"
>
> What did you have archive_command set to?  That needs to
> dump the WAL
> files generated while the backup is going on somewhere that
> gets copied
> over after the main copy is done, and you need the last of
> them referenced
> by the backup copied over before you can use that backup.
> Steps (1) and
> (5) of
> http://www.postgresql.org/docs/current/static/continuous-archiving.html
>
> are the hard parts here and I don't see that you're
> addressing them so
> far, and that will keep the copy from starting if all the
> files aren't
> there.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com
> Baltimore, MD
>
> --
> Sent via pgsql-general mailing list
> (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general





Re: recovery via base + WAL replay failure

From
Rob Adams
Date:
The WAL file archiving appears to be working correctly.  These are the
settings I'm using for archiving the WAL files:

  archive_mode = on
  archive_command = 'copy %p C:\backup\%f /A'
  archive_timeout = 15s


Thanks again,
--Rob

Greg Smith wrote:
> On Sun, 3 Aug 2008, Rob Adams wrote:
>
>> I made a base backup while the postgres was running using the
>> following batch file:
>> psql -d test_database -U user_name -c "SELECT pg_start_backup('test');"
>
> What did you have archive_command set to?  That needs to dump the WAL
> files generated while the backup is going on somewhere that gets copied
> over after the main copy is done, and you need the last of them
> referenced by the backup copied over before you can use that backup.
> Steps (1) and (5) of
> http://www.postgresql.org/docs/current/static/continuous-archiving.html
> are the hard parts here and I don't see that you're addressing them so
> far, and that will keep the copy from starting if all the files aren't
> there.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>

Re: recovery via base + WAL replay failure

From
Rob Adams
Date:
There is nothing in the log file (in pg_log dir) with regard to this.
Should I set any particular parameter in the postgresql.conf file to log
information about a failed startup?  I have not altered or uncommented
any lines in the "ERROR REPORTING AND LOGGING" section of the conf file.

Thanks again,
--Rob Adams

Lennin Caro wrote:
> what error show the log file?
>
>
> --- On Mon, 8/4/08, Greg Smith <gsmith@gregsmith.com> wrote:
>
>> From: Greg Smith <gsmith@gregsmith.com>
>> Subject: Re: [GENERAL] recovery via base + WAL replay failure
>> To: "Rob Adams" <robfadams@cox.net>
>> Cc: "postgres general" <pgsql-general@postgresql.org>
>> Date: Monday, August 4, 2008, 5:58 PM
>> On Sun, 3 Aug 2008, Rob Adams wrote:
>>
>>> I made a base backup while the postgres was running
>> using the following batch
>>> file:
>>> psql -d test_database -U user_name -c "SELECT
>> pg_start_backup('test');"
>>
>> What did you have archive_command set to?  That needs to
>> dump the WAL
>> files generated while the backup is going on somewhere that
>> gets copied
>> over after the main copy is done, and you need the last of
>> them referenced
>> by the backup copied over before you can use that backup.
>> Steps (1) and
>> (5) of
>> http://www.postgresql.org/docs/current/static/continuous-archiving.html
>>
>> are the hard parts here and I don't see that you're
>> addressing them so
>> far, and that will keep the copy from starting if all the
>> files aren't
>> there.
>>
>> --
>> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com
>> Baltimore, MD
>>
>> --
>> Sent via pgsql-general mailing list
>> (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
>
>

Re: recovery via base + WAL replay failure

From
Rob Adams
Date:
I found error log entries in the Windows Event Viewer:

----------------
2008-08-01 23:57:55 GMT FATAL:  could not remove old lock file
"postmaster.pid": Permission denied

2008-08-01 23:57:55 GMT HINT:  The file seems accidentally left over,
but it could not be removed. Please remove the file by hand and try again.
----------------

However, there is no postmaster.pid file in the data directory.  (I
can't find one anywhere else, either.)

Any ideas?


Thanks,
--Rob Adams

Lennin Caro wrote:
> what error show the log file?
>
>
> --- On Mon, 8/4/08, Greg Smith <gsmith@gregsmith.com> wrote:
>
>> From: Greg Smith <gsmith@gregsmith.com>
>> Subject: Re: [GENERAL] recovery via base + WAL replay failure
>> To: "Rob Adams" <robfadams@cox.net>
>> Cc: "postgres general" <pgsql-general@postgresql.org>
>> Date: Monday, August 4, 2008, 5:58 PM
>> On Sun, 3 Aug 2008, Rob Adams wrote:
>>
>>> I made a base backup while the postgres was running
>> using the following batch
>>> file:
>>> psql -d test_database -U user_name -c "SELECT
>> pg_start_backup('test');"
>>
>> What did you have archive_command set to?  That needs to
>> dump the WAL
>> files generated while the backup is going on somewhere that
>> gets copied
>> over after the main copy is done, and you need the last of
>> them referenced
>> by the backup copied over before you can use that backup.
>> Steps (1) and
>> (5) of
>> http://www.postgresql.org/docs/current/static/continuous-archiving.html
>>
>> are the hard parts here and I don't see that you're
>> addressing them so
>> far, and that will keep the copy from starting if all the
>> files aren't
>> there.
>>
>> --
>> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com
>> Baltimore, MD
>>
>> --
>> Sent via pgsql-general mailing list
>> (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
>
>

Re: recovery via base + WAL replay failure

From
Rob Adams
Date:
Finally figured out what was wrong. The data folder had incorrect
permissions after unzipping the base backup. For me, the solution was
unchecking the "Inherit from parent the permission entries that apply to
child objects" option in the Advanced Security Settings dialog for the
data folder & giving the postgres user full control.

Nothing appeared in the log when the database failed to startup b/c the
permissions were wrong. However, an application error did get reported
to Windows which I found using the Event Viewer.

--Rob