Thread: Re: Serious problem: media recovery fails after system or PostgreSQL crash

Re: Serious problem: media recovery fails after system or PostgreSQL crash

From
"Kevin Grittner"
Date:
MauMau wrote:

> [Problem]
> I'm using PostgreSQL 9.1.6 on Linux. I encountered a serious
> problem that media recovery failed showing the following message:
> 
> FATAL: archive file "000000010000008000000028" has wrong size:
> 7340032 instead of 16777216
> 
> I'm using normal cp command to archive WAL files. That is:
> 
>  archive_command = '/path/to/my_script.sh "%p"
"/backup/archive_log/%f"'
> 
> <<my_script.sh>>
> --------------------------------------------------
> #!/bin/sh
> some processing...
> cp "$1" "$2"
> other processing...
> --------------------------------------------------
> 
> 
> The media recovery was triggered by power failure. The disk drive
> that stored $PGDATA failed after a power failure. So I replaced
> the failed disk, and performed media recovery by creating
> recovery.conf and running pg_ctl start. However, pg_ctl failed
> with the above error message.

If you are attempting a PITR-style recovery and you want to include
WAL entries from the partially-copied file, pad a copy of it with
NUL bytes to the expected length.

-Kevin



Re: Serious problem: media recovery fails after system or PostgreSQL crash

From
"MauMau"
Date:
From: "Kevin Grittner" <kgrittn@mail.com>
> If you are attempting a PITR-style recovery and you want to include
> WAL entries from the partially-copied file, pad a copy of it with
> NUL bytes to the expected length.

I'm afraid This is unacceptably difficult, or almost impossible, for many PG 
users.  How do you do the following?

1. Identify the file type (WAL segment, backup history file, timeline 
history file) and its expected size in the archive_command script. 
archive_command has to handle these three types of files.  Embedding file 
name logic (e.g. WAL is 000000010000000200000003) in archive_command is a 
bad idea, because the file name might change in the future PG release.

2. Append NUL bytes to the file in the archive_command shell script or batch 
file.  Particularly I have no idea about Windows.  I have some PG systems 
running on Windows.  This would compromise the ease of use of PostgreSQL.

So I believe PG should handle the problem, not the archive_command.

Regards
MauMau