Thread: WAL archiving is stuck on an old file that was deleted -- how to get it going again? (8.4.2)

Hi.  Need some help getting WAL log archiving going, please.

PostgreSQL 8.4.2

archive_command = '/usr/local/bin/rsync -e /usr/bin/ssh %p
postgres@remoteserver:directory/%f </dev/null'

I am able to login to "remoteserver" as user postgres using key-based
authentication (trust relationship exists).

However WAL archiving is failing with errors like:

rsync: link_stat
"/opt/PostgreSQL/8.4/data/pg_xlog/00000001000000350000006E" failed: No
such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1039) [sender=3.0.6]
2010-01-06 17:15:13 PSTLOG:  archive command failed with exit code 23
2010-01-06 17:15:13 PSTDETAIL:  The failed archive command was:
/usr/local/bin/rsync -e /usr/bin/ssh pg_xlog/00000001000000350000006E
postgres@remoteserver:directory/00000001000000350000006E </dev/null
2010-01-06 17:15:13 PSTWARNING:  transaction log file
"00000001000000350000006E" could not be archived: too many failures

/opt/PostgreSQL/8.4/data/pg_xlog/ exists, but there is no
"00000001000000350000006E" there.

I do have a cron job that cleans files older than 2 days out of the
pg_xlog directory; and the key-based login
was broken for a few weeks (I just noticed).

How do I get Postgres to stop trying to rsync
00000001000000350000006E, and to do rsync all the WAL files that ARE
there?

I've already tried restarting the database but it remembers about
00000001000000350000006E.

Thanks,
Aleksey

Aleksey Tsalolikhin escribió:

> I do have a cron job that cleans files older than 2 days out of the
> pg_xlog directory;

Bad, bad idea.  Get rid of that.  Perfect way to corrupt your system.
Postgres removes pg_xlog files automatically when they are no longer
necessary.  If it doesn't remove them, something is happening and you
need to fix *that*.  Deleting files by hand only works around the
wasted-disk-space symptom in a bad way.

> and the key-based login was broken for a few weeks (I just noticed).

I suggest you get some monitoring in place to prevent this from being a
problem in the future.

> How do I get Postgres to stop trying to rsync
> 00000001000000350000006E, and to do rsync all the WAL files that ARE
> there?

You're screwed.  You need to get a new base backup; all the files
you have archived previous to 00000001000000350000006E are useless.

You can get out of the problem by creating a dummy file with that name
in pg_xlog, but keep in mind that the archive is now completely useless
and unrecoverable.

Sorry.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

On Thu, Jan 7, 2010 at 11:29 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Aleksey Tsalolikhin escribió:
>
>> I do have a cron job that cleans files older than 2 days out of the
>> pg_xlog directory;
>
> Bad, bad idea.  Get rid of that.  Perfect way to corrupt your system.
> Postgres removes pg_xlog files automatically when they are no longer
> necessary.  If it doesn't remove them, something is happening and you
> need to fix *that*.  Deleting files by hand only works around the
> wasted-disk-space symptom in a bad way.

Completely agreed.

>> How do I get Postgres to stop trying to rsync
>> 00000001000000350000006E, and to do rsync all the WAL files that ARE
>> there?
>
> You're screwed.  You need to get a new base backup; all the files
> you have archived previous to 00000001000000350000006E are useless.
>
> You can get out of the problem by creating a dummy file with that name
> in pg_xlog, but keep in mind that the archive is now completely useless
> and unrecoverable.

Or remove pg_xlog/archive_status/00000001000000350000006E.ready instead
of creating a dummy file. Postgres tries to archive the WAL files whose
.ready file exists in archive_status directory.

And, note that you must get out of the archiving problem *before* making
a new base backup because pg_stop_backup() waits until the last WAL file
filled during backup has been archived. Otherwise, pg_stop_backup() would
get stuck.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center