Thread: WAL archiving is stuck on an old file that was deleted -- how to get it going again? (8.4.2)
WAL archiving is stuck on an old file that was deleted -- how to get it going again? (8.4.2)
From
Aleksey Tsalolikhin
Date:
Hi. Need some help getting WAL log archiving going, please. PostgreSQL 8.4.2 archive_command = '/usr/local/bin/rsync -e /usr/bin/ssh %p postgres@remoteserver:directory/%f </dev/null' I am able to login to "remoteserver" as user postgres using key-based authentication (trust relationship exists). However WAL archiving is failing with errors like: rsync: link_stat "/opt/PostgreSQL/8.4/data/pg_xlog/00000001000000350000006E" failed: No such file or directory (2) rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6] 2010-01-06 17:15:13 PSTLOG: archive command failed with exit code 23 2010-01-06 17:15:13 PSTDETAIL: The failed archive command was: /usr/local/bin/rsync -e /usr/bin/ssh pg_xlog/00000001000000350000006E postgres@remoteserver:directory/00000001000000350000006E </dev/null 2010-01-06 17:15:13 PSTWARNING: transaction log file "00000001000000350000006E" could not be archived: too many failures /opt/PostgreSQL/8.4/data/pg_xlog/ exists, but there is no "00000001000000350000006E" there. I do have a cron job that cleans files older than 2 days out of the pg_xlog directory; and the key-based login was broken for a few weeks (I just noticed). How do I get Postgres to stop trying to rsync 00000001000000350000006E, and to do rsync all the WAL files that ARE there? I've already tried restarting the database but it remembers about 00000001000000350000006E. Thanks, Aleksey
Re: WAL archiving is stuck on an old file that was deleted -- how to get it going again? (8.4.2)
From
Alvaro Herrera
Date:
Aleksey Tsalolikhin escribió: > I do have a cron job that cleans files older than 2 days out of the > pg_xlog directory; Bad, bad idea. Get rid of that. Perfect way to corrupt your system. Postgres removes pg_xlog files automatically when they are no longer necessary. If it doesn't remove them, something is happening and you need to fix *that*. Deleting files by hand only works around the wasted-disk-space symptom in a bad way. > and the key-based login was broken for a few weeks (I just noticed). I suggest you get some monitoring in place to prevent this from being a problem in the future. > How do I get Postgres to stop trying to rsync > 00000001000000350000006E, and to do rsync all the WAL files that ARE > there? You're screwed. You need to get a new base backup; all the files you have archived previous to 00000001000000350000006E are useless. You can get out of the problem by creating a dummy file with that name in pg_xlog, but keep in mind that the archive is now completely useless and unrecoverable. Sorry. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Re: WAL archiving is stuck on an old file that was deleted -- how to get it going again? (8.4.2)
From
Fujii Masao
Date:
On Thu, Jan 7, 2010 at 11:29 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Aleksey Tsalolikhin escribió: > >> I do have a cron job that cleans files older than 2 days out of the >> pg_xlog directory; > > Bad, bad idea. Get rid of that. Perfect way to corrupt your system. > Postgres removes pg_xlog files automatically when they are no longer > necessary. If it doesn't remove them, something is happening and you > need to fix *that*. Deleting files by hand only works around the > wasted-disk-space symptom in a bad way. Completely agreed. >> How do I get Postgres to stop trying to rsync >> 00000001000000350000006E, and to do rsync all the WAL files that ARE >> there? > > You're screwed. You need to get a new base backup; all the files > you have archived previous to 00000001000000350000006E are useless. > > You can get out of the problem by creating a dummy file with that name > in pg_xlog, but keep in mind that the archive is now completely useless > and unrecoverable. Or remove pg_xlog/archive_status/00000001000000350000006E.ready instead of creating a dummy file. Postgres tries to archive the WAL files whose .ready file exists in archive_status directory. And, note that you must get out of the archiving problem *before* making a new base backup because pg_stop_backup() waits until the last WAL file filled during backup has been archived. Otherwise, pg_stop_backup() would get stuck. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center