Thread: pg_xlog not cleaned up
Hello, I don't know why, but the XXXXXXXX.backup file created by pg_stop_backup() in my pg_xlog never gets deleted, even though it was successfully archived (a good copy is stored in the archive area, no errors are raised). The directory pg_xlog/archive_status contains a matching XXXXXXXX.backup.done file, but that one never gets removed as well. As a consequence, all further WAL archiving attempts fail as archive_command continuously retries to archive the (already archived) XXXXXXXX.backup file. I'd appreciate any hints on what might be going on! Chris -- Christian Kastner PGP Key: AE90E13f
What PostgreSQL version are you using? I know the *.backup file removal was added in 8.1: * Remove old "*.backup" files when we do pg_stop_backup() (Bruce) This prevents a large number of "*.backup" files from existing in "/pg_xlog". but the existance of the files should not cause problems. --------------------------------------------------------------------------- Christian Kastner wrote: > Hello, > > I don't know why, but the XXXXXXXX.backup file created by > pg_stop_backup() in my pg_xlog never gets deleted, even though it was > successfully archived (a good copy is stored in the archive area, no > errors are raised). > The directory pg_xlog/archive_status contains a matching > XXXXXXXX.backup.done file, but that one never gets removed as well. As a > consequence, all further WAL archiving attempts fail as archive_command > continuously retries to archive the (already archived) XXXXXXXX.backup file. > > I'd appreciate any hints on what might be going on! > > Chris > > -- > Christian Kastner > PGP Key: AE90E13f > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > What PostgreSQL version are you using? I know the *.backup file removal > was added in 8.1: > > * Remove old "*.backup" files when we do pg_stop_backup() (Bruce) > This prevents a large number of "*.backup" files from existing in > "/pg_xlog". > > but the existance of the files should not cause problems. It's 8.1.4, running on FC 5. It does not cause any active problems, but the WAL archiving fails until the *.backup and *.done files are removed manually, after which all the "ready" WAL segments are archived away normally. The script I supply to archive_command just mounts a windows share, performs a cp (similar to the example in the PITR doc) and unmounts the share again. The online backup script basically just calls pg_start_backup(), tar-gzips $PGDATA, then calls pg_stop_backup(). During that process, a *.backup file is copied to the archive area (the windows share) using the script supplied to archive_command. After the online backup script runs, all subsequent attempts to run archive_command fail because the first thing it tries to archive away is the still-existing *.backup file. This fails because a copy already exists in the archive area and the script refuses to overwrite the existing file. I thought it might be network-related (by the added delay), but simply running pg_start_backup() and pg_stop_backup() - without actually doing anything in between - produced the same result: I get a *.backup and *.done file which aren't removed. This only applies to *.backup / *.done pairs. I watched WAL segments go from *.ready to *.done, after which the *.done files were removed from pg_xlog/archive_status. Best Regards, Chris -- Christian Kastner PGP Key: AE90E13f
Christian Kastner wrote: > Bruce Momjian wrote: > > What PostgreSQL version are you using? I know the *.backup file removal > > was added in 8.1: > > > > * Remove old "*.backup" files when we do pg_stop_backup() (Bruce) > > This prevents a large number of "*.backup" files from existing in > > "/pg_xlog". > > > > but the existance of the files should not cause problems. > > It's 8.1.4, running on FC 5. > > It does not cause any active problems, but the WAL archiving fails until > the *.backup and *.done files are removed manually, after which all the > "ready" WAL segments are archived away normally. > > The script I supply to archive_command just mounts a windows share, > performs a cp (similar to the example in the PITR doc) and unmounts the > share again. Try removing the mount/unmount --- just keep the share mounted all the time and see if it works. I think the archive system believes it can do concurrent archive commands as long as different file names are used. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Mon, Jul 31, 2006 at 10:18:31PM +0200, Christian Kastner wrote: > After the online backup script runs, all subsequent attempts to run > archive_command fail because the first thing it tries to archive away is > the still-existing *.backup file. This fails because a copy already > exists in the archive area and the script refuses to overwrite the > existing file. I'm seeing this too. In case more info will help Those Who Know Things: My archive_command is a simple copy to local storage, pretty much right lifted out of the manual: archive_command = 'test ! -f /var/lib/postgresql/8.1/pitr/current/wal/%f && cp +%p /var/lib/postgresql/8.1/pitr/current/wal/%f' Version/Platform: Postgres 8.1.3-4, i386, Debian testing.
On Mon, Jul 31, 2006 at 10:18:31PM +0200, Christian Kastner wrote: > After the online backup script runs, all subsequent attempts to run > archive_command fail because the first thing it tries to archive away is > the still-existing *.backup file. This fails because a copy already > exists in the archive area and the script refuses to overwrite the > existing file. I'm seeing this too. I thought I was doing something wrong. In case more info will help Those Who Know Things: My archive_command is a simple copy to the same storage device, pretty much right lifted out of the manual: archive_command = 'test ! -f /var/lib/postgresql/8.1/pitr/current/wal/%f && cp +%p /var/lib/postgresql/8.1/pitr/current/wal/%f' Version: Postgres 8.1.3-4, i386, Debian testing.
On Mon, 2006-07-31 at 22:18 +0200, Christian Kastner wrote: > Bruce Momjian wrote: > > What PostgreSQL version are you using? I know the *.backup file removal > > was added in 8.1: > > > > * Remove old "*.backup" files when we do pg_stop_backup() (Bruce) > > This prevents a large number of "*.backup" files from existing in > > "/pg_xlog". > > > > but the existance of the files should not cause problems. > > It's 8.1.4, running on FC 5. > > It does not cause any active problems, but the WAL archiving fails until > the *.backup and *.done files are removed manually, after which all the > "ready" WAL segments are archived away normally. This is a known problem fixed on June 22 and the fix has been applied to 8.1 branch as well as what will become 8.2. The next point release for 8.1 should have that included. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Mon, 2006-07-31 at 22:18 +0200, Christian Kastner wrote: >> Bruce Momjian wrote: >>> What PostgreSQL version are you using? I know the *.backup file removal >>> was added in 8.1: >>> >>> * Remove old "*.backup" files when we do pg_stop_backup() (Bruce) >>> This prevents a large number of "*.backup" files from existing in >>> "/pg_xlog". >>> >>> but the existance of the files should not cause problems. >> It's 8.1.4, running on FC 5. >> >> It does not cause any active problems, but the WAL archiving fails until >> the *.backup and *.done files are removed manually, after which all the >> "ready" WAL segments are archived away normally. > > This is a known problem fixed on June 22 and the fix has been applied to > 8.1 branch as well as what will become 8.2. > > The next point release for 8.1 should have that included. In that case, I can work around this easily until the next release. Thank you for clearing this up! Chris -- Christian Kastner PGP Key: AE90E13f
Wayne Conrad <wconrad@yagni.com> writes: > On Mon, Jul 31, 2006 at 10:18:31PM +0200, Christian Kastner wrote: >> After the online backup script runs, all subsequent attempts to run >> archive_command fail because the first thing it tries to archive away is >> the still-existing *.backup file. This fails because a copy already >> exists in the archive area and the script refuses to overwrite the >> existing file. > I'm seeing this too. I thought I was doing something wrong. I'm not entirely convinced by Simon's theory that this is already fixed, because there are some points that you didn't mention. First off, the thing isn't *supposed* to remove the latest .backup file; that one, and its .done file, should stick around until your next successful pg_stop_backup call. pg_stop_backup removes all but the just-created .backup file (if they have .done files, showing they've been archived). No other operation touches them at all. Second, the bug Simon mentions consists in trying to create the .ready file for the new .backup file twice during pg_stop_backup. If the archiver is fast enough to have pushed out the .backup file already, this would result in there being both a .ready and a .done file for the .backup file. If that's what you see, then manually removing the .ready file should clear the problem --- I wouldn't recommend manually removing either .done or .backup. If that's *not* what you see, then we have another problem to deal with. Please report back. regards, tom lane
Tom Lane wrote: > Wayne Conrad <wconrad@yagni.com> writes: >> On Mon, Jul 31, 2006 at 10:18:31PM +0200, Christian Kastner wrote: >>> After the online backup script runs, all subsequent attempts to run >>> archive_command fail because the first thing it tries to archive away is >>> the still-existing *.backup file. This fails because a copy already >>> exists in the archive area and the script refuses to overwrite the >>> existing file. > >> I'm seeing this too. I thought I was doing something wrong. > > I'm not entirely convinced by Simon's theory that this is already fixed, > because there are some points that you didn't mention. > > First off, the thing isn't *supposed* to remove the latest .backup file; > that one, and its .done file, should stick around until your next > successful pg_stop_backup call. pg_stop_backup removes all but the > just-created .backup file (if they have .done files, showing they've > been archived). No other operation touches them at all. That's not what I'm seeing. The following log messages are repeated every few minutes after a successful backup run: LOG: Archivbefehl >/usr/local/bin/archive_wal.sh pg_xlog/000000010000000200000009.0018C208.backup 000000010000000200000009.0018C208.backup< fehlgeschlagen: Fehlercode 256 File already exists ... LOG: Archivbefehl >/usr/local/bin/archive_wal.sh pg_xlog/000000010000000200000009.0018C208.backup 000000010000000200000009.0018C208.backup< fehlgeschlagen: Fehlercode 256 WARNUNG: Transaktionslogdatei >000000010000000200000009.0018C208.backup< konnte nicht archiviert werden: zu viele Fehler (CK: too many errors) 0018C208.backup is the file created by pg_stop_backup (which is immediately stored in my archive area), archive_wal.sh is the script I supply to archive_command. This script refuses to overwrite existing files, as recommended by the manual. I assume the above messages are generated by "normal" archive_command runs, which fail because the .backup file has already been archived. Further WAL archiving fails until I delete the .backup file. Hope this info helps! Chris -- Christian Kastner PGP Key: AE90E13F
Christian Kastner <techmail@kvr.at> writes: > That's not what I'm seeing. The following log messages are repeated > every few minutes after a successful backup run: > LOG: Archivbefehl >/usr/local/bin/archive_wal.sh > pg_xlog/000000010000000200000009.0018C208.backup > 000000010000000200000009.0018C208.backup< fehlgeschlagen: Fehlercode 256 > File already exists > ... Yes, but are there .ready and/or .done files for that .backup file in the archive_status subdirectory? regards, tom lane
Tom Lane wrote: > Christian Kastner <techmail@kvr.at> writes: >> That's not what I'm seeing. The following log messages are repeated >> every few minutes after a successful backup run: > >> LOG: Archivbefehl >/usr/local/bin/archive_wal.sh >> pg_xlog/000000010000000200000009.0018C208.backup >> 000000010000000200000009.0018C208.backup< fehlgeschlagen: Fehlercode 256 >> File already exists >> ... > > Yes, but are there .ready and/or .done files for that .backup file in the > archive_status subdirectory? > > regards, tom lane Yes, there is a corresponding .done file in archive_status. The problem is quite easy to reproduce (IIRC calling pg_start_backup and pg_stop_backup without actually doing anything in between is enough). I will post detailed reproduction steps as well as pg_xlog activity when I get back to the office on Monday. Best regards, Chris -- Christian Kastner PGP Key: AE90E13F