Thread: pg_xlog not cleaned up

pg_xlog not cleaned up

From
Christian Kastner
Date:
Hello,

I don't know why, but the XXXXXXXX.backup file created by
pg_stop_backup() in my pg_xlog never gets deleted, even though it was
successfully archived (a good copy is stored in the archive area, no
errors are raised).
The directory pg_xlog/archive_status contains a matching
XXXXXXXX.backup.done file, but that one never gets removed as well. As a
consequence, all further WAL archiving attempts fail as archive_command
continuously retries to archive the (already archived) XXXXXXXX.backup file.

I'd appreciate any hints on what might be going on!

Chris

--
Christian Kastner
PGP Key: AE90E13f

Re: pg_xlog not cleaned up

From
Bruce Momjian
Date:
What PostgreSQL version are you using?  I know the *.backup file removal
was added in 8.1:

         * Remove old "*.backup" files when we do pg_stop_backup() (Bruce)
       This prevents a large number of "*.backup" files from existing in
    "/pg_xlog".

but the existance of the files should not cause problems.

---------------------------------------------------------------------------

Christian Kastner wrote:
> Hello,
>
> I don't know why, but the XXXXXXXX.backup file created by
> pg_stop_backup() in my pg_xlog never gets deleted, even though it was
> successfully archived (a good copy is stored in the archive area, no
> errors are raised).
> The directory pg_xlog/archive_status contains a matching
> XXXXXXXX.backup.done file, but that one never gets removed as well. As a
> consequence, all further WAL archiving attempts fail as archive_command
> continuously retries to archive the (already archived) XXXXXXXX.backup file.
>
> I'd appreciate any hints on what might be going on!
>
> Chris
>
> --
> Christian Kastner
> PGP Key: AE90E13f
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: pg_xlog not cleaned up

From
Christian Kastner
Date:
Bruce Momjian wrote:
> What PostgreSQL version are you using?  I know the *.backup file removal
> was added in 8.1:
>
>          * Remove old "*.backup" files when we do pg_stop_backup() (Bruce)
>        This prevents a large number of "*.backup" files from existing in
>     "/pg_xlog".
>
> but the existance of the files should not cause problems.

It's 8.1.4, running on FC 5.

It does not cause any active problems, but the WAL archiving fails until
the *.backup and *.done files are removed manually, after which all the
"ready" WAL segments are archived away normally.

The script I supply to archive_command just mounts a windows share,
performs a cp (similar to the example in the PITR doc) and unmounts the
share again.

The online backup script basically just calls pg_start_backup(),
tar-gzips $PGDATA, then calls pg_stop_backup(). During that process, a
*.backup file is copied to the archive area (the windows share) using
the script supplied to archive_command.

After the online backup script runs, all subsequent attempts to run
archive_command fail because the first thing it tries to archive away is
 the still-existing *.backup file. This fails because a copy already
exists in the archive area and the script refuses to overwrite the
existing file.

I thought it might be network-related (by the added delay), but simply
running pg_start_backup() and pg_stop_backup() - without actually doing
anything in between - produced the same result: I get a *.backup and
*.done file which aren't removed.

This only applies to *.backup / *.done pairs. I watched WAL segments go
from *.ready to *.done, after which the *.done files were removed from
pg_xlog/archive_status.

Best Regards,
Chris

--
Christian Kastner
PGP Key: AE90E13f

Re: pg_xlog not cleaned up

From
Bruce Momjian
Date:
Christian Kastner wrote:
> Bruce Momjian wrote:
> > What PostgreSQL version are you using?  I know the *.backup file removal
> > was added in 8.1:
> >
> >          * Remove old "*.backup" files when we do pg_stop_backup() (Bruce)
> >        This prevents a large number of "*.backup" files from existing in
> >     "/pg_xlog".
> >
> > but the existance of the files should not cause problems.
>
> It's 8.1.4, running on FC 5.
>
> It does not cause any active problems, but the WAL archiving fails until
> the *.backup and *.done files are removed manually, after which all the
> "ready" WAL segments are archived away normally.
>
> The script I supply to archive_command just mounts a windows share,
> performs a cp (similar to the example in the PITR doc) and unmounts the
> share again.

Try removing the mount/unmount --- just keep the share mounted all the
time and see if it works.  I think the archive system believes it can do
concurrent archive commands as long as different file names are used.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: pg_xlog not cleaned up

From
Wayne Conrad
Date:
On Mon, Jul 31, 2006 at 10:18:31PM +0200, Christian Kastner wrote:
> After the online backup script runs, all subsequent attempts to run
> archive_command fail because the first thing it tries to archive away is
>  the still-existing *.backup file. This fails because a copy already
> exists in the archive area and the script refuses to overwrite the
> existing file.

I'm seeing this too.

In case more info will help Those Who Know Things:

My archive_command is a simple copy to local storage, pretty much
right lifted out of the manual:

archive_command = 'test ! -f /var/lib/postgresql/8.1/pitr/current/wal/%f && cp
+%p /var/lib/postgresql/8.1/pitr/current/wal/%f'

Version/Platform: Postgres 8.1.3-4, i386, Debian testing.

Re: pg_xlog not cleaned up

From
Wayne Conrad
Date:
On Mon, Jul 31, 2006 at 10:18:31PM +0200, Christian Kastner wrote:
> After the online backup script runs, all subsequent attempts to run
> archive_command fail because the first thing it tries to archive away is
>  the still-existing *.backup file. This fails because a copy already
> exists in the archive area and the script refuses to overwrite the
> existing file.

I'm seeing this too.  I thought I was doing something wrong.

In case more info will help Those Who Know Things:

My archive_command is a simple copy to the same storage device, pretty
much right lifted out of the manual:

archive_command = 'test ! -f /var/lib/postgresql/8.1/pitr/current/wal/%f && cp
+%p /var/lib/postgresql/8.1/pitr/current/wal/%f'

Version: Postgres 8.1.3-4, i386, Debian testing.

Re: pg_xlog not cleaned up

From
Simon Riggs
Date:
On Mon, 2006-07-31 at 22:18 +0200, Christian Kastner wrote:
> Bruce Momjian wrote:
> > What PostgreSQL version are you using?  I know the *.backup file removal
> > was added in 8.1:
> >
> >          * Remove old "*.backup" files when we do pg_stop_backup() (Bruce)
> >        This prevents a large number of "*.backup" files from existing in
> >     "/pg_xlog".
> >
> > but the existance of the files should not cause problems.
>
> It's 8.1.4, running on FC 5.
>
> It does not cause any active problems, but the WAL archiving fails until
> the *.backup and *.done files are removed manually, after which all the
> "ready" WAL segments are archived away normally.

This is a known problem fixed on June 22 and the fix has been applied to
8.1 branch as well as what will become 8.2.

The next point release for 8.1 should have that included.

--
  Simon Riggs
  EnterpriseDB          http://www.enterprisedb.com


Re: pg_xlog not cleaned up

From
Christian Kastner
Date:
Simon Riggs wrote:
> On Mon, 2006-07-31 at 22:18 +0200, Christian Kastner wrote:
>> Bruce Momjian wrote:
>>> What PostgreSQL version are you using?  I know the *.backup file removal
>>> was added in 8.1:
>>>
>>>          * Remove old "*.backup" files when we do pg_stop_backup() (Bruce)
>>>        This prevents a large number of "*.backup" files from existing in
>>>     "/pg_xlog".
>>>
>>> but the existance of the files should not cause problems.
>> It's 8.1.4, running on FC 5.
>>
>> It does not cause any active problems, but the WAL archiving fails until
>> the *.backup and *.done files are removed manually, after which all the
>> "ready" WAL segments are archived away normally.
>
> This is a known problem fixed on June 22 and the fix has been applied to
> 8.1 branch as well as what will become 8.2.
>
> The next point release for 8.1 should have that included.

In that case, I can work around this easily until the next release.

Thank you for clearing this up!
Chris

--
Christian Kastner
PGP Key: AE90E13f

Re: pg_xlog not cleaned up

From
Tom Lane
Date:
Wayne Conrad <wconrad@yagni.com> writes:
> On Mon, Jul 31, 2006 at 10:18:31PM +0200, Christian Kastner wrote:
>> After the online backup script runs, all subsequent attempts to run
>> archive_command fail because the first thing it tries to archive away is
>> the still-existing *.backup file. This fails because a copy already
>> exists in the archive area and the script refuses to overwrite the
>> existing file.

> I'm seeing this too.  I thought I was doing something wrong.

I'm not entirely convinced by Simon's theory that this is already fixed,
because there are some points that you didn't mention.

First off, the thing isn't *supposed* to remove the latest .backup file;
that one, and its .done file, should stick around until your next
successful pg_stop_backup call.  pg_stop_backup removes all but the
just-created .backup file (if they have .done files, showing they've
been archived).  No other operation touches them at all.

Second, the bug Simon mentions consists in trying to create the .ready
file for the new .backup file twice during pg_stop_backup.  If the
archiver is fast enough to have pushed out the .backup file already,
this would result in there being both a .ready and a .done file for the
.backup file.  If that's what you see, then manually removing the .ready
file should clear the problem --- I wouldn't recommend manually removing
either .done or .backup.  If that's *not* what you see, then we have
another problem to deal with.  Please report back.

            regards, tom lane

Re: pg_xlog not cleaned up

From
Christian Kastner
Date:
Tom Lane wrote:
> Wayne Conrad <wconrad@yagni.com> writes:
>> On Mon, Jul 31, 2006 at 10:18:31PM +0200, Christian Kastner wrote:
>>> After the online backup script runs, all subsequent attempts to run
>>> archive_command fail because the first thing it tries to archive away is
>>> the still-existing *.backup file. This fails because a copy already
>>> exists in the archive area and the script refuses to overwrite the
>>> existing file.
>
>> I'm seeing this too.  I thought I was doing something wrong.
>
> I'm not entirely convinced by Simon's theory that this is already fixed,
> because there are some points that you didn't mention.
>
> First off, the thing isn't *supposed* to remove the latest .backup file;
> that one, and its .done file, should stick around until your next
> successful pg_stop_backup call.  pg_stop_backup removes all but the
> just-created .backup file (if they have .done files, showing they've
> been archived).  No other operation touches them at all.

That's not what I'm seeing. The following log messages are repeated
every few minutes after a successful backup run:

LOG:  Archivbefehl >/usr/local/bin/archive_wal.sh
pg_xlog/000000010000000200000009.0018C208.backup
000000010000000200000009.0018C208.backup< fehlgeschlagen: Fehlercode 256
File already exists
...
LOG:  Archivbefehl >/usr/local/bin/archive_wal.sh
pg_xlog/000000010000000200000009.0018C208.backup
000000010000000200000009.0018C208.backup< fehlgeschlagen: Fehlercode 256
WARNUNG:  Transaktionslogdatei
>000000010000000200000009.0018C208.backup< konnte nicht archiviert
werden: zu viele Fehler (CK: too many errors)

0018C208.backup is the file created by pg_stop_backup (which is
immediately stored in my archive area), archive_wal.sh is the script I
supply to archive_command. This script refuses to overwrite existing
files, as recommended by the manual.

I assume the above messages are generated by "normal" archive_command
runs, which fail because the .backup file has already been archived.
Further WAL archiving fails until I delete the .backup file.

Hope this info helps!
Chris

--
Christian Kastner
PGP Key: AE90E13F

Re: pg_xlog not cleaned up

From
Tom Lane
Date:
Christian Kastner <techmail@kvr.at> writes:
> That's not what I'm seeing. The following log messages are repeated
> every few minutes after a successful backup run:

> LOG:  Archivbefehl >/usr/local/bin/archive_wal.sh
> pg_xlog/000000010000000200000009.0018C208.backup
> 000000010000000200000009.0018C208.backup< fehlgeschlagen: Fehlercode 256
> File already exists
> ...

Yes, but are there .ready and/or .done files for that .backup file in the
archive_status subdirectory?

            regards, tom lane

Re: pg_xlog not cleaned up

From
Christian Kastner
Date:
Tom Lane wrote:
> Christian Kastner <techmail@kvr.at> writes:
>> That's not what I'm seeing. The following log messages are repeated
>> every few minutes after a successful backup run:
>
>> LOG:  Archivbefehl >/usr/local/bin/archive_wal.sh
>> pg_xlog/000000010000000200000009.0018C208.backup
>> 000000010000000200000009.0018C208.backup< fehlgeschlagen: Fehlercode 256
>> File already exists
>> ...
>
> Yes, but are there .ready and/or .done files for that .backup file in the
> archive_status subdirectory?
>
>             regards, tom lane

Yes, there is a corresponding .done file in archive_status.

The problem is quite easy to reproduce (IIRC calling pg_start_backup and
pg_stop_backup without actually doing anything in between is enough). I
will post detailed reproduction steps as well as pg_xlog activity when I
get back to the office on Monday.

Best regards,
Chris

--
Christian Kastner
PGP Key: AE90E13F