BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving. - Mailing list pgsql-bugs

From Luke Koops
Subject BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.
Date
Msg-id 200909050352.n853qTEH071667@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-bugs
The following bug has been logged online:

Bug reference:      5038
Logged by:          Luke Koops
Email address:      luke.koops@entrust.com
PostgreSQL version: 8.3.7
Operating system:   Windows 2003 Server Enterprise Edition
Description:        WAL file is pending deletion in pg_xlog folder, this
interferes with WAL archiving.
Details:

On my system, one of the WAL files is pending deletion.  The handle is being
held by one of the postgres backend processes, but that is another potential
bug.

At first, the unlink worked, and the .ready and .done files were deleted.
But the WAL file still shows up in the pg_xlog directory listing.

Note: the WAL file did get archived properly.  There was no error reported
at the time.

When it comes time to recycle the log files, RemoveOldXLogFiles() calls
ReadDir() to get the list of files, then it calls XLogArchiveCheckDone()
which, if it cannot find a .done or a .ready file, calls
XLogArchiveNotify().  XLogArchiveNotify() creates the .ready file again.
This causes the archiver to call the archive command on the old WAL file
that is pending deletion.  The copy command will fail and all subsequent
archive attempts will keep trying to copy the old WAL file that is pending
deletion.

At this point, none of the WAL files will get shipped and the pg_xlog folder
will start filling up.

Before calling XLogArchiveCheckDone(), RemoveOldXLogFiles() makes a number
of tests to make sure the name is  for a legitimate XLOG.  This would be a
good time to make sure the file is real, not pending deletion.  That would
prevent the creation of the .ready file and WAL archiving would continue to
work.

It might be a good idea to log something at the DEBUG level if a directory
entry is encoutered that matches the naming conventions but is not a real
file.

You could probably reproduce this behaviour by changing the permissions on a
WAL file, although you wouldn't be able to test a fix in the same way.

I have not reliably reproduced the WAL file handle "leak" in the postgres
back end.  I believe may be related to statements timing out.  My system
currently has statement_timeout=1min, but that will be removed.  I will
report the "leak" when I have a better handle (no pun) on the situation.

-Luke

pgsql-bugs by date:

Previous
From: Robert Haas
Date:
Subject: Re: BUG #5034: plperlu problem with gethostbyname
Next
From: Robert Haas
Date:
Subject: Re: BUG #5010: perl iconv function returns ? character