Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving. - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.
Date
Msg-id 4AA4D0A7.7040204@enterprisedb.com
Whole thread Raw
In response to BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.  ("Luke Koops" <luke.koops@entrust.com>)
Responses Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Luke Koops wrote:
> The following bug has been logged online:
>
> Bug reference:      5038
> Logged by:          Luke Koops
> Email address:      luke.koops@entrust.com
> PostgreSQL version: 8.3.7
> Operating system:   Windows 2003 Server Enterprise Edition
> Description:        WAL file is pending deletion in pg_xlog folder, this
> interferes with WAL archiving.
> Details:
>
> On my system, one of the WAL files is pending deletion.  The handle is being
> held by one of the postgres backend processes, but that is another potential
> bug.

Hmm. Under normal Unix filesystem semantics, that doesn't matter much
since the file can still be unlinked. It will still consume space, but
that's not a big issue. On Windows, however, the open handle keeps the
file locked, so it can't be deleted.

If I'm reading the code correctly, when a backend opens a WAL file for
writing to it, it stays open until the backend needs to write to another
WAL file. If the backend only writes once to a file, and then doesn't
create WAL records anymore (= executes read-only queries only), the file
is kept open indefinitely.

Perhaps we should try to close the old WAL file sooner. It's easy to
check if the current open log segment is old and close it if so, but I'm
not sure what the check should be hooked into.

> At first, the unlink worked, and the .ready and .done files were deleted.
> But the WAL file still shows up in the pg_xlog directory listing.

If the file didn't go away, it seems like the unlink didn't work. We
don't check the return code in RemoveOldXlogFiles(); I suspect that
we're getting EBUSY in that scenario. We should check for that if we're
going to delete the .ready and .done files. Patch attached, but I
haven't tested it. I don't have Windows environment at hand, but I'll
try to find one..

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9b64578..f54dd3b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3106,7 +3106,12 @@ RemoveOldXlogFiles(uint32 log, uint32 seg, XLogRecPtr endptr)
                     ereport(DEBUG2,
                             (errmsg("removing transaction log file \"%s\"",
                                     xlde->d_name)));
-                    unlink(path);
+
+                    if (unlink(path) != 0)
+                        ereport(ERROR,
+                                (errcode_for_file_access(),
+                                 errmsg("could not remove old transaction log file \"%s\": %m",
+                                        path)));
                     CheckpointStats.ckpt_segs_removed++;
                 }


pgsql-bugs by date:

Previous
From: "Ilian Georgiev"
Date:
Subject: BUG #5042: Update numeric within a rule
Next
From: "Tomasz Karlik"
Date:
Subject: Odp: Re: BUG #5035: cast 'text' to 'name' doesnt work in plpgsqlfunction