Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving. - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.
Date
Msg-id 4AA7FAE7.5040707@enterprisedb.com
Whole thread Raw
In response to Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.  (Luke Koops <luke.koops@entrust.com>)
Responses Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Luke Koops wrote:
> For those of you who are still looking at this, I tried to reproduce the issue by holding one of the WAL files open
withanother program (just opened it with the cygwin build of less.exe for windows).  That didn't do the trick.  It
preventedunlink or rename from working at all.  I wrote a program (open.exe) that opens the file using pgwin32_open()
andpassed in the same parameters that postgres uses when opening a WAL file.  That allowed the file to be renamed.
And,when deleted, the open file went into the pending deletion state. 

Yeah, it's the FILE_SHARE_DELETE flag that allows the deletion.

> I used open.exe to hold onto a WAL file that was going to be recycled.  The recycling worked, but what is going to
happendown the road when the handle is released, leaving a gap in the WAL file sequence.  Or if it is not released,
whena backend tries to open the WAL file and does not have access to it? 

When the file is recycled, I believe we're fine. The file is not
deleted, only renamed, so it won't be deleted when open.exe closes it.
No gap in WAL sequence is created.

> When open.exe was holding onto a WAL file that was going to be deleted, the deletion worked.  The file went into the
deletionpending state.  The archive status for the WAL file went through the .ready ==> .done ==> {no status file} ==>
.readysequence.  At that point Postgres repeatedly tries to archive the WAL file. 


> I reported earlier that I believe postgres leaked the file handle to the WAL file.  I don't believe that is the case.
We have a process that only checks data in the database for integrity.  It is only reading.  I think it opened the WAL
fileinitially, perhaps the backend had some maintenance work to do when that session started and had to write something
tothe WAL and then never moved on to a new one. 
>
> Now that I can reproduce the pending deletion case, I'm working on code to detect it reliably and, hopefully,
efficiently.

I got hold of a Windows virtual machine as well, and could reproduce the
issue. It was a bit tricky to coerce the file to be deleted instead of
recycled, but setting "max_advance = 0" in RemoveOldXlogFiles() finally
did the trick.

I googled around, and saw some discussion that suggest that when a file
is in "pending deletion" state, it's implemented by setting a
"delete-on-close" flag on the existing file handle. The upshot of that
is that if you pull the power plug, the file won't be deleted after all.

One option is to rename the file before deleting it. For all practical
purposes, that's the same as if the file no longer exists. Seems like
the simplest solution to me.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

pgsql-bugs by date:

Previous
From: Luke Koops
Date:
Subject: Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.
Next
From: Tom Lane
Date:
Subject: Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.