Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving. - Mailing list pgsql-bugs
From | Luke Koops |
---|---|
Subject | Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving. |
Date | |
Msg-id | A3144629B5AC714A8BF27806EBFA7057514623F2@sottexch7.corp.ad.entrust.com Whole thread Raw |
In response to | Re: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving. (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
List | pgsql-bugs |
I picked up the patch and verified both fixes on 8.3.7. In one test, Handles to two different WAL files were being held by two diff= erent backends. The WAL files were renamed to .deleted after I forced a sw= itch xlog. Eventually the .deleted files disappeared. In one case the bac= kend exited. In the other, the backend moved on to the latest WAL file. In another test, I opened a WAL file so that it could not be renamed or del= eted. The appropriate error was logged and the .done file remained. The e= rror is logged quite frequently. When released the WAL file it was soon de= leted. If you get into a case where the rename works but the unlink fails (I don't= see how this could happen in real life, except possibly for a race conditi= on with AV software), you will have a situation where there is a .done file= that does not match any WAL logs, and you will have a .deleted file that w= on't get cleaned up. I couldn't reproduce this, so I faked it by adding a .done file back into t= he archive_status folder after it was deleted. The orphaned .done file doe= sn't cause any trouble. It doesn't get cleaned up, it doesn't generate any= log messages, and it doesn't interfere with WAL file recycling or removal = (unlike the trouble that is caused by orphaned .ready files). The patch looks good. Thank-you, -Luke > -----Original Message----- > From: Heikki Linnakangas [mailto:heikki.linnakangas@enterprisedb.com] > Sent: Thursday, September 10, 2009 5:44 AM > Cc: Tom Lane; Luke Koops; pgsql-bugs@postgresql.org > Subject: Re: [BUGS] BUG #5038: WAL file is pending deletion > in pg_xlog folder, this interferes with WAL archiving. > > Heikki Linnakangas wrote: > > Tom Lane wrote: > >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > >>> No, it's a backend that's holding the file open, with > FILE_SHARE_DELETE. > >> If that's the only case we care about covering, then > rename might be > >> enough. I was just wondering what it would take to solve the more > >> general problem of something holding it open with the > wrong flags at > >> the time we want to get rid of it. > > > > Yes, that's a separate problem, and I think we should > address that too. > > That's what I thought was going on in OP's case at first, > the patch I > > posted in my first reply should address that. > > > > I'll try to reproduce that case too, and verify that the > patch fixes it. > > Ok, I've committed a patch along those lines. The file is now > renamed before unlinking (on Windows), and the return code of > rename() and > unlink() is checked, so that we don't delete the .done file > if the WAL file deletion failed. This fixes both scenarios, > the one OP reported with another backend keeping the file > open, and the one where a different process keeps a file open > without FILE_SHARE_DELETE. > > I considered making failure to rename or delete a WARNING > instead of ERROR, so that RemoveOldXLogFiles() would still > clean up any other old WAL files. However, when a file is > recycled, we throw an error anyway if the rename fails in > InstallXLogFileSegment(), so it doesn't seem like it would > buy us much. > > BTW, it seems that errno is not set on Windows when rename > fails, but we still try to print the OS error message in > InstallXLogFileSegment(). > When I tested the case where another process is keeping the > file locked, for example, I got this: > > ERROR: could not rename file > "pg_xlog/000000010000000100000073" to > "pg_xlog/000000010000000100000092" (initialization of log > file 1, segment 146): No such file or directory > > even though the file clearly exists, it's just locked. I'm > not sure where errno is coming from in that case, and if we > should do something about that, but that exceeds my appetite > for fixing Windows issues right now. > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com >
pgsql-bugs by date: