pgsql: Don't error out if recycling or removing an old WAL segment fails - Mailing list pgsql-committers

From heikki@postgresql.org (Heikki Linnakangas)
Subject pgsql: Don't error out if recycling or removing an old WAL segment fails
Date
Msg-id 20090913183217.9E30D753FB7@cvs.postgresql.org
Whole thread Raw
List pgsql-committers
Log Message:
-----------
Don't error out if recycling or removing an old WAL segment fails at the end
of checkpoint. Although the checkpoint has been written to WAL at that point
already, so that all data is safe, and we'll retry removing the WAL segment at
the next checkpoint, if such a failure persists we won't be able to remove any
other old WAL segments either and will eventually run out of disk space. It's
better to treat the failure as non-fatal, and move on to clean any other WAL
segment and continue with any other end-of-checkpoint cleanup.

We don't normally expect any such failures, but on Windows it can happen with
some anti-virus or backup software that lock files without FILE_SHARE_DELETE
flag.

Also, the loop in pgrename() to retry when the file is locked was broken. If a
file is locked on Windows, you get ERROR_SHARE_VIOLATION, not
ERROR_ACCESS_DENIED, at least on modern versions. Fix that, although I left
the check for ERROR_ACCESS_DENIED in there as well (presumably it was correct
in some environment), and added ERROR_LOCK_VIOLATION to be consistent with
similar checks in pgwin32_open(). Reduce the timeout on the loop from 30s to
10s, on the grounds that since it's been broken, we've effectively had a
timeout of 0s and no-one has complained, so a smaller timeout is actually
closer to the old behavior. A longer timeout would mean that if recycling a
WAL file fails because it's locked for some reason, InstallXLogFileSegment()
will hold ControlFileLock for longer, potentially blocking other backends, so
a long timeout isn't totally harmless.

While we're at it, set errno correctly in pgrename().

Backpatch to 8.2, which is the oldest version supported on Windows. The xlog.c
changes would make sense on other platforms and thus on older versions as
well, but since there's no such locking issues on other platforms, it's not
worth it.

Tags:
----
REL8_4_STABLE

Modified Files:
--------------
    pgsql/src/backend/access/transam:
        xlog.c (r1.345.2.4 -> r1.345.2.5)
        (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c?r1=1.345.2.4&r2=1.345.2.5)
    pgsql/src/port:
        dirmod.c (r1.58 -> r1.58.2.1)
        (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/port/dirmod.c?r1=1.58&r2=1.58.2.1)

pgsql-committers by date:

Previous
From: heikki@postgresql.org (Heikki Linnakangas)
Date:
Subject: pgsql: Don't error out if recycling or removing an old WAL segment fails
Next
From: heikki@postgresql.org (Heikki Linnakangas)
Date:
Subject: pgsql: Don't error out if recycling or removing an old WAL segment fails