Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CA+hUKG+CwBUbdMSPxN5+z3y3zoknYr_7TyrWSL4RcPGACbaB7Q@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Asim R P <apraveen@pivotal.io>)
Responses Re: POC: Cleaning up orphaned files using undo logs
List pgsql-hackers
On Mon, Jun 10, 2019 at 5:35 AM Asim R P <apraveen@pivotal.io> wrote:
> My understanding is smgr pendingDeletes infrastructure will be replaced by these patches.  I still see
CommitTransaction()calling smgrDoPendingDeletes() in the latest patch set.  Am I missing something?
 

Hi Asim,

Thanks for looking at the patch.

The pendingDeletes list is used both for files that should be deleted
if we commit and files that should be deleted if we abort.  This patch
deals only with the abort case, using the undo log instead of
pendingDeletes.  That is the file leak scenario that has an
arbitrarily wide window controlled by the user and is probably the
source of almost all cases that you hear of of disks filling up with
orphaned junk AFAICS.

There could in theory be a persistent stuff-to-do-if-we-commit system
exactly unlike undo logs (records to be discarded on abort, executed
on commit).  I haven't thought much about how it'd work, but Andres
did suggest something like that for another purpose just the other
day, and although it's hard to think of a name for it, it doesn't seem
crazy as long as it doesn't add overheads when you're not using it.
Without such a mechanism, you can probably leak files belonging to
tables that you have dropped in a committed transaction, if you die in
CommitTransaction() after it has called RecordTransactionCommit() but
before it reaches smgrDoPendingDeletes(), and even then probably only
if there is super well-timed checkpoint so that you recover without
replaying the drop.  I'm not try to tackle that today.

BTW, there is yet another kind of deferred unlinking going on.  In
SyncPostCheckpoint() (formerly known as mdpostckpt()) we defer the
last bit of the job until after the next checkpoint.  At that point we
only expect the first segment to exist and we expect it to be empty.
That's a mechanism introduced by commit 6cc4451b5c47 to make sure that
we don't reuse relfilenode numbers too soon in some crash scenarios.
That means there is another very narrow window there to leak a file
(though these ones are empty): you die after the checkpoint is logged
but before SyncPostCheckpoint() is run, or even after that but before
the operating system has flushed the directory.

-- 
Thomas Munro
https://enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: heapam_index_build_range_scan's anyvisible
Next
From: Alvaro Herrera
Date:
Subject: Re: Should we warn against using too many partitions?