Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Andres Freund
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id 20190820164423.7ztjy4lnfwfi2iqn@alap3.anarazel.de
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs
Re: POC: Cleaning up orphaned files using undo logs
List pgsql-hackers
Hi,

On 2019-08-20 21:02:18 +1200, Thomas Munro wrote:
> Aside from code changes based on review (and I have more to come of
> those), the attached experimental patchset (also at
> https://github.com/EnterpriseDB/zheap/tree/undo) has a new protocol
> that, I hope, allows for better concurrency, reliability and
> readability, and removes a bunch of TODO notes about questionable
> interlocking.  However, I'm not quite done figuring out if the bufmgr
> interaction is right and will be manageable on the undoaccess side, so
> I'm hoping to get some feedback, not asking for anyone to rebase on
> top of it yet.
> 
> Previously, there were two LWLocks used to make sure that all
> discarding was prevented while anyone was reading or writing data in
> any part of an undo log, and (probably more problematically) vice
> versa.  Here's a new approach that removes that blocking:
> 
> 1.  Anyone is allowed to try to read or write data at any UndoRecPtr
> that has been allocated, through the buffer pool (though you'd usually
> want to check it with UndoRecPtrIsDiscarded() first, and only rely on
> the system I'm describing to deal with races).
> 
> 2.  ReadBuffer() might return InvalidBuffer.  This can happen for a
> cache miss, if the smgrread implementation wants to indicate that the
> buffer has been discarded/truncated and that is expected (md.c won't
> ever do that, but undofile.c can).

Hm. This gives me a bit of a stomach ache. It somehow feels like a weird
form of signalling.  Can't quite put my finger on why it makes me feel
queasy.


> 3.  UndoLogDiscard() uses DiscardBuffer() to invalidate any currently
> unpinned buffers, and marks as BM_DISCARDED any that happen to be
> pinned right now, so they can't be immediately invalidated.  Such
> buffers are never written back and are eligible for reuse on the next
> clock sweep, even if they're written into by a backend that managed to
> do that when we were trying to discard.

Hm. When is it legitimate for a backend to write into such a buffer? I
guess that's about updating the previous transaction's next pointer? Or
progress info?


> 5.  Separating begin from discard allows the WAL logging for
> UndoLogDiscard() to do filesystem actions before logging, and other
> effects after logging, which have several nice properties if you work
> through the various crash scenarios.

Hm. ISTM we always need to log before doing some filesystem operation
(see also my recent complaint Robert and I are discussing at the bottom
of [1]). It's just that we can have a separate stage afterwards?

[1] https://www.postgresql.org/message-id/CA%2BTgmoZc5JVYORsGYs8YnkSxUC%3DcLQF1Z%2BfcpH2TTKvqkS7MFg%40mail.gmail.com



> So now I'd like to get feedback on the sanity of this scheme.  I'm not
> saying it doesn't have bugs right now -- I've been trying to figure
> out good ways to test it and I'm not quite there yet -- but the
> concept.  One observation I have is that there were already code paths
> in undoaccess.c that can tolerate InvalidBuffer in recovery, due to
> the potentially different discard timing for DO vs REDO.  I think
> that's a point in favour of this scheme, but I can see that it's
> inconvenient to have to deal with InvalidBuffer whenever you read.

FWIW, I'm far from convinced that those are currently quite right. See
discussion pointed to above.



> I pulled in the latest code from undoprocessing as of today, and I
> might be a bit confused about "Defect and enhancement in multi-log
> support" some of which I have squashed into the  make undolog patch.
> BTW undoprocessing builds with initialized variable warnings in xact.c
> on clang today.

I've complained about that existance of that commit multiple times
now. So far without any comments.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: Global temporary tables
Next
From: Pavel Stehule
Date:
Subject: Re: Global temporary tables