Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CAA4eK1+H6opxK1YENP5DqggRhobO78GwTPUK41cESypYDpKu2w@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, May 13, 2019 at 11:36 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sun, May 12, 2019 at 2:15 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > I have removed some of the globals and also improved some comments.
>
> I don't like the discard_lock very much.  Perhaps it's OK, but I hope
> that there are better alternatives.  One problem with Thomas Munro
> pointed out to me in off-list discussion is that the discard_lock has
> to be held by anyone reading undo even if the undo they are reading
> and the undo that the discard worker wants to discard are in
> completely different parts of the undo log.  Somebody could be trying
> to read an undo page written 1 second ago while the discard worker is
> trying to discard an undo page written to the same undo log 1 hour
> ago.  Those things need not block each other, but with this design
> they will.
>

Yeah, this doesn't appear to be a good way to deal with the problem.

>  Another problem is that we end up holding it across an
> I/O; there's precedent for that, but it's not particularly good
> precedent.  Let's see if we can do better.
>
> But then I had what I think may be a better idea.
>

+1.  I also think the below idea is better than the previous one.

>  Let's add a new
> ReadBufferMode that suppresses the actual I/O; if the buffer is not
> already present in shared_buffers, it allocates a buffer but returns
> it without doing any I/O, so the caller must be prepared for BM_VALID
> to be unset.  I don't know what to call this, so I'll call it
> RBM_ALLOCATE (leaving room for possible future variants like
> RBM_ALLOCATE_AND_LOCK).  Then, the protocol for reading an undo buffer
> would go like this:
>
> 1. Read the buffer with RBM_ALLOCATE, thus acquiring a pin on the
> relevant buffer.
> 2. Check whether the buffer precedes the discard horizon for that undo
> log stored in shared memory.
> 3. If so, use the ForgetBuffer() code we have in the zheap branch to
> deallocate the buffer and stop here.  The undo is not available to be
> read, whether it's still physically present or not.
> 4. Otherwise, if the buffer is not valid, call ReadBufferExtended
> again, or some new function, to make it so.  Remember to release all
> of our pins.
>
> The protocol for discarding an undo buffer would go like this:
>
> 1. Advance the discard horizon in shared memory.
> 2. Take a cleanup lock on each buffer that ought to be discarded.
> Remember the dirty ones and forget the others.
> 3. WAL-log the discard operation.
> 4. Revisit the dirty buffers we remembered in step 2 and forget them.
>
> The idea is that, once we've advanced the discard horizon in shared
> memory, any readers that come along later are responsible for making
> sure that they never do I/O on any older undo.  They may create some
> invalid buffers in shared memory, but they'll hopefully also get rid
> of them if they do, and if they error out for some reason before doing
> so, that buffer should age out naturally.  So, the discard worker just
> needs to worry about buffers that already exist.  Once it's taken a
> cleanup lock on each buffer, it knows that there are no I/O operations
> and in fact no buffer usage of any kind still in progress from before
> it moved the in-memory discard horizon.  Anyone new that comes along
> will clean up after themselves.  We postpone forgetting dirty buffers
> until after we've successfully WAL-logged the discard, in case we fail
> to do so.
>

I have spent some time thinking over this and couldn't see any problem
with this.  So, +1 for trying this out on the lines of what you have
described above.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Hubert Zhang
Date:
Subject: Replace hashtable growEnable flag
Next
From: Kuntal Ghosh
Date:
Subject: Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY