Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CAA4eK1+H6opxK1YENP5DqggRhobO78GwTPUK41cESypYDpKu2w@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Mon, May 13, 2019 at 11:36 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Sun, May 12, 2019 at 2:15 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I have removed some of the globals and also improved some comments. > > I don't like the discard_lock very much. Perhaps it's OK, but I hope > that there are better alternatives. One problem with Thomas Munro > pointed out to me in off-list discussion is that the discard_lock has > to be held by anyone reading undo even if the undo they are reading > and the undo that the discard worker wants to discard are in > completely different parts of the undo log. Somebody could be trying > to read an undo page written 1 second ago while the discard worker is > trying to discard an undo page written to the same undo log 1 hour > ago. Those things need not block each other, but with this design > they will. > Yeah, this doesn't appear to be a good way to deal with the problem. > Another problem is that we end up holding it across an > I/O; there's precedent for that, but it's not particularly good > precedent. Let's see if we can do better. > > But then I had what I think may be a better idea. > +1. I also think the below idea is better than the previous one. > Let's add a new > ReadBufferMode that suppresses the actual I/O; if the buffer is not > already present in shared_buffers, it allocates a buffer but returns > it without doing any I/O, so the caller must be prepared for BM_VALID > to be unset. I don't know what to call this, so I'll call it > RBM_ALLOCATE (leaving room for possible future variants like > RBM_ALLOCATE_AND_LOCK). Then, the protocol for reading an undo buffer > would go like this: > > 1. Read the buffer with RBM_ALLOCATE, thus acquiring a pin on the > relevant buffer. > 2. Check whether the buffer precedes the discard horizon for that undo > log stored in shared memory. > 3. If so, use the ForgetBuffer() code we have in the zheap branch to > deallocate the buffer and stop here. The undo is not available to be > read, whether it's still physically present or not. > 4. Otherwise, if the buffer is not valid, call ReadBufferExtended > again, or some new function, to make it so. Remember to release all > of our pins. > > The protocol for discarding an undo buffer would go like this: > > 1. Advance the discard horizon in shared memory. > 2. Take a cleanup lock on each buffer that ought to be discarded. > Remember the dirty ones and forget the others. > 3. WAL-log the discard operation. > 4. Revisit the dirty buffers we remembered in step 2 and forget them. > > The idea is that, once we've advanced the discard horizon in shared > memory, any readers that come along later are responsible for making > sure that they never do I/O on any older undo. They may create some > invalid buffers in shared memory, but they'll hopefully also get rid > of them if they do, and if they error out for some reason before doing > so, that buffer should age out naturally. So, the discard worker just > needs to worry about buffers that already exist. Once it's taken a > cleanup lock on each buffer, it knows that there are no I/O operations > and in fact no buffer usage of any kind still in progress from before > it moved the in-memory discard horizon. Anyone new that comes along > will clean up after themselves. We postpone forgetting dirty buffers > until after we've successfully WAL-logged the discard, in case we fail > to do so. > I have spent some time thinking over this and couldn't see any problem with this. So, +1 for trying this out on the lines of what you have described above. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: