Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CA+hUKGJL4X1em70rxN1d_EC3rxiVhVd1woHviydW=Hr2PeGBpg@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Antonin Houska <ah@cybertec.at>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
|
List | pgsql-hackers |
On Thu, Nov 12, 2020 at 10:15 PM Antonin Houska <ah@cybertec.at> wrote: > Thomas Munro <thomas.munro@gmail.com> wrote: > > Thanks. We decided to redesign a couple of aspects of the undo > > storage and record layers that this patch was intended to demonstrate, > > and work on that is underway. More on that soon. > > As my boss expressed in his recent blog post, we'd like to contribute to the > zheap development, and a couple of developers from other companies are > interested in this as well. Amit Kapila suggested that the "cleanup of > orphaned files" feature is a good start point in getting the code into PG > core, so I've spent some time on it and tried to rebase the patch set. Hi Antonin, I saw that -- great news! -- and have been meaning to write for a while. I think I am nearly ready to talk about it again. I agree 100% that it's worth trying to do something much simpler than a new access manager, and this was the simplest useful feature solving a real-world-problem-that-people-actually-have we could come up with (based on an idea from Robert). I think it needs a convincing explanation for why there is no scenario where the relfilenode is recycled for a new unlucky table before the rollback is executed, which might depend on details that you might be working on/changing (scenarios where you execute undo twice because you forgot you already did it). > In fact what I did is not mere rebasing against the current master branch - > I've also (besides various bug fixes) done some design changes. > > Incorporated the new Undo Record Set (URS) infrastructure > --------------------------------------------------------- > > This is also pointed out in [0]. > > I started from [1] and tried to implement some missing parts (e.g. proper > closing of the URSs after crash), introduced UNDO_DEBUG preprocessor macro > which makes the undo log segments very small and fixed some bugs that the > small segments exposed. Cool! Getting up to speed on all these made up concepts like URS, and getting all these pieces assembled and rebased and up and running is already quite something, let alone adding missing parts and debugging. > The most significant change I've done was removal of the undo requests from > checkpoint. I could not find any particular bug / race conditions related to > including the requests into the checkpoint, but I concluded that it's easier > to think about consistency and checkpoint timings if we scan the undo log on > restart (after recovery has finished) and create the requests from scratch. Interesting. I guess that would be closer to textbook three-phase ARIES. > [2] shows where I ended up before I started to rebase this patchset. > > No background undo > ------------------ > > Reduced complexity of the patch seems to be the priority at the moment. Amit > suggested that cleanup of an orphaned relation file is simple enough to be > done on foreground and I agree. > > "undo worker" is still there, but it only processes undo requests after server > restart because relation data can only be changed in a transaction - it seems > cleaner to launch a background worker for this than to hack the startup > process. I suppose the simplest useful system would be one does the work at startup before allowing connections, and also in regular backends, and panics if a backend ever exits while it has pending undo (panic = "goto crash recovery"). Then you don't have to deal with undo workers running at the same time as regular sessions which might run into trouble reacquiring locks (for an AM I mean), or due to OIDs being recycled with multiple checkpoints, or undo work that gets deferred until the next restart of the server. > Since the concept of undo requests is closely related to the undo worker, I > removed undorequest.c too. The new (much simpler) undo worker gets the > information on incomplete / aborted transactions from the undo log as > mentioned above. > > SMGR enhancement > ---------------- > > I used the 0001 patch from [3] rather than [4], although it's more invasive > because I noticed somewhere in the discussion that there should be no reserved > database OID for the undo log. (InvalidOid cannot be used because it's already > in use for shared catalogs.) I give up thinking about the colour of the BufferTag shed and went back to magic database 9, mainly because there seemed to be more pressing matters. I don't even think it's that crazy to store this type of system-wide data in pseudo databases, and I know of other systems that do similar sorts of things without blinking... > Following are a few areas which are not implemented yet because more > discussion is needed there: Hmm. I'm thinking about these questions.
pgsql-hackers by date: