Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CA+Tgmob_PGNVoj8=jNC2mc6K1crtUxdn=zFoX2Xx63G0uxAzjg@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Mon, Aug 5, 2019 at 12:42 PM Andres Freund <andres@anarazel.de> wrote: > A good move in the right direction, imo. I spent some more time thinking about this and talking to Thomas about it and I'd like to propose a somewhat more aggressive restructuring proposal, with the aim of getting a cleaner separation between layers of this patch set. Right now, the undo log storage stuff knows nothing about the contents of an undo log, whereas the undo interface storage knows everything about the contents of an undo log. In particular, it knows that it's a series of records, and those records are grouped into transactions, and it knows both the format of the individual records and also the details of how transaction headers work. Nothing can use the undo log storage system except for the undo interface layer, because the undo interface layer assumes that all the data in the undo storage system conforms to the record/recordset format which it defines. However, there are a few warts: while the undo log storage patch doesn't know anything about the contents of undo logs, it does know that that transaction boundaries matter, and it signals to the undo interface layer whether a transaction header should be inserted for a new record. That's a strange thing for the storage layer to be doing. Also, in addition to three persistence levels, it knows about a fourth undo log category for "special" data for multixact or TPD-like things. That's another wart. Suppose that we instead invent a new layer which sits on top of the undo log storage layer. This layer manages what I'm going to call GHOBs, growable hunks of bytes. (This is probably not the best name, but I thought of it in 5 seconds during a complex technical conversation, so bear with me.) The GHOB layer supports open/close/grow/write/overwrite operations. Conceptually, you open a GHOB with an initial size and a persistence level, and then you can subsequently grow it unless you fill up the undo log in which case you can't grow it any more; when you're done, you close it. Opening and closing a GHOB are operations that only make in-memory state changes. Opening a GHOB finds a place where you could write the initial amount of data you specify, but it doesn't actually write any data or change any persistent state yet, except for making sure that nobody else can grab that space as long as you have the GHOB open. Closing a GHOB tells the system that you're not going to grow the object any more, which means some other GHOB can be placed immediately after the last data you wrote. Growing a GHOB doesn't do anything persistent either; it just tests whether there would be room to write those bytes. So, the only operations that make actual persistent changes are write and overwrite. These operations just copy data into shared buffers and mark them dirty, but they are set up so that you can integrate this with whatever WAL-logging your doing for those operations, so that you can make the same writes happen at redo time. Then, on top of the GHOB layer, you have separate submodules for different kinds of GHOBs. Most importantly, you have a transaction-GHOB manager, which opens a GHOB per persistence level the first time somebody wants to write to it and closes those GHOBs at end-of-xact. AMs push records into the transaction-GHOB manager, and it pushes them into GHOBs on the other side. Then you can also have a multi-GHOB manager, which would replace what Thomas now has as a separate undo log category. The undo-log-storage layer wouldn't have any fixed limit on the number of GHOBs that could be open at the same time; it would just be the sum of whatever the individual GHOB type managers can open. It would be important to keep that number fairly small since there's not an unlimited supply of undo logs, but that doesn't seem like a problem for any of the uses we currently have in mind. Each GHOB would begin with a magic number identifying the GHOB type, and would have callbacks for everything else, like "how big is this GHOB?" and "is it discardable?". I'm not totally sure I've thought through all of the problems here, but it seems like this might help us fix some of the aforementioned layering inversions. The undo log storage system only knows about storage: it doesn't have to help with things like transaction boundaries any more, and it continues to be indifferent to the actual contents of the storage. At the GHOB layer, we know that we've got chunks of storage which are the unit of undo discard, and we know that they start with a magic number that identifies the type, but it doesn't know whether they are internally broken into records or, if so, how those records are organized. The individual GHOB managers do know that stuff; for example, the transaction-GHOB manager would know that AMs insert undo records and how those records are compressed and so forth. One thing that feels good about this system is that you could actually write something like the test_undo module that Thomas had in an older patch set. He threw it away because it doesn't play nice with the way the undorecord/undoaccess stuff works: that stuff thinks that all undo records have to be in the format that it knows about, and if they're not, it will barf. With this, test_undo could define its own kind of GHOB that keeps stuff until it's explicitly told to throw it away, and that'd be fine for 'make check' (but not 'make installcheck', probably). Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: