Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Robert Haas
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CA+Tgmob_PGNVoj8=jNC2mc6K1crtUxdn=zFoX2Xx63G0uxAzjg@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, Aug 5, 2019 at 12:42 PM Andres Freund <andres@anarazel.de> wrote:
> A good move in the right direction, imo.

I spent some more time thinking about this and talking to Thomas about
it and I'd like to propose a somewhat more aggressive restructuring
proposal, with the aim of getting a cleaner separation between layers
of this patch set.

Right now, the undo log storage stuff knows nothing about the contents
of an undo log, whereas the undo interface storage knows everything
about the contents of an undo log. In particular, it knows that it's a
series of records, and those records are grouped into transactions,
and it knows both the format of the individual records and also the
details of how transaction headers work. Nothing can use the undo log
storage system except for the undo interface layer, because the undo
interface layer assumes that all the data in the undo storage system
conforms to the record/recordset format which it defines. However,
there are a few warts: while the undo log storage patch doesn't know
anything about the contents of undo logs, it does know that that
transaction boundaries matter, and it signals to the undo interface
layer whether a transaction header should be inserted for a new
record.  That's a strange thing for the storage layer to be doing.
Also, in addition to three persistence levels, it knows about a fourth
undo log category for "special" data for multixact or TPD-like things.
That's another wart.

Suppose that we instead invent a new layer which sits on top of the
undo log storage layer.  This layer manages what I'm going to call
GHOBs, growable hunks of bytes.  (This is probably not the best name,
but I thought of it in 5 seconds during a complex technical
conversation, so bear with me.)  The GHOB layer supports
open/close/grow/write/overwrite operations.  Conceptually, you open a
GHOB with an initial size and a persistence level, and then you can
subsequently grow it unless you fill up the undo log in which case you
can't grow it any more; when you're done, you close it.  Opening and
closing a GHOB are operations that only make in-memory state changes.
Opening a GHOB finds a place where you could write the initial amount
of data you specify, but it doesn't actually write any data or change
any persistent state yet, except for making sure that nobody else can
grab that space as long as you have the GHOB open.  Closing a GHOB
tells the system that you're not going to grow the object any more,
which means some other GHOB can be placed immediately after the last
data you wrote.  Growing a GHOB doesn't do anything persistent either;
it just tests whether there would be room to write those bytes.  So,
the only operations that make actual persistent changes are write and
overwrite.  These operations just copy data into shared buffers and
mark them dirty, but they are set up so that you can integrate this
with whatever WAL-logging your doing for those operations, so that you
can make the same writes happen at redo time.

Then, on top of the GHOB layer, you have separate submodules for
different kinds of GHOBs.  Most importantly, you have a
transaction-GHOB manager, which opens a GHOB per persistence level the
first time somebody wants to write to it and closes those GHOBs at
end-of-xact.  AMs push records into the transaction-GHOB manager, and
it pushes them into GHOBs on the other side. Then you can also have a
multi-GHOB manager, which would replace what Thomas now has as a
separate undo log category.  The undo-log-storage layer wouldn't have
any fixed limit on the number of GHOBs that could be open at the same
time; it would just be the sum of whatever the individual GHOB type
managers can open. It would be important to keep that number fairly
small since there's not an unlimited supply of undo logs, but that
doesn't seem like a problem for any of the uses we currently have in
mind.  Each GHOB would begin with a magic number identifying the GHOB
type, and would have callbacks for everything else, like "how big is
this GHOB?" and "is it discardable?".

I'm not totally sure I've thought through all of the problems here,
but it seems like this might help us fix some of the aforementioned
layering inversions. The undo log storage system only knows about
storage: it doesn't have to help with things like transaction
boundaries any more, and it continues to be indifferent to the actual
contents of the storage.  At the GHOB layer, we know that we've got
chunks of storage which are the unit of undo discard, and we know that
they start with a magic number that identifies the type, but it
doesn't know whether they are internally broken into records or, if
so, how those records are organized. The individual GHOB managers do
know that stuff; for example, the transaction-GHOB manager would know
that AMs insert undo records and how those records are compressed and
so forth.  One thing that feels good about this system is that you
could actually write something like the test_undo module that Thomas
had in an older patch set.  He threw it away because it doesn't play
nice with the way the undorecord/undoaccess stuff works: that stuff
thinks that all undo records have to be in the format that it knows
about, and if they're not, it will barf. With this, test_undo could
define its own kind of GHOB that keeps stuff until it's explicitly
told to throw it away, and that'd be fine for 'make check' (but not
'make installcheck', probably).

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: An out-of-date comment in nodeIndexonlyscan.c
Next
From: Alexander Korotkov
Date:
Subject: Re: Psql patch to show access methods info