Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
| From | Robert Haas |
|---|---|
| Subject | Re: POC: Cleaning up orphaned files using undo logs |
| Date | |
| Msg-id | CA+TgmoZD2YcWV1E+kLShbnSG7UUvybDkAyFm9K2GA9g-ai0Ajg@mail.gmail.com Whole thread Raw |
| In response to | Re: POC: Cleaning up orphaned files using undo logs (Robert Haas <robertmhaas@gmail.com>) |
| Responses |
Re: POC: Cleaning up orphaned files using undo logs
Re: POC: Cleaning up orphaned files using undo logs |
| List | pgsql-hackers |
On Tue, Jun 18, 2019 at 2:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jun 18, 2019 at 7:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > [ new patches ]
>
> I tried writing some code [ to use these patches ].
I spent some more time experimenting with this patch set today and I
think that the UndoFetchRecord interface is far too zheap-centric. I
expected that I would be able to do this:
UnpackedUndoRecord *uur = UndoFetchRecord(urp);
// do stuff with uur
UndoRecordRelease(uur);
But I can't, because the UndoFetchRecord API requires me to pass not
only an undo record but also a block number, an offset number, an XID,
and a callback. I think I could get the effect that I want by
defining a callback that always returns true. Then I could do:
UndoRecPtr junk;
UnpackedUndoRecord *uur = UndoFetchRecord(urp, InvalidBlockNumber,
InvalidOffsetNumber, &junk, always_returns_true);
// do stuff with uur
UndoRecordRelease(uur);
That seems ridiculously baroque. I think the most common thing that
an AM will want to do with an UndoRecPtr is look up that exact record;
that is, for example, what zedstore will want to do. However, even if
some AMs, like zheap, want to search backward through a chain of
records, there's no real reason to suppose that all of them will want
to search by block number + offset. They might want to search by some
bit of data buried in the payload, for example.
I think the basic question here is whether we really need anything
more complicated than:
extern UnpackedUndoRecord *UndoFetchRecord(UndoRecPtr urp);
I mean, if you had that, the caller can implement looping easily
enough, and insert any test they want:
for (;;)
{
UnpackedUndoRecord *uur = UndoFetchRecord(urp);
if (i like this one)
break;
urp = uur->uur_blkprev; // should be renamed, since zedstore +
probably others will have tuple chains not block chains
UndoRecordRelease(uur);
}
The question in my mind is whether there's some performance advantage
of having the undo layer manage the looping rather than the caller do
it. If there is, then there's a lot of zheap code that ought to be
changed to use it, because it's just using the same satisfies-callback
everywhere. If there's not, we should just simplify the undo record
lookup along the lines mentioned above and put all the looping into
the callers. zheap could provide a wrapper around UndoFetchRecord
that does a search by block and offset, so that we don't have to
repeat that logic in multiple places.
BTW, an actually generic iterator interface would probably look more like this:
typedef bool (*SatisfyUndoRecordCallback)(void *callback_data,
UnpackedUndoRecord *uur);
extern UnpackedUndoRecord *UndoFetchRecord(UndoRecPtr urp, UndoRecPtr
*found, SatisfyUndoRecordCallback callback, void *callback_data);
Now we're not assuming anything about what parts of the record the
callback wants to examine. It can do whatever it likes. Typically
with this sort of interface a caller will define a file-private struct
that is known to both the callback and the caller of UndoFetchRecord,
but not elsewhere.
If we decide we need an iterator within the undo machinery itself,
then I think it should look like the above, and I think it should
accept NULL for found, callback, and callback_data, so that somebody
who wants to just look up a record, full stop, can do just:
UnpackedUndoRecord *uur = UndoFetchRecord(urp, NULL, NULL, NULL);
which seems tolerable.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
pgsql-hackers by date: