Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CA+TgmoZD2YcWV1E+kLShbnSG7UUvybDkAyFm9K2GA9g-ai0Ajg@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
Re: POC: Cleaning up orphaned files using undo logs |
List | pgsql-hackers |
On Tue, Jun 18, 2019 at 2:07 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jun 18, 2019 at 7:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > [ new patches ] > > I tried writing some code [ to use these patches ]. I spent some more time experimenting with this patch set today and I think that the UndoFetchRecord interface is far too zheap-centric. I expected that I would be able to do this: UnpackedUndoRecord *uur = UndoFetchRecord(urp); // do stuff with uur UndoRecordRelease(uur); But I can't, because the UndoFetchRecord API requires me to pass not only an undo record but also a block number, an offset number, an XID, and a callback. I think I could get the effect that I want by defining a callback that always returns true. Then I could do: UndoRecPtr junk; UnpackedUndoRecord *uur = UndoFetchRecord(urp, InvalidBlockNumber, InvalidOffsetNumber, &junk, always_returns_true); // do stuff with uur UndoRecordRelease(uur); That seems ridiculously baroque. I think the most common thing that an AM will want to do with an UndoRecPtr is look up that exact record; that is, for example, what zedstore will want to do. However, even if some AMs, like zheap, want to search backward through a chain of records, there's no real reason to suppose that all of them will want to search by block number + offset. They might want to search by some bit of data buried in the payload, for example. I think the basic question here is whether we really need anything more complicated than: extern UnpackedUndoRecord *UndoFetchRecord(UndoRecPtr urp); I mean, if you had that, the caller can implement looping easily enough, and insert any test they want: for (;;) { UnpackedUndoRecord *uur = UndoFetchRecord(urp); if (i like this one) break; urp = uur->uur_blkprev; // should be renamed, since zedstore + probably others will have tuple chains not block chains UndoRecordRelease(uur); } The question in my mind is whether there's some performance advantage of having the undo layer manage the looping rather than the caller do it. If there is, then there's a lot of zheap code that ought to be changed to use it, because it's just using the same satisfies-callback everywhere. If there's not, we should just simplify the undo record lookup along the lines mentioned above and put all the looping into the callers. zheap could provide a wrapper around UndoFetchRecord that does a search by block and offset, so that we don't have to repeat that logic in multiple places. BTW, an actually generic iterator interface would probably look more like this: typedef bool (*SatisfyUndoRecordCallback)(void *callback_data, UnpackedUndoRecord *uur); extern UnpackedUndoRecord *UndoFetchRecord(UndoRecPtr urp, UndoRecPtr *found, SatisfyUndoRecordCallback callback, void *callback_data); Now we're not assuming anything about what parts of the record the callback wants to examine. It can do whatever it likes. Typically with this sort of interface a caller will define a file-private struct that is known to both the callback and the caller of UndoFetchRecord, but not elsewhere. If we decide we need an iterator within the undo machinery itself, then I think it should look like the above, and I think it should accept NULL for found, callback, and callback_data, so that somebody who wants to just look up a record, full stop, can do just: UnpackedUndoRecord *uur = UndoFetchRecord(urp, NULL, NULL, NULL); which seems tolerable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: