Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CAFiTN-sCsaWtiAHzXAbQS0x6YkVLTEGj15jHEJSc6qu4S7jEyw@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
|
List | pgsql-hackers |
On Tue, Jul 30, 2019 at 12:21 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Tue, Jul 30, 2019 at 5:03 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I think this idea is good for the DO time but during REDO time it will > > not work as we will not have the transaction state. Having said that > > the current idea of keeping in the global variable will also not work > > during REDO time because the WAL from the different transaction can be > > interleaved. There are few ideas to handle this issue > > > > 1. At DO time keep in TopTransactionState as you suggested and during > > recovery time read from the first complete record on the page. > > 2. Just to keep the code uniform always read from the first complete > > record of the page. > > It contains another > suggestion for that problem you just mentioned (and also me pointing > out what you just pointed out, since I wrote it earlier) though I'm > not sure if it's better than your options above. Thanks, Thomas for your review, Currently, I am replying to the problem which both of us has identified and found a different set of solutions. I will go through other comments soon and work on those. > +UndoRecPtr > +PrepareUndoInsert(UndoRecordInsertContext *context, > + UnpackedUndoRecord *urec, > + Oid dbid) > +{ > ... > + /* Fetch compression info for the transaction. */ > + compression_info = GetTopTransactionUndoCompressionInfo(category); > > How can this work correctly in recovery? [Edit: it doesn't, as you > just pointed out] > > > One data structure that could perhaps hold this would be > UndoLogTableEntry (the per-backend cache, indexed by undo log number, > with pretty fast lookups; used for things like > UndoLogNumberGetCategory()). As long as you never want to have > inter-transaction compression, that should have the right scope to > give recovery per-undo log tracking. If you ever wanted to do > compression between transactions too, maybe UndoLogSlot could work, > but that'd have more complications. I think this could be a good idea. I had thought of keeping in the slot as my 3rd option but later I removed it thinking that we need to expose the compression field to the undo log layer. I think keeping in the UndoLogTableEntry is a better idea than keeping in the slot. But, I still have the same problem that we need to expose undo record-level fields to undo log layer to compute the cache entry size. OTOH, If we decide to get from the first record of the page (as I mentioned up thread) then I don't think there is any performance issue because we are inserting on the same page. But, for doing that we need to unpack the complete undo record (guaranteed to be on one page). And, UnpackUndoData will internally unpack the payload data as well which is not required in our case unless we change UnpackUndoData such that it unpacks only what the caller wants (one input parameter will do). I am not sure out of these two which idea is better? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: