Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CAFiTN-vDrXuL6tHK1f_V9PAXp2+EFRpPtxCG_DRx08PZXAPkyw@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Tue, Aug 6, 2019 at 1:26 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2019-08-05 11:29:34 -0700, Andres Freund wrote: > > Need to do something else for a bit. More later. > > Here we go. > Thanks for the review. I will work on them. Currently, I need suggestions on some of the review comments. > > > + /* > > + * Compute the header size of the undo record. > > + */ > > +Size > > +UndoRecordHeaderSize(uint16 uur_info) > > +{ > > + Size size; > > + > > + /* Add fixed header size. */ > > + size = SizeOfUndoRecordHeader; > > + > > + /* Add size of transaction header if it presets. */ > > + if ((uur_info & UREC_INFO_TRANSACTION) != 0) > > + size += SizeOfUndoRecordTransaction; > > + > > + /* Add size of rmid if it presets. */ > > + if ((uur_info & UREC_INFO_RMID) != 0) > > + size += sizeof(RmgrId); > > + > > + /* Add size of reloid if it presets. */ > > + if ((uur_info & UREC_INFO_RELOID) != 0) > > + size += sizeof(Oid); > > + > There's numerous blocks with one if for each type, and the body copied > basically the same for each alternative. That doesn't seem like a > reasonable approach to me. Means that many places need to be adjusted > when we invariably add another type, and seems likely to lead to bugs > over time. > I agree with the point that we are repeating this in a couple of function and doing different actions e.g. In this function we are computing the size and in some other function we are copying the field. I am not sure what would be the best way to handle it? One approach could just write one function which handles all these cases but the caller will suggest what action to take. Basically, it will look like this. Function (uur_info, action) { if ((uur_info & UREC_INFO_TRANSACTION) != 0) { // if action is compute header size size += SizeOfUndoRecordTransaction; //else if action is copy to dest dest = src ... } Repeat for other types } But, IMHO that function will look confusing to anyone that what exactly it's trying to achieve. If anyone has a better idea please suggest. > > +/* > > + * Insert the undo record into the input page from the unpack undo context. > > + * > > + * Caller can call this function multiple times until desired stage is reached. > > + * This will write the undo record into the page. > > + */ > > +void > > +InsertUndoData(UndoPackContext *ucontext, Page page, int starting_byte) > > +{ > > + char *writeptr = (char *) page + starting_byte; > > + char *endptr = (char *) page + BLCKSZ; > > + > > + switch (ucontext->stage) > > + { > > + case UNDO_PACK_STAGE_HEADER: > > + /* Insert undo record header. */ > > + if (!InsertUndoBytes((char *) &ucontext->urec_hd, > > + SizeOfUndoRecordHeader, &writeptr, endptr, > > + &ucontext->already_processed, > > + &ucontext->partial_bytes)) > > + return; > > + ucontext->stage = UNDO_PACK_STAGE_TRANSACTION; > > + /* fall through */ > > + > > I don't understand. The only purpose of this is that we can partially > write a packed-but-not-actually-packed record onto a bunch of pages? And > for that we have an endless chain of copy and pasted code calling > InsertUndoBytes()? Copying data into shared buffers in tiny increments? > > If we need to this, what is the whole packed record format good for? > Except for adding a bunch of functions with 10++ ifs and nearly > identical code? > > Copying data is expensive. Copying data in tiny increments is more > expensive. Copying data in tiny increments, with a bunch of branches, is > even more expensive. Copying data in tiny increments, with a bunch of > branches, is even more expensive, especially when it's shared > memory. Copying data in tiny increments, with a bunch of branches, is > even more expensive, especially when it's shared memory, especially when > all that shared meory is locked at once. My idea is, indeed of keeping all these fields duplicated in the context, just allocate a single memory segment equal to the expected record size (maybe the payload data can keep separate). Now, based on uur_info pack all the field of UnpackedUndoRecord in that memory segment. After that In InsertUndoData, we just need one call to InsertUndoBytes for copying complete header in one shot and another call for copying payload data. Does this sound reasonable to you? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: