Thread: Merge compact/non compact commits, make aborts dynamically sized
Hi, Right now wal_level=logical implies that the compact commit record format isn't used and similarly 2pc commits also include the non compact format of commits. In the course of the 'replication identifier' patch submitted to the current commitfest I added more information to the non compact commit record if a xinfo flag was present. I optionally adding data is a better course than the rigorous split between compact/non compact commits. In the attached patch I've merged compact/noncompact code, made aborts use similar logic to avoid including useless bytes and used both for the 2pc equivalents. To avoid using more space in the compact case the 'xinfo' field indicating the presence of further data is only included when a byte in the xl_info flag is set (similar to what heap rmgr does). That means that transactions without subtransactions and other things are four bytes smaller than before, but ones with a subtransaction but no other complications 4 byte larger. database info, nsubxacts, nrels, nmsgs et al are also only included if required. I think that's a overall win, even disregarding wal_level = logical. When compact commits were discussed in http://archives.postgresql.org/message-id/235610.92468.qm%40web29004.mail.ird.yahoo.com such a scheme was discussed, but then discarded. I think that was a mistake; the information in commit records is likely to be growing and having to decide between the compact/non compact form instead on a more granular level makes decisions harder. The information about two forms of commits already has creeped into too many places... I generally like how the patch looks, the relevant code actually looks clearer to me afterwards - especially RecordTransactionCommit() being noticeably shorter, decode.c having to care about less and that twophase.c knows a less about xact.c type records seems good. There's one bit that I'm not so sure about though: To avoid duplication I've added Parse(Commit/Abort)Record(), but unfortunately that has to be available both in front and backend code - so it's currently living in xactdesc.c. I think we can live with that, but it's certainly not pretty. I'm not going to rebase the replication identifier patch over this for now, seems premature until some feedback is in. Comments? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 02/20/2015 05:21 PM, Andres Freund wrote: > In the attached patch I've merged compact/noncompact code, made aborts > use similar logic to avoid including useless bytes and used both for the > 2pc equivalents. +1 for this approach in general. > To avoid using more space in the compact case the 'xinfo' field > indicating the presence of further data is only included when a byte in > the xl_info flag is set (similar to what heap rmgr does). That means > that transactions without subtransactions and other things are four > bytes smaller than before, but ones with a subtransaction but no other > complications 4 byte larger. database info, nsubxacts, nrels, nmsgs et > al are also only included if required. I think that's a overall win, > even disregarding wal_level = logical. xinfo is a bit underdocumented now. The comment in xl_xact_commit should mention that it's a uint32. But actually, why is it 32 bits wide? Only 6 bits are used, so it would in a single byte. I guess that's because of alignment: now that it's 32 bits wide, that ensures that all the other structs are 4 byte aligned. That should be documented. Alternatively, store them unaligned to save the few bytes and use memcpy. > There's one bit that I'm not so sure about though: To avoid duplication > I've added Parse(Commit/Abort)Record(), but unfortunately that has to be > available both in front and backend code - so it's currently living in > xactdesc.c. I think we can live with that, but it's certainly not > pretty. Yeah, that's ugly. Why does frontend code need that? The old format isn't exactly trivial for frontend code to decode either. Regarding XactEmitCommitRecord and XactEmitAbortRecord, I wonder if you could pass an xl_xact_parsed/abort_commit struct to them, instead of the individual fields? You could then also avoid the static variables inside it, passing pointers to that struct to XLogRegisterData() instead. - Heikki
On 24 February 2015 at 18:51, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 02/20/2015 05:21 PM, Andres Freund wrote: >> >> In the attached patch I've merged compact/noncompact code, made aborts >> use similar logic to avoid including useless bytes and used both for the >> 2pc equivalents. > > > +1 for this approach in general. +1 also The compact commit record idea wasn't useful enough to stick with, especially in the light of logical rep, possible future extensions to the commit record and the prevalence of subtransactions. XactEmitCommitRecord and XactEmitAbortRecord are somewhat hard to understand. More comments or clearer design please. The record format itself needs more explanation than I see in the patch. Might be worth documenting when and how those fields would be populated also (my bad). It might be useful to have the optional field descriptions be named with a _opt suffix to make it clearer. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services
Hi, On 2015-02-24 20:51:42 +0200, Heikki Linnakangas wrote: > On 02/20/2015 05:21 PM, Andres Freund wrote: > >To avoid using more space in the compact case the 'xinfo' field > >indicating the presence of further data is only included when a byte in > >the xl_info flag is set (similar to what heap rmgr does). That means > >that transactions without subtransactions and other things are four > >bytes smaller than before, but ones with a subtransaction but no other > >complications 4 byte larger. database info, nsubxacts, nrels, nmsgs et > >al are also only included if required. I think that's a overall win, > >even disregarding wal_level = logical. > > xinfo is a bit underdocumented now. Yea, the whole thing needs a bit more comments. I really wanted some feedback before sinking even more time into this... > The comment in xl_xact_commit should mention that it's a uint32. But > actually, why is it 32 bits wide? The first reason is that it's essentially just the old 'xinfo' field moved to a different place ;). I think I'll add a xl_xact_flags or so, and document it there. > Only 6 bits are used, so it would in a single byte. I guess that's > because of alignment: now that it's 32 bits wide, that ensures that > all the other structs are 4 byte aligned. That should be > documented. Alternatively, store them unaligned to save the few bytes > and use memcpy. But yes, I didn't think about changing it because of alignment. Given that commit and abort records can be fairly large, it seems better to waste three bytes in a few scenarios (not the minimal one!) than allocate another large chunk of memory for the copied data. Even in the current code there's some rather underdocumented assumptions about alignment; namely that RelFileNode+, TransactionId+ always result in a acceptable alignment for the data after them. That happens to be the case, but is worth a comment and/or a MAXALIGN. > >There's one bit that I'm not so sure about though: To avoid duplication > >I've added Parse(Commit/Abort)Record(), but unfortunately that has to be > >available both in front and backend code - so it's currently living in > >xactdesc.c. I think we can live with that, but it's certainly not > >pretty. > > Yeah, that's ugly. Why does frontend code need that? The old format > isn't exactly trivial for frontend code to decode either. pg_xlogdump outputs subxacts and such; I don't forsee other usages. Sure, we could copy the code around, but I think that's worse than having it in xactdesc.c. Needs a comment explaining why it's there if I haven't added one already. > Regarding XactEmitCommitRecord and XactEmitAbortRecord, I wonder if you > could pass an xl_xact_parsed/abort_commit struct to them, instead of the > individual fields? You could then also avoid the static variables inside it, > passing pointers to that struct to XLogRegisterData() instead. Hm, that's an idea. And rename it to xaxt_commit/abort_data? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Feb 25, 2015 at 8:10 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-02-24 20:51:42 +0200, Heikki Linnakangas wrote: >> On 02/20/2015 05:21 PM, Andres Freund wrote: >> >There's one bit that I'm not so sure about though: To avoid duplication >> >I've added Parse(Commit/Abort)Record(), but unfortunately that has to be >> >available both in front and backend code - so it's currently living in >> >xactdesc.c. I think we can live with that, but it's certainly not >> >pretty. >> >> Yeah, that's ugly. Why does frontend code need that? The old format >> isn't exactly trivial for frontend code to decode either. > > pg_xlogdump outputs subxacts and such; I don't forsee other > usages. Sure, we could copy the code around, but I think that's worse > than having it in xactdesc.c. Needs a comment explaining why it's there > if I haven't added one already. FWIW, I think they would live better in frontend code for client applications. That's a nice patch. +1 for merging them. Here are a couple of comments: +/* Parse the WAL format of a xact abort into a easier to understand format. */ +void +ParseCommitRecord(uint8 info, xl_xact_commit *xlrec, xl_xact_parsed_commit *parsed) I think that you mean here of "an xact commit", not abort. + * Emit, but don't insert, a abort record. s/a abort/an abort/ XactEmitAbortRecord has some problems with tabs replaced by 4 spaces at a couple of places. +/* free opcode 0x70 */ + +#define XLOG_XACT_OPMASK 0x70 There is a contradiction here, 0x70 is not free. Regards, -- Michael
On 2015-02-27 16:26:08 +0900, Michael Paquier wrote: > On Wed, Feb 25, 2015 at 8:10 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2015-02-24 20:51:42 +0200, Heikki Linnakangas wrote: > >> On 02/20/2015 05:21 PM, Andres Freund wrote: > >> >There's one bit that I'm not so sure about though: To avoid duplication > >> >I've added Parse(Commit/Abort)Record(), but unfortunately that has to be > >> >available both in front and backend code - so it's currently living in > >> >xactdesc.c. I think we can live with that, but it's certainly not > >> >pretty. > >> > >> Yeah, that's ugly. Why does frontend code need that? The old format > >> isn't exactly trivial for frontend code to decode either. > > > > pg_xlogdump outputs subxacts and such; I don't forsee other > > usages. Sure, we could copy the code around, but I think that's worse > > than having it in xactdesc.c. Needs a comment explaining why it's there > > if I haven't added one already. > > FWIW, I think they would live better in frontend code for client applications. What do you mean with that? You mean you'd rather see a copy in pg_xlogdump somewhere? How would you trigger that being used instead of the normal description routine? > +/* free opcode 0x70 */ > + > +#define XLOG_XACT_OPMASK 0x70 > There is a contradiction here, 0x70 is not free. The above is a mask, not an opcode - so I don't think it's a contraction. I'll expand on the commentary on the next version. Thanks for having a look! Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Feb 27, 2015 at 6:49 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-02-27 16:26:08 +0900, Michael Paquier wrote: >> On Wed, Feb 25, 2015 at 8:10 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> > On 2015-02-24 20:51:42 +0200, Heikki Linnakangas wrote: >> >> On 02/20/2015 05:21 PM, Andres Freund wrote: >> >> >There's one bit that I'm not so sure about though: To avoid duplication >> >> >I've added Parse(Commit/Abort)Record(), but unfortunately that has to be >> >> >available both in front and backend code - so it's currently living in >> >> >xactdesc.c. I think we can live with that, but it's certainly not >> >> >pretty. >> >> >> >> Yeah, that's ugly. Why does frontend code need that? The old format >> >> isn't exactly trivial for frontend code to decode either. >> > >> > pg_xlogdump outputs subxacts and such; I don't forsee other >> > usages. Sure, we could copy the code around, but I think that's worse >> > than having it in xactdesc.c. Needs a comment explaining why it's there >> > if I haven't added one already. >> >> FWIW, I think they would live better in frontend code for client applications. > > What do you mean with that? You mean you'd rather see a copy in > pg_xlogdump somewhere? How would you trigger that being used instead of > the normal description routine? No, no. I meant that it is good the way your patch does it in xactdesc.c, where both frontend and backend can reach it. -- Michael
On Fri, Feb 27, 2015 at 7:10 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > No, no. I meant that it is good the way your patch does it in > xactdesc.c, where both frontend and backend can reach it. Agreed, that seems much better than duplicating it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-02-25 12:10:42 +0100, Andres Freund wrote: > On 2015-02-24 20:51:42 +0200, Heikki Linnakangas wrote: > > Regarding XactEmitCommitRecord and XactEmitAbortRecord, I wonder if you > > could pass an xl_xact_parsed/abort_commit struct to them, instead of the > > individual fields? You could then also avoid the static variables inside it, > > passing pointers to that struct to XLogRegisterData() instead. > > Hm, that's an idea. And rename it to xaxt_commit/abort_data? So, I just tried this, and it doesn't really seem to come out as a net positive. More code at the callsites and more complex code in the *Emit* routines. It's impossible to use [FLEXIBLE_ARRAY_MEMBER] employing datatypes in xact_commit_data while emitting because there obviously are several chunks needing it. And avoid using it would make for a slightly less clear format. So unless you feel strongly about it, don't think so, I'll keep the statics, even if they're not particularly pretty. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 03/02/2015 06:51 PM, Andres Freund wrote: > On 2015-02-25 12:10:42 +0100, Andres Freund wrote: >> On 2015-02-24 20:51:42 +0200, Heikki Linnakangas wrote: >>> Regarding XactEmitCommitRecord and XactEmitAbortRecord, I wonder if you >>> could pass an xl_xact_parsed/abort_commit struct to them, instead of the >>> individual fields? You could then also avoid the static variables inside it, >>> passing pointers to that struct to XLogRegisterData() instead. >> >> Hm, that's an idea. And rename it to xaxt_commit/abort_data? > > So, I just tried this, and it doesn't really seem to come out as a net > positive. More code at the callsites and more complex code in the *Emit* > routines. It's impossible to use [FLEXIBLE_ARRAY_MEMBER] employing > datatypes in xact_commit_data while emitting because there obviously are > several chunks needing it. And avoid using it would make for a slightly > less clear format. > > So unless you feel strongly about it, don't think so, I'll keep the > statics, even if they're not particularly pretty. Come to think of it, it would be cleaner anyway to move the XLogBeginInsert() and XLogInsert() calls inside XactEmitCommitRecord. Then those structs don't need to be static either. - Heikki
On 2015-03-02 19:11:15 +0200, Heikki Linnakangas wrote: > Come to think of it, it would be cleaner anyway to move the > XLogBeginInsert() and XLogInsert() calls inside XactEmitCommitRecord. Then > those structs don't need to be static either. That was my first thought as well - but it doesn't easily work because of the way commit/abort records are embedded into twophase.c. I couldn't come with a simple way to change that, and anythign non simple imo defeats the purpose. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 03/02/2015 07:14 PM, Andres Freund wrote: > On 2015-03-02 19:11:15 +0200, Heikki Linnakangas wrote: >> Come to think of it, it would be cleaner anyway to move the >> XLogBeginInsert() and XLogInsert() calls inside XactEmitCommitRecord. Then >> those structs don't need to be static either. > > That was my first thought as well - but it doesn't easily work because > of the way commit/abort records are embedded into twophase.c. I > couldn't come with a simple way to change that, and anythign non simple > imo defeats the purpose. Pass the prepared XID as yet another argument to XactEmitCommitRecord, and have XactEmitCommitRecord emit the xl_xact_commit_prepared part of the record too. It might even make sense to handle the prepared XID like all the other optional fields and add an xinfo flag for it. - Heikki
On 2015-03-02 19:23:56 +0200, Heikki Linnakangas wrote: > On 03/02/2015 07:14 PM, Andres Freund wrote: > >On 2015-03-02 19:11:15 +0200, Heikki Linnakangas wrote: > >>Come to think of it, it would be cleaner anyway to move the > >>XLogBeginInsert() and XLogInsert() calls inside XactEmitCommitRecord. Then > >>those structs don't need to be static either. > > > >That was my first thought as well - but it doesn't easily work because > >of the way commit/abort records are embedded into twophase.c. I > >couldn't come with a simple way to change that, and anythign non simple > >imo defeats the purpose. > > Pass the prepared XID as yet another argument to XactEmitCommitRecord, and > have XactEmitCommitRecord emit the xl_xact_commit_prepared part of the > record too. It might even make sense to handle the prepared XID like all the > other optional fields and add an xinfo flag for it. That's what I mean with "non simple". Not a fan of teaching xact.c even more about twophase's dealings than it already knows. We'd have to either teach XactEmit* to insert different types of records pased a parameter or switch in xact_redo() based on the availability of the separate twophase xid via xinfo. Doesn't strike me as an improvement at all. I'd rather live with some statics in an isolated location - it's not as if we're ever going to allow two concurrent xlog inserts to happen at the same time. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2015-03-02 18:45:18 +0100, Andres Freund wrote: > On 2015-03-02 19:23:56 +0200, Heikki Linnakangas wrote: > > Pass the prepared XID as yet another argument to XactEmitCommitRecord, and > > have XactEmitCommitRecord emit the xl_xact_commit_prepared part of the > > record too. It might even make sense to handle the prepared XID like all the > > other optional fields and add an xinfo flag for it. > > That's what I mean with "non simple". Not a fan of teaching xact.c even > more about twophase's dealings than it already knows. I made an effort to show how horrible it would look like. Turns out it's not that bad ;). Far from pretty, especially in xlog.c, but acceptable. I think treating them more similar actually might open the road for reducing the difference further at some point. The attached patch does pretty much what you suggested above and tries to address all the point previously made. I plan to push this fairly soon; unless somebody has further input. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 2015-03-15 16:30:16 +0100, Andres Freund wrote: > The attached patch does pretty much what you suggested above and tries > to address all the point previously made. I plan to push this fairly > soon; unless somebody has further input. Pushed, after fixing two typos in decode.c I introduced while "cleaning up". This might not yet be perfect, but there seems little point in waiting further. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services