Re: Replication identifiers, take 4 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Replication identifiers, take 4 |
Date | |
Msg-id | 20150417085451.GJ2361@alap3.anarazel.de Whole thread Raw |
In response to | Re: Replication identifiers, take 4 (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
fix xlogdump percentage display (was Re: Replication
identifiers, take 4)
Re: Replication identifiers, take 4 Re: Replication identifiers, take 4 |
List | pgsql-hackers |
On 2015-04-12 22:02:38 +0300, Heikki Linnakangas wrote: > This needs to be weighed against removing the padding bytes > altogether. Hrmpf. Says the person that used a lot of padding, without much discussion, for the WAL level infrastructure making pg_rewind more maintainable. And you deemed to be perfectly ok to use them up to avoid *increasing* the WAL size with the *additional data* (which so far nothing but pg_rewind needs in that way). While it perfectly well could have been used to shrink the WAL size to less than it now is. And that's *far*, *far* harder to back out/refactor changes than this (which are pretty localized and thus can easily be changed); to the point that I think it's infeasible to do so... If you want to shrink the WAL size, send in a patch independently. Not as a way to block somebody else implementing something. > I'm surprised there's such a big difference between the "extern" and > "padding" versions above. At a quick approximation, storing the ID as a > separate "fragment", along with XLogRecordDataHeaderShort and > XLogRecordDataHeaderLong, should add one byte of overhead plus the ID > itself. So that would be 3 extra bytes for 2-byte identifiers, or 5 bytes > for 4-byte identifiers. Does that mean that the average record length is > only about 30 bytes? Yes, nearly. That's xlogdump --stats=record from the above scenario with replication identifiers used and reusing the padding: Type N (%) Record size (%) FPI size (%) Combined size (%) ---- - --- ----------- --- -------- --- ------------- --- Transaction/COMMIT 50003 ( 16.89) 2600496 ( 23.38) 0 ( -nan) 2600496 ( 23.38) CLOG/ZEROPAGE 1 ( 0.00) 28 ( 0.00) 0 ( -nan) 28 ( 0.00) Standby/RUNNING_XACTS 5 ( 0.00) 248 ( 0.00) 0 ( -nan) 248 ( 0.00) Heap2/CLEAN 46034 ( 15.55) 1473088 ( 13.24) 0 ( -nan) 1473088 ( 13.24) Heap2/VISIBLE 2 ( 0.00) 56 ( 0.00) 0 ( -nan) 56 ( 0.00) Heap/INSERT 49682 ( 16.78) 1341414 ( 12.06) 0 ( -nan) 1341414 ( 12.06) Heap/HOT_UPDATE 150013 ( 50.67) 5700494 ( 51.24) 0 ( -nan) 5700494 ( 51.24) Heap/INPLACE 5 ( 0.00) 130 ( 0.00) 0 ( -nan) 130 ( 0.00) Heap/INSERT+INIT 318 ( 0.11) 8586 ( 0.08) 0 ( -nan) 8586 ( 0.08) Btree/VACUUM 2 ( 0.00) 56 ( 0.00) 0 ( -nan) 56 ( 0.00) -------- -------- -------- -------- Total 296065 11124596 [100.00%] 0 [0.00%] 11124596 [100% (The FPI percentage display above is arguably borked. Interesting.) So the average record size is ~37.5 bytes including the increased commit record size due to the origin information (which is the part that increases the size for that version that reuses the padding). This *most definitely* isn't representative of every workload. But it *is* *a* common type of workload. Note that --stats will *not* show the size difference in xlog records when adding data as an additional chunk vs. padding as it uses XLogRecGetDataLen() to compute the record length... That confused me for a while. > That doesn't sound right, 30 bytes is very little. Well, it's mostly HOT_UPDATES and INSERTS into not indexed tables. So that's not too surprising. Obviously that'd look different with FPIs enabled. > Perhaps the size > of the records created by pgbench happen to cross a 8-byte alignment > boundary at that point, making a big difference. In another workload, > there might be no difference at all, due to alignment. Right. > Also, you don't need to tag every record type with the replication ID. All > indexam records can skip it, for starters, since logical decoding doesn't > care about them. That should remove a fair amount of bloat. Yes. I mentioned that. It's additional complexity because now the decision has to be made at each xlog insertion callsite. Making refactoring this into a different representation a bit harder. I don't think it will make that much of a differenced in the above workload (just CLEAN will be smaller); but it clearly might in others. I've attached a rebased patch, that adds decision about origin logging to the relevant XLogInsert() callsites for "external" 2 byte identifiers and removes the pad-reusing version in the interest of moving forward. I still don't see a point in using 4 byte identifiers atm, given the above numbers that just seems like a waste for unrealistic use cases (>2^16 nodes). It's just two lines to change if we feel the need in the future. Working on fixing the issue with WAL logging of deletions and rearranging docs as Petr suggested. Not sure if the latter will really look good, but I guess we'll see ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: