Re: XLog size reductions: smaller XLRec block header for PG17 - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: XLog size reductions: smaller XLRec block header for PG17 |
Date | |
Msg-id | CAEze2Wid5x5i8Z_bj6p+AZkoUU=6V8QbBmoTtKJeKM8sse2-PQ@mail.gmail.com Whole thread Raw |
In response to | Re: XLog size reductions: smaller XLRec block header for PG17 (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: XLog size reductions: smaller XLRec block header for PG17
|
List | pgsql-hackers |
On Thu, 18 May 2023 at 18:22, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 18/05/2023 17:59, Matthias van de Meent wrote: > Perhaps we should introduce a few generic inline functions to do varint > encoding. That could be useful in many places, while this scheme is very > tailored for XLogRecordBlockHeader. I'm not sure about the reusability of such code, as not all varint encodings are made equal: Here, I chose to determine the size of the field with some bits stored in leftover bits of another field, so storing the field and size separately. But in other cases, such as UTF8's code point encoding, each byte has a carry bit indicating whether the value has more bytes to go. In even more other cases, such as sqlite's integer encoding, the value is stored in a single byte, unless that byte contains a sentinel value that indicates the number of bytes that the value continues into. What I'm trying to say is that there is no perfect encoding that is better than all others, and I picked what I thought worked best in this specific case. I think it is reasonable to expect that varint-encoding of e.g. blockNo or RelFileNode into the WAL record could want to choose a different method than the method I've chosen for the block data length. > We could replace XLogRecordDataHeaderShort and XLogRecordDataHeaderLong > with this too. With just one XLogRecordDataHeader, with a > variable-length length field. Yes, that could be used too. But that's not part of the patch right now, and I've not yet planned on implementing that for this patch. > > [0] https://wiki.postgresql.org/wiki/Updating_the_WAL_infrastructure > > Good ideas here. Eliminating the two padding bytes from XLogRecord in > particular seems like a pure win. It requires code churn and probably increases complexity, but apart from that I think 'pure win' is accurate, yes. > > PS. Benchmark results on my system (5950x with other light tasks > > running) don't show an obviously negative effect in a 10-minute run > > with these arbitrary pgbench settings on a fresh cluster with default > > configuration: > > > > ./pg_install/bin/pgbench postgres -j 2 -c 6 -T 600 -M prepared > > [...] > > master: tps = 375 > > patched: tps = 381 > > That was probably not CPU limited, so that any overhead in generating > the WAL would not show up. Try PGOPTIONS="-csynchronous_commit=off" and > pgbench -N option. And make sure the scale is large enough that there is > no lock contention. Also would be good to measure the overhead in > replaying the WAL. with assertions now disabled, and the following configuration: synchronous_commit = off fsync = off full_page_writes = off checkpoint_timeout = 1d autovacuum = off and now without assertions, I get master: tps = 3500.815859 patched: tps = 3535.188054 With autovacuum enabled it's worked similarly well, within 1% of these results. > How much space saving does this yield? No meaningful savings in the pgbench workload, mostly due to xlog record length MAXALIGNs currently not being favorable in the pgbench workload. But, record sizes have dropped by 1 or 2 bytes in several cases, as can be seen at the bottom of this mail. Kind regards, Matthias van de Meent Neon, Inc. The data: Record type, then record length averages (average aligned length between parens) for both master and patched, and the average per-record savings with this patch. | record type | master avg | patched avg | delta | delta | | | (aligned avg) | (aligned avg) | | aligned | |---------------|-----------------|-----------------|-------|---------| | BT/DEDUP | 64.00 (64.00) | 63.00 (64.00) | -1 | 0 | | BT/INS_LEAF | 81.41 (81.41) | 80.41 (81.41) | -1 | 0 | | CLOG/0PG | 30.00 (32.00) | 30.00 (32.00) | 0 | 0 | | HEAP/DEL | 54.00 (56.00) | 52.00 (56.00) | -2 | 0 | | HEAP/HOT_UPD | 72.02 (72.19) | 71.02 (72.19) | 0 | 0 | | HEAP/INS | 79.00 (80.00) | 78.00 (80.00) | -1 | 0 | | HEAP/INS+INIT | 79.00 (80.00) | 78.00 (80.00) | -1 | 0 | | HEAP/LOCK | 54.00 (56.00) | 52.00 (56.00) * | -2 | 0 | | HEAP2/MUL_INS | 85.00 (88.00) | 84.00 (88.00) * | -1 | 0 | | HEAP2/PRUNE | 65.17 (68.19) | 64.17 (68.19) | -1 | 0 | | STDBY/R_XACTS | 52.76 (56.00) | 52.21 (56.00) | -0.5 | 0 | | TX/COMMIT | 34.00 (40.00) | 34.00 (40.00) | 0 | 0 | | XLOG/CHCKPT_O | 114.00 (120.00) | 114.00 (120.00) | 0 | 0 |
pgsql-hackers by date: