Re: XLog size reductions: smaller XLRec block header for PG17 - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: XLog size reductions: smaller XLRec block header for PG17
Date
Msg-id CAEze2Wid5x5i8Z_bj6p+AZkoUU=6V8QbBmoTtKJeKM8sse2-PQ@mail.gmail.com
Whole thread Raw
In response to Re: XLog size reductions: smaller XLRec block header for PG17  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: XLog size reductions: smaller XLRec block header for PG17
List pgsql-hackers
On Thu, 18 May 2023 at 18:22, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 18/05/2023 17:59, Matthias van de Meent wrote:
> Perhaps we should introduce a few generic inline functions to do varint
> encoding. That could be useful in many places, while this scheme is very
> tailored for XLogRecordBlockHeader.

I'm not sure about the reusability of such code, as not all varint
encodings are made equal:

Here, I chose to determine the size of the field with some bits stored
in leftover bits of another field, so storing the field and size
separately.
But in other cases, such as UTF8's code point encoding, each byte has
a carry bit indicating whether the value has more bytes to go.
In even more other cases, such as sqlite's integer encoding, the value
is stored in a single byte, unless that byte contains a sentinel value
that indicates the number of bytes that the value continues into.

What I'm trying to say is that there is no perfect encoding that is
better than all others, and I picked what I thought worked best in
this specific case. I think it is reasonable to expect that
varint-encoding of e.g. blockNo or RelFileNode into the WAL record
could want to choose a different method than the method I've chosen
for the block data length.

> We could replace XLogRecordDataHeaderShort and XLogRecordDataHeaderLong
> with this too. With just one XLogRecordDataHeader, with a
> variable-length length field.

Yes, that could be used too. But that's not part of the patch right
now, and I've not yet planned on implementing that for this patch.

> > [0] https://wiki.postgresql.org/wiki/Updating_the_WAL_infrastructure
>
> Good ideas here. Eliminating the two padding bytes from XLogRecord in
> particular seems like a pure win.

It requires code churn and probably increases complexity, but apart
from that I think 'pure win' is accurate, yes.

> > PS. Benchmark results on my system (5950x with other light tasks
> > running) don't show an obviously negative effect in a 10-minute run
> > with these arbitrary pgbench settings on a fresh cluster with default
> > configuration:
> >
> > ./pg_install/bin/pgbench postgres -j 2 -c 6 -T 600 -M prepared
> > [...]
> > master: tps = 375
> > patched: tps = 381
>
> That was probably not CPU limited, so that any overhead in generating
> the WAL would not show up. Try PGOPTIONS="-csynchronous_commit=off" and
> pgbench -N option. And make sure the scale is large enough that there is
> no lock contention. Also would be good to measure the overhead in
> replaying the WAL.

with assertions now disabled, and the following configuration:

synchronous_commit = off
fsync = off
full_page_writes = off
checkpoint_timeout = 1d
autovacuum = off

and now without assertions, I get
master: tps = 3500.815859
patched: tps = 3535.188054

With autovacuum enabled it's worked similarly well, within 1% of these results.

> How much space saving does this yield?

No meaningful savings in the pgbench workload, mostly due to xlog
record length MAXALIGNs currently not being favorable in the pgbench
workload. But, record sizes have dropped by 1 or 2 bytes in several
cases, as can be seen at the bottom of this mail.

Kind regards,

Matthias van de Meent
Neon, Inc.

The data: Record type, then record length averages (average aligned
length between parens) for both master and patched, and the average
per-record savings with this patch.

| record type   | master avg      | patched avg     | delta | delta   |
|               | (aligned avg)   | (aligned avg)   |       | aligned |
|---------------|-----------------|-----------------|-------|---------|
| BT/DEDUP      | 64.00 (64.00)   | 63.00 (64.00)   |    -1 |       0 |
| BT/INS_LEAF   | 81.41 (81.41)   | 80.41 (81.41)   |    -1 |       0 |
| CLOG/0PG      | 30.00 (32.00)   | 30.00 (32.00)   |     0 |       0 |
| HEAP/DEL      | 54.00 (56.00)   | 52.00 (56.00)   |    -2 |       0 |
| HEAP/HOT_UPD  | 72.02 (72.19)   | 71.02 (72.19)   |     0 |       0 |
| HEAP/INS      | 79.00 (80.00)   | 78.00 (80.00)   |    -1 |       0 |
| HEAP/INS+INIT | 79.00 (80.00)   | 78.00 (80.00)   |    -1 |       0 |
| HEAP/LOCK     | 54.00 (56.00)   | 52.00 (56.00) * |    -2 |       0 |
| HEAP2/MUL_INS | 85.00 (88.00)   | 84.00 (88.00) * |    -1 |       0 |
| HEAP2/PRUNE   | 65.17 (68.19)   | 64.17 (68.19)   |    -1 |       0 |
| STDBY/R_XACTS | 52.76 (56.00)   | 52.21 (56.00)   |  -0.5 |       0 |
| TX/COMMIT     | 34.00 (40.00)   | 34.00 (40.00)   |     0 |       0 |
| XLOG/CHCKPT_O | 114.00 (120.00) | 114.00 (120.00) |     0 |       0 |



pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: PG 16 draft release notes ready
Next
From: Stephen Frost
Date:
Subject: Re: Adding SHOW CREATE TABLE