Reducing size of WAL record headers - Mailing list pgsql-hackers

From Simon Riggs
Subject Reducing size of WAL record headers
Date
Msg-id CA+U5nMJKvGhBF0Zwvg0-fuLisXf+Okue7_9fxAShwmq2UBM0KA@mail.gmail.com
Whole thread Raw
Responses Re: Reducing size of WAL record headers
Re: Reducing size of WAL record headers
List pgsql-hackers
Overall, the WAL record is MAXALIGN'd, so with 8 byte alignment we
waste 4 bytes per record. Or put another way, if we could reduce
record header by 4 bytes, we would actually reduce it by 8 bytes per
record. So looking for ways to do that seems like a good idea.

The WAL record header starts with xl_tot_len, a 4 byte field. There is
also another field, xl_len. The difference is that xl_tot_len includes
the header, xl_len and any backup blocks. Since the header is fixed,
the only time xl_tot_len != SizeOfXLogRecord + xl_len is when we have
backup blocks.

We can re-arrange the record layout so that we remove xl_tot_len and
add another (maxaligned) 4 byte field (--> 8 bytes) after the record
header, xl_bkpblock_len that only exists if we have backup blocks.
This will then save 8 bytes from every record that doesn't have backup
blocks, and be the same as now with backup blocks.

The only problem is that we currently allow WAL records to be written
so that the header wraps across pages. This allows us to save space in
WAL when we have between 5 and 32 bytes spare at the end of a page. To
reduce the header size by 8 bytes we would need to ensure that the
whole header, which would now be 24 or 32 bytes, is all on one page.
My math tells me that would waste on average 12 bytes per page because
of the end-of-page wastage, but would gain 8 bytes per record when we
don't have backup blocks. My thinking is that the end of page loss
would be much reduced on average when we had backup blocks, so we
could ignore that case.

Assuming typically 100 records per page when we have no backup blocks,
this is a considerable upside. We would make gains on any page with 3
or more WAL records on it, so low downside even in worst cases. That
seems like a great break-even point for optimisation.

Since we've changed the WAL format already this release, another
change seems OK. More to the point, we can remove backup blocks in the
common case without changing WAL format, so this might be the last
time we have the chance to make this change.

Forcing the XLogRecord header to be all on one page makes the format
more robust and simplifies the code that copes with header wrapping.

The format changes would mean that its still possible to work out the
length of the WAL record precisely
= SizeOfXLogRecord + (HasBkpBlocks ? SizeOf(uint32) : 0)  + xl_len
and so would then be protected by the WAL record CRC.

Thoughts?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Index build temp files
Next
From: Bruce Momjian
Date:
Subject: Re: Feature Request: pg_replication_master()