Quick-and-dirty compression for WAL backup blocks - Mailing list pgsql-hackers

From Tom Lane
Subject Quick-and-dirty compression for WAL backup blocks
Date
Msg-id 23210.1117571180@sss.pgh.pa.us
Whole thread Raw
Responses Re: Quick-and-dirty compression for WAL backup blocks  (Simon Riggs <simon@2ndquadrant.com>)
Re: Quick-and-dirty compression for WAL backup blocks  (Junji TERAMOTO <teramoto.junji@lab.ntt.co.jp>)
List pgsql-hackers
It seems we are more or less agreed that 32-bit CRC ought to be enough
for WAL; and we also need to make a change to ensure that backup blocks
are positively linked to their parent WAL record, as I noted earlier
today.  So as long as we have to mess with the WAL record format, I was
wondering what else we could get done in the same change.

The TODO item that comes to mind immediately is "Compress WAL entries".
The TODO.detail file for that has a whole lot of ideas of various
(mostly high) levels of complexity, but one thing we could do fairly
trivially is to try to compress the page images that are dumped into WAL
to protect against partial-write problems.  After reviewing the old
discussion I still like the proposal I made:

> ... make the WAL writing logic aware of the layout
> of buffer pages --- specifically, to know that our pages generally
> contain an uninteresting "hole" in the middle, and not write the hole.
> Optimistically this might reduce the WAL data volume by something
> approaching 50%; though pessimistically (if most pages are near full)
> it wouldn't help much.

A more concrete version of this is: examine the page to see if the
pd_lower field is between SizeOfPageHeaderData and BLCKSZ, and if so
whether there is a run of consecutive zero bytes beginning at the
pd_lower position.  Omit any such bytes from what is written to WAL.
(This definition ensures that nothing goes wrong if the page does not
follow the normal page layout conventions: the transformation is
lossless no matter what, since we can always reconstruct the exact page
contents.)  The overhead needed is only 2 bytes to show the number of
bytes removed.

The other alternatives that were suggested included running the page
contents through the same compressor used for TOAST, and implementing
a general-purpose run-length compressor that could get rid of runs of
zeroes anywhere on the page.  However, considering that the compression
work has to be done while holding WALInsertLock, it seems to me there
is a strong premium on speed.  I think that lets out the TOAST
compressor, which isn't amazingly speedy.  (Another objection to the
TOAST compressor is that it certainly won't win on already-compressed
toasted data.)  A run-length compressor would be reasonably quick but
I think that the omit-the-middle-hole approach gets most of the possible
win with even less work.  In particular, I think it can be proven that
omit-the-hole will actually require less CPU than now, since counting
zero bytes should be strictly faster than CRC'ing bytes, and we'll be
able to save the CRC work on whatever bytes we omit.

Any objections?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: CREATE DATABASE fails when template1 being accessed ...
Next
From: Tom Lane
Date:
Subject: Re: CREATE DATABASE fails when template1 being accessed ...