Re: [REVIEW] Re: Compression of full-page-writes - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [REVIEW] Re: Compression of full-page-writes
Date
Msg-id CAB7nPqRF-Tdr_LWHaOfc1MdMUpmU+1cLH6vGPKC1PDseSO8aZA@mail.gmail.com
Whole thread Raw
In response to Re: [REVIEW] Re: Compression of full-page-writes  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: [REVIEW] Re: Compression of full-page-writes  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers


On Tue, Dec 16, 2014 at 8:35 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Tue, Dec 16, 2014 at 3:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, Dec 13, 2014 at 9:36 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> Something to be aware of btw is that this patch introduces an
>>> additional 8 bytes per block image in WAL as it contains additional
>>> information to control the compression. In this case this is the
>>> uint16 compress_len present in XLogRecordBlockImageHeader. In the case
>>> of the measurements done, knowing that 63638 FPWs have been written,
>>> there is a difference of a bit less than 500k in WAL between HEAD and
>>> "FPW off" in favor of HEAD. The gain with compression is welcome,
>>> still for the default there is a small price to track down if a block
>>> is compressed or not. This patch still takes advantage of it by not
>>> compressing the hole present in page and reducing CPU work a bit.
>>
>> That sounds like a pretty serious problem to me.
> OK. If that's so much a problem, I'll switch back to the version using
> 1 bit in the block header to identify if a block is compressed or not.
> This way, when switch will be off the record length will be the same
> as HEAD.
And here are attached fresh patches reducing the WAL record size to what it is in head when the compression switch is off. Looking at the logic in xlogrecord.h, the block header stores the hole length and hole offset. I changed that a bit to store the length of raw block, with hole or compressed as the 1st uint16. The second uint16 is used to store the hole offset, same as HEAD when compression switch is off. When compression is on, a special value 0xFFFF is saved (actually only filling 1 in the 16th bit is fine...). Note that this forces to fill in the hole with zeros and to compress always BLCKSZ worth of data.
Those patches pass make check-world, even WAL replay on standbys.

I have done as well measurements using this patch set, with the following things that can be noticed:
- When compression switch is off, the same quantity of WAL as HEAD is produced
- pglz is very bad at compressing page hole. I mean, really bad. Have a look at the user CPU particularly when pages are empty and you'll understand... Other compression algorithms would be better here. Tests are done with various values of fillfactor, 10 means that after the update 80% of the page is empty, at 50% the page is more or less completely full.

Here are the results, with 5 test cases:
- FPW on + 2 bytes, compression switch is on, using 2 additional bytes in block header, resulting in WAL records longer as 8 more bytes are used per block with lower CPU usage as page holes are not compressed by pglz.
- FPW off + 2 bytes, same as previous, with compression switch to on.
- FPW on + 0 bytes, compression switch to on, the same block header size as HEAD is used, at the cost of compressing page holes filled with zeros
- FPW on + 0 bytes, compression switch to off, same as previous
- HEAD, unpatched master (except with hack to calculate user and system CPU)
- Record, the record-level compression, with compression lower-bound set at 0.

=# select test || ', ffactor ' || ffactor, pg_size_pretty(post_update - pre_update), user_diff, system_diff from results;
           ?column?            | pg_size_pretty | user_diff | system_diff
-------------------------------+----------------+-----------+-------------
 FPW on + 2 bytes, ffactor 50  | 582 MB         | 42.391894 |    0.807444
 FPW on + 2 bytes, ffactor 20  | 229 MB         | 14.330304 |    0.729626
 FPW on + 2 bytes, ffactor 10  | 117 MB         |  7.335442 |    0.570996
 FPW off + 2 bytes, ffactor 50 | 746 MB         | 25.330391 |    1.248503
 FPW off + 2 bytes, ffactor 20 | 293 MB         | 10.537475 |    0.755448
 FPW off + 2 bytes, ffactor 10 | 148 MB         |  5.762775 |    0.763761
 FPW on + 0 bytes, ffactor 50  | 585 MB         | 54.115496 |    0.924891
 FPW on + 0 bytes, ffactor 20  | 234 MB         | 26.270404 |    0.755862
 FPW on + 0 bytes, ffactor 10  | 122 MB         | 19.540131 |    0.800981
 FPW off + 0 bytes, ffactor 50 | 746 MB         | 25.102241 |    1.110677
 FPW off + 0 bytes, ffactor 20 | 293 MB         |  9.889374 |    0.749884
 FPW off + 0 bytes, ffactor 10 | 148 MB         |  5.286767 |    0.682746
 HEAD, ffactor 50              | 746 MB         | 25.181729 |    1.133433
 HEAD, ffactor 20              | 293 MB         |  9.962242 |    0.765970
 HEAD, ffactor 10              | 148 MB         |  5.693426 |    0.775371
 Record, ffactor 50            | 582 MB         | 54.904374 |    0.678204
 Record, ffactor 20            | 229 MB         | 19.798268 |    0.807220
 Record, ffactor 10            | 116 MB         |  9.401877 |    0.668454
(18 rows)

Attached are as well the results of the measurements, and the test case used.
Regards,
--
Michael
Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: WALWriter active during recovery
Next
From: Alex Shulgin
Date:
Subject: Re: REVIEW: Track TRUNCATE via pgstat