Re: [REVIEW] Re: Compression of full-page-writes - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [REVIEW] Re: Compression of full-page-writes
Date
Msg-id CAB7nPqQOOzd5FLVkg-SN1cFf5Pi2ky3LTQecoBtS2Ws+jq=A2Q@mail.gmail.com
Whole thread Raw
In response to Re: [REVIEW] Re: Compression of full-page-writes  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: [REVIEW] Re: Compression of full-page-writes  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers


On Fri, Dec 26, 2014 at 4:16 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Fri, Dec 26, 2014 at 3:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> pglz_compress() and pglz_decompress() still use PGLZ_Header, so the frontend
>> which uses those functions needs to handle PGLZ_Header. But it basically should
>> be handled via the varlena macros. That is, the frontend still seems to need to
>> understand the varlena datatype. I think we should avoid that. Thought?
> Hm, yes it may be wiser to remove it and make the data passed to pglz
> for varlena 8 bytes shorter..

OK, here is the result of this work, made of 3 patches.

The first two patches move pglz stuff to src/common and make it a frontend utility entirely independent on varlena and its related metadata.
- Patch 1 is a simple move of pglz to src/common, with PGLZ_Header still present. There is nothing amazing here, and that's the broken version that has been reverted in 966115c.
- The real stuff comes with patch 2, that implements the removal of PGLZ_Header, changing the APIs of compression and decompression to pglz to not have anymore toast metadata, this metadata being now localized in tuptoaster.c. Note that this patch protects the on-disk format (tested with pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of compression and decompression look like with this patch, simply performing operations from a source to a destination:
extern int32 pglz_compress(const char *source, int32 slen, char *dest,
                          const PGLZ_Strategy *strategy);
extern int32 pglz_decompress(const char *source, char *dest,
                          int32 compressed_size, int32 raw_size);
The return value of those functions is the number of bytes written in the destination buffer, and 0 if operation failed. This is aimed to make backend as well more pluggable. The reason why patch 2 exists (it could be merged with patch 1), is to facilitate the review and the changes made to pglz to make it an entirely independent facility.

Patch 3 is the FPW compression, changed to fit with those changes. Note that as PGLZ_Header contains the raw size of the compressed data, and that it does not exist, it is necessary to store the raw length of the block image directly in the block image header with 2 additional bytes. Those 2 bytes are used only if wal_compression is set to true thanks to a boolean flag, so if wal_compression is disabled, the WAL record length is exactly the same as HEAD, and there is no penalty in the default case. Similarly to previous patches, the block image is compressed without its hole.

To finish, here are some results using the same test as here with the hack on getrusage to get the system and user CPU diff on a single backend execution:
http://www.postgresql.org/message-id/CAB7nPqSc97o-UE5paxfMUKWcxE_JioyxO1M4A0pMnmYqAnec2g@mail.gmail.com
Just as a reminder, this test generated a fixed number of FPWs on a single backend with fsync and autovacuum disabled with several values of fillfactor to see the effect of page holes.

  test   | ffactor | user_diff | system_diff | pg_size_pretty
---------+---------+-----------+-------------+----------------
 FPW on  |      50 | 48.823907 |    0.737649 | 582 MB
 FPW on  |      20 | 16.135000 |    0.764682 | 229 MB
 FPW on  |      10 |  8.521099 |    0.751947 | 116 MB
 FPW off |      50 | 29.722793 |    1.045577 | 746 MB
 FPW off |      20 | 12.673375 |    0.905422 | 293 MB
 FPW off |      10 |  6.723120 |    0.779936 | 148 MB
 HEAD    |      50 | 30.763136 |    1.129822 | 746 MB
 HEAD    |      20 | 13.340823 |    0.893365 | 293 MB
 HEAD    |      10 |  7.267311 |    0.909057 | 148 MB
(9 rows)

Results are similar to what has been measured previously, it doesn't hurt to check again, but roughly the CPU cost is balanced by the WAL record reduction. There is 0 byte of difference in term of WAL record length between HEAD this patch when wal_compression = off.

Patches, as well as the test script and the results are attached.
Regards,
--
Michael
Attachment

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: attaching a process in eclipse
Next
From: Craig Ringer
Date:
Subject: Re: attaching a process in eclipse