Re: [REVIEW] Re: Compression of full-page-writes - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [REVIEW] Re: Compression of full-page-writes
Date
Msg-id 20150216115536.GG20205@awork2.anarazel.de
Whole thread Raw
In response to Re: [REVIEW] Re: Compression of full-page-writes  ("Syed, Rahila" <Rahila.Syed@nttdata.com>)
Responses Re: [REVIEW] Re: Compression of full-page-writes  ("Syed, Rahila" <Rahila.Syed@nttdata.com>)
Re: [REVIEW] Re: Compression of full-page-writes  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On 2015-02-16 11:30:20 +0000, Syed, Rahila wrote:
> - * As a trivial form of data compression, the XLOG code is aware that
> - * PG data pages usually contain an unused "hole" in the middle, which
> - * contains only zero bytes.  If hole_length > 0 then we have removed
> - * such a "hole" from the stored data (and it's not counted in the
> - * XLOG record's CRC, either).  Hence, the amount of block data actually
> - * present is BLCKSZ - hole_length bytes.
> + * Block images are able to do several types of compression:
> + * - When wal_compression is off, as a trivial form of compression, the
> + * XLOG code is aware that PG data pages usually contain an unused "hole"
> + * in the middle, which contains only zero bytes.  If length < BLCKSZ
> + * then we have removed such a "hole" from the stored data (and it is
> + * not counted in the XLOG record's CRC, either).  Hence, the amount
> + * of block data actually present is "length" bytes.  The hole "offset"
> + * on page is defined using "hole_offset".
> + * - When wal_compression is on, block images are compressed using a
> + * compression algorithm without their hole to improve compression
> + * process of the page. "length" corresponds in this case to the length
> + * of the compressed block. "hole_offset" is the hole offset of the page,
> + * and the length of the uncompressed block is defined by "raw_length",
> + * whose data is included in the record only when compression is enabled
> + * and "with_hole" is set to true, see below.
> + *
> + * "is_compressed" is used to identify if a given block image is compressed
> + * or not. Maximum page size allowed on the system being 32k, the hole
> + * offset cannot be more than 15-bit long so the last free bit is used to
> + * store the compression state of block image. If the maximum page size
> + * allowed is increased to a value higher than that, we should consider
> + * increasing this structure size as well, but this would increase the
> + * length of block header in WAL records with alignment.
> + *
> + * "with_hole" is used to identify the presence of a hole in a block image.
> + * As the length of a block cannot be more than 15-bit long, the extra bit in
> + * the length field is used for this identification purpose. If the block image
> + * has no hole, it is ensured that the raw size of a compressed block image is
> + * equal to BLCKSZ, hence the contents of XLogRecordBlockImageCompressionInfo
> + * are not necessary.
>   */
>  typedef struct XLogRecordBlockImageHeader
>  {
> -    uint16        hole_offset;    /* number of bytes before "hole" */
> -    uint16        hole_length;    /* number of bytes in "hole" */
> +    uint16    length:15,            /* length of block data in record */
> +            with_hole:1;        /* status of hole in the block */
> +
> +    uint16    hole_offset:15,    /* number of bytes before "hole" */
> +        is_compressed:1;    /* compression status of image */
> +
> +    /* Followed by the data related to compression if block is compressed */
>  } XLogRecordBlockImageHeader;

Yikes, this is ugly.

I think we should change the xlog format so that the block_id (which
currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't
the block id but something like XLR_CHUNK_ID. Which is used as is for
XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to
XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED,
XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the
block id following the chunk id.

Yes, that'll increase the amount of data for a backup block by 1 byte,
but I think that's worth it. I'm pretty sure we will be happy about the
added extensibility pretty soon.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Next
From: Andres Freund
Date:
Subject: Re: [REVIEW] Re: Compression of full-page-writes