Home > mailing lists

Re: [REVIEW] Re: Compression of full-page-writes - Mailing list pgsql-hackers

From	Rahila Syed
Subject	Re: [REVIEW] Re: Compression of full-page-writes
Date	February 23, 2015 08:28:14
Msg-id	CAH2L28shN4m65HYR9Khz=cGj0OC1O95gqRU_bi3WxxHJMHM6bA@mail.gmail.com Whole thread
In response to	Re: [REVIEW] Re: Compression of full-page-writes ("Syed, Rahila" <Rahila.Syed@nttdata.com>)
Responses	Re: [REVIEW] Re: Compression of full-page-writes
List	pgsql-hackers

Tree view

Hello,

Attached is a patch which has following changes,

As suggested above block ID in xlog structs has been replaced by chunk ID.

Chunk ID is used to distinguish between different types of xlog record fragments.

Like,
XLR_CHUNK_ID_DATA_SHORT
XLR_CHUNK_ID_DATA_LONG
XLR_CHUNK_BKP_COMPRESSED
XLR_CHUNK_BKP_WITH_HOLE

In block references, block ID follows the chunk ID. Here block ID retains its functionality.

This approach increases data by 1 byte for each block reference in an xlog record. This approach separates ID referring different fragments of xlog record from the actual block ID which is used to refer block references in xlog record.

Following are WAL numbers for each scenario,

WAL

FPW compression on 121.652 MB

FPW compression off 148.998 MB

HEAD 148.764 MB

Compression remains nearly same as before. There is some difference in WAL between HEAD and HEAD+patch+compression OFF. This difference corresponds to 1 byte increase with each block reference of xlog record.

Thank you,

Rahila Syed

On Wed, Feb 18, 2015 at 7:53 PM, Syed, Rahila <Rahila.Syed@nttdata.com> wrote:

Hello,

>I think we should change the xlog format so that the block_id (which currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the block id but something like XLR_CHUNK_ID. Which is used as is for XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to >XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the block id following the chunk id.

>Yes, that'll increase the amount of data for a backup block by 1 byte, but I think that's worth it. I'm pretty sure we will be happy about the added extensibility pretty soon.

To clarify my understanding of the above change,

Instead of a block id to reference different fragments of an xlog record , a single byte field "chunk_id" should be used. chunk_id will be same as XLR_BLOCK_ID_DATA_SHORT/LONG for main data fragments.
But for block references, it will take store following values in order to store information about the backup blocks.
#define XLR_CHUNK_BKP_COMPRESSED 0x01
#define XLR_CHUNK_BKP_WITH_HOLE 0x02
...

The new xlog format should look like follows,

Fixed-size header (XLogRecord struct)
Chunk_id(add a field before id field in XLogRecordBlockHeader struct)
XLogRecordBlockHeader
Chunk_id
XLogRecordBlockHeader
...
...
Chunk_id ( rename id field of the XLogRecordDataHeader struct)
XLogRecordDataHeader[Short|Long]
block data
block data
...
main data

I will post a patch based on this.

Thank you,
Rahila Syed

-----Original Message-----
From: Andres Freund [mailto:andres@2ndquadrant.com]
Sent: Monday, February 16, 2015 5:26 PM
To: Syed, Rahila
Cc: Michael Paquier; Fujii Masao; PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On 2015-02-16 11:30:20 +0000, Syed, Rahila wrote:
> - * As a trivial form of data compression, the XLOG code is aware that
> - * PG data pages usually contain an unused "hole" in the middle,
> which
> - * contains only zero bytes. If hole_length > 0 then we have removed
> - * such a "hole" from the stored data (and it's not counted in the
> - * XLOG record's CRC, either). Hence, the amount of block data
> actually
> - * present is BLCKSZ - hole_length bytes.
> + * Block images are able to do several types of compression:
> + * - When wal_compression is off, as a trivial form of compression,
> + the
> + * XLOG code is aware that PG data pages usually contain an unused "hole"
> + * in the middle, which contains only zero bytes. If length < BLCKSZ
> + * then we have removed such a "hole" from the stored data (and it is
> + * not counted in the XLOG record's CRC, either). Hence, the amount
> + * of block data actually present is "length" bytes. The hole "offset"
> + * on page is defined using "hole_offset".
> + * - When wal_compression is on, block images are compressed using a
> + * compression algorithm without their hole to improve compression
> + * process of the page. "length" corresponds in this case to the
> + length
> + * of the compressed block. "hole_offset" is the hole offset of the
> + page,
> + * and the length of the uncompressed block is defined by
> + "raw_length",
> + * whose data is included in the record only when compression is
> + enabled
> + * and "with_hole" is set to true, see below.
> + *
> + * "is_compressed" is used to identify if a given block image is
> + compressed
> + * or not. Maximum page size allowed on the system being 32k, the
> + hole
> + * offset cannot be more than 15-bit long so the last free bit is
> + used to
> + * store the compression state of block image. If the maximum page
> + size
> + * allowed is increased to a value higher than that, we should
> + consider
> + * increasing this structure size as well, but this would increase
> + the
> + * length of block header in WAL records with alignment.
> + *
> + * "with_hole" is used to identify the presence of a hole in a block image.
> + * As the length of a block cannot be more than 15-bit long, the
> + extra bit in
> + * the length field is used for this identification purpose. If the
> + block image
> + * has no hole, it is ensured that the raw size of a compressed block
> + image is
> + * equal to BLCKSZ, hence the contents of
> + XLogRecordBlockImageCompressionInfo
> + * are not necessary.
> */
> typedef struct XLogRecordBlockImageHeader {
> - uint16 hole_offset; /* number of bytes before "hole" */
> - uint16 hole_length; /* number of bytes in "hole" */
> + uint16 length:15, /* length of block data in record */
> + with_hole:1; /* status of hole in the block */
> +
> + uint16 hole_offset:15, /* number of bytes before "hole" */
> + is_compressed:1; /* compression status of image */
> +
> + /* Followed by the data related to compression if block is
> +compressed */
> } XLogRecordBlockImageHeader;

Yikes, this is ugly.

I think we should change the xlog format so that the block_id (which currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the block id but something like XLR_CHUNK_ID. Which is used as is for XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the block id following the chunk id.

Yes, that'll increase the amount of data for a backup block by 1 byte, but I think that's worth it. I'm pretty sure we will be happy about the added extensibility pretty soon.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

______________________________________________________________________
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Support-compression-for-full-page-writes-in-WAL_v20.patch

pgsql-hackers by date:

From: Michael Paquier
Date: 23 February 2015, 06:33:46
Subject: Re: Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

From: Rushabh Lathia
Date: 23 February 2015, 08:57:16
Subject: Re: pg_dump gets attributes from tables in extensions

Re: [REVIEW] Re: Compression of full-page-writes - Mailing list pgsql-hackers

Attachment

Previous

Next