Home > mailing lists

Re: [REVIEW] Re: Compression of full-page-writes - Mailing list pgsql-hackers

From	Rahila Syed
Subject	Re: [REVIEW] Re: Compression of full-page-writes
Date	August 16, 2014 12:51:32
Msg-id	CAH2L28sjkOR=Wmnybvz+0mTQR_KvR85riD4T1CYmsJzkaMzp1w@mail.gmail.com Whole thread Raw
In response to	Re: [REVIEW] Re: Compression of full-page-writes (Andres Freund <andres@2ndquadrant.com>)
Responses	Re: [REVIEW] Re: Compression of full-page-writes
List	pgsql-hackers

Tree view

>So, you're compressing backup blocks one by one. I wonder if that's the
>right idea and if we shouldn't instead compress all of them in one run to
>increase the compression ratio

Please find attached patch for compression of all blocks of a record together .

Following are the measurement results:

Benchmark:

Scale : 16
Command :java JR /home/postgres/jdbcrunner-1.2/scripts/tpcc.js -sleepTime 550,250,250,200,200

Warmup time : 1 sec
Measurement time : 900 sec
Number of tx types : 5
Number of agents : 16
Connection pool size : 16
Statement cache size : 40
Auto commit : false

Checkpoint segments:1024
Checkpoint timeout:5 mins

Compression Multiple Blocks in one run Single Block in one run

Bytes saved 0 0

OFF WAL generated 1265150984(~1265MB) 1264771760(~1265MB)

% Compression NA NA

Bytes saved 215215079 (~215MB) 285675622 (~286MB)

LZ4 WAL generated 125118783(~1251MB) 1329031918(~1329MB)

% Compression 17.2 % 21.49 %

Bytes saved 203705959 (~204MB) 271009408 (~271MB)

Snappy WAL generated 1254505415(~1254MB) 1329628352(~1330MB)

% Compression 16.23 % 20.38%

Bytes saved 155910177(~156MB) 182804997(~182MB)

pglz WAL generated 1259773129(~1260MB) 1286670317(~1287MB)

% Compression 12.37% 14.21%

As per measurement results of this benchmark, compression of multiple blocks didn't improve compression ratio over compression of single block.

LZ4 outperforms Snappy and pglz in terms of compression ratio.

Thank you,

On Fri, Jul 11, 2014 at 12:00 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-07-04 19:27:10 +0530, Rahila Syed wrote:
> + /* Allocates memory for compressed backup blocks according to the compression
> + * algorithm used.Once per session at the time of insertion of first XLOG
> + * record.
> + * This memory stays till the end of session. OOM is handled by making the
> + * code proceed without FPW compression*/
> + static char *compressed_pages[XLR_MAX_BKP_BLOCKS];
> + static bool compressed_pages_allocated = false;
> + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF &&
> + compressed_pages_allocated!= true)
> + {
> + size_t buffer_size = VARHDRSZ;
> + int j;
> + if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_SNAPPY)
> + buffer_size += snappy_max_compressed_length(BLCKSZ);
> + else if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_LZ4)
> + buffer_size += LZ4_compressBound(BLCKSZ);
> + else if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_PGLZ)
> + buffer_size += PGLZ_MAX_OUTPUT(BLCKSZ);
> + for (j = 0; j < XLR_MAX_BKP_BLOCKS; j++)
> + { compressed_pages[j] = (char *) malloc(buffer_size);
> + if(compressed_pages[j] == NULL)
> + {
> + compress_backup_block=BACKUP_BLOCK_COMPRESSION_OFF;
> + break;
> + }
> + }
> + compressed_pages_allocated = true;
> + }

Why not do this in InitXLOGAccess() or similar?

> /*
> * Make additional rdata chain entries for the backup blocks, so that we
> * don't need to special-case them in the write loop. This modifies the
> @@ -1015,11 +1048,32 @@ begin:;
> rdt->next = &(dtbuf_rdt2[i]);
> rdt = rdt->next;
>
> + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF)
> + {
> + /* Compress the backup block before including it in rdata chain */
> + rdt->data = CompressBackupBlock(page, BLCKSZ - bkpb->hole_length,
> + compressed_pages[i], &(rdt->len));
> + if (rdt->data != NULL)
> + {
> + /*
> + * write_len is the length of compressed block and its varlena
> + * header
> + */
> + write_len += rdt->len;
> + bkpb->hole_length = BLCKSZ - rdt->len;
> + /*Adding information about compression in the backup block header*/
> + bkpb->block_compression=compress_backup_block;
> + rdt->next = NULL;
> + continue;
> + }
> + }
> +

So, you're compressing backup blocks one by one. I wonder if that's the
right idea and if we shouldn't instead compress all of them in one run to
increase the compression ratio.

> +/*
> * Get a pointer to the right location in the WAL buffer containing the
> * given XLogRecPtr.
> *
> @@ -4061,6 +4174,50 @@ RestoreBackupBlockContents(XLogRecPtr lsn, BkpBlock bkpb, char *blk,
> {
> memcpy((char *) page, blk, BLCKSZ);
> }
> + /* Decompress if backup block is compressed*/
> + else if (VARATT_IS_COMPRESSED((struct varlena *) blk)
> + && bkpb.block_compression!=BACKUP_BLOCK_COMPRESSION_OFF)
> + {
> + if (bkpb.block_compression == BACKUP_BLOCK_COMPRESSION_SNAPPY)
> + {
> + int ret;
> + size_t compressed_length = VARSIZE((struct varlena *) blk) - VARHDRSZ;
> + char *compressed_data = (char *)VARDATA((struct varlena *) blk);
> + size_t s_uncompressed_length;
> +
> + ret = snappy_uncompressed_length(compressed_data,
> + compressed_length,
> + &s_uncompressed_length);
> + if (!ret)
> + elog(ERROR, "snappy: failed to determine compression length");
> + if (BLCKSZ != s_uncompressed_length)
> + elog(ERROR, "snappy: compression size mismatch %d != %zu",
> + BLCKSZ, s_uncompressed_length);
> +
> + ret = snappy_uncompress(compressed_data,
> + compressed_length,
> + page);
> + if (ret != 0)
> + elog(ERROR, "snappy: decompression failed: %d", ret);
> + }
> + else if (bkpb.block_compression == BACKUP_BLOCK_COMPRESSION_LZ4)
> + {
> + int ret;
> + size_t compressed_length = VARSIZE((struct varlena *) blk) - VARHDRSZ;
> + char *compressed_data = (char *)VARDATA((struct varlena *) blk);
> + ret = LZ4_decompress_fast(compressed_data, page,
> + BLCKSZ);
> + if (ret != compressed_length)
> + elog(ERROR, "lz4: decompression size mismatch: %d vs %zu", ret,
> + compressed_length);
> + }
> + else if (bkpb.block_compression == BACKUP_BLOCK_COMPRESSION_PGLZ)
> + {
> + pglz_decompress((PGLZ_Header *) blk, (char *) page);
> + }
> + else
> + elog(ERROR, "Wrong value for compress_backup_block GUC");
> + }
> else
> {
> memcpy((char *) page, blk, bkpb.hole_offset);

So why aren't we compressing the hole here instead of compressing the
parts that the current logic deems to be filled with important information?

> /*
> * Options for enum values stored in other modules
> */
> @@ -3498,6 +3512,16 @@ static struct config_enum ConfigureNamesEnum[] =
> NULL, NULL, NULL
> },
>
> + {
> + {"compress_backup_block", PGC_SIGHUP, WAL_SETTINGS,
> + gettext_noop("Compress backup block in WAL using specified compression algorithm."),
> + NULL
> + },
> + &compress_backup_block,
> + BACKUP_BLOCK_COMPRESSION_OFF, backup_block_compression_options,
> + NULL, NULL, NULL
> + },
> +

This should be named 'compress_full_page_writes' or so, even if a
temporary guc. There's the 'full_page_writes' guc and I see little
reaason to deviate from its name.

Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

From: Noah Misch
Date: 16 August 2014, 10:28:59
Subject: Re: Proposal to add a QNX 6.5 port to PostgreSQL

From: Fabien COELHO
Date: 16 August 2014, 15:23:53
Subject: Re: pgbench --tuple-size option

Re: [REVIEW] Re: Compression of full-page-writes - Mailing list pgsql-hackers

Attachment

Previous

Next