Re: [REVIEW] Re: Compression of full-page-writes - Mailing list pgsql-hackers
From | Rahila Syed |
---|---|
Subject | Re: [REVIEW] Re: Compression of full-page-writes |
Date | |
Msg-id | CAH2L28sjkOR=Wmnybvz+0mTQR_KvR85riD4T1CYmsJzkaMzp1w@mail.gmail.com Whole thread Raw |
In response to | Re: [REVIEW] Re: Compression of full-page-writes (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: [REVIEW] Re: Compression of full-page-writes
|
List | pgsql-hackers |
>right idea and if we shouldn't instead compress all of them in one run to
>increase the compression ratio
Benchmark:
Scale : 16
Command :java JR /home/postgres/jdbcrunner-1.2/scripts/tpcc.js -sleepTime 550,250,250,200,200
Warmup time : 1 sec
Measurement time : 900 sec
Number of tx types : 5
Number of agents : 16
Connection pool size : 16
Statement cache size : 40
Auto commit : false
Checkpoint segments:1024
Checkpoint timeout:5 mins
Compression Multiple Blocks in one run Single Block in one run
Bytes saved 0 0
OFF WAL generated 1265150984(~1265MB) 1264771760(~1265MB)
% Compression NA NA
Bytes saved 215215079 (~215MB) 285675622 (~286MB)
LZ4 WAL generated 125118783(~1251MB) 1329031918(~1329MB)
% Compression 17.2 % 21.49 %
Bytes saved 203705959 (~204MB) 271009408 (~271MB)
Snappy WAL generated 1254505415(~1254MB) 1329628352(~1330MB)
% Compression 16.23 % 20.38%
Bytes saved 155910177(~156MB) 182804997(~182MB)
pglz WAL generated 1259773129(~1260MB) 1286670317(~1287MB)
% Compression 12.37% 14.21%
As per measurement results of this benchmark, compression of multiple blocks didn't improve compression ratio over compression of single block.
LZ4 outperforms Snappy and pglz in terms of compression ratio.
Thank you,
On 2014-07-04 19:27:10 +0530, Rahila Syed wrote:> + * This memory stays till the end of session. OOM is handled by making the
> + /* Allocates memory for compressed backup blocks according to the compression
> + * algorithm used.Once per session at the time of insertion of first XLOG
> + * record.
> + * code proceed without FPW compression*/> + static char *compressed_pages[XLR_MAX_BKP_BLOCKS];
> + static bool compressed_pages_allocated = false;> + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF &&> + {
> + compressed_pages_allocated!= true)
> + size_t buffer_size = VARHDRSZ;
> + int j;> + if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_SNAPPY)> + buffer_size += snappy_max_compressed_length(BLCKSZ);
> + else if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_LZ4)
> + buffer_size += LZ4_compressBound(BLCKSZ);
> + else if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_PGLZ)
> + buffer_size += PGLZ_MAX_OUTPUT(BLCKSZ);> + for (j = 0; j < XLR_MAX_BKP_BLOCKS; j++)> + if(compressed_pages[j] == NULL)
> + { compressed_pages[j] = (char *) malloc(buffer_size);
> + {
> + compress_backup_block=BACKUP_BLOCK_COMPRESSION_OFF;
> + break;
> + }
> + }
> + compressed_pages_allocated = true;
> + }
Why not do this in InitXLOGAccess() or similar?
> /*
> * Make additional rdata chain entries for the backup blocks, so that we
> * don't need to special-case them in the write loop. This modifies the
> @@ -1015,11 +1048,32 @@ begin:;
> rdt->next = &(dtbuf_rdt2[i]);
> rdt = rdt->next;
>
> + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF)
> + {
> + /* Compress the backup block before including it in rdata chain */
> + rdt->data = CompressBackupBlock(page, BLCKSZ - bkpb->hole_length,
> + compressed_pages[i], &(rdt->len));
> + if (rdt->data != NULL)
> + {
> + /*
> + * write_len is the length of compressed block and its varlena
> + * header
> + */
> + write_len += rdt->len;
> + bkpb->hole_length = BLCKSZ - rdt->len;
> + /*Adding information about compression in the backup block header*/
> + bkpb->block_compression=compress_backup_block;
> + rdt->next = NULL;
> + continue;
> + }
> + }
> +
So, you're compressing backup blocks one by one. I wonder if that's the
right idea and if we shouldn't instead compress all of them in one run to
increase the compression ratio.
> +/*
> * Get a pointer to the right location in the WAL buffer containing the
> * given XLogRecPtr.
> *
> @@ -4061,6 +4174,50 @@ RestoreBackupBlockContents(XLogRecPtr lsn, BkpBlock bkpb, char *blk,
> {
> memcpy((char *) page, blk, BLCKSZ);> }
> + /* Decompress if backup block is compressed*/> + else if (VARATT_IS_COMPRESSED((struct varlena *) blk)> + && bkpb.block_compression!=BACKUP_BLOCK_COMPRESSION_OFF)
> + {
> + if (bkpb.block_compression == BACKUP_BLOCK_COMPRESSION_SNAPPY)> + {> + else if (bkpb.block_compression == BACKUP_BLOCK_COMPRESSION_LZ4)
> + int ret;
> + size_t compressed_length = VARSIZE((struct varlena *) blk) - VARHDRSZ;
> + char *compressed_data = (char *)VARDATA((struct varlena *) blk);
> + size_t s_uncompressed_length;
> +
> + ret = snappy_uncompressed_length(compressed_data,
> + compressed_length,
> + &s_uncompressed_length);
> + if (!ret)
> + elog(ERROR, "snappy: failed to determine compression length");
> + if (BLCKSZ != s_uncompressed_length)
> + elog(ERROR, "snappy: compression size mismatch %d != %zu",
> + BLCKSZ, s_uncompressed_length);
> +
> + ret = snappy_uncompress(compressed_data,
> + compressed_length,
> + page);
> + if (ret != 0)
> + elog(ERROR, "snappy: decompression failed: %d", ret);
> + }> + {> + ret = LZ4_decompress_fast(compressed_data, page,
> + int ret;
> + size_t compressed_length = VARSIZE((struct varlena *) blk) - VARHDRSZ;
> + char *compressed_data = (char *)VARDATA((struct varlena *) blk);
> + BLCKSZ);
> + if (ret != compressed_length)
> + elog(ERROR, "lz4: decompression size mismatch: %d vs %zu", ret,
> + compressed_length);
> + }
> + else if (bkpb.block_compression == BACKUP_BLOCK_COMPRESSION_PGLZ)
> + {
> + pglz_decompress((PGLZ_Header *) blk, (char *) page);
> + }
> + else
> + elog(ERROR, "Wrong value for compress_backup_block GUC");
> + }
> else
> {
> memcpy((char *) page, blk, bkpb.hole_offset);
So why aren't we compressing the hole here instead of compressing the
parts that the current logic deems to be filled with important information?
> /*
> * Options for enum values stored in other modules
> */
> @@ -3498,6 +3512,16 @@ static struct config_enum ConfigureNamesEnum[] =
> NULL, NULL, NULL
> },
>
> + {
> + {"compress_backup_block", PGC_SIGHUP, WAL_SETTINGS,
> + gettext_noop("Compress backup block in WAL using specified compression algorithm."),
> + NULL
> + },
> + &compress_backup_block,
> + BACKUP_BLOCK_COMPRESSION_OFF, backup_block_compression_options,
> + NULL, NULL, NULL
> + },
> +
This should be named 'compress_full_page_writes' or so, even if a
temporary guc. There's the 'full_page_writes' guc and I see little
reaason to deviate from its name.
Greetings,
Andres Freund
Attachment
pgsql-hackers by date: