Re: Different compression methods for FPI - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Different compression methods for FPI |
Date | |
Msg-id | YKIelANwse+k2JxT@paquier.xyz Whole thread Raw |
In response to | Re: Different compression methods for FPI (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: Different compression methods for FPI
|
List | pgsql-hackers |
On Sun, Mar 21, 2021 at 02:30:04PM -0500, Justin Pryzby wrote: > rebased to keep cfbot happy. This will run with default=zlib. I have been looking at bit at this patch set, and I think that we should do something here for 15~. First, I think that we should make more tests and pick up one compression method that could be used as an alternative to pglz, not three among zlib, LZ4 or zstd. Looking at the past archives, we could do more tests like this one: https://www.postgresql.org/message-id/CAB7nPqSc97o-UE5paxfMUKWcxE_JioyxO1M4A0pMnmYqAnec2g@mail.gmail.com The I/O should not be the bottleneck here, so on top of disabling fsync we could put the whole data folder on a ramdisk and compare the whole. The patch set does not apply, by the way, so it needs a rebase. Based on my past experiences, I'd see LZ4 as a good choice in terms of performance, speed and compression ratio, and on top of that we now have the possibility to build PG with --with-lz4 for TOAST compression so ./configure is already tweaked for it. For this patch, this is going to require a bit more in terms of library linking as the block decompression is done in xlogreader.c, so that's one thing to worry about. #define BKPIMAGE_IS_COMPRESSED 0x02 /* page image is compressed */ #define BKPIMAGE_APPLY 0x04 /* page image should be restored during * replay */ +#define BKPIMAGE_COMPRESS_METHOD1 0x08 /* bits to encode compression method */ +#define BKPIMAGE_COMPRESS_METHOD2 0x10 /* 0=pglz; 1=zlib; */ The interface used in xlogrecord.h is confusing to me with BKPIMAGE_IS_COMPRESSED, followed by extra bits set for the compression method. Wouldn't it be cleaner to have a set of BKPIMAGE_COMPRESS_XXX (XXX={lz4,zlib,etc.})? There is no even need to steal one bit for some kind of BKPIMAGE_COMPRESS_NONE as we know that the page is compressed if we know it has a method assigned, so with pglz, the default, and one extra method we need only two bits here. + { + {"wal_compression_method", PGC_SIGHUP, WAL_SETTINGS, + gettext_noop("Set the method used to compress full page images in the WAL."), + NULL + }, + &wal_compression_method, + WAL_COMPRESSION_PGLZ, wal_compression_options, + NULL, NULL, NULL + }, The interface is not quite right to me. I think that we should just change wal_compression to become an enum, with extra values for pglz and the new method. "on" would be a synonym for "pglz". +/* This is a mapping indexed by wal_compression */ +// XXX: maybe this is better done as a GUC hook to assign the 1) method; and 2) level +struct walcompression walmethods[] = { + {"pglz", WAL_COMPRESSION_PGLZ}, + {"zlib", WAL_COMPRESSION_ZLIB}, +}; Don't think you need a hook here, but zlib, or any other method which is not supported by the build, had better not be listed instead. This ensures that the value cannot be assigned if the binaries don't support that. + {"wal_compression_method", PGC_SIGHUP, WAL_SETTINGS, + gettext_noop("Set the method used to compress full page images in the WAL."), + NULL + }, + &wal_compression_method, + WAL_COMPRESSION_PGLZ, wal_compression_options, + NULL, NULL, NULL Any reason to not make that user-settable? If you merge that with wal_compression, that's not an issue. The patch set is a gathering of various things, and not only things associated to the compression method used for FPIs. my $node = get_new_node('primary'); -$node->init(allows_streaming => 1); +$node->init(); $node->start; What is the point of that in patch 0002? > Subject: [PATCH 03/12] Make sure published XIDs are persistent Patch 0003 looks unrelated to this thread. > Subject: [PATCH 04/12] wal_compression_method: default to zlib.. Patch 0004 could happen, however there are no reasons given why this is adapted. Changing the default is not going to happen for the time release where this feature is added, anyway. + default: + report_invalid_record(record, "image at %X/%X is compressed with unsupported codec, block %d (%d/%s)", + (uint32) (record->ReadRecPtr >> 32), + (uint32) record->ReadRecPtr, + block_id, + compression_method, + wal_compression_name(compression_method)); + return false; In xlogreader.c, the error message is helpful this way. However, we would not know which compression method failed if there is a decompression failure for a method supported by the build restoring this block. That would be good to add. I think that what we actually need for this thread are patches 0001, 0005 and 0006 merged together to study first the performance we have with each one of the compression methods proposed, and then let's just pick one. Reading around, zstd and zlib compresse more but take longer. LZ4 is faster than the others, but can compress less. With limited bandwidth, less data makes sense, and my guess is that most users care most about the speed of recovery if we can afford speed with an acceptable compression ratio. -- Michael
Attachment
pgsql-hackers by date: