Thread: Re: Different compression methods for FPI

Re: Different compression methods for FPI

From
Justin Pryzby
Date:
rebased to keep cfbot happy.  This will run with default=zlib.

On Mon, Mar 15, 2021 at 01:09:18PM -0500, Justin Pryzby wrote:
> On Sun, Mar 14, 2021 at 07:31:35PM -0500, Justin Pryzby wrote:
> > On Sat, Mar 13, 2021 at 08:48:33PM +0500, Andrey Borodin wrote:
> > > > 13 марта 2021 г., в 06:28, Justin Pryzby <pryzby@telsasoft.com> написал(а):
> > > > Updated patch with a minor fix to configure.ac to avoid warnings on OSX.
> > > > And 2ndary patches from another thread to allow passing recovery tests.
> > > > Renamed to WAL_COMPRESSION_*
> > > > Split LZ4 support to a separate patch and support zstd.  These show that
> > > > changes needed for a new compression method have been minimized, although not
> > > > yet moved to a separate, abstracted compression/decompression function.
> > > 
> > > Thanks! Awesome work!
> > > 
> > > > These two patches are a prerequisite for this patch to progress:
> > > > * Run 011_crash_recovery.pl with wal_level=minimal
> > > > * Make sure published XIDs are persistent
> > > > 
> > > > I don't know if anyone will consider this patch for v14 - if not, it should be
> > > > set to v15 and revisit in a month.  
> > > 
> > > I want to note, that fixes for 011_crash_recovery.pl are not strictly necessary for this patch set.
> > > The problem in tests arises if we turn on wal_compression, absolutely independently from wal compression method.
> > > We turn on wal_compression in this test only for CI purposes.
> > 
> > I rearranged the patches to reflect this.
> > Change to zlib and zstd to level=1.
> > Add support for negative "zstd fast" levels.
> > Use correct length accounting for "hole" in LZ4 and ZSTD.
> > Fixed Solution.pm for zstd on windows.
> > Switch to zstd by default (for CI).
> > Add docs.
> 
> Changes:
> - Allocate buffer sufficient to accommodate any supported compression method;
> - Use existing info flags argument rather than adding another byte for storing
>   the compression method; this seems to be what was anticipated by commit
>   57aa5b2bb and what Michael objected to.
> 
> I think the existing structures are ugly, so maybe this suggests using a GUC
> assign hook to support arbitrary compression level, and maybe other options.

Attachment

Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Sun, Mar 21, 2021 at 02:30:04PM -0500, Justin Pryzby wrote:
> rebased to keep cfbot happy.  This will run with default=zlib.

I have been looking at bit at this patch set, and I think that we
should do something here for 15~.

First, I think that we should make more tests and pick up one
compression method that could be used as an alternative to pglz, not
three among zlib, LZ4 or zstd.  Looking at the past archives, we could
do more tests like this one:
https://www.postgresql.org/message-id/CAB7nPqSc97o-UE5paxfMUKWcxE_JioyxO1M4A0pMnmYqAnec2g@mail.gmail.com

The I/O should not be the bottleneck here, so on top of disabling
fsync we could put the whole data folder on a ramdisk and compare the
whole.  The patch set does not apply, by the way, so it needs a
rebase.

Based on my past experiences, I'd see LZ4 as a good choice in terms of
performance, speed and compression ratio, and on top of that we now
have the possibility to build PG with --with-lz4 for TOAST
compression so ./configure is already tweaked for it.  For this patch,
this is going to require a bit more in terms of library linking as the
block decompression is done in xlogreader.c, so that's one thing to
worry about.

 #define BKPIMAGE_IS_COMPRESSED     0x02    /* page image is  compressed */
 #define BKPIMAGE_APPLY     0x04    /* page image should be restored during
                                      * replay */
+#define BKPIMAGE_COMPRESS_METHOD1  0x08    /* bits to encode
compression method */
+#define BKPIMAGE_COMPRESS_METHOD2  0x10    /* 0=pglz; 1=zlib; */

The interface used in xlogrecord.h is confusing to me with
BKPIMAGE_IS_COMPRESSED, followed by extra bits set for the compression
method.  Wouldn't it be cleaner to have a set of BKPIMAGE_COMPRESS_XXX
(XXX={lz4,zlib,etc.})?  There is no even need to steal one bit for
some kind of BKPIMAGE_COMPRESS_NONE as we know that the page is
compressed if we know it has a method assigned, so with pglz, the
default, and one extra method we need only two bits here.

+   {
+       {"wal_compression_method", PGC_SIGHUP, WAL_SETTINGS,
+           gettext_noop("Set the method used to compress full page
images in the WAL."),
+           NULL
+       },
+       &wal_compression_method,
+       WAL_COMPRESSION_PGLZ, wal_compression_options,
+       NULL, NULL, NULL
+   },
The interface is not quite right to me.  I think that we should just
change wal_compression to become an enum, with extra values for pglz
and the new method.  "on" would be a synonym for "pglz".

+/* This is a mapping indexed by wal_compression */
+// XXX: maybe this is better done as a GUC hook to assign the 1)
method; and 2) level
+struct walcompression walmethods[] = {
+   {"pglz",    WAL_COMPRESSION_PGLZ},
+   {"zlib",    WAL_COMPRESSION_ZLIB},
+};
Don't think you need a hook here, but zlib, or any other method which
is not supported by the build, had better not be listed instead.  This
ensures that the value cannot be assigned if the binaries don't
support that.

+       {"wal_compression_method", PGC_SIGHUP, WAL_SETTINGS,
+           gettext_noop("Set the method used to compress full page
images in the WAL."),
+           NULL
+       },
+       &wal_compression_method,
+       WAL_COMPRESSION_PGLZ, wal_compression_options,
+       NULL, NULL, NULL
Any reason to not make that user-settable?  If you merge that with
wal_compression, that's not an issue.

The patch set is a gathering of various things, and not only things
associated to the compression method used for FPIs.

 my $node = get_new_node('primary');
-$node->init(allows_streaming => 1);
+$node->init();
 $node->start;
What is the point of that in patch 0002?

> Subject: [PATCH 03/12] Make sure published XIDs are persistent

Patch 0003 looks unrelated to this thread.

> Subject: [PATCH 04/12] wal_compression_method: default to zlib..

Patch 0004 could happen, however there are no reasons given why this
is adapted.  Changing the default is not going to happen for the time
release where this feature is added, anyway.

+       default:
+           report_invalid_record(record, "image at %X/%X is compressed with unsupported codec, block %d (%d/%s)",
+                                 (uint32) (record->ReadRecPtr >> 32),
+                                 (uint32) record->ReadRecPtr,
+                                 block_id,
+                                 compression_method,
+
wal_compression_name(compression_method));
+           return false;
In xlogreader.c, the error message is helpful this way.  However, we
would not know which compression method failed if there is a
decompression failure for a method supported by the build restoring
this block.  That would be good to add.

I think that what we actually need for this thread are patches 0001,
0005 and 0006 merged together to study first the performance we have
with each one of the compression methods proposed, and then let's just
pick one.  Reading around, zstd and zlib compresse more but take
longer.  LZ4 is faster than the others, but can compress less.
With limited bandwidth, less data makes sense, and my guess is that
most users care most about the speed of recovery if we can afford
speed with an acceptable compression ratio.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Justin Pryzby
Date:
On Mon, May 17, 2021 at 04:44:11PM +0900, Michael Paquier wrote:
> On Sun, Mar 21, 2021 at 02:30:04PM -0500, Justin Pryzby wrote:
> 
> For this patch, this is going to require a bit more in terms of library
> linking as the block decompression is done in xlogreader.c, so that's one
> thing to worry about.

I'm not sure what you mean here ?

> +   {
> +       {"wal_compression_method", PGC_SIGHUP, WAL_SETTINGS,
> +           gettext_noop("Set the method used to compress full page images in the WAL."),
> +           NULL
> +       },
> +       &wal_compression_method,
> +       WAL_COMPRESSION_PGLZ, wal_compression_options,
> +       NULL, NULL, NULL
> +   },
> The interface is not quite right to me.  I think that we should just
> change wal_compression to become an enum, with extra values for pglz
> and the new method.  "on" would be a synonym for "pglz".

Andrey gave a reason in March:

| I hope one day we will compress all WAL, not just FPIs. Advanced archive management tools already do so, why not
compressit in walwriter?
 
| When this will be implemented, we could have wal_compression = {off, fpi, all}.

> +/* This is a mapping indexed by wal_compression */
> +// XXX: maybe this is better done as a GUC hook to assign the 1)
> method; and 2) level
> +struct walcompression walmethods[] = {
> +   {"pglz",    WAL_COMPRESSION_PGLZ},
> +   {"zlib",    WAL_COMPRESSION_ZLIB},
> +};
> Don't think you need a hook here, but zlib, or any other method which
> is not supported by the build, had better not be listed instead.  This
> ensures that the value cannot be assigned if the binaries don't
> support that.

I think you're confusing the walmethods struct (which is unconditional) with
wal_compression_options, which is conditional.

> The patch set is a gathering of various things, and not only things
> associated to the compression method used for FPIs.
> What is the point of that in patch 0002?
> > Subject: [PATCH 03/12] Make sure published XIDs are persistent
> Patch 0003 looks unrelated to this thread.

..for the reason that I gave:

| And 2ndary patches from another thread to allow passing recovery tests.
|These two patches are a prerequisite for this patch to progress:
| * Run 011_crash_recovery.pl with wal_level=minimal
| * Make sure published XIDs are persistent

> > Subject: [PATCH 04/12] wal_compression_method: default to zlib..
> 
> Patch 0004 could happen, however there are no reasons given why this
> is adapted.  Changing the default is not going to happen for the time
> release where this feature is added, anyway.

From the commit message:
| this is meant to exercise the CIs, and not meant to be merged

> +       default:
> +           report_invalid_record(record, "image at %X/%X is compressed with unsupported codec, block %d (%d/%s)",
> +                                 (uint32) (record->ReadRecPtr >> 32),
> +                                 (uint32) record->ReadRecPtr,
> +                                 block_id,
> +                                 compression_method,
> +                                 wal_compression_name(compression_method));
> +           return false;
> In xlogreader.c, the error message is helpful this way.  However, we
> would not know which compression method failed if there is a
> decompression failure for a method supported by the build restoring
> this block.  That would be good to add.

I don't undersand you here - that's what wal_compression_name is for ?
2021-05-18 21:38:04.324 CDT [26984] FATAL:  unknown compression method requested: 2(lz4)

> I think that what we actually need for this thread are patches 0001,
> 0005 and 0006 merged together to study first the performance we have
> with each one of the compression methods proposed, and then let's just
> pick one.  Reading around, zstd and zlib compresse more but take
> longer.  LZ4 is faster than the others, but can compress less.
> With limited bandwidth, less data makes sense, and my guess is that
> most users care most about the speed of recovery if we can afford
> speed with an acceptable compression ratio.

I don't see why we'd add a guc for configuration compression but not include
the 30 lines of code needed to support a 3rd method that we already used by the
core server.

-- 
Justin

Attachment

Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Tue, May 18, 2021 at 10:06:18PM -0500, Justin Pryzby wrote:
> On Mon, May 17, 2021 at 04:44:11PM +0900, Michael Paquier wrote:
>> On Sun, Mar 21, 2021 at 02:30:04PM -0500, Justin Pryzby wrote:
>>
>> For this patch, this is going to require a bit more in terms of library
>> linking as the block decompression is done in xlogreader.c, so that's one
>> thing to worry about.
>
> I'm not sure what you mean here ?

I was wondering if anything else was needed in terms of linking here.

>> Don't think you need a hook here, but zlib, or any other method which
>> is not supported by the build, had better not be listed instead.  This
>> ensures that the value cannot be assigned if the binaries don't
>> support that.
>
> I think you're confusing the walmethods struct (which is unconditional) with
> wal_compression_options, which is conditional.

Indeed I was.  For the note, walmethods is not necessary either as a
new structure.  You could just store a list of strings in xlogreader.c
and make a note to keep the entries in a given order.  That's simpler
as well.

>> The patch set is a gathering of various things, and not only things
>> associated to the compression method used for FPIs.
>> What is the point of that in patch 0002?
>>> Subject: [PATCH 03/12] Make sure published XIDs are persistent
>> Patch 0003 looks unrelated to this thread.
>
> ..for the reason that I gave:
>
> | And 2ndary patches from another thread to allow passing recovery tests.
> |These two patches are a prerequisite for this patch to progress:
> | * Run 011_crash_recovery.pl with wal_level=minimal
> | * Make sure published XIDs are persistent

I still don't understand why XID consistency has anything to do with
the compression of FPIs.  There is nothing preventing the testing of
compression of FPIs, and plese note this argument:
https://www.postgresql.org/message-id/BEF3B1E0-0B31-4F05-8E0A-F681CB918626@yandex-team.ru

For example, I can just revert from my tree 0002 and 0003, and still
perform tests of the various compression methods.  I do agree that we
are going to need to do something about this problem, but let's drop
this stuff from the set of patches of this thread and just discuss
them where they are needed.


>> +   {
>> +       {"wal_compression_method", PGC_SIGHUP, WAL_SETTINGS,
>> +           gettext_noop("Set the method used to compress full page images in the WAL."),
>> +           NULL
>> +       },
>> +       &wal_compression_method,
>> +       WAL_COMPRESSION_PGLZ, wal_compression_options,
>> +       NULL, NULL, NULL
>> +   },
>> The interface is not quite right to me.  I think that we should just
>> change wal_compression to become an enum, with extra values for pglz
>> and the new method.  "on" would be a synonym for "pglz".
>
> Andrey gave a reason in March:
>
> | I hope one day we will compress all WAL, not just FPIs. Advanced archive management tools already do so, why not
compressit in walwriter? 
> | When this will be implemented, we could have wal_compression = {off, fpi, all}.
>
> [...]

> I don't see why we'd add a guc for configuration compression but not include
> the 30 lines of code needed to support a 3rd method that we already used by the
> core server.

Because that makes things confusing.  We have no idea if we'll ever
reach a point or even if it makes sense to have compression applied to
multiple parts of WAL.  So, for now, let's just use one single GUC and
be done with it.  Your argument is not tied to what's proposed on this
thread either, and could go the other way around.  If we were to
invent more compression concepts in WAL records, we could as well just
go with a new GUC that lists the parts of the WAL where compression
needs to be applied.  I'd say to keep it to a minimum for now, that's
an interface less confusing than what's proposed here.

And you have not replaced BKPIMAGE_IS_COMPRESSED by a PGLZ-equivalent,
so your patch set is eating more bits for BKPIMAGE_* than it needs
to.

By the way, it would be really useful for the user to print in
pg_waldump -b the type of compression used :)

A last point, and I think that this should be part of a study of the
choice to made for an extra compression method: there is no discussion
yet about the level of compression applied, which is something that
can be applied to zstd, lz4 or even zlib.  Perhaps there is an
argument for a different GUC controlling that, so more benchmarking
is a first step needed for this thread to move on.  Benchmarking can
happen with what's currently posted, of course.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Justin Pryzby
Date:
On Wed, May 19, 2021 at 06:31:15PM +0900, Michael Paquier wrote:
> I still don't understand why XID consistency has anything to do with
> the compression of FPIs.  There is nothing preventing the testing of
> compression of FPIs, and plese note this argument:
> https://www.postgresql.org/message-id/BEF3B1E0-0B31-4F05-8E0A-F681CB918626@yandex-team.ru
> 
> For example, I can just revert from my tree 0002 and 0003, and still
> perform tests of the various compression methods.  I do agree that we
> are going to need to do something about this problem, but let's drop
> this stuff from the set of patches of this thread and just discuss
> them where they are needed.

They are needed here - that they're included is deliberate.  Revert this and
then the tests fail.  "Make sure published XIDs are persistent"

time make -C src/test/recovery check
#   Failed test 'new xid after restart is greater'

> And you have not replaced BKPIMAGE_IS_COMPRESSED by a PGLZ-equivalent,
> so your patch set is eating more bits for BKPIMAGE_* than it needs

The goal is to support 2+ "methods" (including "none"), which takes 4 bits, so
may as well support 3 methods.

- uncompressed
- pglz
- lz4
- zlib or zstd or ??

This version:
0) repurposes the pre-existing GUC as an enum;
1) saves a bit (until zstd is included);
2) shows the compression in pg_waldump;

To support different compression levels, I think I'd change from an enum to
string and an assign hook, which sets a pair of ints.

-- 
Justin

Attachment

Re: Different compression methods for FPI

From
Dilip Kumar
Date:
On Tue, May 25, 2021 at 10:14 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

 Some comment.

+#define BKPIMAGE_COMPRESS_METHOD1 0x04 /* bits to encode compression method */
+#define BKPIMAGE_COMPRESS_METHOD2 0x08 /* 0=none, 1=pglz, 2=zlib */

Instead of using METHOD1, METHOD2, can we use the direct method name,
that will look cleaner?

+ unsigned long len_l = COMPRESS_BUFSIZE;
+ int ret;
+ ret = compress2((Bytef*)dest, &len_l, (Bytef*)source, orig_len, 1);

compress2((Bytef*)dest -> compress2((Bytef *) dest

diff --git a/src/test/recovery/t/011_crash_recovery.pl
b/src/test/recovery/t/011_crash_recovery.pl
index a26e99500b..2e7e3db639 100644
--- a/src/test/recovery/t/011_crash_recovery.pl
+++ b/src/test/recovery/t/011_crash_recovery.pl
@@ -14,7 +14,7 @@ use Config;
 plan tests => 3;

 my $node = get_new_node('primary');
-$node->init(allows_streaming => 1);
+$node->init();
 $node->start;

How this change is relevant?

+#ifdef USE_LZ4
+ case WAL_COMPRESSION_LZ4:
+ len = LZ4_compress_fast(source, dest, orig_len, COMPRESS_BUFSIZE, 1);
+ if (len == 0)
+ len = -1;
+ break;
+#endif

If we are passing acceleration as 1, then we can directly use
LZ4_compress_default, it is the same as LZ4_compress_fast with
acceleration=1.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Mon, May 24, 2021 at 11:44:45PM -0500, Justin Pryzby wrote:
> The goal is to support 2+ "methods" (including "none"), which takes 4 bits, so
> may as well support 3 methods.
>
> - uncompressed
> - pglz
> - lz4
> - zlib or zstd or ??

Let's make a proper study of all that and make a choice, the only
thing I am rather sure of is that pglz is bad compared to all the
others.  There is no point to argue as long as we don't know if any of
those candidates are suited for the job.

> This version:
> 0) repurposes the pre-existing GUC as an enum;
> 1) saves a bit (until zstd is included);
> 2) shows the compression in pg_waldump;
>
> To support different compression levels, I think I'd change from an enum to
> string and an assign hook, which sets a pair of ints.

Hmm.  I am not really sure what you mean here, but let's keep that
in mind until we get more performance numbers.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Andrey Borodin
Date:

> 25 мая 2021 г., в 12:26, Michael Paquier <michael@paquier.xyz> написал(а):
>
> On Mon, May 24, 2021 at 11:44:45PM -0500, Justin Pryzby wrote:
>> The goal is to support 2+ "methods" (including "none"), which takes 4 bits, so
>> may as well support 3 methods.
>>
>> - uncompressed
>> - pglz
>> - lz4
>> - zlib or zstd or ??
>
> Let's make a proper study of all that and make a choice, the only
> thing I am rather sure of is that pglz is bad compared to all the
> others.  There is no point to argue as long as we don't know if any of
> those candidates are suited for the job.

There's a lot of open studies like [0,1].
In short, Lz4 is fastest codec. Zstd gives better compression at cost of more CPU usage[2]. Zlib is not tied to
Facebook,however, it's slower and have smaller compression ratio than Zstd. Zstd is actively developed. 

There is Google's Brotli codec. It is comparable to Zstd.

What kind of deeper study do we want to do?
Would it make sense to run our own benchmarks?
Or, perhaps, review other codecs?

In my opinion, anything that is sent over network or written to block storage deserves Zstd-5 compression. But milage
mayvary. 

Thanks!

Best regards, Andrey Borodin.


[0]
https://indico.cern.ch/event/695984/contributions/2872933/attachments/1590457/2516802/ZSTD_and_ZLIB_Updates_-_January_20186.pdf
[1] https://facebook.github.io/zstd/
[2] Zstd gives significantly better compression at cost of little more CPU usage. But they both have points on Pareto
frontier.


Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Mon, May 31, 2021 at 12:33:44PM +0500, Andrey Borodin wrote:
> Would it make sense to run our own benchmarks?

Yes, I think that it could be a good idea to run some custom-made
benchmarks as that could mean different bottlenecks found when it
comes to PG.

There are a couple of factors that matter here:
- Is the algo available across a maximum of platforms?  ZLIB and LZ4
are everywhere and popular, for one.  And we already plug with them in
the builds.  No idea about the others but I can see quickly that Zstd
has support across many systems, and has a compatible license.
- Speed and CPU usage.  We should worry about that for CPU-bounded
environments.
- Compression ratio, which is just monitoring the difference in WAL.
- Effect of the level of compression perhaps?
- Use a fixed amount of WAL generated, meaning a set of repeatable SQL
queries, with one backend, no benchmarks like pgbench.
- Avoid any I/O bottleneck, so run tests on a tmpfs or ramfs.
- Avoid any extra WAL interference, like checkpoints, no autovacuum
running in parallel.

It is not easy to draw a straight line here, but one could easily say
that an algo that reduces a FPI by 90% costing two times more CPU
cycles is worse than something doing only a 70%~75% compression for
two times less CPU cycles if environments are easily constrained on
CPU.

As mentioned upthread, I'd recomment to design tests like this one, or
just reuse this one:
https://www.postgresql.org/message-id/CAB7nPqSc97o-UE5paxfMUKWcxE_JioyxO1M4A0pMnmYqAnec2g@mail.gmail.com

In terms of CPU usage, we should also monitor the user and system
times of the backend, and compare the various situations.  See patch
0003 posted here that we used for wal_compression:
https://www.postgresql.org/message-idCAB7nPqRC20=mKgu6d2st-e11_QqqbreZg-=SF+_UYsmvwNu42g@mail.gmail.com

This just uses getrusage() to get more stats.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Justin Pryzby
Date:
On Tue, Jun 01, 2021 at 11:06:53AM +0900, Michael Paquier wrote:
> - Speed and CPU usage.  We should worry about that for CPU-bounded
> environments.
> - Compression ratio, which is just monitoring the difference in WAL.
> - Effect of the level of compression perhaps?
> - Use a fixed amount of WAL generated, meaning a set of repeatable SQL
> queries, with one backend, no benchmarks like pgbench.
> - Avoid any I/O bottleneck, so run tests on a tmpfs or ramfs.
> - Avoid any extra WAL interference, like checkpoints, no autovacuum
> running in parallel.

I think it's more nuanced than just finding the algorithm with the least CPU
use.  The GUC is PGC_USERSET, and it's possible that a data-loading process
might want to use zlib for better compress ratio, but an interactive OLTP
process might want to use lz4 or no compression for better responsiveness.

Reducing WAL volume during loading can be important - at one site, their SAN
was too slow to keep up during their period of heaviest loading, the
checkpointer fell behind, WAL couldn't be recycled as normal, and the (local)
WAL filesystem overflowed, and then the oversized WAL then needed to be
replayed, to the slow SAN.  A large fraction of their WAL is FPI, and
compression now made this a non-issue.  We'd happily incur 2x more CPU cost if
WAL were 25% smaller.

We're not proposing to enable it by default, so the threshold doesn't have to
be "no performance regression" relative to no compression.  The feature should
provide a faster alternative to PGLZ, and also a method with better compression
ratio to improve the case of heavy WAL writes, by reducing I/O from FPI.

In a CPU-bound environment, one would just disable WAL compression, or use LZ4
if it's cheap enough.  In the IO bound case, someone might enable zlib or zstd
compression.

I found this old thread about btree performance with wal compression (+Peter,
+Andres).

https://www.postgresql.org/message-id/flat/540584F2-A554-40C1-8F59-87AF8D623BB7%40yandex-team.ru#94c0dcaa34e3170992749f6fdc8db35c

And the differences are pretty dramatic, so I ran a single test on my PC:

CREATE TABLE t AS SELECT generate_series(1,999999)a; VACUUM t;
SET wal_compression= off;
\set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal'); begin; CREATE
INDEXON t(a); rollback; SELECT * FROM pg_stat_wal;
 
Time: 1639.375 ms (00:01.639)
wal_bytes        | 20357193

pglz writes ~half as much, but takes twice as long as uncompressed:
|Time: 3362.912 ms (00:03.363)
|wal_bytes        | 11644224

zlib writes ~4x less than ncompressed, and still much faster than pglz
|Time: 2167.474 ms (00:02.167)
|wal_bytes        | 5611653

lz4 is as fast as uncompressed, and writes a bit more than pglz:
|Time: 1612.874 ms (00:01.613)
|wal_bytes        | 12397123

zstd(6) is slower than lz4, but compresses better than anything but zlib.
|Time: 1808.881 ms (00:01.809)
|wal_bytes        | 6395993

In this patch series, I added compression information to the errcontext from
xlog_block_info(), and allow specifying compression levels like zlib-2.  I'll
rearrange that commit earlier if we decide that's desirable to include.

Attachment

Re: Different compression methods for FPI

From
Andrey Borodin
Date:
Thanks for benchmarks, Justin!


> 14 июня 2021 г., в 06:24, Justin Pryzby <pryzby@telsasoft.com> написал(а):
>
> The GUC is PGC_USERSET
Oh, wow, that's neat. I did not realize that we can tune this for each individual client connection. Cool!

> pglz writes ~half as much, but takes twice as long as uncompressed:
> |Time: 3362.912 ms (00:03.363)
> |wal_bytes        | 11644224
>
> zlib writes ~4x less than ncompressed, and still much faster than pglz
> |Time: 2167.474 ms (00:02.167)
> |wal_bytes        | 5611653
>
> lz4 is as fast as uncompressed, and writes a bit more than pglz:
> |Time: 1612.874 ms (00:01.613)
> |wal_bytes        | 12397123
>
> zstd(6) is slower than lz4, but compresses better than anything but zlib.
> |Time: 1808.881 ms (00:01.809)
> |wal_bytes        | 6395993

I was wrong about zlib: it has its point on Pareto frontier. At least for this test.

Thanks!

Best regards, Andrey Borodin.


Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Sun, Jun 13, 2021 at 08:24:12PM -0500, Justin Pryzby wrote:
> I think it's more nuanced than just finding the algorithm with the least CPU
> use.  The GUC is PGC_USERSET, and it's possible that a data-loading process
> might want to use zlib for better compress ratio, but an interactive OLTP
> process might want to use lz4 or no compression for better responsiveness.

It seems to me that this should be a PGC_SUSET, at least?  We've had
our share of problems with assumptions behind data leaks depending on
data compressibility (see ssl_compression and the kind).

> In this patch series, I added compression information to the errcontext from
> xlog_block_info(), and allow specifying compression levels like zlib-2.  I'll
> rearrange that commit earlier if we decide that's desirable to include.

The compression level may be better if specified with a different
GUC.  That's less parsing to have within the GUC machinery.

So, how does the compression level influence those numbers?  The level
of compression used by LZ4 here is the fastest-CPU/least-compression,
same for zlib and zstd?  Could we get some data with getrusage()?  It
seems to me that if we can get the same amount of compression and CPU
usage just by tweaking the compression level, there is no need to
support more than one extra compression algorithm, easing the life of
packagers and users.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Andres Freund
Date:
Hi,

On 2021-06-15 09:50:41 +0900, Michael Paquier wrote:
> On Sun, Jun 13, 2021 at 08:24:12PM -0500, Justin Pryzby wrote:
> > I think it's more nuanced than just finding the algorithm with the least CPU
> > use.  The GUC is PGC_USERSET, and it's possible that a data-loading process
> > might want to use zlib for better compress ratio, but an interactive OLTP
> > process might want to use lz4 or no compression for better responsiveness.
> 
> It seems to me that this should be a PGC_SUSET, at least?  We've had
> our share of problems with assumptions behind data leaks depending on
> data compressibility (see ssl_compression and the kind).

-1.

Currently wal_compression has too much overhead for some workloads, but
not for others. Disallowing normal users to set it would break cases
where it's set for users that can tolerate the perf impact (which I have
done at least). And the scenarios where it can leak information that
couldn't otherwise be leaked already don't seem all that realistic?

Greetings,

Andres Freund



Re: Different compression methods for FPI

From
Justin Pryzby
Date:
On Tue, Jun 15, 2021 at 09:50:41AM +0900, Michael Paquier wrote:
> On Sun, Jun 13, 2021 at 08:24:12PM -0500, Justin Pryzby wrote:
> > I think it's more nuanced than just finding the algorithm with the least CPU
> > use.  The GUC is PGC_USERSET, and it's possible that a data-loading process
> > might want to use zlib for better compress ratio, but an interactive OLTP
> > process might want to use lz4 or no compression for better responsiveness.
> 
> It seems to me that this should be a PGC_SUSET, at least?  We've had
> our share of problems with assumptions behind data leaks depending on
> data compressibility (see ssl_compression and the kind).

It's USERSET following your own suggestion (which is a good suggestion):

On Mon, May 17, 2021 at 04:44:11PM +0900, Michael Paquier wrote:
> +       {"wal_compression_method", PGC_SIGHUP, WAL_SETTINGS,
> +           gettext_noop("Set the method used to compress full page images in the WAL."),
> +           NULL
> +       },
> +       &wal_compression_method,
> +       WAL_COMPRESSION_PGLZ, wal_compression_options,
> +       NULL, NULL, NULL
> Any reason to not make that user-settable?  If you merge that with
> wal_compression, that's not an issue.

I don't see how restricting it to superusers would mitigate the hazard at all:
If the local admin enables wal compression, then every user's data will be
compressed, and the degree of compression indicatates a bit about their data,
no matter whether it's pglz or lz4.

It's probably true without compression, too - the fraction of FPW might reveal
their usage patterns.

> > In this patch series, I added compression information to the errcontext from
> > xlog_block_info(), and allow specifying compression levels like zlib-2.  I'll
> > rearrange that commit earlier if we decide that's desirable to include.
> 
> The compression level may be better if specified with a different
> GUC.  That's less parsing to have within the GUC machinery.

I'm not sure about that - then there's an interdependency between GUCs.
If zlib range is 1..9, and zstd is -50..10, then you may have to set the
compression level first, to avoid an error.  I believe there's a previous
discussion about inter-dependent GUCs, and maybe a commit fixing a problem they
caused.  But I cannot find it offhand.

> seems to me that if we can get the same amount of compression and CPU
> usage just by tweaking the compression level, there is no need to
> support more than one extra compression algorithm, easing the life of
> packagers and users.

I don't think it eases it for packagers, since I anticipate the initial patch
would support {none/pglz/lz4/zlib}.  I anticipate that zstd may not be in pg15.

The goal of the patch is to give options, and the overhead of adding both zlib
and lz4 is low.  zlib gives good compression at some CPUcost and may be
preferable for (some) DWs, and lz4 is almost certainly better (than pglz) for
OLTPs.

-- 
Justin



Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Mon, Jun 14, 2021 at 08:42:08PM -0500, Justin Pryzby wrote:
> On Tue, Jun 15, 2021 at 09:50:41AM +0900, Michael Paquier wrote:
>> +       {"wal_compression_method", PGC_SIGHUP, WAL_SETTINGS,
>> +           gettext_noop("Set the method used to compress full page images in the WAL."),
>> +           NULL
>> +       },
>> +       &wal_compression_method,
>> +       WAL_COMPRESSION_PGLZ, wal_compression_options,
>> +       NULL, NULL, NULL
>> Any reason to not make that user-settable?  If you merge that with
>> wal_compression, that's not an issue.

Hmm, yeah.  This can be read as using PGC_USERSET.  With the second
part of my sentence, I think that I imply to use PGC_SUSET and be
consistent with wal_compression, but I don't recall my mood from one
month ago :)  Sorry for the confusion.

> I don't see how restricting it to superusers would mitigate the hazard at all:
> If the local admin enables wal compression, then every user's data will be
> compressed, and the degree of compression indicatates a bit about their data,
> no matter whether it's pglz or lz4.

I would vote for having some consistency with wal_compression.
Perhaps we could even revisit c2e5f4d, but I'd rather not do that.

>> The compression level may be better if specified with a different
>> GUC.  That's less parsing to have within the GUC machinery.
>
> I'm not sure about that - then there's an interdependency between GUCs.
> If zlib range is 1..9, and zstd is -50..10, then you may have to set the
> compression level first, to avoid an error.  I believe there's a previous
> discussion about inter-dependent GUCs, and maybe a commit fixing a problem they
> caused.  But I cannot find it offhand.

You cannot do cross-checks for GUCs in their assign hooks or even rely
in the order of those parameters, but you can do that in some backend
code paths.  A recent discussion on the matter is for example what led
to 79dfa8a for the GUCs controlling the min/max SSL protocols
allowed.

>> seems to me that if we can get the same amount of compression and CPU
>> usage just by tweaking the compression level, there is no need to
>> support more than one extra compression algorithm, easing the life of
>> packagers and users.
>
> I don't think it eases it for packagers, since I anticipate the initial patch
> would support {none/pglz/lz4/zlib}.  I anticipate that zstd may not be in pg15.

Yes, without zstd we have all the infra to track the dependencies.

> The goal of the patch is to give options, and the overhead of adding both zlib
> and lz4 is low.  zlib gives good compression at some CPUcost and may be
> preferable for (some) DWs, and lz4 is almost certainly better (than pglz) for
> OLTPs.

Anything will be better than pglz.  I am rather confident in that.

What I am wondering is if we need to eat more bits than necessary for
the WAL record format, because we will end up supporting it until the
end of times.  We may have twenty years from now a better solution
than what has been integrated, and we may not care much about 1 extra
byte for a WAL record at this point, or perhaps we will.  From what I
hear here, there are many cases that we may care about depending on
how much CPU one is ready to pay in order to get more compression,
knowing that there are no magic solutions for something that's cheap
in CPU with a very good compression ratio or we could just go with
that.  So it seems to me that there is still an argument for adding
only one new compression method with a good range of levels, able to
support the range of cases we'd care about:
- High compression ratio but high CPU cost.
- Low compression ratio but low CPU cost.

So we could also take a decision here based on the range of
(compression,CPU) an algorithm is able to cover.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Justin Pryzby
Date:
On Tue, Jun 15, 2021 at 11:39:24AM +0900, Michael Paquier wrote:
> On Mon, Jun 14, 2021 at 08:42:08PM -0500, Justin Pryzby wrote:
>>>> It's USERSET following your own suggestion (which is a good suggestion):
> >> +       {"wal_compression_method", PGC_SIGHUP, WAL_SETTINGS,
> >> +           gettext_noop("Set the method used to compress full page images in the WAL."),
> >> +           NULL
> >> +       },
> >> +       &wal_compression_method,
> >> +       WAL_COMPRESSION_PGLZ, wal_compression_options,
> >> +       NULL, NULL, NULL
> >> Any reason to not make that user-settable?  If you merge that with
> >> wal_compression, that's not an issue.
> 
> Hmm, yeah.  This can be read as using PGC_USERSET.  With the second
> part of my sentence, I think that I imply to use PGC_SUSET and be
> consistent with wal_compression, but I don't recall my mood from one
> month ago :)  Sorry for the confusion.

Hold on - we're all confused (and I'm to blame).  The patch is changing the
existing wal_compression GUC, rather than adding wal_compression_method.
It's still SUSET, but in earlier messages, I called it USERSET, twice.

> You cannot do cross-checks for GUCs in their assign hooks or even rely
> in the order of those parameters, but you can do that in some backend
> code paths.  A recent discussion on the matter is for example what led
> to 79dfa8a for the GUCs controlling the min/max SSL protocols
> allowed.

Thank you - this is what I was remembering.

> > The goal of the patch is to give options, and the overhead of adding both zlib
> > and lz4 is low.  zlib gives good compression at some CPUcost and may be
> > preferable for (some) DWs, and lz4 is almost certainly better (than pglz) for
> > OLTPs.
> 
> Anything will be better than pglz.  I am rather confident in that.
> 
> What I am wondering is if we need to eat more bits than necessary for
> the WAL record format, because we will end up supporting it until the
> end of times.

Why ?  This is WAL, not table data.  WAL depends on the major version, so
I think wal_compression could provide a different set of compression methods at
every major release?

Actually, I was just thinking that default yes/no/on/off stuff maybe should be
defined to mean "lz4" rather than meaning pglz for "backwards compatibility".

> hear here, there are many cases that we may care about depending on
> how much CPU one is ready to pay in order to get more compression,
> knowing that there are no magic solutions for something that's cheap
> in CPU with a very good compression ratio or we could just go with
> that.  So it seems to me that there is still an argument for adding
> only one new compression method with a good range of levels, able to
> support the range of cases we'd care about:
> - High compression ratio but high CPU cost.
> - Low compression ratio but low CPU cost.

I think zlib is too expensive and lz4 doesn't get enough compression,
so neither supports both cases.  In a sample of 1, zlib-1 is ~35% slower than
lz4 and writes half as much.

I think zstd could support both cases; however, I still see it as this patch's
job to provide options, rather to definitively conclude which compression
algorithm is going to work best for everyone's use data and application.

-- 
Justin



Re: Different compression methods for FPI

From
Heikki Linnakangas
Date:
On 15/06/2021 06:42, Justin Pryzby wrote:
> On Tue, Jun 15, 2021 at 11:39:24AM +0900, Michael Paquier wrote:
>> On Mon, Jun 14, 2021 at 08:42:08PM -0500, Justin Pryzby wrote:
>>>>> It's USERSET following your own suggestion (which is a good suggestion):
>>>> +       {"wal_compression_method", PGC_SIGHUP, WAL_SETTINGS,
>>>> +           gettext_noop("Set the method used to compress full page images in the WAL."),
>>>> +           NULL
>>>> +       },
>>>> +       &wal_compression_method,
>>>> +       WAL_COMPRESSION_PGLZ, wal_compression_options,
>>>> +       NULL, NULL, NULL
>>>> Any reason to not make that user-settable?  If you merge that with
>>>> wal_compression, that's not an issue.
>>
>> Hmm, yeah.  This can be read as using PGC_USERSET.  With the second
>> part of my sentence, I think that I imply to use PGC_SUSET and be
>> consistent with wal_compression, but I don't recall my mood from one
>> month ago :)  Sorry for the confusion.
> 
> Hold on - we're all confused (and I'm to blame).  The patch is changing the
> existing wal_compression GUC, rather than adding wal_compression_method.
> It's still SUSET, but in earlier messages, I called it USERSET, twice.

See prior discussion on the security aspect: 
https://www.postgresql.org/message-id/55269915.1000309%40iki.fi. Adding 
different compression algorithms doesn't change anything from a security 
point of view AFAICS.

>>> The goal of the patch is to give options, and the overhead of adding both zlib
>>> and lz4 is low.  zlib gives good compression at some CPUcost and may be
>>> preferable for (some) DWs, and lz4 is almost certainly better (than pglz) for
>>> OLTPs.
>>
>> Anything will be better than pglz.  I am rather confident in that.
>> What I am wondering is if we need to eat more bits than necessary for
>> the WAL record format, because we will end up supporting it until the
>> end of times.
> 
> Why ?  This is WAL, not table data.  WAL depends on the major version, so
> I think wal_compression could provide a different set of compression methods at
> every major release?
> 
> Actually, I was just thinking that default yes/no/on/off stuff maybe should be
> defined to mean "lz4" rather than meaning pglz for "backwards compatibility".

+1

- Heikki



Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Tue, Jun 15, 2021 at 08:08:54AM +0300, Heikki Linnakangas wrote:
> On 15/06/2021 06:42, Justin Pryzby wrote:
>> Why ?  This is WAL, not table data.  WAL depends on the major version, so
>> I think wal_compression could provide a different set of compression methods at
>> every major release?

That may finish by being annoying to the user, but perhaps that you
are right that we could have more flexibility here.  That does not
change the fact that we'd better choose something wisely, able to
stick around for a couple of years at least, rather than revisiting
this choice every year.

>> Actually, I was just thinking that default yes/no/on/off stuff maybe should be
>> defined to mean "lz4" rather than meaning pglz for "backwards compatibility".
>
> +1

I am not sure that we have any reasons to be that aggressive about
this one either, and this would mean that wal_compression=on implies a
different method depending on the build options.  I would just stick
with the past, careful practice that we have to use a default
backward-compatible value as default, while being able to use a new
option.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Peter Eisentraut
Date:
On 15.06.21 07:37, Michael Paquier wrote:
>>> Actually, I was just thinking that default yes/no/on/off stuff maybe should be
>>> defined to mean "lz4" rather than meaning pglz for "backwards compatibility".
>> +1
> I am not sure that we have any reasons to be that aggressive about
> this one either, and this would mean that wal_compression=on implies a
> different method depending on the build options.  I would just stick
> with the past, careful practice that we have to use a default
> backward-compatible value as default, while being able to use a new
> option.

If we think this new thing is strictly better than the old thing, then 
why not make it the default.  What would be the gain from sticking to 
the old default?

The point that the default would depend on build options is a valid one. 
  I'm not sure whether it's important enough by itself.



Re: Different compression methods for FPI

From
Justin Pryzby
Date:
On Tue, Jun 15, 2021 at 07:53:26AM +0200, Peter Eisentraut wrote:
> On 15.06.21 07:37, Michael Paquier wrote:
> > > > Actually, I was just thinking that default yes/no/on/off stuff maybe should be
> > > > defined to mean "lz4" rather than meaning pglz for "backwards compatibility".
> > > +1
> > I am not sure that we have any reasons to be that aggressive about
> > this one either, and this would mean that wal_compression=on implies a
> > different method depending on the build options.  I would just stick
> > with the past, careful practice that we have to use a default
> > backward-compatible value as default, while being able to use a new
> > option.

You're right, I hadn't though this through all the way.
There's precedent if the default is non-static (wal_sync_method).

But I think yes/on/true/1 should be a compatibility alias for a static thing,
and then the only option is pglz.

Of course, changing the default to LZ4 is still a possibility.

> If we think this new thing is strictly better than the old thing, then why
> not make it the default.  What would be the gain from sticking to the old
> default?
> 
> The point that the default would depend on build options is a valid one.
> I'm not sure whether it's important enough by itself.



Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Tue, Jun 15, 2021 at 11:14:56AM -0500, Justin Pryzby wrote:
> You're right, I hadn't though this through all the way.
> There's precedent if the default is non-static (wal_sync_method).
>
> But I think yes/on/true/1 should be a compatibility alias for a static thing,
> and then the only option is pglz.
>
> Of course, changing the default to LZ4 is still a possibility.

We have not reached yet a conclusion with the way we are going to
parameterize all that, so let's adapt depending on the feedback.  For
now, I am really interested in this patch set, so I am going to run
some tests of my own and test more around the compression levels we
have at our disposals with the proposed algos.

From I'd like us to finish with here is one new algorithm method, able
to cover a large range of cases as mentioned upthread, from
low-CPU/low-compression to high-CPU/high-compression.  It does not
seem like a good idea to be stuck with an algo that only specializes
in one or the other, for example.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Wed, Jun 16, 2021 at 09:39:57AM +0900, Michael Paquier wrote:
> From I'd like us to finish with here is one new algorithm method, able
> to cover a large range of cases as mentioned upthread, from
> low-CPU/low-compression to high-CPU/high-compression.  It does not
> seem like a good idea to be stuck with an algo that only specializes
> in one or the other, for example.

So, I have been playing with that.  And the first thing I have done
before running any benchmark was checking the logic of the patch, that
I have finished to heavily clean up.  This is still WIP (see the
various XXX), and it still includes all the compression methods we are
discussing here, but it allows to control the level of the
compression and it is in a much better shape.  So that will help.

Attached are two patches, the WIP version I have simplified (there
were many things I found confusing, from the set of header
dependencies added across the code to unnecessary code, the set of
patches in the series as mentioned upthread, etc.) that I have used
for the benchmarks.  The second patch is a tweak to grab getrusage()
stats for the lifetime of a backend.

The benchmark I have used is rather simple, as follows, with a
value of shared_buffers that allows to fit all the pages of the
relation in.  I then just mounted the instance on a tmpfs while
adapting wal_compression* for each test.  This gives a fixed amount of
FPWs generated, large enough to reduce any noise and to still allow to
any difference:
#!/bin/bash
psql <<EOF
-- Change your conf here
SET wal_compression = zstd;
SET wal_compression_level = 20;
SELECT pg_backend_pid();
DROP TABLE IF EXISTS aa, results;
CREATE TABLE aa (a int);
CREATE TABLE results (phase text, position pg_lsn);
CREATE EXTENSION IF NOT EXISTS pg_prewarm;
ALTER TABLE aa SET (FILLFACTOR = 50);
INSERT INTO results VALUES ('pre-insert', pg_current_wal_lsn());
INSERT INTO aa VALUES (generate_series(1,7000000)); -- 484MB
SELECT pg_size_pretty(pg_relation_size('aa'::regclass));
SELECT pg_prewarm('aa'::regclass);
CHECKPOINT;
INSERT INTO results VALUES ('pre-update', pg_current_wal_lsn());
UPDATE aa SET a = 7000000 + a;
CHECKPOINT;
INSERT INTO results VALUES ('post-update', pg_current_wal_lsn());
SELECT * FROM results;
EOF

The set of results, with various compression levels used gives me the
following (see also compression_results.sql attached):
       wal_compression        | user_diff  | sys_diff | rel_size |  fpi_size
------------------------------+------------+----------+----------+----------
 lz4 level=1                  |  24.219464 | 0.427996 | 429 MB   | 574 MB
 lz4 level=65535 (speed mode) |  24.154747 | 0.524067 | 429 MB   | 727 MB
 off                          |  24.069323 | 0.635622 | 429 MB   | 727 MB
 pglz                         |  36.123642 | 0.451949 | 429 MB   | 566 MB
 zlib level=1 (default)       |  27.454397 |  2.25989 | 429 MB   | 527 MB
 zlib level=9                 |  31.962234 | 2.160444 | 429 MB   | 527 MB
 zstd level=0                 |  24.766077 |  0.67174 | 429 MB   | 523 MB
 zstd level=20                | 114.429589 | 0.495938 | 429 MB   | 520 MB
 zstd level=3 (default)       |  25.218323 | 0.475974 | 429 MB   | 523 MB
(9 rows)

There are a couple of things that stand out here:
- zlib has a much higher user CPU time than zstd and lz4, so we could
just let this one go.
- Everything is better than pglz, that does not sound as a surprise.
- The level does not really influence the compression reached
-- lz4 aims at being fast, so its default is actually the best
compression it can do.  Using a much high acceleration level reduces
the effects of compression to zero.
-- zstd has a high CPU consumption at high level (level > 20 is
classified as ultra, I have not tested that), without helping much
with the amount of data compressed.

It seems to me that this would leave LZ4 or zstd as obvious choices,
and that we don't really need to care about the compression level, so
let's just stick with the defaults without any extra GUCs.  Among the
remaining two I would be tempted to choose LZ4.  That's consistent
with what toast can use now.  And, even if it is a bit worse than pglz
in terms of compression in this case, it shows a CPU usage close to
the "off" case, which is nice (sys_diff for lz4 with level=1 is a
bit suspicious by the way).  zstd has merits as well at default
level.

At the end I am not surprised by this result: LZ4 is designed to be
faster, while zstd compresses more and eats more CPU.  Modern
compression algos are nice.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Andrey Borodin
Date:

> 16 июня 2021 г., в 12:18, Michael Paquier <michael@paquier.xyz> написал(а):
>  Among the
> remaining two I would be tempted to choose LZ4.  That's consistent
> with what toast can use now.

I agree that allowing just lz4 - is already a huge step ahead.
But I'd suggest supporting zstd as well. Currently we only compress 8Kb chunks and zstd had no runaway to fully unwrap
it'spotential. 
In WAL-G we observed ~3x improvement in network utilisation when switched from lz4 to zstd in WAL archive compression.

BTW we could get rid of whole hole-in-a-page thing if we would set lz4 as default. This could simplify FPI code.

Thanks!

Best regards, Andrey Borodin.


Re: Different compression methods for FPI

From
Heikki Linnakangas
Date:
On 16/06/2021 11:17, Andrey Borodin wrote:
>> 16 июня 2021 г., в 12:18, Michael Paquier <michael@paquier.xyz> написал(а):
>>   Among the
>> remaining two I would be tempted to choose LZ4.  That's consistent
>> with what toast can use now.
> 
> I agree that allowing just lz4 - is already a huge step ahead.
> But I'd suggest supporting zstd as well. Currently we only compress 8Kb chunks and zstd had no runaway to fully
unwrapit's potential.
 
> In WAL-G we observed ~3x improvement in network utilisation when switched from lz4 to zstd in WAL archive
compression.

Hmm, do we currently compress each block in a WAL record separately, for 
records that contain multiple full-page images? That could make a big 
difference e.g. for GiST index build that WAL-logs 32 pages in each 
record. If it helps the compression, we should probably start 
WAL-logging b-tree index build in larger batches, too.

- Heikki



Re: Different compression methods for FPI

From
Andrey Borodin
Date:

> 16 июня 2021 г., в 13:49, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):
>
> Hmm, do we currently compress each block in a WAL record separately, for records that contain multiple full-page
images?That could make a big difference e.g. for GiST index build that WAL-logs 32 pages in each record. If it helps
thecompression, we should probably start WAL-logging b-tree index build in larger batches, too. 

Here's PoC for this [0]. But benchmark results are HW-dependent.

Best regards, Andrey Borodin.

https://www.postgresql.org/message-id/flat/41822E78-48EE-41AE-A89B-3CB76FF53980%40yandex-team.ru


Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Wed, Jun 16, 2021 at 11:49:51AM +0300, Heikki Linnakangas wrote:
> Hmm, do we currently compress each block in a WAL record separately, for
> records that contain multiple full-page images? That could make a big
> difference e.g. for GiST index build that WAL-logs 32 pages in each record.
> If it helps the compression, we should probably start WAL-logging b-tree
> index build in larger batches, too.

Each block is compressed alone, see XLogCompressBackupBlock() in
XLogRecordAssemble() where we loop through each block.  Compressing a
group of blocks would not be difficult (the refactoring may be
trickier than it looks) but I am wondering how we should treat the
case where we finish by not compressing a group of blocks as there is
a safety fallback to not enforce a failure if a block cannot be
compressed.  Should we move back to the compression of individual
blocks or just log all those pages uncompressed without their holes?
I really don't expect a group of blocks to not be compressed, just
being a bit paranoid here about the fallback we'd better have.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Wed, Jun 16, 2021 at 01:17:26PM +0500, Andrey Borodin wrote:
> I agree that allowing just lz4 - is already a huge step ahead.

Yeah, I am tempted to just add LZ4 as a first step as the patch
footprint would be minimal, and we could come back to zstd once we
have more feedback from the field, if that's necessary.  As said
upthread, we have more flexibility with WAL than for the relation
data.

> But I'd suggest supporting zstd as well. Currently we only compress
> 8Kb chunks and zstd had no runaway to fully unwrap it's potential.
> In WAL-G we observed ~3x improvement in network utilisation when
> switched from lz4 to zstd in WAL archive compression.

You mean full segments here, right?  This has no need to be in core,
except if we want to add more compression options to pg_receivewal and
its friends?  That would be a nice addition, saying that.

> BTW we could get rid of whole hole-in-a-page thing if we would set
> lz4 as default. This could simplify FPI code.

Why would we do that?  We still need to support pglz as fallback if a
platform does not have LZ4, no?
--
Michael

Attachment

Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Thu, Jun 17, 2021 at 10:19:47AM +0900, Michael Paquier wrote:
> Yeah, I am tempted to just add LZ4 as a first step as the patch
> footprint would be minimal, and we could come back to zstd once we
> have more feedback from the field, if that's necessary.  As said
> upthread, we have more flexibility with WAL than for the relation
> data.

I have worked more on that today and finished with two patches:
- 0001 is the mininal patch to add support for LZ4.  This is in a
rather committable shape.  I noticed that we checked for an incorrect
error code in the compression and decompression paths as LZ4 APIs can
return a negative result.  There were also some extra bugs I spotted.
Its size is satisfying for what it does, and there is MSVC support
out-of-the-box:
 12 files changed, 176 insertions(+), 48 deletions(-)
- 0002 is the extra code need to add ZSTD and do the same.  This still
requires support for MSVC and I have not checked the internals of ZSTD
to see if we do the compress/decompress calls the right way.

While on it, I am going to switch my buildfarm animal to use LZ4 for
toast..  Just saying.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Andrey Borodin
Date:

> 17 июня 2021 г., в 06:19, Michael Paquier <michael@paquier.xyz> написал(а):
>
> On Wed, Jun 16, 2021 at 01:17:26PM +0500, Andrey Borodin wrote:
>> I agree that allowing just lz4 - is already a huge step ahead.
>
> Yeah, I am tempted to just add LZ4 as a first step as the patch
> footprint would be minimal, and we could come back to zstd once we
> have more feedback from the field, if that's necessary.  As said
> upthread, we have more flexibility with WAL than for the relation
> data.
>
>> But I'd suggest supporting zstd as well. Currently we only compress
>> 8Kb chunks and zstd had no runaway to fully unwrap it's potential.
>> In WAL-G we observed ~3x improvement in network utilisation when
>> switched from lz4 to zstd in WAL archive compression.
>
> You mean full segments here, right?  This has no need to be in core,
> except if we want to add more compression options to pg_receivewal and
> its friends?  That would be a nice addition, saying that.
Konstantin, Daniil and Justin are working on compressing libpq [0]. That would make walsender compress WAL
automatically.
And we (at least I and Dan) are inclined to work on compressing on-disk WAL as soon as libpq compression will be
committed.

Zstd is much better at compressing long data sequences than lz4. I'm sure we will need such codec in core one day.


>> BTW we could get rid of whole hole-in-a-page thing if we would set
>> lz4 as default. This could simplify FPI code.
>
> Why would we do that?  We still need to support pglz as fallback if a
> platform does not have LZ4, no?
Because compressing sequence of zeroes is cheap, even for pglz. But we still need to support 'no compression at all',
thismode benefits from hole-in-a-page a lot. Until we send and store WAL uncompressed, of cause. 

Thanks!


Best regards, Andrey Borodin.

[0]https://www.postgresql.org/message-id/flat/aad16e41-b3f9-e89d-fa57-fb4c694bec25%40postgrespro.ru





Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Thu, Jun 17, 2021 at 11:45:37AM +0500, Andrey Borodin wrote:
> Konstantin, Daniil and Justin are working on compressing libpq
> [0]. That would make walsender compress WAL automatically.
> And we (at least I and Dan) are inclined to work on compressing
> on-disk WAL as soon as libpq compression will be committed.

What's the relationship between libpq and WAL?  Just the addition of
zstd in the existing dependency chain?

> Zstd is much better at compressing long data sequences than lz4.

That's my impression.

> I'm sure we will need such codec in core one day.
>
> Because compressing sequence of zeroes is cheap, even for pglz. But
> we still need to support 'no compression at all', this mode benefits
> from hole-in-a-page a lot. Until we send and store WAL uncompressed,
> of cause.

I am not sure, but surely this will come up in future discussions as a
separate problem.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Thu, Jun 17, 2021 at 03:44:26PM +0900, Michael Paquier wrote:
> I have worked more on that today and finished with two patches:
> - 0001 is the mininal patch to add support for LZ4.  This is in a
> rather committable shape.  I noticed that we checked for an incorrect
> error code in the compression and decompression paths as LZ4 APIs can
> return a negative result.  There were also some extra bugs I spotted.
> Its size is satisfying for what it does, and there is MSVC support
> out-of-the-box:
>  12 files changed, 176 insertions(+), 48 deletions(-)
> - 0002 is the extra code need to add ZSTD and do the same.  This still
> requires support for MSVC and I have not checked the internals of ZSTD
> to see if we do the compress/decompress calls the right way.
>
> While on it, I am going to switch my buildfarm animal to use LZ4 for
> toast..  Just saying.

And I forgot to attach these.  (Thanks Andrey!)
--
Michael

Attachment

Re: Different compression methods for FPI

From
Heikki Linnakangas
Date:
On 17/06/2021 04:12, Michael Paquier wrote:
> On Wed, Jun 16, 2021 at 11:49:51AM +0300, Heikki Linnakangas wrote:
>> Hmm, do we currently compress each block in a WAL record separately, for
>> records that contain multiple full-page images? That could make a big
>> difference e.g. for GiST index build that WAL-logs 32 pages in each record.
>> If it helps the compression, we should probably start WAL-logging b-tree
>> index build in larger batches, too.
> 
> Each block is compressed alone, see XLogCompressBackupBlock() in
> XLogRecordAssemble() where we loop through each block.  Compressing a
> group of blocks would not be difficult (the refactoring may be
> trickier than it looks) but I am wondering how we should treat the
> case where we finish by not compressing a group of blocks as there is
> a safety fallback to not enforce a failure if a block cannot be
> compressed.  Should we move back to the compression of individual
> blocks or just log all those pages uncompressed without their holes?

Just log all the pages uncompressed in that case. If you didn't save any 
bytes by compressing the pages together, surely compressing them one by 
one would be even worse.

> I really don't expect a group of blocks to not be compressed, just
> being a bit paranoid here about the fallback we'd better have.

Yeah, there will inevitably be some common bytes in the page and tuple 
headers, if you compress multiple pages together. But I don't think the 
fallback is that important anyway. Even in the worst case, the 
compressed image of something uncompressible should not be much larger 
than the original.

- Heikki



Re: Different compression methods for FPI

From
Andrey Borodin
Date:

> 17 июня 2021 г., в 11:57, Michael Paquier <michael@paquier.xyz> написал(а):
>
> On Thu, Jun 17, 2021 at 11:45:37AM +0500, Andrey Borodin wrote:
>> Konstantin, Daniil and Justin are working on compressing libpq
>> [0]. That would make walsender compress WAL automatically.
>> And we (at least I and Dan) are inclined to work on compressing
>> on-disk WAL as soon as libpq compression will be committed.
>
> What's the relationship between libpq and WAL?
walsender transmit WAL over regular protocol. Compressing libpq leads to huge decrease of cross-datacenter traffic of
HAclusters. 
In fact that's the reason why Daniil is working on libpq compression [0]. But that's matter of other thread.

Best regards, Andrey Borodin.

[0]
https://www.postgresql.org/message-id/flat/161609580905.28624.5304095609680400810.pgcf%40coridan.postgresql.org#be6bc3ba77ff8a293b1816f4841c59ef


Re: Different compression methods for FPI

From
Andrey Borodin
Date:

> 17 июня 2021 г., в 11:44, Michael Paquier <michael@paquier.xyz> написал(а):
>
> I have worked more on that today and finished with two patches

I have some small questions.

1.
+            report_invalid_record(record, "image at %X/%X compressed with %s not supported, block %d",
+                                  (uint32) (record->ReadRecPtr >> 32),
+                                  (uint32) record->ReadRecPtr,
+                                  "lz4",
+                                  block_id);
Can we point out to user that the problem is in the build? Also, maybe %s can be inlined to lz4 in this case.

2.
> const char *method = "???";
Maybe we can use something like "unknown" for unknown compression methods? Or is it too long string for waldump output?

3. Can we exclude lz4 from config if it's not supported by the build?

Thanks!

Best regards, Andrey Borodin.




Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Sun, Jun 20, 2021 at 11:15:08PM +0500, Andrey Borodin wrote:
> I have some small questions.
>
> 1.
> +            report_invalid_record(record, "image at %X/%X compressed with %s not supported, block %d",
> +                                  (uint32) (record->ReadRecPtr >> 32),
> +                                  (uint32) record->ReadRecPtr,
> +                                  "lz4",
> +                                  block_id);
> Can we point out to user that the problem is in the build?

What about the following error then?  Say:
"image at %X/%X compressed with LZ4 not supported by build, block
%d".

> Also, maybe %s can be inlined to lz4 in this case.

I think that's a remnant of the zstd part of the patch set, where I
wanted to have only one translatable message.  Sure, I can align lz4
with the message.

> 2.
> > const char *method = "???";
> Maybe we can use something like "unknown" for unknown compression
> methods? Or is it too long string for waldump output?

A couple of extra bytes for pg_waldump will not matter much.  Using
"unknown" is fine by me.

> 3. Can we exclude lz4 from config if it's not supported by the build?

Enforcing the absence of this value at GUC level is enough IMO:
+static const struct config_enum_entry wal_compression_options[] = {
+   {"pglz", WAL_COMPRESSION_PGLZ, false},
+#ifdef USE_LZ4
+   {"lz4", WAL_COMPRESSION_LZ4, false},
+#endif
[...]
+typedef enum WalCompression
+{
+   WAL_COMPRESSION_NONE = 0,
+   WAL_COMPRESSION_PGLZ,
+   WAL_COMPRESSION_LZ4
+} WalCompression;

It is not possible to set the GUC, still it is listed in the enum that
allows us to track it.  That's the same thing as
default_toast_compression with its ToastCompressionId and its
default_toast_compression_options.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Justin Pryzby
Date:
On Tue, Jun 22, 2021 at 09:11:26AM +0900, Michael Paquier wrote:
> On Sun, Jun 20, 2021 at 11:15:08PM +0500, Andrey Borodin wrote:
> > I have some small questions.
> > 
> > 1.
> > +            report_invalid_record(record, "image at %X/%X compressed with %s not supported, block %d",
> > +                                  (uint32) (record->ReadRecPtr >> 32),
> > +                                  (uint32) record->ReadRecPtr,
> > +                                  "lz4",
> > +                                  block_id);
> > Can we point out to user that the problem is in the build?
> 
> What about the following error then?  Say:
> "image at %X/%X compressed with LZ4 not supported by build, block
> %d".

The two similar, existing messages are:

+#define NO_LZ4_SUPPORT() \
+       ereport(ERROR, \
+                       (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+                        errmsg("unsupported LZ4 compression method"), \
+                        errdetail("This functionality requires the server to be built with lz4 support."), \
+                        errhint("You need to rebuild PostgreSQL using --with-lz4.")))

src/bin/pg_dump/pg_backup_archiver.c:                           fatal("cannot restore from compressed archive
(compressionnot supported in this installation)");
 
src/bin/pg_dump/pg_backup_archiver.c:           pg_log_warning("archive is compressed, but this installation does not
supportcompression -- no data will be available");
 
src/bin/pg_dump/pg_dump.c:              pg_log_warning("requested compression not available in this installation --
archivewill be uncompressed");
 

-- 
Justin



Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Mon, Jun 21, 2021 at 07:19:27PM -0500, Justin Pryzby wrote:
> The two similar, existing messages are:
>
> +#define NO_LZ4_SUPPORT() \
> +       ereport(ERROR, \
> +                       (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
> +                        errmsg("unsupported LZ4 compression method"), \
> +                        errdetail("This functionality requires the server to be built with lz4 support."), \
> +                        errhint("You need to rebuild PostgreSQL using --with-lz4.")))
>
> src/bin/pg_dump/pg_backup_archiver.c:                           fatal("cannot restore from compressed archive
(compressionnot supported in this installation)"); 
> src/bin/pg_dump/pg_backup_archiver.c:           pg_log_warning("archive is compressed, but this installation does not
supportcompression -- no data will be available"); 
> src/bin/pg_dump/pg_dump.c:              pg_log_warning("requested compression not available in this installation --
archivewill be uncompressed"); 

The difference between the first message and the rest is that the
backend has much more room in terms of error verbosity while
xlogreader.c needs to worry also about the frontend.  In this case, we
need to worry about the block involved and its LSN.  Perhaps you have
a suggestion?
--
Michael

Attachment

Re: Different compression methods for FPI

From
Justin Pryzby
Date:
On Tue, May 25, 2021 at 12:05:19PM +0530, Dilip Kumar wrote:
> +++ b/src/test/recovery/t/011_crash_recovery.pl
> @@ -14,7 +14,7 @@ use Config;
>  plan tests => 3;
> 
>  my $node = get_new_node('primary');
> -$node->init(allows_streaming => 1);
> +$node->init();
>  $node->start;
> 
> How this change is relevant?

It's necessary for the tests to pass - see the prior discussions.
Revert them and the tests fail.

time make -C src/test/recovery check
#   Failed test 'new xid after restart is greater'

@Michael: I assume that if you merge this patch, you'd set your animals to use
wal_compression=lz4, and then they would fail the recovery tests.  So the
patches that you say are unrelated still seem to me to be a prerequisite.

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
                                                                                                          
 
Subject: [PATCH v8 2/9] Run 011_crash_recovery.pl with wal_level=minimal
                                                                                                          
 

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
                                                                                                          
 
Subject: [PATCH v8 3/9] Make sure published XIDs are persistent
                                                                                                          
 

+/* compression methods supported */
+#define BKPIMAGE_COMPRESS_PGLZ 0x04
+#define BKPIMAGE_COMPRESS_ZLIB 0x08
+#define BKPIMAGE_COMPRESS_LZ4  0x10
+#define BKPIMAGE_COMPRESS_ZSTD 0x20
+#define        BKPIMAGE_IS_COMPRESSED(info) \
+       ((info & (BKPIMAGE_COMPRESS_PGLZ | BKPIMAGE_COMPRESS_ZLIB | \
+                         BKPIMAGE_COMPRESS_LZ4 | BKPIMAGE_COMPRESS_ZSTD)) != 0)

You encouraged saving bits here, so I'm surprised to see that your patches
use one bit per compression method: 2 bits to support no/pglz/lz4, 3 to add
zstd, and the previous patch used 4 bits to also support zlib.

There are spare bits available for that, but now there can be an inconsistency
if two bits are set.  Also, 2 bits could support 4 methods (including "no").

-- 
Justin



Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Mon, Jun 21, 2021 at 10:13:58PM -0500, Justin Pryzby wrote:
> @Michael: I assume that if you merge this patch, you'd set your animals to use
> wal_compression=lz4, and then they would fail the recovery tests.

Yes, I'd like to do that on my animal dangomushi.

> So the patches that you say are unrelated still seem to me to be a
> prerequisite .

I may be missing something, of course, but I still don't see why
that's necessary?  We don't get any test failures on HEAD by switching
wal_compression to on, no?  That's easy enough to test with a two-line
change that changes the default.

> +/* compression methods supported */
> +#define BKPIMAGE_COMPRESS_PGLZ 0x04
> +#define BKPIMAGE_COMPRESS_ZLIB 0x08
> +#define BKPIMAGE_COMPRESS_LZ4  0x10
> +#define BKPIMAGE_COMPRESS_ZSTD 0x20
> +#define        BKPIMAGE_IS_COMPRESSED(info) \
> +       ((info & (BKPIMAGE_COMPRESS_PGLZ | BKPIMAGE_COMPRESS_ZLIB | \
> +                         BKPIMAGE_COMPRESS_LZ4 | BKPIMAGE_COMPRESS_ZSTD)) != 0)
>
> You encouraged saving bits here, so I'm surprised to see that your patches
> use one bit per compression method: 2 bits to support no/pglz/lz4, 3 to add
> zstd, and the previous patch used 4 bits to also support zlib.

Yeah, I know.  I have just finished with that to get something
readable for the sake of the tests.  As you say, the point is moot
just we add one new method, anyway, as we need just one new bit.
And that's what I would like to do for v15 with LZ4 as the resulting
patch is simple.  It would be an idea to discuss more compression
methods here once we hear more from users when this is released in the
field, re-considering at this point if more is necessary or not.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Justin Pryzby
Date:
On Tue, Jun 22, 2021 at 12:53:46PM +0900, Michael Paquier wrote:
> > So the patches that you say are unrelated still seem to me to be a
> > prerequisite .
> 
> I may be missing something, of course, but I still don't see why
> that's necessary?  We don't get any test failures on HEAD by switching
> wal_compression to on, no?  That's easy enough to test with a two-line
> change that changes the default.

Curious.  I found that before a4d75c86bf, there was an issue without the
"extra" patches.

|commit a4d75c86bf15220df22de0a92c819ecef9db3849
|Author: Tomas Vondra <tomas.vondra@postgresql.org>
|Date:   Fri Mar 26 23:22:01 2021 +0100
|
|    Extended statistics on expressions

I have no idea why that patch changes the behavior, but before a4d7, this patch
series failed like:

|$ time time make -C src/test/recovery check
...
|#   Failed test 'new xid after restart is greater'
|#   at t/011_crash_recovery.pl line 53.
|#     '539'
|#         >
|#     '539'
|
|#   Failed test 'xid is aborted after crash'
|#   at t/011_crash_recovery.pl line 57.
|#          got: 'committed'
|#     expected: 'aborted'
|# Looks like you failed 2 tests of 3.
|t/011_crash_recovery.pl .............. Dubious, test returned 2 (wstat 512, 0x200)
|Failed 2/3 subtests 

I checked that my most recent WAL compression patch applied on top of
a4d75c86bf works ok without the "extra" patches but fails when applied to
a4d75c86bf~1.

I think Andrey has been saying that since this already fails with PGLZ wal
compression, we could consider this to be a pre-existing problem.  I'm not
thrilled with that interpretation, but it's not wrong.

-- 
Justin



Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Sat, Jun 26, 2021 at 06:11:26PM -0500, Justin Pryzby wrote:
> Curious.  I found that before a4d75c86bf, there was an issue without the
> "extra" patches.

Is this issue different than the XID problem not matching when using
wal_level = minimal in test 011_crash_recovery.pl?  I am not sure to
understand if you are

> I have no idea why that patch changes the behavior, but before a4d7, this patch
> series failed like:

Not seeing the link here.  011_crash_recovery.pl has nothing to do
with extended statistics, normally.

> I think Andrey has been saying that since this already fails with PGLZ wal
> compression, we could consider this to be a pre-existing problem.  I'm not
> thrilled with that interpretation, but it's not wrong.

Removing "allows_streaming => 1" in 011_crash_recovery.pl is enough to
make the test fail on HEAD.  And the test fails equally without or
without any changes related to wal_compression, so adding or removing
options to wal_compression is not going to change anything with that.
There is simply no relationship I can spot, though I may be missing of
course an argument here.  Let's just discuss this recovery issue where
it should be discussed (these are patches 0002 and 0003 in the patch
v9 sent upthread):
https://www.postgresql.org/message-id/20210308.173242.463790587797836129.horikyota.ntt%40gmail.com
--
Michael

Attachment

Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Mon, Jun 28, 2021 at 04:36:42PM +0900, Michael Paquier wrote:
> Is this issue different than the XID problem not matching when using
> wal_level = minimal in test 011_crash_recovery.pl?  I am not sure to
> understand if you are

(This paragraph has been cut in half)
referring to a different problem or not.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Andrey Borodin
Date:

> 28 июня 2021 г., в 12:36, Michael Paquier <michael@paquier.xyz> написал(а):
>
> Removing "allows_streaming => 1" in 011_crash_recovery.pl is enough to
> make the test fail on HEAD.  And the test fails equally without or
> without any changes related to wal_compression, so adding or removing
> options to wal_compression is not going to change anything with that.
> There is simply no relationship I can spot, though I may be missing of
> course an argument here.
There is no relationship at all. That test 011_crash_recovery.pl is failing depending on random fluctuation in WAL
size.Currently, it does not affect this thread anyhow (except for confusion I made by importing patch with fix from
otherthread, sorry). 


> Let's just discuss this recovery issue where
> it should be discussed (these are patches 0002 and 0003 in the patch
> v9 sent upthread):
> https://www.postgresql.org/message-id/20210308.173242.463790587797836129.horikyota.ntt%40gmail.com
+1.

Best regards, Andrey Borodin.


Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Tue, Jun 22, 2021 at 09:11:26AM +0900, Michael Paquier wrote:
> What about the following error then?  Say:
> "image at %X/%X compressed with LZ4 not supported by build, block
> %d".
>
>> Also, maybe %s can be inlined to lz4 in this case.
>
> I think that's a remnant of the zstd part of the patch set, where I
> wanted to have only one translatable message.  Sure, I can align lz4
> with the message.
>
>> 2.
>> > const char *method = "???";
>> Maybe we can use something like "unknown" for unknown compression
>> methods? Or is it too long string for waldump output?
>
> A couple of extra bytes for pg_waldump will not matter much.  Using
> "unknown" is fine by me.

I have kept 1. as-is for translability, and included 2.

Now that v15 is open for business, I have looked again at this stuff
this morning and committed the LZ4 part after some adjustments:
- Avoid the use of default in the switches used for the compression,
so as the compiler would warn when introducing a new value in the enum
used for wal_compression.
- Switched to LZ4_compress_default().  LZ4_compress_fast() could be
used with LZ4_ACCELERATION_DEFAULT, but toast_compression.c uses the
former.

I got check-world tested with wal_compression = {lz4,pglz,off}, and
did some manuals checks, including some stuff without --with-lz4.
Attached is the rest of the patch set for zstd, for reference, rebased
with the changes that have been committed.  This still requires proper
support in the MSVC scripts.

My buildfarm machine has been changed to use wal_compression = lz4,
while on it for HEAD runs.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Andrey Borodin
Date:

> 29 июня 2021 г., в 08:41, Michael Paquier <michael@paquier.xyz> написал(а):
>
> Now that v15 is open for business, I have looked again at this stuff
> this morning and committed the LZ4 part

That's great, thanks Michael!

Best regards, Andrey Borodin.




Re: Different compression methods for FPI

From
Justin Pryzby
Date:
On Sun, Jun 13, 2021 at 08:24:12PM -0500, Justin Pryzby wrote:
> In this patch series, I added compression information to the errcontext from
> xlog_block_info(), and allow specifying compression levels like zlib-2.  I'll
> rearrange that commit earlier if we decide that's desirable to include.

4035cd5d4 added wal_compress=lz4 and
e9537321a added wal_compress=zstd

Since 4035cd5d4, pg_waldump has shown compression info (and walinspect
shows the same since 2258e76f9).

But if you try to replay WAL on a server which wasn't compiled with
support for the requisite compression methods, it's a bit crummy that it
doesn't include in the error message the reason *why* it failed to
restore the image, if that's caused by the missing compression method.

This hunk was from my patch in June, 2021, but wasn't included in the
final patches.  This would include the compression info algorithm: 1)
when failing to apply WAL in rm_redo_error_callback(); and, 2) in
wal_debug.

< 2022-08-31 21:37:53.325 CDT  >FATAL:  failed to restore block image
< 2022-08-31 21:37:53.325 CDT  >CONTEXT:  WAL redo at 1201/1B931F50 for XLOG/FPI_FOR_HINT: ; blkref #0: rel
1663/16888/164320567,blk 8186 FPW, compression method: zstd
 

In addition to cases where someone re/compiles postgres locally, I guess this
would also improve the situation for PITR and standbys, which might reasonably
be run on a different OS, with different OS packages, or with postgres compiled
separately.

> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index 17eeff0720..1ccc51575a 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -10470,7 +10470,17 @@ xlog_block_info(StringInfo buf, XLogReaderState *record)
>                               rnode.spcNode, rnode.dbNode, rnode.relNode,
>                               blk);
>          if (XLogRecHasBlockImage(record, block_id))
> -            appendStringInfoString(buf, " FPW");
> +        {
> +            int compression =
> +                BKPIMAGE_IS_COMPRESSED(record->blocks[block_id].bimg_info) ?
> +                BKPIMAGE_COMPRESSION(record->blocks[block_id].bimg_info) : -1;
> +            if (compression == -1)
> +                appendStringInfoString(buf, " FPW");
> +            else
> +                appendStringInfo(buf, " FPW, compression method %d/%s",
> +                    compression, wal_compression_name(compression));
> +        }
> +


Attachment

Re: Different compression methods for FPI

From
Justin Pryzby
Date:
On Fri, Sep 02, 2022 at 06:55:11AM -0500, Justin Pryzby wrote:
> On Sun, Jun 13, 2021 at 08:24:12PM -0500, Justin Pryzby wrote:
> > In this patch series, I added compression information to the errcontext from
> > xlog_block_info(), and allow specifying compression levels like zlib-2.  I'll
> > rearrange that commit earlier if we decide that's desirable to include.
> 
> 4035cd5d4 added wal_compress=lz4 and
> e9537321a added wal_compress=zstd
> 
> Since 4035cd5d4, pg_waldump has shown compression info (and walinspect
> shows the same since 2258e76f9).
> 
> But if you try to replay WAL on a server which wasn't compiled with
> support for the requisite compression methods, it's a bit crummy that it
> doesn't include in the error message the reason *why* it failed to
> restore the image, if that's caused by the missing compression method.

That's also hitting an elog().

2022-09-04 15:56:29.916 CDT startup[2625] FATAL:  XX000: failed to restore block image
2022-09-04 15:56:29.916 CDT startup[2625] DETAIL:  image at 0/1D11CB8 compressed with zstd not supported by build,
block0
 
2022-09-04 15:56:29.916 CDT startup[2625] CONTEXT:  WAL redo at 0/1D11CB8 for Heap/DELETE: off 50 flags 0x00
KEYS_UPDATED; blkref #0: rel 1663/16384/2610, blk 4 FPW
 
2022-09-04 15:56:29.916 CDT startup[2625] LOCATION:  XLogReadBufferForRedoExtended, xlogutils.c:396

(gdb) bt
#0  report_invalid_record (state=0x555555e33ff0, fmt=0x555555b1c1a8 "image at %X/%X compressed with %s not supported by
build,block %d") at xlogreader.c:74
 
#1  0x00005555556beeec in RestoreBlockImage (record=record@entry=0x555555e33ff0, block_id=block_id@entry=0 '\000',
page=page@entry=0x7fffee9bdc00"") at xlogreader.c:2078
 
#2  0x00005555556c5d39 in XLogReadBufferForRedoExtended (record=record@entry=0x555555e33ff0, block_id=block_id@entry=0
'\000',mode=mode@entry=RBM_NORMAL, get_cleanup_lock=get_cleanup_lock@entry=false, 
 
    buf=buf@entry=0x7fffffffd760) at xlogutils.c:395
#3  0x00005555556c5e4a in XLogReadBufferForRedo (record=record@entry=0x555555e33ff0, block_id=block_id@entry=0 '\000',
buf=buf@entry=0x7fffffffd760)at xlogutils.c:320
 
#4  0x000055555565bd5b in heap_xlog_delete (record=0x555555e33ff0) at heapam.c:9032
#5  0x00005555556622b7 in heap_redo (record=<optimized out>) at heapam.c:9836
#6  0x00005555556c15ed in ApplyWalRecord (xlogreader=0x555555e33ff0, record=record@entry=0x7fffee2b6820,
replayTLI=replayTLI@entry=0x7fffffffd870)at ../../../../src/include/access/xlog_internal.h:379
 
#7  0x00005555556c4c30 in PerformWalRecovery () at xlogrecovery.c:1725
#8  0x00005555556b7f23 in StartupXLOG () at xlog.c:5291
#9  0x0000555555ac4491 in InitPostgres (in_dbname=in_dbname@entry=0x555555e09390 "postgres", dboid=dboid@entry=0,
username=username@entry=0x555555dedda0"pryzbyj", useroid=useroid@entry=0, 
 
    load_session_libraries=<optimized out>, override_allow_connections=override_allow_connections@entry=false,
out_dbname=0x0)at postinit.c:731
 
#10 0x000055555598471f in PostgresMain (dbname=0x555555e09390 "postgres", username=username@entry=0x555555dedda0
"pryzbyj")at postgres.c:4085
 
#11 0x00005555559851b0 in PostgresSingleUserMain (argc=5, argv=0x555555de7530, username=0x555555dedda0 "pryzbyj") at
postgres.c:3986
#12 0x0000555555840533 in main (argc=5, argv=0x555555de7530) at main.c:194

I guess it should be promoted to an ereport(), since it's now a
user-facing error rathere than an internal one.

$ ./tmp_install.without-zstd/usr/local/pgsql/bin/postgres -D ./src/test/regress/tmp_check/data
2022-09-04 15:28:37.446 CDT postmaster[29964] LOG:  starting PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by gcc
(Ubuntu9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
 
2022-09-04 15:28:37.446 CDT postmaster[29964] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2022-09-04 15:28:37.528 CDT postmaster[29964] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-09-04 15:28:37.587 CDT startup[29972] LOG:  database system was interrupted while in recovery at 2022-09-04
15:27:44CDT
 
2022-09-04 15:28:37.587 CDT startup[29972] HINT:  This probably means that some data is corrupted and you will have to
usethe last backup for recovery.
 
2022-09-04 15:28:38.010 CDT startup[29972] LOG:  database system was not properly shut down; automatic recovery in
progress
2022-09-04 15:28:38.038 CDT startup[29972] LOG:  redo starts at 0/1D118C0
2022-09-04 15:28:38.039 CDT startup[29972] LOG:  could not stat file "pg_tblspc/16502": No such file or directory
2022-09-04 15:28:38.039 CDT startup[29972] CONTEXT:  WAL redo at 0/1D11970 for Tablespace/DROP: 16502
2022-09-04 15:28:38.039 CDT startup[29972] FATAL:  failed to restore block image
2022-09-04 15:28:38.039 CDT startup[29972] DETAIL:  image at 0/1D11CB8 compressed with zstd not supported by build,
block0
 
2022-09-04 15:28:38.039 CDT startup[29972] CONTEXT:  WAL redo at 0/1D11CB8 for Heap/DELETE: off 50 flags 0x00
KEYS_UPDATED; blkref #0: rel 1663/16384/2610, blk 4 FPW
 

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 0cda22597fe..01c7454bcc7 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -393,7 +393,11 @@ XLogReadBufferForRedoExtended(XLogReaderState *record,
                                       prefetch_buffer);
         page = BufferGetPage(*buf);
         if (!RestoreBlockImage(record, block_id, page))
-            elog(ERROR, "failed to restore block image");
+            ereport(ERROR,
+                    errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                    errmsg("failed to restore block image"),
+                    errdetail("%s", record->errormsg_buf));
+
 
         /*
          * The page may be uninitialized. If so, we can't set the LSN because



Re: Different compression methods for FPI

From
Matthias van de Meent
Date:
Hi,

I have a small question for those involved:

Context: I'm trying to get the smallest BKPIMAGE size possible
regardless of CPU cost, which means I'm trying multiple compressions
to get the smallest data possible with the options available. This
means ignoring the wal_compression GUC in XLogCompressBackupBlock and
brute-forcing the smallest compression available, potentially layering
compression methods, and returning the bimg_info compression flags
that will be stored in XLogCompressBackupBlock and used to decompress
the block's data.

Is there a prescribed order of compression algorithms to apply when
(de)compressing full page images in WAL, when the bimg_info has more
than one BKPIMAGE_COMPRESS_*-flags set? That is, when I want to check
the compression of the block image with both ZSTD and LZ4, which order
is the ordering indicated by bimg_info = (COMPRESS_LZ4 |
COMPRESS_ZSTD)?

Kind regards,

Matthias van de Meent



Re: Different compression methods for FPI

From
Justin Pryzby
Date:
On Mon, Sep 05, 2022 at 02:45:57PM +0200, Matthias van de Meent wrote:
> Hi,
> 
> I have a small question for those involved:

I suggest to "reply all"

> Context: I'm trying to get the smallest BKPIMAGE size possible
> regardless of CPU cost, which means I'm trying multiple compressions
> to get the smallest data possible with the options available. This
> means ignoring the wal_compression GUC in XLogCompressBackupBlock and
> brute-forcing the smallest compression available, potentially layering
> compression methods, and returning the bimg_info compression flags
> that will be stored in XLogCompressBackupBlock and used to decompress
> the block's data.

I think once you apply one compression method/algorithm, you shouldn't
expect other "layered" methods to be able to compress it at all.  I
think you'll get better compression by using a higher compression level
in zstd (or zlib) than with any combination of methods.  A patch for
configurable compression level was here:

https://postgr.es/m/20220222231948.GJ9008@telsasoft.com

> Is there a prescribed order of compression algorithms to apply when
> (de)compressing full page images in WAL, when the bimg_info has more
> than one BKPIMAGE_COMPRESS_*-flags set? That is, when I want to check
> the compression of the block image with both ZSTD and LZ4, which order
> is the ordering indicated by bimg_info = (COMPRESS_LZ4 |
> COMPRESS_ZSTD)?

There's currently a separate bit for each method, but it's not supported
to "stack" them (See also the "Save bits" patch, above).

This came up before when Greg asked about it.
https://www.postgresql.org/message-id/20210622031358.GF29179@telsasoft.com
https://www.postgresql.org/message-id/20220131222800.GY23027@telsasoft.com

-- 
Justin



Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Sun, Sep 04, 2022 at 07:23:20PM -0500, Justin Pryzby wrote:
> That's also hitting an elog().
>
> 2022-09-04 15:56:29.916 CDT startup[2625] FATAL:  XX000: failed to restore block image
> 2022-09-04 15:56:29.916 CDT startup[2625] DETAIL:  image at 0/1D11CB8 compressed with zstd not supported by build,
block0 
> 2022-09-04 15:56:29.916 CDT startup[2625] CONTEXT:  WAL redo at 0/1D11CB8 for Heap/DELETE: off 50 flags 0x00
KEYS_UPDATED; blkref #0: rel 1663/16384/2610, blk 4 FPW 
> 2022-09-04 15:56:29.916 CDT startup[2625] LOCATION:  XLogReadBufferForRedoExtended, xlogutils.c:396
>
> I guess it should be promoted to an ereport(), since it's now a
> user-facing error rathere than an internal one.
>
> diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
> index 0cda22597fe..01c7454bcc7 100644
> --- a/src/backend/access/transam/xlogutils.c
> +++ b/src/backend/access/transam/xlogutils.c
> @@ -393,7 +393,11 @@ XLogReadBufferForRedoExtended(XLogReaderState *record,
>                                        prefetch_buffer);
>          page = BufferGetPage(*buf);
>          if (!RestoreBlockImage(record, block_id, page))
> -            elog(ERROR, "failed to restore block image");
> +            ereport(ERROR,
> +                    errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                    errmsg("failed to restore block image"),
> +                    errdetail("%s", record->errormsg_buf));
> +

Yes, you are right here.  elog()'s should never be used for things
that could be triggered by the user, even if this one depends on the
build options.  I think that the error message ought to be updated as
"could not restore block image" instead, to be more in line with the
project policy.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Tue, Sep 06, 2022 at 03:47:05PM +0900, Michael Paquier wrote:
> On Sun, Sep 04, 2022 at 07:23:20PM -0500, Justin Pryzby wrote:
>>          page = BufferGetPage(*buf);
>>          if (!RestoreBlockImage(record, block_id, page))
>> -            elog(ERROR, "failed to restore block image");
>> +            ereport(ERROR,
>> +                    errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> +                    errmsg("failed to restore block image"),
>> +                    errdetail("%s", record->errormsg_buf));
>> +
>
> Yes, you are right here.  elog()'s should never be used for things
> that could be triggered by the user, even if this one depends on the
> build options.  I think that the error message ought to be updated as
> "could not restore block image" instead, to be more in line with the
> project policy.

At the end, I have not taken the approach to use errdetail() for this
problem as errormsg_buf is designed to include an error string.  So, I
have finished by refining the error messages generated in
RestoreBlockImage(), consuming them with an ERRCODE_INTERNAL_ERROR.

This approach addresses a second issue, actually, because we have
never provided any context when there are inconsistencies in the
decoded record for max_block_id, has_image or in_use when restoring a
block image.  This one is older than v15, but we have received
complaints about that for ~14 as far as I know, so I would leave this
change for HEAD and REL_15_STABLE.
--
Michael

Attachment

Re: Different compression methods for FPI

From
Justin Pryzby
Date:
On Wed, Sep 07, 2022 at 03:29:08PM +0900, Michael Paquier wrote:
> At the end, I have not taken the approach to use errdetail() for this
> problem as errormsg_buf is designed to include an error string.  So, I
> have finished by refining the error messages generated in
> RestoreBlockImage(), consuming them with an ERRCODE_INTERNAL_ERROR.

The cases involving max_block_id, has_image and in_use are still "can't
happen" cases, which used to hit elog(), and INTERNAL_ERROR sounds right
for them.

But that's also what'll happen when attempting to replay WAL using a postgres
build which doesn't support the necessary compression method.  I ran into this
while compiling postgres locally while reporting the recovery_prefetch bug,
when I failed to compile --with-zstd.  Note that basebackup does:

src/backend/backup/basebackup_zstd.c-   ereport(ERROR,
src/backend/backup/basebackup_zstd.c-                   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
src/backend/backup/basebackup_zstd.c:                    errmsg("zstd compression is not supported by this build")));
src/backend/backup/basebackup_zstd.c-   return NULL;                            /* keep compiler quiet */

-- 
Justin



Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Wed, Sep 07, 2022 at 03:57:29AM -0500, Justin Pryzby wrote:
> But that's also what'll happen when attempting to replay WAL using a postgres
> build which doesn't support the necessary compression method.  I ran into this
> while compiling postgres locally while reporting the recovery_prefetch bug,
> when I failed to compile --with-zstd.

Well, I am aware of that as that's how I have tested my change.  I
think that the case you are mentioning is really different than this
change, though.  The case you are mentioning gets triggered with the
server-side compression of pg_basebackup with a client application,
while the case of a block image restored can only happen when using
inconsistent build options between a primary and a standby (at least
in core, for the code paths touched by this patch).  Before posting my
previous patch, I have considered a few options:
- Extend errormsg_buf with an error code, but the frontend does not
care about that.
- Make RestoreBlockImage() a backend-only routine and provide a better
error control without filling in errormsg_buf, but I'd like to believe
that this code is useful for some frontend code even if core does not
use it yet in a frontend context.
- Change the signature of RestoreBlockImage() to return an enum with
at least a tri state instead of a boolean.  For this one I could not
convince myself that this is worth the complexity, as we are talking
about inconsistent build options between nodes doing physical
replication, and the error message is the useful piece to know what's
happening (frontends are only going to consume the error message
anyway).
--
Michael

Attachment

Re: Different compression methods for FPI

From
Michael Paquier
Date:
On Wed, Sep 07, 2022 at 07:02:07PM +0900, Michael Paquier wrote:
> Before posting my previous patch, I have considered a few options:
> - Extend errormsg_buf with an error code, but the frontend does not
> care about that.
> - Make RestoreBlockImage() a backend-only routine and provide a better
> error control without filling in errormsg_buf, but I'd like to believe
> that this code is useful for some frontend code even if core does not
> use it yet in a frontend context.
> - Change the signature of RestoreBlockImage() to return an enum with
> at least a tri state instead of a boolean.  For this one I could not
> convince myself that this is worth the complexity, as we are talking
> about inconsistent build options between nodes doing physical
> replication, and the error message is the useful piece to know what's
> happening (frontends are only going to consume the error message
> anyway).

After a second look, I was not feeling enthusiastic about adding more
complications in this code path for this case, so I have finished by
applying my previous patch to address this open item.

I am wondering if there is a use-case for backpatching something like
that to older versions though?  FPW compression is available since 9.5
but the code has never consumed the error message produced by
RestoreBlockImage() when pglz fails to decompress an image, and there
is equally no exact information provided when the status data of the
block in the record is incorrect, even if recovery provides some
context associated to the record being replayed.
--
Michael

Attachment