Re: Different compression methods for FPI - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Different compression methods for FPI
Date
Msg-id YLWWPaq/KnVS24J4@paquier.xyz
Whole thread Raw
In response to Re: Different compression methods for FPI  (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses Re: Different compression methods for FPI
List pgsql-hackers
On Mon, May 31, 2021 at 12:33:44PM +0500, Andrey Borodin wrote:
> Would it make sense to run our own benchmarks?

Yes, I think that it could be a good idea to run some custom-made
benchmarks as that could mean different bottlenecks found when it
comes to PG.

There are a couple of factors that matter here:
- Is the algo available across a maximum of platforms?  ZLIB and LZ4
are everywhere and popular, for one.  And we already plug with them in
the builds.  No idea about the others but I can see quickly that Zstd
has support across many systems, and has a compatible license.
- Speed and CPU usage.  We should worry about that for CPU-bounded
environments.
- Compression ratio, which is just monitoring the difference in WAL.
- Effect of the level of compression perhaps?
- Use a fixed amount of WAL generated, meaning a set of repeatable SQL
queries, with one backend, no benchmarks like pgbench.
- Avoid any I/O bottleneck, so run tests on a tmpfs or ramfs.
- Avoid any extra WAL interference, like checkpoints, no autovacuum
running in parallel.

It is not easy to draw a straight line here, but one could easily say
that an algo that reduces a FPI by 90% costing two times more CPU
cycles is worse than something doing only a 70%~75% compression for
two times less CPU cycles if environments are easily constrained on
CPU.

As mentioned upthread, I'd recomment to design tests like this one, or
just reuse this one:
https://www.postgresql.org/message-id/CAB7nPqSc97o-UE5paxfMUKWcxE_JioyxO1M4A0pMnmYqAnec2g@mail.gmail.com

In terms of CPU usage, we should also monitor the user and system
times of the backend, and compare the various situations.  See patch
0003 posted here that we used for wal_compression:
https://www.postgresql.org/message-idCAB7nPqRC20=mKgu6d2st-e11_QqqbreZg-=SF+_UYsmvwNu42g@mail.gmail.com

This just uses getrusage() to get more stats.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: CALL versus procedures with output-only arguments
Next
From: Amit Kapila
Date:
Subject: Re: Assertion failure while streaming toasted data