Re: Compress ReorderBuffer spill files using LZ4 - Mailing list pgsql-hackers

From Julien Tachoires
Subject Re: Compress ReorderBuffer spill files using LZ4
Date
Msg-id CAFEQCbFZXLV46PHmrcoBVMmv2=jn_WG7yrtaxrXRwWfkCpgzGA@mail.gmail.com
Whole thread Raw
In response to Compress ReorderBuffer spill files using LZ4  (Julien Tachoires <julmon@gmail.com>)
Responses Re: Compress ReorderBuffer spill files using LZ4
List pgsql-hackers
Hi Tomas,

Le lun. 23 sept. 2024 à 18:13, Tomas Vondra <tomas@vondra.me> a écrit :
>
> Hi,
>
> I've spent a bit more time on this, mostly running tests to get a better
> idea of the practical benefits.

Thank you for your code review and testing!

> Firstly, I think there's a bug in ReorderBufferCompress() - it's legal
> for pglz_compress() to return -1. This can happen if the data is not
> compressible, and would not fit into the output buffer. The code can't
> just do elog(ERROR) in this case, it needs to handle that by storing the
> raw data. The attached fixup patch makes this work for me - I'm not
> claiming this is the best way to handle this, but it works.
>
> FWIW I find it strange the tests included in the patch did not trigger
> this. That probably means the tests are not quite sufficient.
>
>
> Now, to the testing. Attached are two scripts, testing different cases:
>
> test-columns.sh - Table with a variable number of 'float8' columns.
>
> test-toast.sh - Table with a single text column.
>
> The script always sets up a publication/subscription on two instances,
> generates certain amount of data (~1GB for columns, ~3.2GB for TOAST),
> waits for it to be replicated to the replica, and measures how much data
> was spilled to disk with the different compression methods (off, pglz
> and lz4). There's a couple more metrics, but that's irrelevant here.

It would be interesting to run the same tests with zstd: in my early
testing I found that zstd was able to provide a better compression
ratio than lz4, but seemed to use more CPU resources/is slower.

> For the "column" test, it looks like this (this is in MB):
>
>     rows    columns    distribution    off     pglz    lz4
>   ========================================================
>   100000       1000    compressible    778      20       9
>                              random    778     778      16
>   --------------------------------------------------------
>   1000000       100    compressible    916     116      62
>                              random    916     916      67
>
> It's very clear that for the "compressible" data (which just copies the
> same value into all columns), both pglz and lz4 can significantly reduce
> the amount of data. For 1000 columns it's 780MB -> 20MB/9MB, for 100
> columns it's a bit less efficient, but still good.
>
> For the "random" data (where every column gets a random value, but rows
> are copied), it's a very different story - pglz does not help at all,
> while lz4 still massively reduces the amount of spilled data.
>
> I think the explanation is very simple - for pglz, we compress each row
> on it's own, there's no concept of streaming/context. If a row is
> compressible, it works fine, but when the row gets random, pglz can't
> compress it at all. For lz4, this does not matter, because with the
> streaming mode it still sees that rows are just repeated, and so can
> compress them efficiently.

That's correct.

> For TOAST test, the results look like this:
>
>   distribution     repeats        toast       off    pglz     lz4
>   ===============================================================
>   compressible       10000        lz4          14       2       1
>                                   pglz         40       4       3
>                       1000        lz4          32      16       9
>                                   pglz         54      17      10
>         ---------------------------------------------------------
>         random       10000        lz4        3305    3305    3157
>                                   pglz       3305    3305    3157
>                       1000        lz4        3166    3162    1580
>                                   pglz       3334    3326    1745
>        ----------------------------------------------------------
>        random2       10000        lz4        3305    3305    3157
>                                   pglz       3305    3305    3158
>                       1000        lz4        3160    3156    3010
>                                   pglz       3334    3326    3172
>
> The "repeats" value means how long the string is - it's the number of
> "md5" hashes added to the string. The number of rows is calculated to
> keep the total amount of data the same. The "toast" column tracks what
> compression was used for TOAST, I was wondering if it matters.
>
> This time there are three data distributions - compressible means that
> each TOAST value is nicely compressible, "random" means each value is
> random (not compressible), but the rows are just copy of the same value
> (so on the whole there's a lot of redundancy). And "random2" means each
> row is random and unique (so not compressible at all).
>
> The table shows that with compressible TOAST values, compressing the
> spill file is rather useless. The reason is that ReorderBufferCompress
> is handling raw TOAST data, which is already compressed. Yes, it may
> further reduce the amount of data, but it's negligible when compared to
> the original amount of data.
>
> For the random cases, the spill compression is rather pointless. Yes,
> lz4 can reduce it to 1/2 for the shorter strings, but other than that
> it's not very useful.

It's still interesting to confirm that data already compressed or
random data cannot be significantly compressed.

> For a while I was thinking this approach is flawed, because it only sees
> and compressed changes one by one, and that seeing a batch of changes
> would improve this (e.g. we'd see the copied rows). But I realized lz4
> already does that (in the streaming mode at least), and yet it does not
> help very much. Presumably that depends on how large the context is. If
> the random string is long enough, it won't help.
>
> So maybe this approach is fine, and doing the compression at a lower
> layer (for the whole file), would not really improve this. Even then
> we'd only see a limited amount of data.
>
> Maybe the right answer to this is that compression does not help cases
> where most of the replicated data is TOAST, and that it can help cases
> with wide (and redundant) rows, or repeated rows. And that lz4 is a
> clearly superior choice. (This also raises the question if we want to
> support REORDER_BUFFER_STRAT_LZ4_REGULAR. I haven't looked into this,
> but doesn't that behave more like pglz, i.e. no context?)

I'm working on a new version of this patch set that will include the
changes you suggested in your review. About using LZ4 regular API, the
goal was to use it when we cannot use the streaming API due to raw
data larger than LZ4 ring buffer. But this is something I'm going to
delete in the new version because I'm planning to use a similar
approach as we do in astreamer_lz4.c: using frames, not blocks. LZ4
frame API looks very similar to ZSTD's streaming API.

> FWIW when doing these tests, it made me realize how useful would it be
> to track both the "raw" and "spilled" amounts. That is before/after
> compression. It'd make calculating compression ratio much easier.

Yes, that's why I tried to "fix" the spill_bytes counter.

Regards,

JT



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: attndims, typndims still not enforced, but make the value within a sane threshold
Next
From: Shayon Mukherjee
Date:
Subject: Re: Proposal to Enable/Disable Index using ALTER INDEX