Re: Compress ReorderBuffer spill files using LZ4 - Mailing list pgsql-hackers
From | Julien Tachoires |
---|---|
Subject | Re: Compress ReorderBuffer spill files using LZ4 |
Date | |
Msg-id | CAFEQCbFZXLV46PHmrcoBVMmv2=jn_WG7yrtaxrXRwWfkCpgzGA@mail.gmail.com Whole thread Raw |
In response to | Compress ReorderBuffer spill files using LZ4 (Julien Tachoires <julmon@gmail.com>) |
Responses |
Re: Compress ReorderBuffer spill files using LZ4
|
List | pgsql-hackers |
Hi Tomas, Le lun. 23 sept. 2024 à 18:13, Tomas Vondra <tomas@vondra.me> a écrit : > > Hi, > > I've spent a bit more time on this, mostly running tests to get a better > idea of the practical benefits. Thank you for your code review and testing! > Firstly, I think there's a bug in ReorderBufferCompress() - it's legal > for pglz_compress() to return -1. This can happen if the data is not > compressible, and would not fit into the output buffer. The code can't > just do elog(ERROR) in this case, it needs to handle that by storing the > raw data. The attached fixup patch makes this work for me - I'm not > claiming this is the best way to handle this, but it works. > > FWIW I find it strange the tests included in the patch did not trigger > this. That probably means the tests are not quite sufficient. > > > Now, to the testing. Attached are two scripts, testing different cases: > > test-columns.sh - Table with a variable number of 'float8' columns. > > test-toast.sh - Table with a single text column. > > The script always sets up a publication/subscription on two instances, > generates certain amount of data (~1GB for columns, ~3.2GB for TOAST), > waits for it to be replicated to the replica, and measures how much data > was spilled to disk with the different compression methods (off, pglz > and lz4). There's a couple more metrics, but that's irrelevant here. It would be interesting to run the same tests with zstd: in my early testing I found that zstd was able to provide a better compression ratio than lz4, but seemed to use more CPU resources/is slower. > For the "column" test, it looks like this (this is in MB): > > rows columns distribution off pglz lz4 > ======================================================== > 100000 1000 compressible 778 20 9 > random 778 778 16 > -------------------------------------------------------- > 1000000 100 compressible 916 116 62 > random 916 916 67 > > It's very clear that for the "compressible" data (which just copies the > same value into all columns), both pglz and lz4 can significantly reduce > the amount of data. For 1000 columns it's 780MB -> 20MB/9MB, for 100 > columns it's a bit less efficient, but still good. > > For the "random" data (where every column gets a random value, but rows > are copied), it's a very different story - pglz does not help at all, > while lz4 still massively reduces the amount of spilled data. > > I think the explanation is very simple - for pglz, we compress each row > on it's own, there's no concept of streaming/context. If a row is > compressible, it works fine, but when the row gets random, pglz can't > compress it at all. For lz4, this does not matter, because with the > streaming mode it still sees that rows are just repeated, and so can > compress them efficiently. That's correct. > For TOAST test, the results look like this: > > distribution repeats toast off pglz lz4 > =============================================================== > compressible 10000 lz4 14 2 1 > pglz 40 4 3 > 1000 lz4 32 16 9 > pglz 54 17 10 > --------------------------------------------------------- > random 10000 lz4 3305 3305 3157 > pglz 3305 3305 3157 > 1000 lz4 3166 3162 1580 > pglz 3334 3326 1745 > ---------------------------------------------------------- > random2 10000 lz4 3305 3305 3157 > pglz 3305 3305 3158 > 1000 lz4 3160 3156 3010 > pglz 3334 3326 3172 > > The "repeats" value means how long the string is - it's the number of > "md5" hashes added to the string. The number of rows is calculated to > keep the total amount of data the same. The "toast" column tracks what > compression was used for TOAST, I was wondering if it matters. > > This time there are three data distributions - compressible means that > each TOAST value is nicely compressible, "random" means each value is > random (not compressible), but the rows are just copy of the same value > (so on the whole there's a lot of redundancy). And "random2" means each > row is random and unique (so not compressible at all). > > The table shows that with compressible TOAST values, compressing the > spill file is rather useless. The reason is that ReorderBufferCompress > is handling raw TOAST data, which is already compressed. Yes, it may > further reduce the amount of data, but it's negligible when compared to > the original amount of data. > > For the random cases, the spill compression is rather pointless. Yes, > lz4 can reduce it to 1/2 for the shorter strings, but other than that > it's not very useful. It's still interesting to confirm that data already compressed or random data cannot be significantly compressed. > For a while I was thinking this approach is flawed, because it only sees > and compressed changes one by one, and that seeing a batch of changes > would improve this (e.g. we'd see the copied rows). But I realized lz4 > already does that (in the streaming mode at least), and yet it does not > help very much. Presumably that depends on how large the context is. If > the random string is long enough, it won't help. > > So maybe this approach is fine, and doing the compression at a lower > layer (for the whole file), would not really improve this. Even then > we'd only see a limited amount of data. > > Maybe the right answer to this is that compression does not help cases > where most of the replicated data is TOAST, and that it can help cases > with wide (and redundant) rows, or repeated rows. And that lz4 is a > clearly superior choice. (This also raises the question if we want to > support REORDER_BUFFER_STRAT_LZ4_REGULAR. I haven't looked into this, > but doesn't that behave more like pglz, i.e. no context?) I'm working on a new version of this patch set that will include the changes you suggested in your review. About using LZ4 regular API, the goal was to use it when we cannot use the streaming API due to raw data larger than LZ4 ring buffer. But this is something I'm going to delete in the new version because I'm planning to use a similar approach as we do in astreamer_lz4.c: using frames, not blocks. LZ4 frame API looks very similar to ZSTD's streaming API. > FWIW when doing these tests, it made me realize how useful would it be > to track both the "raw" and "spilled" amounts. That is before/after > compression. It'd make calculating compression ratio much easier. Yes, that's why I tried to "fix" the spill_bytes counter. Regards, JT
pgsql-hackers by date: