Re: Compress ReorderBuffer spill files using LZ4 - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Compress ReorderBuffer spill files using LZ4 |
Date | |
Msg-id | bd1c42f5-7167-41eb-a473-b49b77ad9363@vondra.me Whole thread Raw |
In response to | Re: Compress ReorderBuffer spill files using LZ4 (Julien Tachoires <julmon@gmail.com>) |
List | pgsql-hackers |
On 9/23/24 21:58, Julien Tachoires wrote: > Hi Tomas, > > Le lun. 23 sept. 2024 à 18:13, Tomas Vondra <tomas@vondra.me> a écrit : >> >> Hi, >> >> I've spent a bit more time on this, mostly running tests to get a better >> idea of the practical benefits. > > Thank you for your code review and testing! > >> Firstly, I think there's a bug in ReorderBufferCompress() - it's legal >> for pglz_compress() to return -1. This can happen if the data is not >> compressible, and would not fit into the output buffer. The code can't >> just do elog(ERROR) in this case, it needs to handle that by storing the >> raw data. The attached fixup patch makes this work for me - I'm not >> claiming this is the best way to handle this, but it works. >> >> FWIW I find it strange the tests included in the patch did not trigger >> this. That probably means the tests are not quite sufficient. >> >> >> Now, to the testing. Attached are two scripts, testing different cases: >> >> test-columns.sh - Table with a variable number of 'float8' columns. >> >> test-toast.sh - Table with a single text column. >> >> The script always sets up a publication/subscription on two instances, >> generates certain amount of data (~1GB for columns, ~3.2GB for TOAST), >> waits for it to be replicated to the replica, and measures how much data >> was spilled to disk with the different compression methods (off, pglz >> and lz4). There's a couple more metrics, but that's irrelevant here. > > It would be interesting to run the same tests with zstd: in my early > testing I found that zstd was able to provide a better compression > ratio than lz4, but seemed to use more CPU resources/is slower. > Oh, I completely forgot about zstd. I don't think it'd substantially change the conclusions, though. It might compress better/worse for some cases, but the overall behavior would remain the same. I can't test this right now, the testmachine is busy with some other stuff. But it should not be difficult to update the test scripts I attached and get results yourself. There's a couple hard-coded paths that need to be updated, ofc. >> For the "column" test, it looks like this (this is in MB): >> >> rows columns distribution off pglz lz4 >> ======================================================== >> 100000 1000 compressible 778 20 9 >> random 778 778 16 >> -------------------------------------------------------- >> 1000000 100 compressible 916 116 62 >> random 916 916 67 >> >> It's very clear that for the "compressible" data (which just copies the >> same value into all columns), both pglz and lz4 can significantly reduce >> the amount of data. For 1000 columns it's 780MB -> 20MB/9MB, for 100 >> columns it's a bit less efficient, but still good. >> >> For the "random" data (where every column gets a random value, but rows >> are copied), it's a very different story - pglz does not help at all, >> while lz4 still massively reduces the amount of spilled data. >> >> I think the explanation is very simple - for pglz, we compress each row >> on it's own, there's no concept of streaming/context. If a row is >> compressible, it works fine, but when the row gets random, pglz can't >> compress it at all. For lz4, this does not matter, because with the >> streaming mode it still sees that rows are just repeated, and so can >> compress them efficiently. > > That's correct. > >> For TOAST test, the results look like this: >> >> distribution repeats toast off pglz lz4 >> =============================================================== >> compressible 10000 lz4 14 2 1 >> pglz 40 4 3 >> 1000 lz4 32 16 9 >> pglz 54 17 10 >> --------------------------------------------------------- >> random 10000 lz4 3305 3305 3157 >> pglz 3305 3305 3157 >> 1000 lz4 3166 3162 1580 >> pglz 3334 3326 1745 >> ---------------------------------------------------------- >> random2 10000 lz4 3305 3305 3157 >> pglz 3305 3305 3158 >> 1000 lz4 3160 3156 3010 >> pglz 3334 3326 3172 >> >> The "repeats" value means how long the string is - it's the number of >> "md5" hashes added to the string. The number of rows is calculated to >> keep the total amount of data the same. The "toast" column tracks what >> compression was used for TOAST, I was wondering if it matters. >> >> This time there are three data distributions - compressible means that >> each TOAST value is nicely compressible, "random" means each value is >> random (not compressible), but the rows are just copy of the same value >> (so on the whole there's a lot of redundancy). And "random2" means each >> row is random and unique (so not compressible at all). >> >> The table shows that with compressible TOAST values, compressing the >> spill file is rather useless. The reason is that ReorderBufferCompress >> is handling raw TOAST data, which is already compressed. Yes, it may >> further reduce the amount of data, but it's negligible when compared to >> the original amount of data. >> >> For the random cases, the spill compression is rather pointless. Yes, >> lz4 can reduce it to 1/2 for the shorter strings, but other than that >> it's not very useful. > > It's still interesting to confirm that data already compressed or > random data cannot be significantly compressed. > >> For a while I was thinking this approach is flawed, because it only sees >> and compressed changes one by one, and that seeing a batch of changes >> would improve this (e.g. we'd see the copied rows). But I realized lz4 >> already does that (in the streaming mode at least), and yet it does not >> help very much. Presumably that depends on how large the context is. If >> the random string is long enough, it won't help. >> >> So maybe this approach is fine, and doing the compression at a lower >> layer (for the whole file), would not really improve this. Even then >> we'd only see a limited amount of data. >> >> Maybe the right answer to this is that compression does not help cases >> where most of the replicated data is TOAST, and that it can help cases >> with wide (and redundant) rows, or repeated rows. And that lz4 is a >> clearly superior choice. (This also raises the question if we want to >> support REORDER_BUFFER_STRAT_LZ4_REGULAR. I haven't looked into this, >> but doesn't that behave more like pglz, i.e. no context?) > > I'm working on a new version of this patch set that will include the > changes you suggested in your review. About using LZ4 regular API, the > goal was to use it when we cannot use the streaming API due to raw > data larger than LZ4 ring buffer. But this is something I'm going to > delete in the new version because I'm planning to use a similar > approach as we do in astreamer_lz4.c: using frames, not blocks. LZ4 > frame API looks very similar to ZSTD's streaming API. > >> FWIW when doing these tests, it made me realize how useful would it be >> to track both the "raw" and "spilled" amounts. That is before/after >> compression. It'd make calculating compression ratio much easier. > > Yes, that's why I tried to "fix" the spill_bytes counter. > But I think the 'fixed' counter only tracks the data after the new compression, right? I'm suggesting to have two counters - one for "raw" data (before compression) and "compressed" (after compression). regards -- Tomas Vondra
pgsql-hackers by date: