Thread: pglz performance
Hi hackers! I was reviewing Paul Ramsey's TOAST patch[0] and noticed that there is a big room for improvement in performance of pglzcompression and decompression. With Vladimir we started to investigate ways to boost byte copying and eventually created test suit[1] to investigate performanceof compression and decompression. This is and extension with single function test_pglz() which performs tests for different: 1. Data payloads 2. Compression implementations 3. Decompression implementations Currently we test mostly decompression improvements against two WALs and one data file taken from pgbench-generated database.Any suggestion on more relevant data payloads are very welcome. My laptop tests show that our decompression implementation [2] can be from 15% to 50% faster. Also I've noted that compression is extremely slow, ~30 times slower than decompression. I believe we can do something aboutit. We focus only on boosting existing codec without any considerations of other compression algorithms. Any comments are much appreciated. Most important questions are: 1. What are relevant data sets? 2. What are relevant CPUs? I have only XEON-based servers and few laptops\desktops with intel CPUs 3. If compression is 30 times slower, should we better focus on compression instead of decompression? Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/flat/CANP8%2BjKcGj-JYzEawS%2BCUZnfeGKq4T5LswcswMP4GUHeZEP1ag%40mail.gmail.com [1] https://github.com/x4m/test_pglz [2] https://www.postgresql.org/message-id/C2D8E5D5-3E83-469B-8751-1C7877C2A5F2%40yandex-team.ru
On Mon, May 13, 2019 at 07:45:59AM +0500, Andrey Borodin wrote: > I was reviewing Paul Ramsey's TOAST patch[0] and noticed that there > is a big room for improvement in performance of pglz compression and > decompression. Yes, I believe so too. pglz is a huge CPU-consumer when it comes to compilation compared to more modern algos like lz4. > With Vladimir we started to investigate ways to boost byte copying > and eventually created test suit[1] to investigate performance of > compression and decompression. This is and extension with single > function test_pglz() which performs tests for different: > 1. Data payloads > 2. Compression implementations > 3. Decompression implementations Cool. I got something rather similar in my wallet of plugins: https://github.com/michaelpq/pg_plugins/tree/master/compress_test This is something I worked on mainly for FPW compression in WAL. > Currently we test mostly decompression improvements against two WALs > and one data file taken from pgbench-generated database. Any > suggestion on more relevant data payloads are very welcome. Text strings made of random data and variable length? For any test of this kind I think that it is good to focus on the performance of the low-level calls, even going as far as a simple C wrapper on top of the pglz APIs to test only the performance and not have extra PG-related overhead like palloc() which can be a barrier. Focusing on strings of lengths of 1kB up to 16kB may be an idea of size, and it is important to keep the same uncompressed strings for performance comparison. > My laptop tests show that our decompression implementation [2] can > be from 15% to 50% faster. Also I've noted that compression is > extremely slow, ~30 times slower than decompression. I believe we > can do something about it. That's nice. > We focus only on boosting existing codec without any considerations > of other compression algorithms. There is this as well. A couple of algorithms have a license compatible with Postgres, but it may be more simple to just improve pglz. A 10%~20% improvement is something worth doing. > Most important questions are: > 1. What are relevant data sets? > 2. What are relevant CPUs? I have only XEON-based servers and few > laptops\desktops with intel CPUs > 3. If compression is 30 times slower, should we better focus on > compression instead of decompression? Decompression can matter a lot for mostly-read workloads and compression can become a bottleneck for heavy-insert loads, so improving compression or decompression should be two separate problems, not two problems linked. Any improvement in one or the other, or even both, is nice to have. -- Michael
Attachment
> 13 мая 2019 г., в 12:14, Michael Paquier <michael@paquier.xyz> написал(а): > >> Currently we test mostly decompression improvements against two WALs >> and one data file taken from pgbench-generated database. Any >> suggestion on more relevant data payloads are very welcome. > > Text strings made of random data and variable length? Like text corpus? > For any test of > this kind I think that it is good to focus on the performance of the > low-level calls, even going as far as a simple C wrapper on top of the > pglz APIs to test only the performance and not have extra PG-related > overhead like palloc() which can be a barrier. Our test_pglz extension is measuring only time of real compression, doing warmup run, all allocations are done before measurement. > Focusing on strings of > lengths of 1kB up to 16kB may be an idea of size, and it is important > to keep the same uncompressed strings for performance comparison. We intentionally avoid using generated data, thus keep test files committed into git repo. Also we check that decompressed data matches source of compression. All tests are done 5 times. We use PG extension only for simplicity of deployment of benchmarks to our PG clusters. Here are some test results. Currently we test on 4 payloads: 1. WAL from cluster initialization 2. 2 WALs from pgbench pgbench -i -s 10 3. data file taken from pgbench -i -s 10 We use these decompressors: 1. pglz_decompress_vanilla - taken from PG source code 2. pglz_decompress_hacked - use sliced memcpy to imitate byte-by-byte pglz decompression 3. pglz_decompress_hacked4, pglz_decompress_hacked8, pglz_decompress_hackedX - use memcpy if match is no less than X bytes.We need to determine best X, if this approach is used. I used three platforms: 1. Server XEONE5-2660 SM/SYS1027RN3RF/10S2.5/1U/2P (2*INTEL XEON E5-2660/16*DDR3ECCREG/10*SAS-2.5) Under Ubuntu 14, PG 9.6. 2. Desktop Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz Ubuntu 18, PG 12devel 3. Laptop MB Pro 15 2015 2.2 GHz Core i7 (I7-4770HQ) MacOS, PG 12devel Owners of AMD and ARM devices are welcome. Server results (less is better): NOTICE: 00000: Time to decompress one byte in ns: NOTICE: 00000: Payload 000000010000000000000001 NOTICE: 00000: Decompressor pglz_decompress_hacked result 0.647235 NOTICE: 00000: Decompressor pglz_decompress_hacked4 result 0.671029 NOTICE: 00000: Decompressor pglz_decompress_hacked8 result 0.699949 NOTICE: 00000: Decompressor pglz_decompress_hacked16 result 0.739586 NOTICE: 00000: Decompressor pglz_decompress_hacked32 result 0.787926 NOTICE: 00000: Decompressor pglz_decompress_vanilla result 1.147282 NOTICE: 00000: Payload 000000010000000000000006 NOTICE: 00000: Decompressor pglz_decompress_hacked result 0.201774 NOTICE: 00000: Decompressor pglz_decompress_hacked4 result 0.211859 NOTICE: 00000: Decompressor pglz_decompress_hacked8 result 0.212610 NOTICE: 00000: Decompressor pglz_decompress_hacked16 result 0.214601 NOTICE: 00000: Decompressor pglz_decompress_hacked32 result 0.221813 NOTICE: 00000: Decompressor pglz_decompress_vanilla result 0.706005 NOTICE: 00000: Payload 000000010000000000000008 NOTICE: 00000: Decompressor pglz_decompress_hacked result 1.370132 NOTICE: 00000: Decompressor pglz_decompress_hacked4 result 1.388991 NOTICE: 00000: Decompressor pglz_decompress_hacked8 result 1.388502 NOTICE: 00000: Decompressor pglz_decompress_hacked16 result 1.529455 NOTICE: 00000: Decompressor pglz_decompress_hacked32 result 1.520813 NOTICE: 00000: Decompressor pglz_decompress_vanilla result 1.433527 NOTICE: 00000: Payload 16398 NOTICE: 00000: Decompressor pglz_decompress_hacked result 0.606943 NOTICE: 00000: Decompressor pglz_decompress_hacked4 result 0.623044 NOTICE: 00000: Decompressor pglz_decompress_hacked8 result 0.624118 NOTICE: 00000: Decompressor pglz_decompress_hacked16 result 0.620987 NOTICE: 00000: Decompressor pglz_decompress_hacked32 result 0.621183 NOTICE: 00000: Decompressor pglz_decompress_vanilla result 1.365318 Comment: pglz_decompress_hacked is unconditionally optimal. On most of cases it is 2x better than current implementation. On 000000010000000000000008 it is only marginally better. pglz_decompress_hacked8 is few percents worse than pglz_decompress_hacked. Desktop results: NOTICE: Time to decompress one byte in ns: NOTICE: Payload 000000010000000000000001 NOTICE: Decompressor pglz_decompress_hacked result 0.396454 NOTICE: Decompressor pglz_decompress_hacked4 result 0.429249 NOTICE: Decompressor pglz_decompress_hacked8 result 0.436413 NOTICE: Decompressor pglz_decompress_hacked16 result 0.478077 NOTICE: Decompressor pglz_decompress_hacked32 result 0.491488 NOTICE: Decompressor pglz_decompress_vanilla result 0.695527 NOTICE: Payload 000000010000000000000006 NOTICE: Decompressor pglz_decompress_hacked result 0.110710 NOTICE: Decompressor pglz_decompress_hacked4 result 0.115669 NOTICE: Decompressor pglz_decompress_hacked8 result 0.127637 NOTICE: Decompressor pglz_decompress_hacked16 result 0.120544 NOTICE: Decompressor pglz_decompress_hacked32 result 0.117981 NOTICE: Decompressor pglz_decompress_vanilla result 0.399446 NOTICE: Payload 000000010000000000000008 NOTICE: Decompressor pglz_decompress_hacked result 0.647402 NOTICE: Decompressor pglz_decompress_hacked4 result 0.691891 NOTICE: Decompressor pglz_decompress_hacked8 result 0.693834 NOTICE: Decompressor pglz_decompress_hacked16 result 0.776815 NOTICE: Decompressor pglz_decompress_hacked32 result 0.777960 NOTICE: Decompressor pglz_decompress_vanilla result 0.721192 NOTICE: Payload 16398 NOTICE: Decompressor pglz_decompress_hacked result 0.337654 NOTICE: Decompressor pglz_decompress_hacked4 result 0.355452 NOTICE: Decompressor pglz_decompress_hacked8 result 0.351224 NOTICE: Decompressor pglz_decompress_hacked16 result 0.362548 NOTICE: Decompressor pglz_decompress_hacked32 result 0.356456 NOTICE: Decompressor pglz_decompress_vanilla result 0.837042 Comment: identical to Server results. Laptop results: NOTICE: Time to decompress one byte in ns: NOTICE: Payload 000000010000000000000001 NOTICE: Decompressor pglz_decompress_hacked result 0.661469 NOTICE: Decompressor pglz_decompress_hacked4 result 0.638366 NOTICE: Decompressor pglz_decompress_hacked8 result 0.664377 NOTICE: Decompressor pglz_decompress_hacked16 result 0.696135 NOTICE: Decompressor pglz_decompress_hacked32 result 0.634825 NOTICE: Decompressor pglz_decompress_vanilla result 0.676560 NOTICE: Payload 000000010000000000000006 NOTICE: Decompressor pglz_decompress_hacked result 0.213921 NOTICE: Decompressor pglz_decompress_hacked4 result 0.224864 NOTICE: Decompressor pglz_decompress_hacked8 result 0.229394 NOTICE: Decompressor pglz_decompress_hacked16 result 0.218141 NOTICE: Decompressor pglz_decompress_hacked32 result 0.220954 NOTICE: Decompressor pglz_decompress_vanilla result 0.242412 NOTICE: Payload 000000010000000000000008 NOTICE: Decompressor pglz_decompress_hacked result 1.053417 NOTICE: Decompressor pglz_decompress_hacked4 result 1.063704 NOTICE: Decompressor pglz_decompress_hacked8 result 1.007211 NOTICE: Decompressor pglz_decompress_hacked16 result 1.145089 NOTICE: Decompressor pglz_decompress_hacked32 result 1.079702 NOTICE: Decompressor pglz_decompress_vanilla result 1.051557 NOTICE: Payload 16398 NOTICE: Decompressor pglz_decompress_hacked result 0.251690 NOTICE: Decompressor pglz_decompress_hacked4 result 0.268125 NOTICE: Decompressor pglz_decompress_hacked8 result 0.269248 NOTICE: Decompressor pglz_decompress_hacked16 result 0.277880 NOTICE: Decompressor pglz_decompress_hacked32 result 0.270290 NOTICE: Decompressor pglz_decompress_vanilla result 0.705652 Comment: decompress time on WAL segments is statistically indistinguishable between hacked and original versions. Hackeddecompression of data file is 2x faster. We are going to try these tests on cascade lake processors too. Best regards, Andrey Borodin.
> 15 мая 2019 г., в 15:06, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > Owners of AMD and ARM devices are welcome. Yandex hardware RND guys gave me ARM server and Power9 server. They are looking for AMD and some new Intel boxes. Meanwhile I made some enhancements to test suit: 1. I've added Shakespeare payload: concatenation of works of this prominent poet. 2. For each payload compute "sliced time" - time to decompress payload if it was sliced by 2Kb pieces or 8Kb pieces. 3. For each decompressor we compute "score": (sum of time to decompress each payload, each payload sliced by 2Kb and 8Kb)* 5 times I've attached full test logs, meanwhile here's results for different platforms. Intel Server NOTICE: 00000: Decompressor pglz_decompress_hacked result 10.346763 NOTICE: 00000: Decompressor pglz_decompress_hacked8 result 11.192078 NOTICE: 00000: Decompressor pglz_decompress_hacked16 result 11.957727 NOTICE: 00000: Decompressor pglz_decompress_vanilla result 14.262256 ARM Server NOTICE: Decompressor pglz_decompress_hacked result 12.966668 NOTICE: Decompressor pglz_decompress_hacked8 result 13.004935 NOTICE: Decompressor pglz_decompress_hacked16 result 13.043015 NOTICE: Decompressor pglz_decompress_vanilla result 18.239242 Power9 Server NOTICE: Decompressor pglz_decompress_hacked result 10.992974 NOTICE: Decompressor pglz_decompress_hacked8 result 11.747443 NOTICE: Decompressor pglz_decompress_hacked16 result 11.026342 NOTICE: Decompressor pglz_decompress_vanilla result 16.375315 Intel laptop NOTICE: Decompressor pglz_decompress_hacked result 9.445808 NOTICE: Decompressor pglz_decompress_hacked8 result 9.105360 NOTICE: Decompressor pglz_decompress_hacked16 result 9.621833 NOTICE: Decompressor pglz_decompress_vanilla result 10.661968 From these results pglz_decompress_hacked looks best. Best regards, Andrey Borodin.
Attachment
On Thu, May 16, 2019 at 10:13:22PM +0500, Andrey Borodin wrote: > Meanwhile I made some enhancements to test suit: > Intel Server > NOTICE: 00000: Decompressor pglz_decompress_hacked result 10.346763 > NOTICE: 00000: Decompressor pglz_decompress_hacked8 result 11.192078 > NOTICE: 00000: Decompressor pglz_decompress_hacked16 result 11.957727 > NOTICE: 00000: Decompressor pglz_decompress_vanilla result 14.262256 > > ARM Server > NOTICE: Decompressor pglz_decompress_hacked result 12.966668 > NOTICE: Decompressor pglz_decompress_hacked8 result 13.004935 > NOTICE: Decompressor pglz_decompress_hacked16 result 13.043015 > NOTICE: Decompressor pglz_decompress_vanilla result 18.239242 > > Power9 Server > NOTICE: Decompressor pglz_decompress_hacked result 10.992974 > NOTICE: Decompressor pglz_decompress_hacked8 result 11.747443 > NOTICE: Decompressor pglz_decompress_hacked16 result 11.026342 > NOTICE: Decompressor pglz_decompress_vanilla result 16.375315 > > Intel laptop > NOTICE: Decompressor pglz_decompress_hacked result 9.445808 > NOTICE: Decompressor pglz_decompress_hacked8 result 9.105360 > NOTICE: Decompressor pglz_decompress_hacked16 result 9.621833 > NOTICE: Decompressor pglz_decompress_vanilla result 10.661968 > > From these results pglz_decompress_hacked looks best. That's nice. From the numbers you are presenting here, all of them are much better than the original, and there is not much difference between any of the patched versions. Having a 20%~30% improvement with a patch is very nice. After that comes the simplicity and the future maintainability of what is proposed. I am not much into accepting a patch which has a 1%~2% impact for some hardwares and makes pglz much more complex and harder to understand. But I am really eager to see a patch with at least a 10% improvement which remains simple, even more if it simplifies the logic used in pglz. -- Michael
Attachment
> 17 мая 2019 г., в 6:44, Michael Paquier <michael@paquier.xyz> написал(а): > > That's nice. > > From the numbers you are presenting here, all of them are much better > than the original, and there is not much difference between any of the > patched versions. Having a 20%~30% improvement with a patch is very > nice. > > After that comes the simplicity and the future maintainability of what > is proposed. I am not much into accepting a patch which has a 1%~2% > impact for some hardwares and makes pglz much more complex and harder > to understand. But I am really eager to see a patch with at least a > 10% improvement which remains simple, even more if it simplifies the > logic used in pglz. Here are patches for both winning versions. I'll place them on CF. My gut feeling is pglz_decompress_hacked8 should be better, but on most architectures benchmarks show opposite. Best regards, Andrey Borodin.
Attachment
On 16. 05. 19 19:13, Andrey Borodin wrote: > >> 15 мая 2019 г., в 15:06, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): >> >> Owners of AMD and ARM devices are welcome. I've tested according to instructions at the test repo https://github.com/x4m/test_pglz Test_pglz is at a97f63b and postgres at 6ba500. Hardware is desktop AMD Ryzen 5 2600, 32GB RAM Decompressor score (summ of all times): NOTICE: Decompressor pglz_decompress_hacked result 6.988909 NOTICE: Decompressor pglz_decompress_hacked8 result 7.562619 NOTICE: Decompressor pglz_decompress_hacked16 result 8.316957 NOTICE: Decompressor pglz_decompress_vanilla result 10.725826 Attached is the full test run, if needed. Kind regards, Gasper > Yandex hardware RND guys gave me ARM server and Power9 server. They are looking for AMD and some new Intel boxes. > > Meanwhile I made some enhancements to test suit: > 1. I've added Shakespeare payload: concatenation of works of this prominent poet. > 2. For each payload compute "sliced time" - time to decompress payload if it was sliced by 2Kb pieces or 8Kb pieces. > 3. For each decompressor we compute "score": (sum of time to decompress each payload, each payload sliced by 2Kb and 8Kb)* 5 times > > I've attached full test logs, meanwhile here's results for different platforms. > > Intel Server > NOTICE: 00000: Decompressor pglz_decompress_hacked result 10.346763 > NOTICE: 00000: Decompressor pglz_decompress_hacked8 result 11.192078 > NOTICE: 00000: Decompressor pglz_decompress_hacked16 result 11.957727 > NOTICE: 00000: Decompressor pglz_decompress_vanilla result 14.262256 > > ARM Server > NOTICE: Decompressor pglz_decompress_hacked result 12.966668 > NOTICE: Decompressor pglz_decompress_hacked8 result 13.004935 > NOTICE: Decompressor pglz_decompress_hacked16 result 13.043015 > NOTICE: Decompressor pglz_decompress_vanilla result 18.239242 > > Power9 Server > NOTICE: Decompressor pglz_decompress_hacked result 10.992974 > NOTICE: Decompressor pglz_decompress_hacked8 result 11.747443 > NOTICE: Decompressor pglz_decompress_hacked16 result 11.026342 > NOTICE: Decompressor pglz_decompress_vanilla result 16.375315 > > Intel laptop > NOTICE: Decompressor pglz_decompress_hacked result 9.445808 > NOTICE: Decompressor pglz_decompress_hacked8 result 9.105360 > NOTICE: Decompressor pglz_decompress_hacked16 result 9.621833 > NOTICE: Decompressor pglz_decompress_vanilla result 10.661968 > > From these results pglz_decompress_hacked looks best. > > Best regards, Andrey Borodin. >
Attachment
> 17 мая 2019 г., в 18:40, Gasper Zejn <zejn@owca.info> написал(а): > > I've tested according to instructions at the test repo > https://github.com/x4m/test_pglz > > Test_pglz is at a97f63b and postgres at 6ba500. > > Hardware is desktop AMD Ryzen 5 2600, 32GB RAM > > Decompressor score (summ of all times): > > NOTICE: Decompressor pglz_decompress_hacked result 6.988909 > NOTICE: Decompressor pglz_decompress_hacked8 result 7.562619 > NOTICE: Decompressor pglz_decompress_hacked16 result 8.316957 > NOTICE: Decompressor pglz_decompress_vanilla result 10.725826 Thanks, Gasper! Basically we observe same 0.65 time reduction here. That's very good that we have independent scores. I'm still somewhat not sure that score is fair, on payload 000000010000000000000008 we have vanilla decompression sometimesslower than hacked by few percents. And this is especially visible on AMD. Degradation for 000000010000000000000008sliced by 8Kb reaches 10% I think this is because 000000010000000000000008 have highest entropy.It is almost random and matches are very short, butpresent. 000000010000000000000008 Entropy = 4.360546 bits per byte. 000000010000000000000006 Entropy = 1.450059 bits per byte. 000000010000000000000001 Entropy = 2.944235 bits per byte. shakespeare.txt Entropy = 3.603659 bits per byte 16398 Entropy = 1.897640 bits per byte. Best regards, Andrey Borodin.
Hi hackers!
NOTICE: Payload 000000010000000000000001
NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 23.747444 | 0.578344 | 0.159809
NOTICE: pglz_decompress_hacked8 | 23.764193 | 0.677800 | 0.159809
NOTICE: pglz_decompress_hacked16 | 23.740351 | 0.704730 | 0.159809
NOTICE: pglz_decompress_vanilla | 23.797917 | 1.227868 | 0.159809
NOTICE: pglz_decompress_hacked_seg | 12.261808 | 0.625634 | 0.184952
NOTICE: Payload 000000010000000000000001 sliced by 2Kb
NOTICE: pglz_decompress_hacked | 12.616956 | 0.621223 | 0.156953
NOTICE: pglz_decompress_hacked8 | 12.583685 | 0.756741 | 0.156953
NOTICE: pglz_decompress_hacked16 | 12.512636 | 0.774980 | 0.156953
NOTICE: pglz_decompress_vanilla | 12.493062 | 1.262820 | 0.156953
NOTICE: pglz_decompress_hacked_seg | 11.986554 | 0.622654 | 0.159590
NOTICE: Payload 000000010000000000000001 sliced by 4Kb
NOTICE: pglz_decompress_hacked | 15.514469 | 0.565565 | 0.154213
NOTICE: pglz_decompress_hacked8 | 15.529144 | 0.699675 | 0.154213
NOTICE: pglz_decompress_hacked16 | 15.514040 | 0.721145 | 0.154213
NOTICE: pglz_decompress_vanilla | 15.558958 | 1.237237 | 0.154213
NOTICE: pglz_decompress_hacked_seg | 14.650309 | 0.563228 | 0.153652
NOTICE: Payload 000000010000000000000006
NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 8.610177 | 0.153577 | 0.052294
NOTICE: pglz_decompress_hacked8 | 8.566785 | 0.168002 | 0.052294
NOTICE: pglz_decompress_hacked16 | 8.643126 | 0.167537 | 0.052294
NOTICE: pglz_decompress_vanilla | 8.574498 | 0.930738 | 0.052294
NOTICE: pglz_decompress_hacked_seg | 7.394731 | 0.171924 | 0.056081
NOTICE: Payload 000000010000000000000006 sliced by 2Kb
NOTICE: pglz_decompress_hacked | 6.724060 | 0.295043 | 0.065541
NOTICE: pglz_decompress_hacked8 | 6.623018 | 0.318527 | 0.065541
NOTICE: pglz_decompress_hacked16 | 6.898034 | 0.318360 | 0.065541
NOTICE: pglz_decompress_vanilla | 6.712711 | 1.045430 | 0.065541
NOTICE: pglz_decompress_hacked_seg | 6.630743 | 0.302589 | 0.068471
NOTICE: Payload 000000010000000000000006 sliced by 4Kb
NOTICE: pglz_decompress_hacked | 6.624067 | 0.220942 | 0.058865
NOTICE: pglz_decompress_hacked8 | 6.659424 | 0.240183 | 0.058865
NOTICE: pglz_decompress_hacked16 | 6.763864 | 0.240564 | 0.058865
NOTICE: pglz_decompress_vanilla | 6.743574 | 0.985348 | 0.058865
NOTICE: pglz_decompress_hacked_seg | 6.613123 | 0.227582 | 0.060330
NOTICE: Payload 000000010000000000000008
NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 52.425957 | 1.050544 | 0.498941
NOTICE: pglz_decompress_hacked8 | 52.204561 | 1.261592 | 0.498941
NOTICE: pglz_decompress_hacked16 | 52.328491 | 1.466751 | 0.498941
NOTICE: pglz_decompress_vanilla | 52.465308 | 1.341271 | 0.498941
NOTICE: pglz_decompress_hacked_seg | 31.896341 | 1.113260 | 0.600998
NOTICE: Payload 000000010000000000000008 sliced by 2Kb
NOTICE: pglz_decompress_hacked | 30.620611 | 0.768542 | 0.351941
NOTICE: pglz_decompress_hacked8 | 30.557334 | 0.907421 | 0.351941
NOTICE: pglz_decompress_hacked16 | 32.064903 | 1.208913 | 0.351941
NOTICE: pglz_decompress_vanilla | 30.489886 | 1.014197 | 0.351941
NOTICE: pglz_decompress_hacked_seg | 27.145243 | 0.774193 | 0.352868
NOTICE: Payload 000000010000000000000008 sliced by 4Kb
NOTICE: pglz_decompress_hacked | 36.567903 | 1.054633 | 0.514047
NOTICE: pglz_decompress_hacked8 | 36.459124 | 1.267731 | 0.514047
NOTICE: pglz_decompress_hacked16 | 36.791718 | 1.479650 | 0.514047
NOTICE: pglz_decompress_vanilla | 36.241913 | 1.303136 | 0.514047
NOTICE: pglz_decompress_hacked_seg | 31.526327 | 1.059926 | 0.526875
NOTICE: Payload 16398
NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 9.508625 | 0.435190 | 0.071816
NOTICE: pglz_decompress_hacked8 | 9.546987 | 0.473871 | 0.071816
NOTICE: pglz_decompress_hacked16 | 9.534496 | 0.471662 | 0.071816
NOTICE: pglz_decompress_vanilla | 9.559053 | 1.352561 | 0.071816
NOTICE: pglz_decompress_hacked_seg | 8.479486 | 0.441536 | 0.073232
NOTICE: Payload 16398 sliced by 2Kb
NOTICE: pglz_decompress_hacked | 6.808167 | 0.326570 | 0.082775
NOTICE: pglz_decompress_hacked8 | 6.790743 | 0.361720 | 0.082775
NOTICE: pglz_decompress_hacked16 | 6.886097 | 0.364549 | 0.082775
NOTICE: pglz_decompress_vanilla | 6.918429 | 1.191265 | 0.082775
NOTICE: pglz_decompress_hacked_seg | 6.752811 | 0.340805 | 0.085705
NOTICE: Payload 16398 sliced by 4Kb
NOTICE: pglz_decompress_hacked | 7.244472 | 0.261872 | 0.076860
NOTICE: pglz_decompress_hacked8 | 7.290275 | 0.295988 | 0.076860
NOTICE: pglz_decompress_hacked16 | 7.340706 | 0.294683 | 0.076860
NOTICE: pglz_decompress_vanilla | 7.429289 | 1.151645 | 0.076860
NOTICE: pglz_decompress_hacked_seg | 7.054166 | 0.267896 | 0.078325
NOTICE: Payload shakespeare.txt
NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 25.998753 | 1.345542 | 0.281363
NOTICE: pglz_decompress_hacked8 | 26.121630 | 1.917667 | 0.281363
NOTICE: pglz_decompress_hacked16 | 26.139312 | 2.101329 | 0.281363
NOTICE: pglz_decompress_vanilla | 26.155571 | 2.082123 | 0.281363
NOTICE: pglz_decompress_hacked_seg | 16.792089 | 1.951269 | 0.436558
NOTICE: Payload shakespeare.txt sliced by 2Kb
NOTICE: pglz_decompress_hacked | 14.992793 | 1.923663 | 0.436270
NOTICE: pglz_decompress_hacked8 | 14.982428 | 2.695319 | 0.436270
NOTICE: pglz_decompress_hacked16 | 15.211803 | 2.846615 | 0.436270
NOTICE: pglz_decompress_vanilla | 15.113214 | 2.580098 | 0.436270
NOTICE: pglz_decompress_hacked_seg | 15.120852 | 1.922596 | 0.439199
NOTICE: Payload shakespeare.txt sliced by 4Kb
NOTICE: pglz_decompress_hacked | 18.083400 | 1.687598 | 0.366936
NOTICE: pglz_decompress_hacked8 | 18.185038 | 2.395928 | 0.366936
NOTICE: pglz_decompress_hacked16 | 18.096120 | 2.554812 | 0.366936
NOTICE: pglz_decompress_vanilla | 18.435380 | 2.329129 | 0.366936
NOTICE: pglz_decompress_hacked_seg | 18.103267 | 1.705517 | 0.368400
NOTICE:
Decompressor score (summ of all times):
NOTICE: Decompressor pglz_decompress_hacked result 11.288848
NOTICE: Decompressor pglz_decompress_hacked8 result 14.438165
NOTICE: Decompressor pglz_decompress_hacked16 result 15.716280
NOTICE: Decompressor pglz_decompress_vanilla result 21.034867
NOTICE: Decompressor pglz_decompress_hacked_seg result 12.090609
NOTICE:
compressor score (summ of all times):
NOTICE: compressor pglz_compress_vanilla result 276.776671
NOTICE: compressor pglz_compress_hacked_seg result 222.407850
I am a student participating in GSoC 2019. I am looking forward to working with you all and learning from you.
My project would aim to provide the ability to de-TOAST a fully TOAST'd and compressed field using an iterator.For more details, please take a look at my proposal[0]. Any suggestions or comments about my immature ideas would be much appreciated:)
I've implemented the first step of the project, the segment pglz compression provides the ability to get the subset of the raw data without decompressing the entire field.
And I've done some test[1] for the compressor. The test result is as follows:
NOTICE: Test summary:NOTICE: Payload 000000010000000000000001
NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 23.747444 | 0.578344 | 0.159809
NOTICE: pglz_decompress_hacked8 | 23.764193 | 0.677800 | 0.159809
NOTICE: pglz_decompress_hacked16 | 23.740351 | 0.704730 | 0.159809
NOTICE: pglz_decompress_vanilla | 23.797917 | 1.227868 | 0.159809
NOTICE: pglz_decompress_hacked_seg | 12.261808 | 0.625634 | 0.184952
Comment: Compression speed increased by nearly 100% with compression rate dropped by 15%
NOTICE: Payload 000000010000000000000001 sliced by 2Kb
NOTICE: pglz_decompress_hacked | 12.616956 | 0.621223 | 0.156953
NOTICE: pglz_decompress_hacked8 | 12.583685 | 0.756741 | 0.156953
NOTICE: pglz_decompress_hacked16 | 12.512636 | 0.774980 | 0.156953
NOTICE: pglz_decompress_vanilla | 12.493062 | 1.262820 | 0.156953
NOTICE: pglz_decompress_hacked_seg | 11.986554 | 0.622654 | 0.159590
NOTICE: Payload 000000010000000000000001 sliced by 4Kb
NOTICE: pglz_decompress_hacked | 15.514469 | 0.565565 | 0.154213
NOTICE: pglz_decompress_hacked8 | 15.529144 | 0.699675 | 0.154213
NOTICE: pglz_decompress_hacked16 | 15.514040 | 0.721145 | 0.154213
NOTICE: pglz_decompress_vanilla | 15.558958 | 1.237237 | 0.154213
NOTICE: pglz_decompress_hacked_seg | 14.650309 | 0.563228 | 0.153652
NOTICE: Payload 000000010000000000000006
NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 8.610177 | 0.153577 | 0.052294
NOTICE: pglz_decompress_hacked8 | 8.566785 | 0.168002 | 0.052294
NOTICE: pglz_decompress_hacked16 | 8.643126 | 0.167537 | 0.052294
NOTICE: pglz_decompress_vanilla | 8.574498 | 0.930738 | 0.052294
NOTICE: pglz_decompress_hacked_seg | 7.394731 | 0.171924 | 0.056081
NOTICE: Payload 000000010000000000000006 sliced by 2Kb
NOTICE: pglz_decompress_hacked | 6.724060 | 0.295043 | 0.065541
NOTICE: pglz_decompress_hacked8 | 6.623018 | 0.318527 | 0.065541
NOTICE: pglz_decompress_hacked16 | 6.898034 | 0.318360 | 0.065541
NOTICE: pglz_decompress_vanilla | 6.712711 | 1.045430 | 0.065541
NOTICE: pglz_decompress_hacked_seg | 6.630743 | 0.302589 | 0.068471
NOTICE: Payload 000000010000000000000006 sliced by 4Kb
NOTICE: pglz_decompress_hacked | 6.624067 | 0.220942 | 0.058865
NOTICE: pglz_decompress_hacked8 | 6.659424 | 0.240183 | 0.058865
NOTICE: pglz_decompress_hacked16 | 6.763864 | 0.240564 | 0.058865
NOTICE: pglz_decompress_vanilla | 6.743574 | 0.985348 | 0.058865
NOTICE: pglz_decompress_hacked_seg | 6.613123 | 0.227582 | 0.060330
NOTICE: Payload 000000010000000000000008
NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 52.425957 | 1.050544 | 0.498941
NOTICE: pglz_decompress_hacked8 | 52.204561 | 1.261592 | 0.498941
NOTICE: pglz_decompress_hacked16 | 52.328491 | 1.466751 | 0.498941
NOTICE: pglz_decompress_vanilla | 52.465308 | 1.341271 | 0.498941
NOTICE: pglz_decompress_hacked_seg | 31.896341 | 1.113260 | 0.600998
NOTICE: Payload 000000010000000000000008 sliced by 2Kb
NOTICE: pglz_decompress_hacked | 30.620611 | 0.768542 | 0.351941
NOTICE: pglz_decompress_hacked8 | 30.557334 | 0.907421 | 0.351941
NOTICE: pglz_decompress_hacked16 | 32.064903 | 1.208913 | 0.351941
NOTICE: pglz_decompress_vanilla | 30.489886 | 1.014197 | 0.351941
NOTICE: pglz_decompress_hacked_seg | 27.145243 | 0.774193 | 0.352868
NOTICE: Payload 000000010000000000000008 sliced by 4Kb
NOTICE: pglz_decompress_hacked | 36.567903 | 1.054633 | 0.514047
NOTICE: pglz_decompress_hacked8 | 36.459124 | 1.267731 | 0.514047
NOTICE: pglz_decompress_hacked16 | 36.791718 | 1.479650 | 0.514047
NOTICE: pglz_decompress_vanilla | 36.241913 | 1.303136 | 0.514047
NOTICE: pglz_decompress_hacked_seg | 31.526327 | 1.059926 | 0.526875
NOTICE: Payload 16398
NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 9.508625 | 0.435190 | 0.071816
NOTICE: pglz_decompress_hacked8 | 9.546987 | 0.473871 | 0.071816
NOTICE: pglz_decompress_hacked16 | 9.534496 | 0.471662 | 0.071816
NOTICE: pglz_decompress_vanilla | 9.559053 | 1.352561 | 0.071816
NOTICE: pglz_decompress_hacked_seg | 8.479486 | 0.441536 | 0.073232
NOTICE: Payload 16398 sliced by 2Kb
NOTICE: pglz_decompress_hacked | 6.808167 | 0.326570 | 0.082775
NOTICE: pglz_decompress_hacked8 | 6.790743 | 0.361720 | 0.082775
NOTICE: pglz_decompress_hacked16 | 6.886097 | 0.364549 | 0.082775
NOTICE: pglz_decompress_vanilla | 6.918429 | 1.191265 | 0.082775
NOTICE: pglz_decompress_hacked_seg | 6.752811 | 0.340805 | 0.085705
NOTICE: Payload 16398 sliced by 4Kb
NOTICE: pglz_decompress_hacked | 7.244472 | 0.261872 | 0.076860
NOTICE: pglz_decompress_hacked8 | 7.290275 | 0.295988 | 0.076860
NOTICE: pglz_decompress_hacked16 | 7.340706 | 0.294683 | 0.076860
NOTICE: pglz_decompress_vanilla | 7.429289 | 1.151645 | 0.076860
NOTICE: pglz_decompress_hacked_seg | 7.054166 | 0.267896 | 0.078325
NOTICE: Payload shakespeare.txt
NOTICE: Decompressor name | Compression time (ns/bit) | Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 25.998753 | 1.345542 | 0.281363
NOTICE: pglz_decompress_hacked8 | 26.121630 | 1.917667 | 0.281363
NOTICE: pglz_decompress_hacked16 | 26.139312 | 2.101329 | 0.281363
NOTICE: pglz_decompress_vanilla | 26.155571 | 2.082123 | 0.281363
NOTICE: pglz_decompress_hacked_seg | 16.792089 | 1.951269 | 0.436558
Comment: In this case, the compression rate has dropped dramatically.
NOTICE: Payload shakespeare.txt sliced by 2Kb
NOTICE: pglz_decompress_hacked | 14.992793 | 1.923663 | 0.436270
NOTICE: pglz_decompress_hacked8 | 14.982428 | 2.695319 | 0.436270
NOTICE: pglz_decompress_hacked16 | 15.211803 | 2.846615 | 0.436270
NOTICE: pglz_decompress_vanilla | 15.113214 | 2.580098 | 0.436270
NOTICE: pglz_decompress_hacked_seg | 15.120852 | 1.922596 | 0.439199
NOTICE: Payload shakespeare.txt sliced by 4Kb
NOTICE: pglz_decompress_hacked | 18.083400 | 1.687598 | 0.366936
NOTICE: pglz_decompress_hacked8 | 18.185038 | 2.395928 | 0.366936
NOTICE: pglz_decompress_hacked16 | 18.096120 | 2.554812 | 0.366936
NOTICE: pglz_decompress_vanilla | 18.435380 | 2.329129 | 0.366936
NOTICE: pglz_decompress_hacked_seg | 18.103267 | 1.705517 | 0.368400
NOTICE:
Decompressor score (summ of all times):
NOTICE: Decompressor pglz_decompress_hacked result 11.288848
NOTICE: Decompressor pglz_decompress_hacked8 result 14.438165
NOTICE: Decompressor pglz_decompress_hacked16 result 15.716280
NOTICE: Decompressor pglz_decompress_vanilla result 21.034867
NOTICE: Decompressor pglz_decompress_hacked_seg result 12.090609
NOTICE:
compressor score (summ of all times):
NOTICE: compressor pglz_compress_vanilla result 276.776671
NOTICE: compressor pglz_compress_hacked_seg result 222.407850
There are some questions now:
1. The compression algorithm is not compatible with the original compression algorithm now.2. If the idea works, we need to test more data, what kind of data is more appropriate?
Any comments are much appreciated.Best regards, Binguo Bao.
> 18 мая 2019 г., в 11:44, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > Hi! Here's rebased version of patches. Best regards, Andrey Borodin.
Attachment
> 13 мая 2019 г., в 12:14, Michael Paquier <michael@paquier.xyz> написал(а): > > Decompression can matter a lot for mostly-read workloads and > compression can become a bottleneck for heavy-insert loads, so > improving compression or decompression should be two separate > problems, not two problems linked. Any improvement in one or the > other, or even both, is nice to have. Here's patch hacked by Vladimir for compression. Key differences (as far as I see, maybe Vladimir will post more complete list of optimizations): 1. Use functions instead of macro-functions: not surprisingly it's easier to optimize them and provide less constraints forcompiler to optimize. 2. More compact hash table: use indexes instead of pointers. 3. More robust segment comparison: like memcmp, but return index of first different byte In weighted mix of different data (same as for compression), overall speedup is x1.43 on my machine. Current implementation is integrated into test_pglz suit for benchmarking purposes[0]. Best regards, Andrey Borodin. [0] https://github.com/x4m/test_pglz
Attachment
On 27.06.2019 21:33, Andrey Borodin wrote: > >> 13 мая 2019 г., в 12:14, Michael Paquier <michael@paquier.xyz> написал(а): >> >> Decompression can matter a lot for mostly-read workloads and >> compression can become a bottleneck for heavy-insert loads, so >> improving compression or decompression should be two separate >> problems, not two problems linked. Any improvement in one or the >> other, or even both, is nice to have. > Here's patch hacked by Vladimir for compression. > > Key differences (as far as I see, maybe Vladimir will post more complete list of optimizations): > 1. Use functions instead of macro-functions: not surprisingly it's easier to optimize them and provide less constraintsfor compiler to optimize. > 2. More compact hash table: use indexes instead of pointers. > 3. More robust segment comparison: like memcmp, but return index of first different byte > > In weighted mix of different data (same as for compression), overall speedup is x1.43 on my machine. > > Current implementation is integrated into test_pglz suit for benchmarking purposes[0]. > > Best regards, Andrey Borodin. > > [0] https://github.com/x4m/test_pglz It takes me some time to understand that your memcpy optimization is correct;) I have tested different ways of optimizing this fragment of code, but failed tooutperform your implementation! Results at my computer is simlar with yours: Decompressor score (summ of all times): NOTICE: Decompressor pglz_decompress_hacked result 6.627355 NOTICE: Decompressor pglz_decompress_hacked_unrolled result 7.497114 NOTICE: Decompressor pglz_decompress_hacked8 result 7.412944 NOTICE: Decompressor pglz_decompress_hacked16 result 7.792978 NOTICE: Decompressor pglz_decompress_vanilla result 10.652603 Compressor score (summ of all times): NOTICE: Compressor pglz_compress_vanilla result 116.970005 NOTICE: Compressor pglz_compress_hacked result 89.706105 But ... below are results for lz4: Decompressor score (summ of all times): NOTICE: Decompressor lz4_decompress result 3.660066 Compressor score (summ of all times): NOTICE: Compressor lz4_compress result 10.288594 There is 2 times advantage in decompress speed and 10 times advantage in compress speed. So may be instead of "hacking" pglz algorithm we should better switch to lz4? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Fri, Aug 02, 2019 at 04:45:43PM +0300, Konstantin Knizhnik wrote: > > >On 27.06.2019 21:33, Andrey Borodin wrote: >> >>>13 мая 2019 г., в 12:14, Michael Paquier <michael@paquier.xyz> написал(а): >>> >>>Decompression can matter a lot for mostly-read workloads and >>>compression can become a bottleneck for heavy-insert loads, so >>>improving compression or decompression should be two separate >>>problems, not two problems linked. Any improvement in one or the >>>other, or even both, is nice to have. >>Here's patch hacked by Vladimir for compression. >> >>Key differences (as far as I see, maybe Vladimir will post more complete list of optimizations): >>1. Use functions instead of macro-functions: not surprisingly it's easier to optimize them and provide less constraintsfor compiler to optimize. >>2. More compact hash table: use indexes instead of pointers. >>3. More robust segment comparison: like memcmp, but return index of first different byte >> >>In weighted mix of different data (same as for compression), overall speedup is x1.43 on my machine. >> >>Current implementation is integrated into test_pglz suit for benchmarking purposes[0]. >> >>Best regards, Andrey Borodin. >> >>[0] https://github.com/x4m/test_pglz > >It takes me some time to understand that your memcpy optimization is >correct;) >I have tested different ways of optimizing this fragment of code, but >failed tooutperform your implementation! >Results at my computer is simlar with yours: > >Decompressor score (summ of all times): >NOTICE: Decompressor pglz_decompress_hacked result 6.627355 >NOTICE: Decompressor pglz_decompress_hacked_unrolled result 7.497114 >NOTICE: Decompressor pglz_decompress_hacked8 result 7.412944 >NOTICE: Decompressor pglz_decompress_hacked16 result 7.792978 >NOTICE: Decompressor pglz_decompress_vanilla result 10.652603 > >Compressor score (summ of all times): >NOTICE: Compressor pglz_compress_vanilla result 116.970005 >NOTICE: Compressor pglz_compress_hacked result 89.706105 > > >But ... below are results for lz4: > >Decompressor score (summ of all times): >NOTICE: Decompressor lz4_decompress result 3.660066 >Compressor score (summ of all times): >NOTICE: Compressor lz4_compress result 10.288594 > >There is 2 times advantage in decompress speed and 10 times advantage >in compress speed. >So may be instead of "hacking" pglz algorithm we should better switch >to lz4? > I think we should just bite the bullet and add initdb option to pick compression algorithm. That's been discussed repeatedly, but we never ended up actually doing that. See for example [1]. If there's anyone willing to put some effort into getting this feature over the line, I'm willing to do reviews & commit. It's a seemingly small change with rather insane potential impact. But even if we end up doing that, it still makes sense to optimize the hell out of pglz, because existing systems will still use that (pg_upgrade can't switch from one compression algorithm to another). regards [1] https://www.postgresql.org/message-id/flat/55341569.1090107%402ndquadrant.com -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks for looking into this! > 2 авг. 2019 г., в 19:43, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): > > On Fri, Aug 02, 2019 at 04:45:43PM +0300, Konstantin Knizhnik wrote: >> >> It takes me some time to understand that your memcpy optimization is correct;) Seems that comments are not explanatory enough... will try to fix. >> I have tested different ways of optimizing this fragment of code, but failed tooutperform your implementation! JFYI we tried optimizations with memcpy with const size (optimized into assembly instead of call), unrolling literal loopand some others. All these did not work better. >> But ... below are results for lz4: >> >> Decompressor score (summ of all times): >> NOTICE: Decompressor lz4_decompress result 3.660066 >> Compressor score (summ of all times): >> NOTICE: Compressor lz4_compress result 10.288594 >> >> There is 2 times advantage in decompress speed and 10 times advantage in compress speed. >> So may be instead of "hacking" pglz algorithm we should better switch to lz4? >> > > I think we should just bite the bullet and add initdb option to pick > compression algorithm. That's been discussed repeatedly, but we never > ended up actually doing that. See for example [1]. > > If there's anyone willing to put some effort into getting this feature > over the line, I'm willing to do reviews & commit. It's a seemingly > small change with rather insane potential impact. > > But even if we end up doing that, it still makes sense to optimize the > hell out of pglz, because existing systems will still use that > (pg_upgrade can't switch from one compression algorithm to another). We have some kind of "roadmap" of "extensible pglz". We plan to provide implementation on Novembers CF. Currently, pglz starts with empty cache map: there is no prior 4k bytes before start. We can add imaginary prefix to anydata with common substrings: this will enhance compression ratio. It is hard to decide on training data set for this "common prefix". So we want to produce extension with aggregate functionwhich produces some "adapted common prefix" from users's data. Then we can "reserve" few negative bytes for "decompression commands". This command can instruct database on which commonprefix to use. But also system command can say "invoke decompression from extension". Thus, user will be able to train database compression on his data and substitute pglz compression with custom compressionmethod seamlessly. This will make hard-choosen compression unneeded, but seems overly hacky. But there will be no need to have lz4, zstd, brotli,lzma and others in core. Why not provide e.g. "time series compression"? Or "DNA compression"? Whatever gun user wantsfor his foot. Best regards, Andrey Borodin.
Hi, On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote: > We have some kind of "roadmap" of "extensible pglz". We plan to provide implementation on Novembers CF. I don't understand why it's a good idea to improve the compression side of pglz. There's plenty other people that spent a lot of time developing better compression algorithms. > Currently, pglz starts with empty cache map: there is no prior 4k bytes before start. We can add imaginary prefix to anydata with common substrings: this will enhance compression ratio. > It is hard to decide on training data set for this "common prefix". So we want to produce extension with aggregate functionwhich produces some "adapted common prefix" from users's data. > Then we can "reserve" few negative bytes for "decompression commands". This command can instruct database on which commonprefix to use. > But also system command can say "invoke decompression from extension". > > Thus, user will be able to train database compression on his data and substitute pglz compression with custom compressionmethod seamlessly. > > This will make hard-choosen compression unneeded, but seems overly hacky. But there will be no need to have lz4, zstd,brotli, lzma and others in core. Why not provide e.g. "time series compression"? Or "DNA compression"? Whatever gunuser wants for his foot. I think this is way too complicated, and will provide not particularly much benefit for the majority users. In fact, I'll argue that we should flat out reject any such patch until we have at least one decent default compression algorithm in core. You're trying to work around a poor compression algorithm with complicated dictionary improvement, that require user interaction, and only will work in a relatively small subset of the cases, and will very often increase compression times. Greetings, Andres Freund
On Fri, Aug 02, 2019 at 09:39:48AM -0700, Andres Freund wrote: >Hi, > >On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote: >> We have some kind of "roadmap" of "extensible pglz". We plan to >> provide implementation on Novembers CF. > >I don't understand why it's a good idea to improve the compression side >of pglz. There's plenty other people that spent a lot of time >developing better compression algorithms. > Isn't it beneficial for existing systems, that will be stuck with pglz even if we end up adding other algorithms? > >> Currently, pglz starts with empty cache map: there is no prior 4k >> bytes before start. We can add imaginary prefix to any data with >> common substrings: this will enhance compression ratio. It is hard >> to decide on training data set for this "common prefix". So we want >> to produce extension with aggregate function which produces some >> "adapted common prefix" from users's data. Then we can "reserve" few >> negative bytes for "decompression commands". This command can >> instruct database on which common prefix to use. But also system >> command can say "invoke decompression from extension". >> >> Thus, user will be able to train database compression on his data and >> substitute pglz compression with custom compression method >> seamlessly. >> >> This will make hard-choosen compression unneeded, but seems overly >> hacky. But there will be no need to have lz4, zstd, brotli, lzma and >> others in core. Why not provide e.g. "time series compression"? Or >> "DNA compression"? Whatever gun user wants for his foot. > >I think this is way too complicated, and will provide not particularly >much benefit for the majority users. > I agree with this. I do see value in the feature, but probably not as a drop-in replacement for the default compression algorithm. I'd compare it to the "custom compression methods" patch that was submitted some time ago. >In fact, I'll argue that we should flat out reject any such patch until >we have at least one decent default compression algorithm in core. >You're trying to work around a poor compression algorithm with >complicated dictionary improvement, that require user interaction, and >only will work in a relatively small subset of the cases, and will very >often increase compression times. > I wouldn't be so strict I guess. But I do agree an algorithm that requires additional steps (training, ...) is unlikely to be a good candidate for default instance compression alorithm. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-08-02 19:00:39 +0200, Tomas Vondra wrote: > On Fri, Aug 02, 2019 at 09:39:48AM -0700, Andres Freund wrote: > > Hi, > > > > On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote: > > > We have some kind of "roadmap" of "extensible pglz". We plan to > > > provide implementation on Novembers CF. > > > > I don't understand why it's a good idea to improve the compression side > > of pglz. There's plenty other people that spent a lot of time > > developing better compression algorithms. > > > > Isn't it beneficial for existing systems, that will be stuck with pglz > even if we end up adding other algorithms? Why would they be stuck continuing to *compress* with pglz? As we fully retoast on write anyway we can just gradually switch over to the better algorithm. Decompression speed is another story, of course. Here's what I had a few years back: https://www.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de see also https://www.postgresql.org/message-id/20130605150144.GD28067%40alap2.anarazel.de I think we should refresh something like that patch, and: - make the compression algorithm GUC an enum, rename - add --with-system-lz4 - obviously refresh the copy of lz4 - drop snappy Greetings, Andres Freund
On Fri, Aug 02, 2019 at 10:12:58AM -0700, Andres Freund wrote: >Hi, > >On 2019-08-02 19:00:39 +0200, Tomas Vondra wrote: >> On Fri, Aug 02, 2019 at 09:39:48AM -0700, Andres Freund wrote: >> > Hi, >> > >> > On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote: >> > > We have some kind of "roadmap" of "extensible pglz". We plan to >> > > provide implementation on Novembers CF. >> > >> > I don't understand why it's a good idea to improve the compression side >> > of pglz. There's plenty other people that spent a lot of time >> > developing better compression algorithms. >> > >> >> Isn't it beneficial for existing systems, that will be stuck with pglz >> even if we end up adding other algorithms? > >Why would they be stuck continuing to *compress* with pglz? As we >fully retoast on write anyway we can just gradually switch over to the >better algorithm. Decompression speed is another story, of course. > Hmmm, I don't remember the details of those patches so I didn't realize it allows incremental recompression. If that's possible, that would mean existing systems can start using it. Which is good. Another question is whether we'd actually want to include the code in core directly, or use system libraries (and if some packagers might decide to disable that, for whatever reason). But yeah, I agree you may have a point about optimizing pglz compression. > >Here's what I had a few years back: > >https://www.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de >see also >https://www.postgresql.org/message-id/20130605150144.GD28067%40alap2.anarazel.de > >I think we should refresh something like that patch, and: >- make the compression algorithm GUC an enum, rename >- add --with-system-lz4 >- obviously refresh the copy of lz4 >- drop snappy > That's a reasonable plan, I guess. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-08-02 19:52:39 +0200, Tomas Vondra wrote: > Hmmm, I don't remember the details of those patches so I didn't realize > it allows incremental recompression. If that's possible, that would mean > existing systems can start using it. Which is good. That depends on what do you mean by "incremental"? A single toasted datum can only have one compression type, because we only update them all in one anyway. But different datums can be compressed differently. > Another question is whether we'd actually want to include the code in > core directly, or use system libraries (and if some packagers might > decide to disable that, for whatever reason). I'd personally say we should have an included version, and a --with-system-... flag that uses the system one. Greetings, Andres Freund
On Fri, Aug 02, 2019 at 11:20:03AM -0700, Andres Freund wrote: >Hi, > >On 2019-08-02 19:52:39 +0200, Tomas Vondra wrote: >> Hmmm, I don't remember the details of those patches so I didn't realize >> it allows incremental recompression. If that's possible, that would mean >> existing systems can start using it. Which is good. > >That depends on what do you mean by "incremental"? A single toasted >datum can only have one compression type, because we only update them >all in one anyway. But different datums can be compressed differently. > I meant different toast values using different compression algorithm, sorry for the confusion. > >> Another question is whether we'd actually want to include the code in >> core directly, or use system libraries (and if some packagers might >> decide to disable that, for whatever reason). > >I'd personally say we should have an included version, and a >--with-system-... flag that uses the system one. > OK. I'd say to require a system library, but that's a minor detail. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 02.08.2019 21:20, Andres Freund wrote: > Another question is whether we'd actually want to include the code in >> core directly, or use system libraries (and if some packagers might >> decide to disable that, for whatever reason). > I'd personally say we should have an included version, and a > --with-system-... flag that uses the system one. +1
Hi, On 02/08/2019 21:48, Tomas Vondra wrote: > On Fri, Aug 02, 2019 at 11:20:03AM -0700, Andres Freund wrote: > >> >>> Another question is whether we'd actually want to include the code in >>> core directly, or use system libraries (and if some packagers might >>> decide to disable that, for whatever reason). >> >> I'd personally say we should have an included version, and a >> --with-system-... flag that uses the system one. >> > > OK. I'd say to require a system library, but that's a minor detail. > Same here. Just so that we don't idly talk, what do you think about the attached? It: - adds new GUC compression_algorithm with possible values of pglz (default) and lz4 (if lz4 is compiled in), requires SIGHUP - adds --with-lz4 configure option (default yes, so the configure option is actually --without-lz4) that enables the lz4, it's using system library - uses the compression_algorithm for both TOAST and WAL compression (if on) - supports slicing for lz4 as well (pglz was already supported) - supports reading old TOAST values - adds 1 byte header to the compressed data where we currently store the algorithm kind, that leaves us with 254 more to add :) (that's an extra overhead compared to the current state) - changes the rawsize in TOAST header to 31 bits via bit packing - uses the extra bit to differentiate between old and new format - supports reading from table which has different rows stored with different algorithm (so that the GUC itself can be freely changed) Simple docs and a TAP test included. I did some basic performance testing (it's not really my thing though, so I would appreciate if somebody did more). I get about 7x perf improvement on data load with lz4 compared to pglz on my dataset but strangely only tiny decompression improvement. Perhaps more importantly I also did before patch and after patch tests with pglz and the performance difference with my data set was <1%. Note that this will just link against lz4, it does not add lz4 into PostgreSQL code-base. The issues I know of: - the pg_decompress function really ought to throw error in the default branch but that file is also used in front-end so not sure how to do that - the TAP test probably does not work with all possible configurations (but that's why it needs to be set in PG_TEST_EXTRA like for example ssl) - we don't really have any automated test for reading old TOAST format, no idea how to do that - I expect my changes to configure.in are not the greatest as I don't have pretty much zero experience with autoconf -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
Attachment
> 2 авг. 2019 г., в 21:39, Andres Freund <andres@anarazel.de> написал(а): > > On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote: >> We have some kind of "roadmap" of "extensible pglz". We plan to provide implementation on Novembers CF. > > I don't understand why it's a good idea to improve the compression side > of pglz. There's plenty other people that spent a lot of time developing > better compression algorithms. Improving compression side of pglz has two different projects: 1. Faster compression with less code and same compression ratio (patch in this thread). 2. Better compression ratio with at least same compression speed of uncompressed values. Why I want to do patch for 2? Because it's interesting. Will 1 or 2 be reviewed or committed? I have no idea. Will many users benefit from 1 or 2? Yes, clearly. Unless we force everyone to stop compressing with pglz. >> Currently, pglz starts with empty cache map: there is no prior 4k bytes before start. We can add imaginary prefix to anydata with common substrings: this will enhance compression ratio. >> It is hard to decide on training data set for this "common prefix". So we want to produce extension with aggregate functionwhich produces some "adapted common prefix" from users's data. >> Then we can "reserve" few negative bytes for "decompression commands". This command can instruct database on which commonprefix to use. >> But also system command can say "invoke decompression from extension". >> >> Thus, user will be able to train database compression on his data and substitute pglz compression with custom compressionmethod seamlessly. >> >> This will make hard-choosen compression unneeded, but seems overly hacky. But there will be no need to have lz4, zstd,brotli, lzma and others in core. Why not provide e.g. "time series compression"? Or "DNA compression"? Whatever gunuser wants for his foot. > > I think this is way too complicated, and will provide not particularly > much benefit for the majority users. > > In fact, I'll argue that we should flat out reject any such patch until > we have at least one decent default compression algorithm in > core. You're trying to work around a poor compression algorithm with > complicated dictionary improvement OK. The idea of something plugged into pglz seemed odd even to me. But looks like it restarted lz4 discussion :) > , that require user interaction, and > only will work in a relatively small subset of the cases, and will very > often increase compression times. No, surely, if implementation of "common prefix" will increase compression times I will not even post a patch. BTW, lz4 also supports "common prefix", let's do that too? Here's link on Zstd dictionary builder, but it is compatible with lz4 https://github.com/facebook/zstd#the-case-for-small-data-compression We actually have small datums. > 4 авг. 2019 г., в 5:41, Petr Jelinek <petr@2ndquadrant.com> написал(а): > > Just so that we don't idly talk, what do you think about the attached? > It: > - adds new GUC compression_algorithm with possible values of pglz (default) and lz4 (if lz4 is compiled in), requires SIGHUP > - adds --with-lz4 configure option (default yes, so the configure option is actually --without-lz4) that enables the lz4,it's using system library > - uses the compression_algorithm for both TOAST and WAL compression (if on) > - supports slicing for lz4 as well (pglz was already supported) > - supports reading old TOAST values > - adds 1 byte header to the compressed data where we currently store the algorithm kind, that leaves us with 254 more toadd :) (that's an extra overhead compared to the current state) > - changes the rawsize in TOAST header to 31 bits via bit packing > - uses the extra bit to differentiate between old and new format > - supports reading from table which has different rows stored with different algorithm (so that the GUC itself can be freelychanged) That's cool. I suggest defaulting to lz4 if it is available. You cannot start cluster on non-lz4 binaries which used lz4once. Do we plan the possibility of compression algorithm as extension? Or will all algorithms be packed into that byte in core? What about lz4 "common prefix"? System or user-defined. If lz4 is compiled in we can even offer in-system training, justmake sure that trained prefixes will make their way to standbys. Best regards, Andrey Borodin.
On Sun, Aug 04, 2019 at 02:41:24AM +0200, Petr Jelinek wrote: >Hi, > >On 02/08/2019 21:48, Tomas Vondra wrote: >>On Fri, Aug 02, 2019 at 11:20:03AM -0700, Andres Freund wrote: >> >>> >>>>Another question is whether we'd actually want to include the code in >>>>core directly, or use system libraries (and if some packagers might >>>>decide to disable that, for whatever reason). >>> >>>I'd personally say we should have an included version, and a >>>--with-system-... flag that uses the system one. >>> >> >>OK. I'd say to require a system library, but that's a minor detail. >> > >Same here. > >Just so that we don't idly talk, what do you think about the attached? >It: >- adds new GUC compression_algorithm with possible values of pglz >(default) and lz4 (if lz4 is compiled in), requires SIGHUP >- adds --with-lz4 configure option (default yes, so the configure >option is actually --without-lz4) that enables the lz4, it's using >system library >- uses the compression_algorithm for both TOAST and WAL compression (if on) >- supports slicing for lz4 as well (pglz was already supported) >- supports reading old TOAST values >- adds 1 byte header to the compressed data where we currently store >the algorithm kind, that leaves us with 254 more to add :) (that's an >extra overhead compared to the current state) >- changes the rawsize in TOAST header to 31 bits via bit packing >- uses the extra bit to differentiate between old and new format >- supports reading from table which has different rows stored with >different algorithm (so that the GUC itself can be freely changed) > Cool. >Simple docs and a TAP test included. > >I did some basic performance testing (it's not really my thing though, >so I would appreciate if somebody did more). >I get about 7x perf improvement on data load with lz4 compared to pglz >on my dataset but strangely only tiny decompression improvement. >Perhaps more importantly I also did before patch and after patch tests >with pglz and the performance difference with my data set was <1%. > >Note that this will just link against lz4, it does not add lz4 into >PostgreSQL code-base. > WFM, although I think Andres wanted to do both (link against system and add lz4 code as a fallback). I think the question is what happens when you run with lz4 for a while, and then switch to binaries without lz4 support. Or when you try to replicate from lz4-enabled instance to an instance without it. Especially for physical replication, but I suppose it may affect logical replication with binary protocol? A very brief review: 1) I wonder what "runstatedir" is about. 2) This seems rather suspicious, and obviously the comment is now entirely bogus: /* Check that off_t can represent 2**63 - 1 correctly. We can't simply define LARGE_OFF_T to be 9223372036854775807, since some C++ compilers masquerading as C compilers incorrectly reject 9223372036854775807. */ -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) +#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31)) 3) I can't really build without lz4: config.status: linking src/makefiles/Makefile.linux to src/Makefile.port pg_lzcompress.c: In function ‘pg_compress_bound’: pg_lzcompress.c:892:22: error: ‘SIZEOF_PG_COMPRESS_HEADER’ undeclared (first use in this function) return slen + 4 + SIZEOF_PG_COMPRESS_HEADER; ^~~~~~~~~~~~~~~~~~~~~~~~~ pg_lzcompress.c:892:22: note: each undeclared identifier is reported only once for each function it appears in 4) I did a simple test with physical replication, with lz4 enabled on both sides (well, can't build without lz4 anyway, per previous point). It immediately failed like this: FATAL: failed to restore block image CONTEXT: WAL redo at 0/5000A40 for Btree/INSERT_LEAF: off 138 LOG: startup process (PID 15937) exited with exit code 1 This is a simple UPDATE on a trivial table: create table t (a int primary key); insert into t select i from generate_series(1,1000) s(i); update t set a = a - 100000 where random () < 0.1; with some checkpoints to force FPW (and wal_compression=on, of course). I haven't tried `make check-world` but I suppose some of the TAP tests should fail because of this. And if not, we need to improve coverage. 5) I wonder why compression_algorithm is defined as PGC_SIGHUP. Why not to allow users to set it per session? I suppose we might have a separate option for WAL compression_algorithm. 6) It seems a bit strange that pg_compress/pg_decompress are now defined in pglz_compress.{c,h}. Maybe we should invent src/common/compress.c? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 04/08/2019 11:57, Andrey Borodin wrote: > > >> 2 авг. 2019 г., в 21:39, Andres Freund <andres@anarazel.de> написал(а): >> >> On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote: >>> We have some kind of "roadmap" of "extensible pglz". We plan to provide implementation on Novembers CF. >> >> I don't understand why it's a good idea to improve the compression side >> of pglz. There's plenty other people that spent a lot of time developing >> better compression algorithms. > Improving compression side of pglz has two different projects: > 1. Faster compression with less code and same compression ratio (patch in this thread). > 2. Better compression ratio with at least same compression speed of uncompressed values. > Why I want to do patch for 2? Because it's interesting. > Will 1 or 2 be reviewed or committed? I have no idea. > Will many users benefit from 1 or 2? Yes, clearly. Unless we force everyone to stop compressing with pglz. > FWIW I agree. >> Just so that we don't idly talk, what do you think about the attached? >> It: >> - adds new GUC compression_algorithm with possible values of pglz (default) and lz4 (if lz4 is compiled in), requiresSIGHUP >> - adds --with-lz4 configure option (default yes, so the configure option is actually --without-lz4) that enables the lz4,it's using system library >> - uses the compression_algorithm for both TOAST and WAL compression (if on) >> - supports slicing for lz4 as well (pglz was already supported) >> - supports reading old TOAST values >> - adds 1 byte header to the compressed data where we currently store the algorithm kind, that leaves us with 254 moreto add :) (that's an extra overhead compared to the current state) >> - changes the rawsize in TOAST header to 31 bits via bit packing >> - uses the extra bit to differentiate between old and new format >> - supports reading from table which has different rows stored with different algorithm (so that the GUC itself can befreely changed) > That's cool. I suggest defaulting to lz4 if it is available. You cannot start cluster on non-lz4 binaries which used lz4once. > Do we plan the possibility of compression algorithm as extension? Or will all algorithms be packed into that byte in core? What I wrote does not expect extensions providing new compression. We'd have to somehow reserve compression ids for specific extensions and that seems like a lot of extra complexity for little benefit. I don't see much benefit in having more than say 3 generic compressors (I could imagine adding zstd). If you are thinking about data type specific compression then I think this is wrong layer. > What about lz4 "common prefix"? System or user-defined. If lz4 is compiled in we can even offer in-system training, justmake sure that trained prefixes will make their way to standbys. > I definitely don't plan to work on common prefix. But don't see why that could not be added later. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
Hi, On 04/08/2019 13:52, Tomas Vondra wrote: > On Sun, Aug 04, 2019 at 02:41:24AM +0200, Petr Jelinek wrote: >> Hi, >> >> On 02/08/2019 21:48, Tomas Vondra wrote: >>> On Fri, Aug 02, 2019 at 11:20:03AM -0700, Andres Freund wrote: >>> >>>> >>>>> Another question is whether we'd actually want to include the code in >>>>> core directly, or use system libraries (and if some packagers might >>>>> decide to disable that, for whatever reason). >>>> >>>> I'd personally say we should have an included version, and a >>>> --with-system-... flag that uses the system one. >>>> >>> >>> OK. I'd say to require a system library, but that's a minor detail. >>> >> >> Same here. >> >> Just so that we don't idly talk, what do you think about the attached? >> It: >> - adds new GUC compression_algorithm with possible values of pglz >> (default) and lz4 (if lz4 is compiled in), requires SIGHUP >> - adds --with-lz4 configure option (default yes, so the configure >> option is actually --without-lz4) that enables the lz4, it's using >> system library >> - uses the compression_algorithm for both TOAST and WAL compression >> (if on) >> - supports slicing for lz4 as well (pglz was already supported) >> - supports reading old TOAST values >> - adds 1 byte header to the compressed data where we currently store >> the algorithm kind, that leaves us with 254 more to add :) (that's an >> extra overhead compared to the current state) >> - changes the rawsize in TOAST header to 31 bits via bit packing >> - uses the extra bit to differentiate between old and new format >> - supports reading from table which has different rows stored with >> different algorithm (so that the GUC itself can be freely changed) >> > > Cool. > >> Simple docs and a TAP test included. >> >> I did some basic performance testing (it's not really my thing though, >> so I would appreciate if somebody did more). >> I get about 7x perf improvement on data load with lz4 compared to pglz >> on my dataset but strangely only tiny decompression improvement. >> Perhaps more importantly I also did before patch and after patch tests >> with pglz and the performance difference with my data set was <1%. >> >> Note that this will just link against lz4, it does not add lz4 into >> PostgreSQL code-base. >> > > WFM, although I think Andres wanted to do both (link against system and > add lz4 code as a fallback). I think the question is what happens when > you run with lz4 for a while, and then switch to binaries without lz4 > support. Or when you try to replicate from lz4-enabled instance to an > instance without it. Especially for physical replication, but I suppose > it may affect logical replication with binary protocol? > I generally prefer having system library, we don't include for example ICU either. > > A very brief review: > > > 1) I wonder what "runstatedir" is about. > No idea, that stuff is generated by autoconf from configure.in. > > 2) This seems rather suspicious, and obviously the comment is now > entirely bogus: > > /* Check that off_t can represent 2**63 - 1 correctly. > We can't simply define LARGE_OFF_T to be 9223372036854775807, > since some C++ compilers masquerading as C compilers > incorrectly reject 9223372036854775807. */ > -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) > +#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) > << 31)) > Same as above. TBH I am not sure why we even include configure in git repo given that different autoconf versions can build different outputs. > > 3) I can't really build without lz4: > > config.status: linking src/makefiles/Makefile.linux to src/Makefile.port > pg_lzcompress.c: In function ‘pg_compress_bound’: > pg_lzcompress.c:892:22: error: ‘SIZEOF_PG_COMPRESS_HEADER’ undeclared > (first use in this function) > return slen + 4 + SIZEOF_PG_COMPRESS_HEADER; > ^~~~~~~~~~~~~~~~~~~~~~~~~ > pg_lzcompress.c:892:22: note: each undeclared identifier is reported > only once for each function it appears in > Okay, that's just problem of SIZEOF_PG_COMPRESS_HEADER being defined inside the HAVE_LZ4 ifdef while it should be defined above ifdef. > > 4) I did a simple test with physical replication, with lz4 enabled on > both sides (well, can't build without lz4 anyway, per previous point). > It immediately failed like this: > > FATAL: failed to restore block image > CONTEXT: WAL redo at 0/5000A40 for Btree/INSERT_LEAF: off 138 > LOG: startup process (PID 15937) exited with exit code 1 > > This is a simple UPDATE on a trivial table: > > create table t (a int primary key); > insert into t select i from generate_series(1,1000) s(i); > update t set a = a - 100000 where random () < 0.1; > > with some checkpoints to force FPW (and wal_compression=on, of course). > > I haven't tried `make check-world` but I suppose some of the TAP tests > should fail because of this. And if not, we need to improve coverage. > FWIW I did run check-world without problems, will have to look into this. > > 5) I wonder why compression_algorithm is defined as PGC_SIGHUP. Why not > to allow users to set it per session? I suppose we might have a separate > option for WAL compression_algorithm. > Yeah I was thinking we might want to change wal_compression to enum as well. Although that complicates the code quite a bit (the caller has to decide algorithm instead compression system doing it). > > 6) It seems a bit strange that pg_compress/pg_decompress are now defined > in pglz_compress.{c,h}. Maybe we should invent src/common/compress.c? > It does not seem worth it to invent new module for like 20 lines of wrapper code. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
On Sun, Aug 04, 2019 at 05:53:26PM +0200, Petr Jelinek wrote: > > ... > >> >>4) I did a simple test with physical replication, with lz4 enabled on >>both sides (well, can't build without lz4 anyway, per previous point). >>It immediately failed like this: >> >>FATAL: failed to restore block image >>CONTEXT: WAL redo at 0/5000A40 for Btree/INSERT_LEAF: off 138 >>LOG: startup process (PID 15937) exited with exit code 1 >> >>This is a simple UPDATE on a trivial table: >> >>create table t (a int primary key); >>insert into t select i from generate_series(1,1000) s(i); >>update t set a = a - 100000 where random () < 0.1; >> >>with some checkpoints to force FPW (and wal_compression=on, of course). >> >>I haven't tried `make check-world` but I suppose some of the TAP tests >>should fail because of this. And if not, we need to improve coverage. >> > >FWIW I did run check-world without problems, will have to look into this. > Not sure if we bother to set wal_compression=on for check-world (I don't think we do, but I may be missing something), so maybe check-world does not really test wal compression. IMO the issue is that RestoreBlockImage() still calls pglz_decompress directly, instead of going through pg_decompress(). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-08-04 02:41:24 +0200, Petr Jelinek wrote: > Same here. > > Just so that we don't idly talk, what do you think about the attached? Cool! > It: > - adds new GUC compression_algorithm with possible values of pglz (default) > and lz4 (if lz4 is compiled in), requires SIGHUP As Tomas remarked, I think it shouldn't be SIGHUP but USERSET. And I think lz4 should be preferred, if available. I could see us using a list style guc, so we could set it to lz4, pglz, and the first available one would be used. > - adds 1 byte header to the compressed data where we currently store the > algorithm kind, that leaves us with 254 more to add :) (that's an extra > overhead compared to the current state) Hm. Why do we need an additional byte? IIRC my patch added that only for the case we would run out of space for compression formats without extending any sizes? > - changes the rawsize in TOAST header to 31 bits via bit packing > - uses the extra bit to differentiate between old and new format Hm. Wouldn't it be easier to just use a different vartag for this? > - I expect my changes to configure.in are not the greatest as I don't have > pretty much zero experience with autoconf FWIW the configure output changes are likely because you used a modified version of autoconf. Unfortunately debian/ubuntu ship one with vendor patches. Greetings, Andres Freund
Hi, On 04/08/2019 21:20, Andres Freund wrote: > On 2019-08-04 02:41:24 +0200, Petr Jelinek wrote: >> Same here. >> >> Just so that we don't idly talk, what do you think about the attached? > > Cool! > >> It: >> - adds new GUC compression_algorithm with possible values of pglz (default) >> and lz4 (if lz4 is compiled in), requires SIGHUP > > As Tomas remarked, I think it shouldn't be SIGHUP but USERSET. And I > think lz4 should be preferred, if available. I could see us using a > list style guc, so we could set it to lz4, pglz, and the first available > one would be used. > Sounds reasonable. >> - adds 1 byte header to the compressed data where we currently store the >> algorithm kind, that leaves us with 254 more to add :) (that's an extra >> overhead compared to the current state) > > Hm. Why do we need an additional byte? IIRC my patch added that only > for the case we would run out of space for compression formats without > extending any sizes? > Yeah your patch worked differently (I didn't actually use any code from it). The main reason why I add the byte is that I am storing the algorithm in the compressed value itself, not in varlena header. I was mainly trying to not have every caller care about storing and loading the compression algorithm. I also can't say I particularly like that hack in your patch. However if we'd want to have separate GUCs for TOAST and WAL then we'll have to do that anyway so maybe it does not matter anymore (we can't use similar hack there AFAICS though). > >> - changes the rawsize in TOAST header to 31 bits via bit packing >> - uses the extra bit to differentiate between old and new format > > Hm. Wouldn't it be easier to just use a different vartag for this? > That would only work for external TOAST pointers right? The compressed varlena can also be stored inline and potentially in index tuple. > >> - I expect my changes to configure.in are not the greatest as I don't have >> pretty much zero experience with autoconf > > FWIW the configure output changes are likely because you used a modified > version of autoconf. Unfortunately debian/ubuntu ship one with vendor > patches. > Yeah, Ubuntu here, that explains. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
Hi, On 2019-08-04 17:53:26 +0200, Petr Jelinek wrote: > > 5) I wonder why compression_algorithm is defined as PGC_SIGHUP. Why not > > to allow users to set it per session? I suppose we might have a separate > > option for WAL compression_algorithm. > > > > Yeah I was thinking we might want to change wal_compression to enum as well. > Although that complicates the code quite a bit (the caller has to decide > algorithm instead compression system doing it). Isn't that basically required anyway? The WAL record will need to carry information about the type of compression used, independent of PGC_SIGHUP/PGC_USERSET, unless you want to make it an initdb option or something super heavyweight like that. We could just have the wal_compression assign hook set a the compression callback and a compression type integer or such if we want to avoid doing that kind of thing at runtime. Greetings, Andres Freund
Hi, On 05/08/2019 00:15, Andres Freund wrote: > Hi, > > On 2019-08-04 17:53:26 +0200, Petr Jelinek wrote: >>> 5) I wonder why compression_algorithm is defined as PGC_SIGHUP. Why not >>> to allow users to set it per session? I suppose we might have a separate >>> option for WAL compression_algorithm. >>> >> >> Yeah I was thinking we might want to change wal_compression to enum as well. >> Although that complicates the code quite a bit (the caller has to decide >> algorithm instead compression system doing it). > > Isn't that basically required anyway? The WAL record will need to carry > information about the type of compression used, independent of > PGC_SIGHUP/PGC_USERSET, unless you want to make it an initdb option or > something super heavyweight like that. > It carries that information inside the compressed value, like I said in the other reply, that's why the extra byte. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
On Fri, Aug 02, 2019 at 07:52:39PM +0200, Tomas Vondra wrote: > On Fri, Aug 02, 2019 at 10:12:58AM -0700, Andres Freund wrote: >> Why would they be stuck continuing to *compress* with pglz? As we >> fully retoast on write anyway we can just gradually switch over to the >> better algorithm. Decompression speed is another story, of course. > > Hmmm, I don't remember the details of those patches so I didn't realize > it allows incremental recompression. If that's possible, that would mean > existing systems can start using it. Which is good. It may become a problem on some platforms though (Windows?), so patches to improve either the compression or decompression of pglz are not that much crazy as they are still likely going to be used, and for read-mostly switching to a new algo may not be worth the extra cost so it is not like we are going to drop it completely either. My take, still the same as upthread, is that it mostly depends on the amount of complexity each patch introduces compared to the performance gain. > Another question is whether we'd actually want to include the code in > core directly, or use system libraries (and if some packagers might > decide to disable that, for whatever reason). Linking to system libraries would make our maintenance much easier, and when it comes to have a copy of something else in the tree we would be stuck with more maintenance around it. These tend to rot easily. After that comes the case where the compression algo is not in the binary across one server to another, in which case we have an automatic ERROR in case of a mismatching algo, or FATAL for deompression of FPWs at recovery when wal_compression is used. -- Michael
Attachment
Hi, On 2019-08-05 00:19:28 +0200, Petr Jelinek wrote: > It carries that information inside the compressed value, like I said in the > other reply, that's why the extra byte. I'm not convinced that that is a good plan - imo the reference to the compressed data should carry that information. I.e. in the case of toast, at least toast pointers should hold enough information to determine the compression algorithm. And in the case of WAL, the WAL record should contain that. Consider e.g. adding support for slice fetching of datums compressed with some algorithm - we should be able to determine whether that's possible without fetching the datum (so we can take a non-exceptional path for datums compressed otherwise). Similarly, for WAL, we should be able to detect whether an incompatible compression format is used, without having to invoke a generic compression routine that then fails in some way. Or adding compression reporting for WAL to xlogdump. I also don't particularly like baking in the assumption that we don't support tuples larger than 1GB in further places. To me it seems likely that we're going to have to fix that, and it's hard enough already... I know that my patch did that too... For external datums I suggest encoding the compression method as a distinct VARTAG_ONDISK_COMPRESSED, and then have that include the compression method as a field. For in-line compressed values (so VARATT_IS_4B_C), doing something roughly like you did, indicating the type of metadata following using the high bit sounds reasonable. But I think I'd make it so that if the highbit is set, the struct is instead entirely different, keeping a full 4 byte byte length, and including the compression type header inside the struct. Perhaps I'd combine the compression type with the high-bit-set part? So when the high bit is set, it'd be something like { int32 vl_len_; /* varlena header (do not touch directly!) */ /* * Actually only 7 bit, the high bit determines whether this * is the old compression header (if unset), or this type of header * (if set). */ uint8 type; /* * Stored as uint8[4], to avoid unnecessary alignment padding. */ uint8[4] length; char va_data[FLEXIBLE_ARRAY_MEMBER]; } I think it's worth spending some effort trying to get this right - we'll be bound by design choices for a while. It's kinda annoying that toast datums aren't better designed. Creating them from scratch, I'd make it something like: 1) variable-width integer describing the "physical length", so that tuple deforming can quickly determine length - all the ifs necessary to determine lengths are a bottleneck. I'd probably just use a 127bit encoding, if the high bit is set, there's a following length byte. 2) type of toasted datum, probably also in a variable width encoding, starting at 1byte. Not that I think it's likely we'd overrun 256 types - but it's cheap enough to just declare the high bit as an length extension bit. These are always stored unaligned. So there's no need to deal with padding bytes having to be zero to determine whether we're dealing with a 1byte datum etc. Then, type dependant: For In-line uncompressed datums 3a) alignment padding, amount determined by 2) above, i.e. we'd just have different types for different amounts of alignment. Probably using some heuristic to use unaligned when either dealing with data that doesn't need alignment, or when the datum is fairly small, so copying to get the data as unaligned won't be a significant penalty. 4a) data For in-line compressed datums 3b) compression metadata {varint rawsize, varint compression algorithm} 4b) unaligned compressed data - there's no benefit in keeping it aligned For External toast for uncompressed data: 3d) {toastrelid, valueid, varint rawsize} For External toast for compressed data: 3e) {valueid, toastrelid, varint compression_algorithm, varint rawsize, varint extsize} That'd make it a lot more extensible, easier to understand, faster to decode in a lot of cases, remove a lot of arbitrary limits. Yes, it'd increase the header size for small datums to two bytes, but I think that'd be more than bought back by the other improvements. Greetings, Andres Freund
Hi, On 2019-08-05 16:04:46 +0900, Michael Paquier wrote: > On Fri, Aug 02, 2019 at 07:52:39PM +0200, Tomas Vondra wrote: > > On Fri, Aug 02, 2019 at 10:12:58AM -0700, Andres Freund wrote: > >> Why would they be stuck continuing to *compress* with pglz? As we > >> fully retoast on write anyway we can just gradually switch over to the > >> better algorithm. Decompression speed is another story, of course. > > > > Hmmm, I don't remember the details of those patches so I didn't realize > > it allows incremental recompression. If that's possible, that would mean > > existing systems can start using it. Which is good. > > It may become a problem on some platforms though (Windows?), so > patches to improve either the compression or decompression of pglz are > not that much crazy as they are still likely going to be used, and for > read-mostly switching to a new algo may not be worth the extra cost so > it is not like we are going to drop it completely either. What's the platform dependency that you're thinking of? And how's compression speed relevant to "read mostly"? Switching would just happen whenever tuple fields are changed. And it'll not have an additional cost, because all it does is reduce the cost of a toast write that'd otherwise happened with pglz. > Linking to system libraries would make our maintenance much easier, > and when it comes to have a copy of something else in the tree we > would be stuck with more maintenance around it. These tend to rot > easily. I don't think it's really our experience that they "rot easily". > After that comes the case where the compression algo is not > in the binary across one server to another, in which case we have an > automatic ERROR in case of a mismatching algo, or FATAL for > deompression of FPWs at recovery when wal_compression is used. Huh? That's a failure case that only exists if you don't include it in the tree (with the option to use an out-of-tree lib)? Greetings, Andres Freund
On 2019-06-24 10:44, Andrey Borodin wrote: >> 18 мая 2019 г., в 11:44, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): >> > Hi! > Here's rebased version of patches. > > Best regards, Andrey Borodin. I think this is the most recent patch for the CF entry <https://commitfest.postgresql.org/24/2119/>. What about the two patches? Which one is better? Have you also considered using memmove() to deal with the overlap issue? Benchmarks have been posted in this thread. Where is the benchmarking tool? Should we include that in the source somehow? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, Peter! Thanks for looking into this. > 4 сент. 2019 г., в 14:09, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а): > > On 2019-06-24 10:44, Andrey Borodin wrote: >>> 18 мая 2019 г., в 11:44, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): >>> >> Hi! >> Here's rebased version of patches. >> >> Best regards, Andrey Borodin. > > I think this is the most recent patch for the CF entry > <https://commitfest.postgresql.org/24/2119/>. > > What about the two patches? Which one is better? On our observations pglz_decompress_hacked.patch is best for most of tested platforms. Difference is that pglz_decompress_hacked8.patch will not appply optimization if decompressed match is not greater than 8bytes. This optimization was suggested by Tom, that's why we benchmarked it specifically. > Have you also considered using memmove() to deal with the overlap issue? Yes, memmove() resolves ambiguity of copying overlapping regions in a way that is not compatible with pglz. In proposed patchwe never copy overlapping regions. > Benchmarks have been posted in this thread. Where is the benchmarking > tool? Should we include that in the source somehow? Benchmarking tool is here [0]. Well, code of the benchmarking tool do not adhere to our standards in some places, we didnot consider its inclusion in core. However, most questionable part of benchmarking is choice of test data. It's about 100Mb of useless WALs, datafile and valuableShakespeare writings. Best regards, Andrey Borodin. [0] https://github.com/x4m/test_pglz
On 2019-09-04 11:22, Andrey Borodin wrote: >> What about the two patches? Which one is better? > On our observations pglz_decompress_hacked.patch is best for most of tested platforms. > Difference is that pglz_decompress_hacked8.patch will not appply optimization if decompressed match is not greater than8 bytes. This optimization was suggested by Tom, that's why we benchmarked it specifically. The patches attached to the message I was replying to are named 0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch 0001-Use-memcpy-in-pglz-decompression.patch Are those the same ones? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 4 сент. 2019 г., в 17:40, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а): > > On 2019-09-04 11:22, Andrey Borodin wrote: >>> What about the two patches? Which one is better? >> On our observations pglz_decompress_hacked.patch is best for most of tested platforms. >> Difference is that pglz_decompress_hacked8.patch will not appply optimization if decompressed match is not greater than8 bytes. This optimization was suggested by Tom, that's why we benchmarked it specifically. > > The patches attached to the message I was replying to are named > > 0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch > 0001-Use-memcpy-in-pglz-decompression.patch > > Are those the same ones? Yes. Sorry for this confusion. The only difference of 0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch is that it fallbacks to byte-loop iflen is <= 8. Best regards, Andrey Borodin.
I just noticed we had two CF items pointing to this thread, https://commitfest.postgresql.org/24/2119/ https://commitfest.postgresql.org/24/2180/ so I marked the newer one as withdrawn. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Sep 4, 2019 at 12:22 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > Hi, Peter! Thanks for looking into this. > > > 4 сент. 2019 г., в 14:09, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а): > > > > On 2019-06-24 10:44, Andrey Borodin wrote: > >>> 18 мая 2019 г., в 11:44, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > >>> > >> Hi! > >> Here's rebased version of patches. > >> > >> Best regards, Andrey Borodin. > > > > I think this is the most recent patch for the CF entry > > <https://commitfest.postgresql.org/24/2119/>. > > > > What about the two patches? Which one is better? > On our observations pglz_decompress_hacked.patch is best for most of tested platforms. > Difference is that pglz_decompress_hacked8.patch will not appply optimization if decompressed match is not greater than8 bytes. This optimization was suggested by Tom, that's why we benchmarked it specifically. > > > Have you also considered using memmove() to deal with the overlap issue? > Yes, memmove() resolves ambiguity of copying overlapping regions in a way that is not compatible with pglz. In proposedpatch we never copy overlapping regions. > > > Benchmarks have been posted in this thread. Where is the benchmarking > > tool? Should we include that in the source somehow? > > Benchmarking tool is here [0]. Well, code of the benchmarking tool do not adhere to our standards in some places, we didnot consider its inclusion in core. > However, most questionable part of benchmarking is choice of test data. It's about 100Mb of useless WALs, datafile andvaluable Shakespeare writings. Why not use 'Silesia compression corpus' (http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia), which used by lzbench (https://github.com/inikep/lzbench) ? I and Teodor remember that testing on non-english texts could be very important. > > Best regards, Andrey Borodin. > > > [0] https://github.com/x4m/test_pglz > > > -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi, On 05/08/2019 09:26, Andres Freund wrote: > Hi, > > On 2019-08-05 00:19:28 +0200, Petr Jelinek wrote: >> It carries that information inside the compressed value, like I said in the >> other reply, that's why the extra byte. > > I'm not convinced that that is a good plan - imo the reference to the > compressed data should carry that information. > > I.e. in the case of toast, at least toast pointers should hold enough > information to determine the compression algorithm. And in the case of > WAL, the WAL record should contain that. > Point taken. > > For external datums I suggest encoding the compression method as a > distinct VARTAG_ONDISK_COMPRESSED, and then have that include the > compression method as a field. So the new reads/writes will use this and reads of old format won't change? Sounds fine. > > For in-line compressed values (so VARATT_IS_4B_C), doing something > roughly like you did, indicating the type of metadata following using > the high bit sounds reasonable. But I think I'd make it so that if the > highbit is set, the struct is instead entirely different, keeping a full > 4 byte byte length, and including the compression type header inside the > struct. Perhaps I'd combine the compression type with the high-bit-set > part? So when the high bit is set, it'd be something like > > { > int32 vl_len_; /* varlena header (do not touch directly!) */ > > /* > * Actually only 7 bit, the high bit determines whether this > * is the old compression header (if unset), or this type of header > * (if set). > */ > uint8 type; > > /* > * Stored as uint8[4], to avoid unnecessary alignment padding. > */ > uint8[4] length; > > char va_data[FLEXIBLE_ARRAY_MEMBER]; > } > Won't this break BW compatibility on big-endian (if I understand corretly what you are trying to achieve here)? > I think it's worth spending some effort trying to get this right - we'll > be bound by design choices for a while. > Sure, however I am not in business of redesigning TOAST from scratch right now, even if I do agree that the current header is far from ideal. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
On 2019-09-04 14:45, Andrey Borodin wrote: >> On 2019-09-04 11:22, Andrey Borodin wrote: >>>> What about the two patches? Which one is better? >>> On our observations pglz_decompress_hacked.patch is best for most of tested platforms. >>> Difference is that pglz_decompress_hacked8.patch will not appply optimization if decompressed match is not greater than8 bytes. This optimization was suggested by Tom, that's why we benchmarked it specifically. >> >> The patches attached to the message I was replying to are named >> >> 0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch >> 0001-Use-memcpy-in-pglz-decompression.patch >> >> Are those the same ones? > > Yes. Sorry for this confusion. > > The only difference of 0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch is that it fallbacks to byte-loop iflen is <= 8. After reviewing this thread and more testing, I think 0001-Use-memcpy-in-pglz-decompression.patch appears to be a solid win and we should move ahead with it. I don't, however, fully understand the code changes, and I think this could use more and better comments. In particular, I wonder about off *= 2; This is new logic that isn't explained anywhere. This whole function is commented a bit strangely. It begins with "Otherwise", but there is nothing before it. And what does "from OUTPUT to OUTPUT" mean? There is no "output" variable. We should make this match the code better. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Oleg, Peter, thanks for looking into this! I hope to benchmark decompression on Silesian corpus soon. PFA v2 with better comments. > 27 сент. 2019 г., в 14:41, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а): > > After reviewing this thread and more testing, I think > 0001-Use-memcpy-in-pglz-decompression.patch appears to be a solid win > and we should move ahead with it. > > I don't, however, fully understand the code changes, and I think this > could use more and better comments. In particular, I wonder about > > off *= 2; I've changed this to off += off; > > This is new logic that isn't explained anywhere. > > This whole function is commented a bit strangely. It begins with > "Otherwise", but there is nothing before it. And what does "from OUTPUT > to OUTPUT" mean? There is no "output" variable. We should make this > match the code better. I've added small example to illustrate what is going on. Thanks! -- Andrey Borodin Open source RDBMS development team leader Yandex.Cloud
Attachment
> 28 сент. 2019 г., в 10:29, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > I hope to benchmark decompression on Silesian corpus soon. I've done it. And results are quite controversial. Dataset adds 12 payloads to our 5. Payloads have relatively high entropy. In many cases pglz cannot compress them at all,so decompression is nop, data is stored as is. Decompressor pglz_decompress_hacked result 48.281747 Decompressor pglz_decompress_hacked8 result 33.868779 Decompressor pglz_decompress_vanilla result 42.510165 Tested on Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz With Silesian corpus pglz_decompress_hacked is actually decreasing performance on high-entropy data. Meanwhile pglz_decompress_hacked8 is still faster than usual pglz_decompress. In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer option. I've updated test suite [0] and anyone interested can verify benchmarks. -- Andrey Borodin Open source RDBMS development team leader Yandex.Cloud [0] https://github.com/x4m/test_pglz
> 21 окт. 2019 г., в 14:09, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > With Silesian corpus pglz_decompress_hacked is actually decreasing performance on high-entropy data. > Meanwhile pglz_decompress_hacked8 is still faster than usual pglz_decompress. > In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer option. Here's v3 which takes into account recent benchmarks with Silesian Corpus and have better comments. Thanks! -- Andrey Borodin Open source RDBMS development team leader Yandex.Cloud
Attachment
On 2019-10-25 07:05, Andrey Borodin wrote: >> 21 окт. 2019 г., в 14:09, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): >> >> With Silesian corpus pglz_decompress_hacked is actually decreasing performance on high-entropy data. >> Meanwhile pglz_decompress_hacked8 is still faster than usual pglz_decompress. >> In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer option. > > Here's v3 which takes into account recent benchmarks with Silesian Corpus and have better comments. Your message from 21 October appears to say that this change makes the performance worse. So I don't know how to proceed with this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Nov-01, Peter Eisentraut wrote: > On 2019-10-25 07:05, Andrey Borodin wrote: > > > 21 окт. 2019 г., в 14:09, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > > > > > With Silesian corpus pglz_decompress_hacked is actually decreasing performance on high-entropy data. > > > Meanwhile pglz_decompress_hacked8 is still faster than usual pglz_decompress. > > > In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer option. > > > > Here's v3 which takes into account recent benchmarks with Silesian Corpus and have better comments. > > Your message from 21 October appears to say that this change makes the > performance worse. So I don't know how to proceed with this. As I understand that report, in these results "less is better", so the hacked8 variant shows better performance (33.8) than current (42.5). The "hacked" variant shows worse performance (48.2) that the current code. The "in spite" phrase seems to have been a mistake. I am surprised that there is so much variability in the performance numbers, though, based on such small tweaks of the code. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Nov 01, 2019 at 12:48:28PM -0300, Alvaro Herrera wrote: >On 2019-Nov-01, Peter Eisentraut wrote: > >> On 2019-10-25 07:05, Andrey Borodin wrote: >> > > 21 окт. 2019 г., в 14:09, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): >> > > >> > > With Silesian corpus pglz_decompress_hacked is actually decreasing performance on high-entropy data. >> > > Meanwhile pglz_decompress_hacked8 is still faster than usual pglz_decompress. >> > > In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer option. >> > >> > Here's v3 which takes into account recent benchmarks with Silesian Corpus and have better comments. >> >> Your message from 21 October appears to say that this change makes the >> performance worse. So I don't know how to proceed with this. > >As I understand that report, in these results "less is better", so the >hacked8 variant shows better performance (33.8) than current (42.5). >The "hacked" variant shows worse performance (48.2) that the current >code. The "in spite" phrase seems to have been a mistake. > >I am surprised that there is so much variability in the performance >numbers, though, based on such small tweaks of the code. > I'd try running the benchmarks to verify the numbers, and maybe do some additional tests, but it's not clear to me which patches should I use. I think the last patches with 'hacked' and 'hacked8' in the name are a couple of months old, and the recent posts attach just a single patch. Andrey, can you post current versions of both patches? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 1 нояб. 2019 г., в 18:48, Alvaro Herrera <alvherre@2ndquadrant.com> написал(а): > > On 2019-Nov-01, Peter Eisentraut wrote: > >> On 2019-10-25 07:05, Andrey Borodin wrote: >>>> 21 окт. 2019 г., в 14:09, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): >>>> >>>> With Silesian corpus pglz_decompress_hacked is actually decreasing performance on high-entropy data. >>>> Meanwhile pglz_decompress_hacked8 is still faster than usual pglz_decompress. >>>> In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer option. >>> >>> Here's v3 which takes into account recent benchmarks with Silesian Corpus and have better comments. >> >> Your message from 21 October appears to say that this change makes the >> performance worse. So I don't know how to proceed with this. > > As I understand that report, in these results "less is better", so the > hacked8 variant shows better performance (33.8) than current (42.5). > The "hacked" variant shows worse performance (48.2) that the current > code. This is correct. Thanks, Álvaro. > The "in spite" phrase seems to have been a mistake. Yes. Sorry, I actually thought that "in spite" is a contradiction of "despite" and means "In view of". > I am surprised that there is so much variability in the performance > numbers, though, based on such small tweaks of the code. Silesian Corpus is very different from WALs and PG data files. Data files are rich in long sequences of same byte. This sequencesare long, thus unrolled very effectively by memcpy method. But Silesian corpus is rich in short matches of few bytes. > 1 нояб. 2019 г., в 19:59, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): > I'd try running the benchmarks to verify the numbers, and maybe do some > additional tests, but it's not clear to me which patches should I use. Cool, thanks! > I think the last patches with 'hacked' and 'hacked8' in the name are a > couple of months old, and the recent posts attach just a single patch. > Andrey, can you post current versions of both patches? PFA two patches: v4-0001-Use-memcpy-in-pglz-decompression.patch (known as 'hacked' in test_pglz extension) v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch (known as 'hacked8') Best regards, Andrey Borodin.
Attachment
Hello Andrey, On 2019-11-02 12:30, Andrey Borodin wrote: >> 1 нояб. 2019 г., в 18:48, Alvaro Herrera <alvherre@2ndquadrant.com> >> написал(а): > PFA two patches: > v4-0001-Use-memcpy-in-pglz-decompression.patch (known as 'hacked' in > test_pglz extension) > v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch (known > as 'hacked8') Looking at the patches, it seems only the case of a match is changed. But when we observe a literal byte, this is copied byte-by-byte with: else { * An unset control bit means LITERAL BYTE. So we just * copy one from INPUT to OUTPUT. */ *dp++ = *sp++; } Maybe we can optimize this, too. For instance, you could just increase a counter: else { /* * An unset control bit means LITERAL BYTE. We count * these and copy them later. */ literal_bytes ++; } and in the case of: if (ctrl & 1) { /* First copy all the literal bytes */ if (literal_bytes > 0) { memcpy( sp, dp, literal_bytes); sp += literal_bytes; dp += literal_bytes; literal_bytes = 0; } (Code untested!) The same would need to be done at the very end, if the input ends without any new CTRL-byte. Wether that gains us anything depends on how common literal bytes are. It might be that highly compressible input has almost none, while input that is a mix of incompressible strings and compressible ones might have longer stretches. One example would be something like an SHA-256, that is repeated twice. The first instance would be incompressible, the second one would be just a copy. This might not happens that often in practical inputs, though. I wonder if you agree and what would happen if you try this variant on your corpus tests. Best regards, Tels
Hi Tels! Thanks for your interest in fast decompression. > 3 нояб. 2019 г., в 12:24, Tels <nospam-pg-abuse@bloodgate.com> написал(а): > > I wonder if you agree and what would happen if you try this variant on your corpus tests. I've tried some different optimization for literals. For example loop unrolling[0] and literals bulk-copying. This approaches were brining some performance improvement. But with noise. Statistically they were somewhere better, somewhereworse, net win, but that "net win" depends on what we consider important data and important platform. Proposed patch makes clearly decompression faster on any dataset, and platform. I believe improving pglz further is viable, but optimizations like common data prefix seems more promising to me. Also, I think we actually need real codecs like lz4, zstd and brotli instead of our own invented wheel. If you have some spare time - Pull Requests to test_pglz are welcome, lets benchmark more micro optimizations, it bringsa lot of fun :) -- Andrey Borodin Open source RDBMS development team leader Yandex.Cloud [0] https://github.com/x4m/test_pglz/blob/master/pg_lzcompress_hacked.c#L166
On 2019-11-01 16:48, Alvaro Herrera wrote: > As I understand that report, in these results "less is better", so the > hacked8 variant shows better performance (33.8) than current (42.5). > The "hacked" variant shows worse performance (48.2) that the current > code. Which appears to be the opposite of, or at least inconsistent with, results earlier in the thread. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-11-01 17:59, Tomas Vondra wrote: > I'd try running the benchmarks to verify the numbers, and maybe do some > additional tests, but it's not clear to me which patches should I use. > > I think the last patches with 'hacked' and 'hacked8' in the name are a > couple of months old, and the recent posts attach just a single patch. > Andrey, can you post current versions of both patches? OK, waiting on some independent verification of benchmark numbers. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Nov 06, 2019 at 09:04:25AM +0100, Peter Eisentraut wrote: > OK, waiting on some independent verification of benchmark numbers. Still waiting for these after 19 days, so the patch has been marked as returned with feedback. -- Michael
Attachment
> 25 нояб. 2019 г., в 13:03, Michael Paquier <michael@paquier.xyz> написал(а): > > On Wed, Nov 06, 2019 at 09:04:25AM +0100, Peter Eisentraut wrote: >> OK, waiting on some independent verification of benchmark numbers. > > Still waiting for these after 19 days, so the patch has been marked as > returned with feedback. I think status Needs Review describes what is going on better. It's not like something is awaited from my side. Thanks. Best regards, Andrey Borodin.
On Mon, Nov 25, 2019 at 01:21:27PM +0500, Andrey Borodin wrote: > I think status Needs Review describes what is going on better. It's > not like something is awaited from my side. Indeed. You are right so I have moved the patch instead, with "Needs review". The patch status was actually incorrect in the CF app, as it was marked as waiting on author. @Tomas: updated versions of the patches have been sent by Andrey. -- Michael
Attachment
On Mon, Nov 25, 2019 at 05:29:40PM +0900, Michael Paquier wrote: >On Mon, Nov 25, 2019 at 01:21:27PM +0500, Andrey Borodin wrote: >> I think status Needs Review describes what is going on better. It's >> not like something is awaited from my side. > >Indeed. You are right so I have moved the patch instead, with "Needs >review". The patch status was actually incorrect in the CF app, as it >was marked as waiting on author. > >@Tomas: updated versions of the patches have been sent by Andrey. I've done benchmarks on the two last patches, using the data sets from test_pglz repository [1], but using three simple queries: 1) prefix - first 100 bytes of the value SELECT length(substr(value, 0, 100)) FROM t 2) infix - 100 bytes from the middle SELECT length(substr(value, test_length/2, 100)) FROM t 3) suffix - last 100 bytes SELECT length(substr(value, test_length - 100, 100)) FROM t See the two attached scripts, implementing this benchmark. The test itself did a 60-second pgbench runs (single client) measuring tps on two different machines. patch 1: v4-0001-Use-memcpy-in-pglz-decompression.patch patch 2: v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch The results (compared to master) from the first machine (i5-2500k CPU) look like this: patch 1 | patch 2 dataset prefix infix suffix | prefix infix suffix ------------------------------------------------------------------------- 000000010000000000000001 99% 134% 161% | 100% 126% 152% 000000010000000000000006 99% 260% 287% | 100% 257% 279% 000000010000000000000008 100% 100% 100% | 100% 95% 91% 16398 100% 168% 221% | 100% 159% 215% shakespeare.txt 100% 138% 141% | 100% 116% 117% mr 99% 120% 128% | 100% 107% 108% dickens 100% 129% 132% | 100% 100% 100% mozilla 100% 119% 120% | 100% 102% 104% nci 100% 149% 141% | 100% 143% 135% ooffice 99% 121% 123% | 100% 97% 98% osdb 100% 99% 99% | 100% 100% 99% reymont 99% 130% 132% | 100% 106% 107% samba 100% 126% 132% | 100% 105% 111% sao 100% 100% 99% | 100% 100% 100% webster 100% 127% 127% | 100% 106% 106% x-ray 99% 99% 99% | 100% 100% 100% xml 100% 144% 144% | 100% 130% 128% and on the other one (xeon e5-2620v4) looks like this: patch 1 | patch 2 dataset prefix infix suffix | prefix infix suffix ------------------------------------------------------------------------ 000000010000000000000001 98% 147% 170% | 98% 132% 159% 000000010000000000000006 100% 340% 314% | 98% 334% 355% 000000010000000000000008 99% 100% 105% | 99% 99% 101% 16398 101% 153% 205% | 99% 148% 201% shakespeare.txt 100% 147% 149% | 99% 117% 118% mr 100% 131% 139% | 99% 112% 108% dickens 100% 143% 143% | 99% 103% 102% mozilla 100% 122% 122% | 99% 105% 106% nci 100% 151% 135% | 100% 135% 125% ooffice 99% 127% 129% | 98% 101% 102% osdb 102% 100% 101% | 102% 100% 99% reymont 101% 142% 143% | 100% 108% 108% samba 100% 132% 136% | 99% 109% 112% sao 99% 101% 100% | 99% 100% 100% webster 100% 132% 129% | 100% 106% 106% x-ray 99% 101% 100% | 90% 101% 101% xml 100% 147% 148% | 100% 127% 125% In general, I think the results for both patches seem clearly a win, but maybe patch 1 is bit better, especially on the newer (xeon) CPU. So I'd probably go with that one. [1] https://github.com/x4m/test_pglz regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2019-11-26 10:43, Tomas Vondra wrote: > In general, I think the results for both patches seem clearly a win, but > maybe patch 1 is bit better, especially on the newer (xeon) CPU. So I'd > probably go with that one. Patch 1 is also the simpler patch, so it seems clearly preferable. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 26, 2019 at 08:17:13PM +0100, Peter Eisentraut wrote: >On 2019-11-26 10:43, Tomas Vondra wrote: >>In general, I think the results for both patches seem clearly a win, but >>maybe patch 1 is bit better, especially on the newer (xeon) CPU. So I'd >>probably go with that one. > >Patch 1 is also the simpler patch, so it seems clearly preferable. > Yeah, although the difference is minimal. We could probably construct a benchmark where #2 wins, but I think these queries are fairly realistic. So I'd just go with #1. Code-wise I think the patches are mostly fine, although the comments might need some proof-reading. 1) I wasn't really sure what a "nibble" is, but maybe it's just me and it's a well-known term. 2) First byte use lower -> First byte uses lower 3) nibble contain upper -> nibble contains upper 4) to preven possible uncertanity -> to prevent possible uncertainty 5) I think we should briefly explain why memmove would be incompatible with pglz, it's not quite clear to me. 6) I'm pretty sure the comment in the 'while (off < len)' branch will be badly mangled by pgindent. 7) The last change moving "copy" to the next line seems unnecessary. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 26, 2019 at 09:05:59PM +0100, Tomas Vondra wrote: > Yeah, although the difference is minimal. We could probably construct a > benchmark where #2 wins, but I think these queries are fairly realistic. > So I'd just go with #1. Nice results. Using your benchmarks it indeed looks like patch 1 is a winner here. > Code-wise I think the patches are mostly fine, although the comments > might need some proof-reading. > > 1) I wasn't really sure what a "nibble" is, but maybe it's just me and > it's a well-known term. > > 2) First byte use lower -> First byte uses lower > > 3) nibble contain upper -> nibble contains upper > > 4) to preven possible uncertanity -> to prevent possible uncertainty > > 5) I think we should briefly explain why memmove would be incompatible > with pglz, it's not quite clear to me. > > 6) I'm pretty sure the comment in the 'while (off < len)' branch will be > badly mangled by pgindent. > > 7) The last change moving "copy" to the next line seems unnecessary. Patch 1 has a typo as well here: + * When offset is smaller than lengh - source and s/lengh/length/ Okay, if we are reaching a conclusion here, Tomas or Peter, are you planning to finish brushing the patch and potentially commit it? -- Michael
Attachment
On Wed, Nov 27, 2019 at 05:01:47PM +0900, Michael Paquier wrote: >On Tue, Nov 26, 2019 at 09:05:59PM +0100, Tomas Vondra wrote: >> Yeah, although the difference is minimal. We could probably construct a >> benchmark where #2 wins, but I think these queries are fairly realistic. >> So I'd just go with #1. > >Nice results. Using your benchmarks it indeed looks like patch 1 is a >winner here. > >> Code-wise I think the patches are mostly fine, although the comments >> might need some proof-reading. >> >> 1) I wasn't really sure what a "nibble" is, but maybe it's just me and >> it's a well-known term. >> >> 2) First byte use lower -> First byte uses lower >> >> 3) nibble contain upper -> nibble contains upper >> >> 4) to preven possible uncertanity -> to prevent possible uncertainty >> >> 5) I think we should briefly explain why memmove would be incompatible >> with pglz, it's not quite clear to me. >> >> 6) I'm pretty sure the comment in the 'while (off < len)' branch will be >> badly mangled by pgindent. >> >> 7) The last change moving "copy" to the next line seems unnecessary. > >Patch 1 has a typo as well here: >+ * When offset is smaller than lengh - source and >s/lengh/length/ > >Okay, if we are reaching a conclusion here, Tomas or Peter, are you >planning to finish brushing the patch and potentially commit it? I'd like some feedback from Andrey regarding the results and memmove comment, ant then I'll polish and push. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Tomas! Thanks for benchmarking this! > 26 нояб. 2019 г., в 14:43, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): > > On Mon, Nov 25, 2019 at 05:29:40PM +0900, Michael Paquier wrote: >> On Mon, Nov 25, 2019 at 01:21:27PM +0500, Andrey Borodin wrote: >>> I think status Needs Review describes what is going on better. It's >>> not like something is awaited from my side. >> >> Indeed. You are right so I have moved the patch instead, with "Needs >> review". The patch status was actually incorrect in the CF app, as it >> was marked as waiting on author. >> >> @Tomas: updated versions of the patches have been sent by Andrey. > > I've done benchmarks on the two last patches, using the data sets from > test_pglz repository [1], but using three simple queries: > > 1) prefix - first 100 bytes of the value > > SELECT length(substr(value, 0, 100)) FROM t > > 2) infix - 100 bytes from the middle > > SELECT length(substr(value, test_length/2, 100)) FROM t > > 3) suffix - last 100 bytes > > SELECT length(substr(value, test_length - 100, 100)) FROM t > > See the two attached scripts, implementing this benchmark. The test > itself did a 60-second pgbench runs (single client) measuring tps on two > different machines. > > patch 1: v4-0001-Use-memcpy-in-pglz-decompression.patch > patch 2: v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch > > The results (compared to master) from the first machine (i5-2500k CPU) > look like this: > > patch 1 | patch 2 > dataset prefix infix suffix | prefix infix suffix > ------------------------------------------------------------------------- > 000000010000000000000001 99% 134% 161% | 100% 126% 152% > 000000010000000000000006 99% 260% 287% | 100% 257% 279% > 000000010000000000000008 100% 100% 100% | 100% 95% 91% > 16398 100% 168% 221% | 100% 159% 215% > shakespeare.txt 100% 138% 141% | 100% 116% 117% > mr 99% 120% 128% | 100% 107% 108% > dickens 100% 129% 132% | 100% 100% 100% > mozilla 100% 119% 120% | 100% 102% 104% > nci 100% 149% 141% | 100% 143% 135% > ooffice 99% 121% 123% | 100% 97% 98% > osdb 100% 99% 99% | 100% 100% 99% > reymont 99% 130% 132% | 100% 106% 107% > samba 100% 126% 132% | 100% 105% 111% > sao 100% 100% 99% | 100% 100% 100% > webster 100% 127% 127% | 100% 106% 106% > x-ray 99% 99% 99% | 100% 100% 100% > xml 100% 144% 144% | 100% 130% 128% > > and on the other one (xeon e5-2620v4) looks like this: > > patch 1 | patch 2 > dataset prefix infix suffix | prefix infix suffix > ------------------------------------------------------------------------ > 000000010000000000000001 98% 147% 170% | 98% 132% 159% > 000000010000000000000006 100% 340% 314% | 98% 334% 355% > 000000010000000000000008 99% 100% 105% | 99% 99% 101% > 16398 101% 153% 205% | 99% 148% 201% > shakespeare.txt 100% 147% 149% | 99% 117% 118% > mr 100% 131% 139% | 99% 112% 108% > dickens 100% 143% 143% | 99% 103% 102% > mozilla 100% 122% 122% | 99% 105% 106% > nci 100% 151% 135% | 100% 135% 125% > ooffice 99% 127% 129% | 98% 101% 102% > osdb 102% 100% 101% | 102% 100% 99% > reymont 101% 142% 143% | 100% 108% 108% > samba 100% 132% 136% | 99% 109% 112% > sao 99% 101% 100% | 99% 100% 100% > webster 100% 132% 129% | 100% 106% 106% > x-ray 99% 101% 100% | 90% 101% 101% > xml 100% 147% 148% | 100% 127% 125% > > In general, I think the results for both patches seem clearly a win, but > maybe patch 1 is bit better, especially on the newer (xeon) CPU. So I'd > probably go with that one. From my POV there are two interesting new points in your benchmarks: 1. They are more or lesss end-to-end benchmarks with whole system involved. 2. They provide per-payload breakdown Prefix experiment is mostly related to reading from page cache and not directly connected with decompression. It's a bitstrange that we observe 1% degradation in certain experiments, but I believe it's a noise. Infix and Suffix results are correlated. We observe no impact of the patch on compressed data. test_pglz also includes slicing by 2Kb and 8Kb. This was done to imitate toasting. But as far as I understand, in your testdata payload will be inserted into toast table too, won't it? If so, I agree that patch 1 looks like a better option. > 27 нояб. 2019 г., в 1:05, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): > > Code-wise I think the patches are mostly fine, although the comments > might need some proof-reading. > > 1) I wasn't really sure what a "nibble" is, but maybe it's just me and > it's a well-known term. I've took the word from pg_lzcompress.c comments * The offset is in the upper nibble of T1 and in T2. * The length is in the lower nibble of T1. > > 2) First byte use lower -> First byte uses lower > > 3) nibble contain upper -> nibble contains upper > > 4) to preven possible uncertanity -> to prevent possible uncertainty > > 5) I think we should briefly explain why memmove would be incompatible > with pglz, it's not quite clear to me. Here's the example + * Consider input: 112341234123412341234 + * At byte 5 here ^ we have match with length 16 and + * offset 4. 11234M(len=16, off=4) If we simply memmove() this 16 bytes we will produce 112341234XXXXXXXXXXXX, where series of X is 12 undefined bytes, thatwere at bytes [6:18]. > > 6) I'm pretty sure the comment in the 'while (off < len)' branch will be > badly mangled by pgindent. I think I can just write it without line limit and then run pgindent. Will try to do it this evening. Also, I will try towrite more about memmove. > > 7) The last change moving "copy" to the next line seems unnecessary. Oh, looks like I had been rewording this comment, and eventually came to the same text..Yes, this change is absolutely unnecessary. Thanks! Best regards, Andrey Borodin.
On Wed, Nov 27, 2019 at 05:47:25PM +0500, Andrey Borodin wrote: >Hi Tomas! > >Thanks for benchmarking this! > >> 26 нояб. 2019 г., в 14:43, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >> >> On Mon, Nov 25, 2019 at 05:29:40PM +0900, Michael Paquier wrote: >>> On Mon, Nov 25, 2019 at 01:21:27PM +0500, Andrey Borodin wrote: >>>> I think status Needs Review describes what is going on better. It's >>>> not like something is awaited from my side. >>> >>> Indeed. You are right so I have moved the patch instead, with "Needs >>> review". The patch status was actually incorrect in the CF app, as it >>> was marked as waiting on author. >>> >>> @Tomas: updated versions of the patches have been sent by Andrey. >> >> I've done benchmarks on the two last patches, using the data sets from >> test_pglz repository [1], but using three simple queries: >> >> 1) prefix - first 100 bytes of the value >> >> SELECT length(substr(value, 0, 100)) FROM t >> >> 2) infix - 100 bytes from the middle >> >> SELECT length(substr(value, test_length/2, 100)) FROM t >> >> 3) suffix - last 100 bytes >> >> SELECT length(substr(value, test_length - 100, 100)) FROM t >> >> See the two attached scripts, implementing this benchmark. The test >> itself did a 60-second pgbench runs (single client) measuring tps on two >> different machines. >> >> patch 1: v4-0001-Use-memcpy-in-pglz-decompression.patch >> patch 2: v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch >> >> The results (compared to master) from the first machine (i5-2500k CPU) >> look like this: >> >> patch 1 | patch 2 >> dataset prefix infix suffix | prefix infix suffix >> ------------------------------------------------------------------------- >> 000000010000000000000001 99% 134% 161% | 100% 126% 152% >> 000000010000000000000006 99% 260% 287% | 100% 257% 279% >> 000000010000000000000008 100% 100% 100% | 100% 95% 91% >> 16398 100% 168% 221% | 100% 159% 215% >> shakespeare.txt 100% 138% 141% | 100% 116% 117% >> mr 99% 120% 128% | 100% 107% 108% >> dickens 100% 129% 132% | 100% 100% 100% >> mozilla 100% 119% 120% | 100% 102% 104% >> nci 100% 149% 141% | 100% 143% 135% >> ooffice 99% 121% 123% | 100% 97% 98% >> osdb 100% 99% 99% | 100% 100% 99% >> reymont 99% 130% 132% | 100% 106% 107% >> samba 100% 126% 132% | 100% 105% 111% >> sao 100% 100% 99% | 100% 100% 100% >> webster 100% 127% 127% | 100% 106% 106% >> x-ray 99% 99% 99% | 100% 100% 100% >> xml 100% 144% 144% | 100% 130% 128% >> >> and on the other one (xeon e5-2620v4) looks like this: >> >> patch 1 | patch 2 >> dataset prefix infix suffix | prefix infix suffix >> ------------------------------------------------------------------------ >> 000000010000000000000001 98% 147% 170% | 98% 132% 159% >> 000000010000000000000006 100% 340% 314% | 98% 334% 355% >> 000000010000000000000008 99% 100% 105% | 99% 99% 101% >> 16398 101% 153% 205% | 99% 148% 201% >> shakespeare.txt 100% 147% 149% | 99% 117% 118% >> mr 100% 131% 139% | 99% 112% 108% >> dickens 100% 143% 143% | 99% 103% 102% >> mozilla 100% 122% 122% | 99% 105% 106% >> nci 100% 151% 135% | 100% 135% 125% >> ooffice 99% 127% 129% | 98% 101% 102% >> osdb 102% 100% 101% | 102% 100% 99% >> reymont 101% 142% 143% | 100% 108% 108% >> samba 100% 132% 136% | 99% 109% 112% >> sao 99% 101% 100% | 99% 100% 100% >> webster 100% 132% 129% | 100% 106% 106% >> x-ray 99% 101% 100% | 90% 101% 101% >> xml 100% 147% 148% | 100% 127% 125% >> >> In general, I think the results for both patches seem clearly a win, but >> maybe patch 1 is bit better, especially on the newer (xeon) CPU. So I'd >> probably go with that one. > > > >From my POV there are two interesting new points in your benchmarks: >1. They are more or lesss end-to-end benchmarks with whole system involved. >2. They provide per-payload breakdown > Yes. I was considering using the test_pglz extension first, but in the end I decided an end-to-end test is easier to do and more relevant. >Prefix experiment is mostly related to reading from page cache and not >directly connected with decompression. It's a bit strange that we >observe 1% degradation in certain experiments, but I believe it's a >noise. > Yes, I agree it's probably noise - it's not always a degradation, there are cases where it actually improves by ~1%. Perhaps more runs would even this out, or maybe it's due to different bin layout or something. I should have some results from a test with longer (10-minute) run soon, but I don't think this is a massive issue. >Infix and Suffix results are correlated. We observe no impact of the >patch on compressed data. > TBH I have not looked at which data sets are compressible etc. so I can't really comment on this. FWIW the reason why I did the prefix/infix/suffix is primarily that I was involved in some recent patches tweaking TOAST slicing, so I wanted to se if this happens to negatively affect it somehow. And it does not. >test_pglz also includes slicing by 2Kb and 8Kb. This was done to >imitate toasting. But as far as I understand, in your test data payload >will be inserted into toast table too, won't it? If so, I agree that >patch 1 looks like a better option. > Yes, the tests simply do whatever PostgreSQL would do when loading and storing this data, including TOASTing. >> 27 нояб. 2019 г., в 1:05, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >> >> Code-wise I think the patches are mostly fine, although the comments >> might need some proof-reading. >> >> 1) I wasn't really sure what a "nibble" is, but maybe it's just me and >> it's a well-known term. >I've took the word from pg_lzcompress.c comments > * The offset is in the upper nibble of T1 and in T2. > * The length is in the lower nibble of T1. Aha, good. I haven't noticed that word before, so I assumed it's introduced by those patches. And the first thing I thought of was "nibbles" video game [1]. Which obviously left me a bit puzzled ;-) But it seems to be a well-known term, I just never heard it before. [1] https://en.wikipedia.org/wiki/Nibbles_(video_game) >> >> 2) First byte use lower -> First byte uses lower >> >> 3) nibble contain upper -> nibble contains upper >> >> 4) to preven possible uncertanity -> to prevent possible uncertainty >> >> 5) I think we should briefly explain why memmove would be incompatible >> with pglz, it's not quite clear to me. >Here's the example >+ * Consider input: 112341234123412341234 >+ * At byte 5 here ^ we have match with length 16 and >+ * offset 4. 11234M(len=16, off=4) > >If we simply memmove() this 16 bytes we will produce >112341234XXXXXXXXXXXX, where series of X is 12 undefined bytes, that >were at bytes [6:18]. > OK, thanks. >> >> 6) I'm pretty sure the comment in the 'while (off < len)' branch will be >> badly mangled by pgindent. > >I think I can just write it without line limit and then run pgindent. >Will try to do it this evening. Also, I will try to write more about >memmove. > >> >> 7) The last change moving "copy" to the next line seems unnecessary. > >Oh, looks like I had been rewording this comment, and eventually came >to the same text..Yes, this change is absolutely unnecessary. > >Thanks! > Good. I'll wait for an updated version of the patch and then try to get it pushed by the end of the CF. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 27 нояб. 2019 г., в 20:28, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): > >>> >>> 6) I'm pretty sure the comment in the 'while (off < len)' branch will be >>> badly mangled by pgindent. >> >> I think I can just write it without line limit and then run pgindent. >> Will try to do it this evening. Also, I will try to write more about >> memmove. Well, yes, I could not make pgindent format some parts of that comment, gave up and left only simple text. >> >>> >>> 7) The last change moving "copy" to the next line seems unnecessary. >> >> Oh, looks like I had been rewording this comment, and eventually came >> to the same text..Yes, this change is absolutely unnecessary. >> >> Thanks! >> > > Good. I'll wait for an updated version of the patch and then try to get > it pushed by the end of the CF. PFA v5. Thanks! Best regards, Andrey Borodin.
Attachment
On 2019-Nov-27, Andrey Borodin wrote: > > > > 27 нояб. 2019 г., в 20:28, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): > > > >>> > >>> 6) I'm pretty sure the comment in the 'while (off < len)' branch will be > >>> badly mangled by pgindent. > >> > >> I think I can just write it without line limit and then run pgindent. > >> Will try to do it this evening. Also, I will try to write more about > >> memmove. > Well, yes, I could not make pgindent format some parts of that > comment, gave up and left only simple text. Please don't. The way to avoid pgindent from messing with the comment is to surround it with dashes, /*--------- and end it with *------- */ Just make sure that you use well-aligned and lines shorter than 80, for cleanliness; but whatever you do, if you use the dashes then pgindent won't touch it. (I think the closing dash line is not necessary, but it looks better for things to be symmetrical.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Nov 27, 2019 at 04:28:18PM +0100, Tomas Vondra wrote: > Yes. I was considering using the test_pglz extension first, but in the > end I decided an end-to-end test is easier to do and more relevant. I actually got something in this area in one of my trees: https://github.com/michaelpq/pg_plugins/tree/master/compress_test > Good. I'll wait for an updated version of the patch and then try to get > it pushed by the end of the CF. Sounds like a plan. Thanks. -- Michael
Attachment
On Wed, Nov 27, 2019 at 11:27:49PM +0500, Andrey Borodin wrote: > > >> 27 нояб. 2019 г., в 20:28, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >> >>>> >>>> 6) I'm pretty sure the comment in the 'while (off < len)' branch will be >>>> badly mangled by pgindent. >>> >>> I think I can just write it without line limit and then run pgindent. >>> Will try to do it this evening. Also, I will try to write more about >>> memmove. >Well, yes, I could not make pgindent format some parts of that comment, gave up and left only simple text. >>> >>>> >>>> 7) The last change moving "copy" to the next line seems unnecessary. >>> >>> Oh, looks like I had been rewording this comment, and eventually came >>> to the same text..Yes, this change is absolutely unnecessary. >>> >>> Thanks! >>> >> >> Good. I'll wait for an updated version of the patch and then try to get >> it pushed by the end of the CF. > OK, pushed, with some minor cosmetic tweaks on the comments (essentially using the formatting trick pointed out by Alvaro), and removing one unnecessary change in pglz_maximum_compressed_size. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 29 нояб. 2019 г., в 3:43, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): > > OK, pushed, with some minor cosmetic tweaks on the comments (essentially > using the formatting trick pointed out by Alvaro), and removing one > unnecessary change in pglz_maximum_compressed_size. Cool, thanks! > pglz_maximum_compressed_size It was an artifact of pgindent. Best regards, Andrey Borodin.
On Fri, Nov 29, 2019 at 10:18:18AM +0500, Andrey Borodin wrote: >> 29 нояб. 2019 г., в 3:43, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): >> >> OK, pushed, with some minor cosmetic tweaks on the comments (essentially >> using the formatting trick pointed out by Alvaro), and removing one >> unnecessary change in pglz_maximum_compressed_size. > > Cool, thanks! Yippee. Thanks. -- Michael
Attachment
Hi Petr! > 4 авг. 2019 г., в 05:41, Petr Jelinek <petr@2ndquadrant.com> написал(а): > > Just so that we don't idly talk, what do you think about the attached? > It: > - adds new GUC compression_algorithm with possible values of pglz (default) and lz4 (if lz4 is compiled in), requires SIGHUP > - adds --with-lz4 configure option (default yes, so the configure option is actually --without-lz4) that enables the lz4,it's using system library Do you plan to work on lz4 aiming at 13 or 14? Maybe let's register it on CF 2020-07? Best regards, Andrey Borodin.