Re: pglz performance - Mailing list pgsql-hackers

From Binguo Bao
Subject Re: pglz performance
Date
Msg-id CAL-OGkuVCjsHfCE0wa9sz0CsMZvk53jYCW8a1Wi0TTagFgLsDQ@mail.gmail.com
Whole thread Raw
In response to pglz performance  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers
Hi hackers!
I am a student participating in GSoC 2019. I am looking forward to working with you all and learning from you.
My project would aim to provide the ability to de-TOAST a fully TOAST'd and compressed field using an iterator.
For more details, please take a look at my proposal[0]. Any suggestions or comments about my immature ideas would be much appreciated:)

I've implemented the first step of the project, the segment pglz compression provides the ability to get the subset of the raw data without decompressing the entire field.
And I've done some test[1] for the compressor. The test result is as follows:
NOTICE:  Test summary:
NOTICE:  Payload 000000010000000000000001
NOTICE:       Decompressor name      |    Compression time (ns/bit)  | Decompression time (ns/bit) | ratio  
NOTICE:   pglz_decompress_hacked     |            23.747444           |          0.578344         | 0.159809
NOTICE:   pglz_decompress_hacked8    |            23.764193           |          0.677800         | 0.159809
NOTICE:   pglz_decompress_hacked16   |            23.740351           |          0.704730         | 0.159809
NOTICE:   pglz_decompress_vanilla    |            23.797917           |          1.227868         | 0.159809
NOTICE:   pglz_decompress_hacked_seg |            12.261808           |          0.625634         | 0.184952

Comment: Compression speed increased by nearly 100% with compression rate dropped by 15%

NOTICE:  Payload 000000010000000000000001 sliced by 2Kb
NOTICE:   pglz_decompress_hacked     |            12.616956           |          0.621223         | 0.156953
NOTICE:   pglz_decompress_hacked8    |            12.583685           |          0.756741         | 0.156953
NOTICE:   pglz_decompress_hacked16   |            12.512636           |          0.774980         | 0.156953
NOTICE:   pglz_decompress_vanilla    |            12.493062           |          1.262820         | 0.156953
NOTICE:   pglz_decompress_hacked_seg |            11.986554           |          0.622654         | 0.159590
NOTICE:  Payload 000000010000000000000001 sliced by 4Kb
NOTICE:   pglz_decompress_hacked     |            15.514469           |          0.565565         | 0.154213
NOTICE:   pglz_decompress_hacked8    |            15.529144           |          0.699675         | 0.154213
NOTICE:   pglz_decompress_hacked16   |            15.514040           |          0.721145         | 0.154213
NOTICE:   pglz_decompress_vanilla    |            15.558958           |          1.237237         | 0.154213
NOTICE:   pglz_decompress_hacked_seg |            14.650309           |          0.563228         | 0.153652
NOTICE:  Payload 000000010000000000000006
NOTICE:       Decompressor name      |  Compression time (ns/bit)  | Decompression time (ns/bit) | ratio  
NOTICE:   pglz_decompress_hacked     |            8.610177           |          0.153577         | 0.052294
NOTICE:   pglz_decompress_hacked8    |            8.566785           |          0.168002         | 0.052294
NOTICE:   pglz_decompress_hacked16   |            8.643126           |          0.167537         | 0.052294
NOTICE:   pglz_decompress_vanilla    |            8.574498           |          0.930738         | 0.052294
NOTICE:   pglz_decompress_hacked_seg |            7.394731           |          0.171924         | 0.056081
NOTICE:  Payload 000000010000000000000006 sliced by 2Kb
NOTICE:   pglz_decompress_hacked     |            6.724060           |          0.295043         | 0.065541
NOTICE:   pglz_decompress_hacked8    |            6.623018           |          0.318527         | 0.065541
NOTICE:   pglz_decompress_hacked16   |            6.898034           |          0.318360         | 0.065541
NOTICE:   pglz_decompress_vanilla    |            6.712711           |          1.045430         | 0.065541
NOTICE:   pglz_decompress_hacked_seg |            6.630743           |          0.302589         | 0.068471
NOTICE:  Payload 000000010000000000000006 sliced by 4Kb
NOTICE:   pglz_decompress_hacked     |            6.624067           |          0.220942         | 0.058865
NOTICE:   pglz_decompress_hacked8    |            6.659424           |          0.240183         | 0.058865
NOTICE:   pglz_decompress_hacked16   |            6.763864           |          0.240564         | 0.058865
NOTICE:   pglz_decompress_vanilla    |            6.743574           |          0.985348         | 0.058865
NOTICE:   pglz_decompress_hacked_seg |            6.613123           |          0.227582         | 0.060330
NOTICE:  Payload 000000010000000000000008
NOTICE:       Decompressor name      |  Compression time (ns/bit)  | Decompression time (ns/bit) | ratio  
NOTICE:   pglz_decompress_hacked     |            52.425957           |          1.050544         | 0.498941
NOTICE:   pglz_decompress_hacked8    |            52.204561           |          1.261592         | 0.498941
NOTICE:   pglz_decompress_hacked16   |            52.328491           |          1.466751         | 0.498941
NOTICE:   pglz_decompress_vanilla    |            52.465308           |          1.341271         | 0.498941
NOTICE:   pglz_decompress_hacked_seg |            31.896341           |          1.113260         | 0.600998
NOTICE:  Payload 000000010000000000000008 sliced by 2Kb
NOTICE:   pglz_decompress_hacked     |            30.620611           |          0.768542         | 0.351941
NOTICE:   pglz_decompress_hacked8    |            30.557334           |          0.907421         | 0.351941
NOTICE:   pglz_decompress_hacked16   |            32.064903           |          1.208913         | 0.351941
NOTICE:   pglz_decompress_vanilla    |            30.489886           |          1.014197         | 0.351941
NOTICE:   pglz_decompress_hacked_seg |            27.145243           |          0.774193         | 0.352868
NOTICE:  Payload 000000010000000000000008 sliced by 4Kb
NOTICE:   pglz_decompress_hacked     |            36.567903           |          1.054633         | 0.514047
NOTICE:   pglz_decompress_hacked8    |            36.459124           |          1.267731         | 0.514047
NOTICE:   pglz_decompress_hacked16   |            36.791718           |          1.479650         | 0.514047
NOTICE:   pglz_decompress_vanilla    |            36.241913           |          1.303136         | 0.514047
NOTICE:   pglz_decompress_hacked_seg |            31.526327           |          1.059926         | 0.526875
NOTICE:  Payload 16398
NOTICE:       Decompressor name      |  Compression time (ns/bit)  | Decompression time (ns/bit) | ratio  
NOTICE:   pglz_decompress_hacked     |            9.508625           |          0.435190         | 0.071816
NOTICE:   pglz_decompress_hacked8    |            9.546987           |          0.473871         | 0.071816
NOTICE:   pglz_decompress_hacked16   |            9.534496           |          0.471662         | 0.071816
NOTICE:   pglz_decompress_vanilla    |            9.559053           |          1.352561         | 0.071816
NOTICE:   pglz_decompress_hacked_seg |            8.479486           |          0.441536         | 0.073232
NOTICE:  Payload 16398 sliced by 2Kb
NOTICE:   pglz_decompress_hacked     |            6.808167           |          0.326570         | 0.082775
NOTICE:   pglz_decompress_hacked8    |            6.790743           |          0.361720         | 0.082775
NOTICE:   pglz_decompress_hacked16   |            6.886097           |          0.364549         | 0.082775
NOTICE:   pglz_decompress_vanilla    |            6.918429           |          1.191265         | 0.082775
NOTICE:   pglz_decompress_hacked_seg |            6.752811           |          0.340805         | 0.085705
NOTICE:  Payload 16398 sliced by 4Kb
NOTICE:   pglz_decompress_hacked     |            7.244472           |          0.261872         | 0.076860
NOTICE:   pglz_decompress_hacked8    |            7.290275           |          0.295988         | 0.076860
NOTICE:   pglz_decompress_hacked16   |            7.340706           |          0.294683         | 0.076860
NOTICE:   pglz_decompress_vanilla    |            7.429289           |          1.151645         | 0.076860
NOTICE:   pglz_decompress_hacked_seg |            7.054166           |          0.267896         | 0.078325
NOTICE:  Payload shakespeare.txt
NOTICE:       Decompressor name      |  Compression time (ns/bit)  | Decompression time (ns/bit) | ratio  
NOTICE:   pglz_decompress_hacked     |            25.998753           |          1.345542         | 0.281363
NOTICE:   pglz_decompress_hacked8    |            26.121630           |          1.917667         | 0.281363
NOTICE:   pglz_decompress_hacked16   |            26.139312           |          2.101329         | 0.281363
NOTICE:   pglz_decompress_vanilla    |            26.155571           |          2.082123         | 0.281363
NOTICE:   pglz_decompress_hacked_seg |            16.792089           |          1.951269         | 0.436558

Comment: In this case, the compression rate has dropped dramatically.
 
NOTICE:  Payload shakespeare.txt sliced by 2Kb
NOTICE:   pglz_decompress_hacked     |            14.992793           |          1.923663         | 0.436270
NOTICE:   pglz_decompress_hacked8    |            14.982428           |          2.695319         | 0.436270
NOTICE:   pglz_decompress_hacked16   |            15.211803           |          2.846615         | 0.436270
NOTICE:   pglz_decompress_vanilla    |            15.113214           |          2.580098         | 0.436270
NOTICE:   pglz_decompress_hacked_seg |            15.120852           |          1.922596         | 0.439199
NOTICE:  Payload shakespeare.txt sliced by 4Kb
NOTICE:   pglz_decompress_hacked     |            18.083400           |          1.687598         | 0.366936
NOTICE:   pglz_decompress_hacked8    |            18.185038           |          2.395928         | 0.366936
NOTICE:   pglz_decompress_hacked16   |            18.096120           |          2.554812         | 0.366936
NOTICE:   pglz_decompress_vanilla    |            18.435380           |          2.329129         | 0.366936
NOTICE:   pglz_decompress_hacked_seg |            18.103267           |          1.705517         | 0.368400
NOTICE:  

Decompressor score (summ of all times):
NOTICE:  Decompressor pglz_decompress_hacked     result 11.288848
NOTICE:  Decompressor pglz_decompress_hacked8    result 14.438165
NOTICE:  Decompressor pglz_decompress_hacked16   result 15.716280
NOTICE:  Decompressor pglz_decompress_vanilla    result 21.034867
NOTICE:  Decompressor pglz_decompress_hacked_seg result 12.090609
NOTICE:  

compressor score (summ of all times):
NOTICE:  compressor pglz_compress_vanilla result 276.776671
NOTICE:  compressor pglz_compress_hacked_seg result 222.407850

There are some questions now:
1. The compression algorithm is not compatible with the original compression algorithm now.
2. If the idea works, we need to test more data, what kind of data is more appropriate?
Any comments are much appreciated.

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: "long" type is not appropriate for counting tuples
Next
From: Tom Lane
Date:
Subject: Re: "long" type is not appropriate for counting tuples