Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | 03c376ed-839f-35f4-5f03-35b21b47e9a2@postgrespro.ru Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
Responses |
Re: [HACKERS] Custom compression methods
Re: [HACKERS] Custom compression methods Re: [HACKERS] Custom compression methods |
List | pgsql-hackers |
On 23.04.2018 18:32, Alexander Korotkov wrote:
But that the main goal of this patch: let somebody implement own compressionalgorithm which best fit for particular dataset.
Hmmm...Frankly speaking I don't believe in this "somebody".
From my point of view the main value of this patch is that it allows to replace pglz algorithm with more efficient one, for example zstd.
At some data sets zstd provides more than 10 times better compression ratio and at the same time is faster then pglz.Not exactly. If we want to replace pglz with more efficient one, then we shouldjust replace pglz with better algorithm. Pluggable compression methods aredefinitely don't worth it for just replacing pglz with zstd.
As far as I understand it is not possible for many reasons (portability, patents,...) to replace pglz with zstd.
I think that even replacing pglz with zlib (which is much worser than zstd) will not be accepted by community.
So from my point of view the main advantage of custom compression method is to replace builting pglz compression with more advanced one.
Some types blob-like datatypes might be not long enough to let genericcompression algorithms like zlib or zstd train a dictionary. For example,MySQL successfully utilize column-level dictionaries for JSON [1]. AlsoJSON(B) might utilize some compression which let user extractparticular attributes without decompression of the whole document.
Well, I am not an expert in compression.
But I will be very surprised if somebody will show me some real example with large enough compressed data buffer (>2kb) where some specialized algorithm will provide significantly
better compression ratio than advanced universal compression algorithm.
Also may be I missed something, but current compression API doesn't support partial extraction (extra some particular attribute or range).
If we really need it, then it should be expressed in custom compressor API. But I am not sure how frequently it will needed.
Large values are splitted into 2kb TOAST chunks. With compression it can be about 4-8k of raw data. IMHO storing larger JSON objects is database design flaw.
And taken in account that in JSONB we need also extract header (so at least two chunks), it makes more obscure advantages of partial JSONB decompression.
I do not think that assignment default compression method through GUC is so bad idea.It's probably not so bad, but it's a different story. Unrelated to this patch, I think.
May be. But in any cases, there are several direction where compression can be used:
- custom compression algorithms
- libpq compression
- page level compression
...
and them should be somehow finally "married" with each other.
Sorry, I really looking at this patch under the different angle.I think streaming compression seems like a completely different story.client-server traffic compression is not just server feature. It mustbe also supported at client side. And I really doubt it should bepluggable.In my opinion, you propose good things like compression of WALwith better algorithm and compression of client-server traffic.But I think those features are unrelated to this patch and shouldbe considered separately. It's not features, which should beadded to this patch. Regarding this patch the points you providedmore seems like criticism of the general idea.I think the problem of this patch is that it lacks of good example.It would be nice if Ildus implement simple compression withcolumn-defined dictionary (like [1] does), and show its efficiencyof real-life examples, which can't be achieved with genericcompression methods (like zlib or zstd). That would be a goodanswer to the criticism you provide.Links------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
And this is why I have some doubts about general idea.
Postgres allows to defined custom types, access methods,...
But do you know any production system using some special data types or custom indexes which are not included in standard Postgres distribution
or popular extensions (like postgis)?
IMHO end-user do not have skills and time to create their own compression algorithms. And without knowledge of specific of particular data set,
it is very hard to implement something more efficient than universal compression library.
But if you think that it is not a right place and time to discuss it, I do not insist.
But in any case, I think that it will be useful to provide some more examples of custom compression API usage.
From my point of view the most useful will be integration with zstd.
But if it is possible to find some example of data-specific compression algorithms which show better results than universal compression,
it will be even more impressive.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: