Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Chris Travers
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id CAN-RpxDFZUduYmOociaYHsnh3vw2E8wjsQ+Ht1xRvemp3H=6WA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [HACKERS] Custom compression methods
List pgsql-hackers


On Mon, Mar 18, 2019 at 11:09 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


On 3/15/19 12:52 PM, Ildus Kurbangaliev wrote:
> On Fri, 15 Mar 2019 14:07:14 +0400
> David Steele <david@pgmasters.net> wrote:
>
>> On 3/7/19 11:50 AM, Alexander Korotkov wrote:
>>> On Thu, Mar 7, 2019 at 10:43 AM David Steele <david@pgmasters.net
>>> <mailto:david@pgmasters.net>> wrote:
>>>
>>>     On 2/28/19 5:44 PM, Ildus Kurbangaliev wrote:
>>>
>>>      > there are another set of patches.
>>>      > Only rebased to current master.
>>>      >
>>>      > Also I will change status on commitfest to 'Needs review'.
>>>
>>>     This patch has seen periodic rebases but no code review that I
>>> can see since last January 2018.
>>>
>>>     As Andres noted in [1], I think that we need to decide if this
>>> is a feature that we want rather than just continuing to push it
>>> from CF to CF.
>>>
>>>
>>> Yes.  I took a look at code of this patch.  I think it's in pretty
>>> good shape.  But high level review/discussion is required.
>>
>> OK, but I think this patch can only be pushed one more time, maximum,
>> before it should be rejected.
>>
>> Regards,
>
> Hi,
> in my opinion this patch is usually skipped not because it is not
> needed, but because of its size. It is not hard to maintain it until
> commiters will have time for it or I will get actual response that
> nobody is going to commit it.
>

That may be one of the reasons, yes. But there are other reasons, which
I think may be playing a bigger role.

There's one practical issue with how the patch is structured - the docs
and tests are in separate patches towards the end of the patch series,
which makes it impossible to commit the preceding parts. This needs to
change. Otherwise the patch size kills the patch as a whole.

But there's a more important cost/benefit issue, I think. When I look at
patches as a committer, I naturally have to weight how much time I spend
on getting it in (and then dealing with fallout from bugs etc) vs. what
I get in return (measured in benefits for community, users). This patch
is pretty large and complex, so the "costs" are quite high, while the
benefits from the patch itself is the ability to pick between pg_lz and
zlib. Which is not great, and so people tend to pick other patches.

Now, I understand there's a lot of potential benefits further down the
line, like column-level compression (which I think is the main goal
here). But that's not included in the patch, so the gains are somewhat
far in the future.

Not discussing whether any particular committer should pick this up but I want to discuss an important use case we have at Adjust for this sort of patch.

The PostgreSQL compression strategy is something we find inadequate for at least one of our large deployments (a large debug log spanning 10PB+).  Our current solution is to set storage so that it does not compress and then run on ZFS to get compression speedups on spinning disks.

But running PostgreSQL on ZFS has some annoying costs because we have copy-on-write on copy-on-write, and when you add file fragmentation... I would really like to be able to get away from having to do ZFS as an underlying filesystem.  While we have good write throughput, read throughput is not as good as I would like.

An approach that would give us better row-level compression  would allow us to ditch the COW filesystem under PostgreSQL approach.

So I think the benefits are actually quite high particularly for those dealing with volume/variety problems where things like JSONB might be a go-to solution.  Similarly I could totally see having systems which handle large amounts of specialized text having extensions for dealing with these.

But hey, I think there are committers working for postgrespro, who might
have the motivation to get this over the line. Of course, assuming that
there are no serious objections to having this functionality or how it's
implemented ... But I don't think that was the case.

While I am not currently able to speak for questions of how it is implemented, I can say with very little doubt that we would almost certainly use this functionality if it were there and I could see plenty of other cases where this would be a very appropriate direction for some other projects as well.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: partitioned tables referenced by FKs
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] CLUSTER command progress monitor