Thread: Toast compression method options

Toast compression method options

From
Dilip Kumar
Date:
We have already pushed the configurable lz4 toast compression code[1].
In the custom compression thread, we were already having the patch to
support the compression method options[2].  But the design for the
base patches was heavily modified before commit but I never rebased
this patch based on the new design.  Now, I have rebased this patch so
that we don't lose track and we can work on this for v15.  This is
still a WIP patch.

For v15 I will work on improving the code and I will also work on
analyzing the usage of compression method options (compression
speed/ratio).

[1] https://www.postgresql.org/message-id/E1lNKw9-0008DT-1L%40gemulon.postgresql.org
[2] https://www.postgresql.org/message-id/CAFiTN-s7fno8pGwfK7jwSf7uNaVhPZ38C3LAcF%2B%3DWHu7jNvy7g%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Toast compression method options

From
Dilip Kumar
Date:
On Mon, May 3, 2021 at 6:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> We have already pushed the configurable lz4 toast compression code[1].
> In the custom compression thread, we were already having the patch to
> support the compression method options[2].  But the design for the
> base patches was heavily modified before commit but I never rebased
> this patch based on the new design.  Now, I have rebased this patch so
> that we don't lose track and we can work on this for v15.  This is
> still a WIP patch.
>
> For v15 I will work on improving the code and I will also work on
> analyzing the usage of compression method options (compression
> speed/ratio).
>
> [1] https://www.postgresql.org/message-id/E1lNKw9-0008DT-1L%40gemulon.postgresql.org
> [2] https://www.postgresql.org/message-id/CAFiTN-s7fno8pGwfK7jwSf7uNaVhPZ38C3LAcF%2B%3DWHu7jNvy7g%40mail.gmail.com
>

I have fixed some comments offlist reported by Justin.  Apart from
that, I have also improved documentation and test case.  Stil it has a
lot of cleanup to be done but I am not planning to do that
immediately.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Toast compression method options

From
Michael Paquier
Date:
On Thu, May 06, 2021 at 07:23:48PM +0530, Dilip Kumar wrote:
> I have fixed some comments offlist reported by Justin.  Apart from
> that, I have also improved documentation and test case.  Stil it has a
> lot of cleanup to be done but I am not planning to do that
> immediately.

I was testing the various compression algos we touch in core, and I am
not really convinced that we need more code to control that.  First,
pglz is living as-is in the backend code for a very long time and no
one has expressed an interest in controlling the compression strategy
used AFAIK.  On top of that, LZ4 outclasses it easily, so if there is
a need to worry about performance with the TOAST data, users could
just move to use LZ4.

+       if (strcmp(def->defname, "acceleration") == 0)
+       {
+           int32 acceleration =
+               pg_atoi(defGetString(def), sizeof(acceleration), 0);
+
+           if (acceleration < INT32_MIN || acceleration > INT32_MAX)
+               ereport(ERROR,
+                   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                    errmsg("unexpected value for lz4 compression acceleration: \"%s\"",
+                               defGetString(def)),
+                    errhint("expected value between INT32_MIN and
INT32_MAX")
+                   ));

Then comes the part with LZ4 and its acceleration.  The default
compression level used by LZ4 compresses data the most, and it is
already pretty cheap in CPU.  Do you have cases where this would be
useful?  Increasing the acceleration reduces the compression to be
close to zero, but if one cares about the compression cost, he/she
could fall back to the external storage for basically the same
effect.  Is there really a use-case for something in-between?
--
Michael

Attachment

Re: Toast compression method options

From
Dilip Kumar
Date:
On Fri, Jun 18, 2021 at 12:13 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, May 06, 2021 at 07:23:48PM +0530, Dilip Kumar wrote:
> > I have fixed some comments offlist reported by Justin.  Apart from
> > that, I have also improved documentation and test case.  Stil it has a
> > lot of cleanup to be done but I am not planning to do that
> > immediately.
>
> I was testing the various compression algos we touch in core, and I am
> not really convinced that we need more code to control that.  First,
> pglz is living as-is in the backend code for a very long time and no
> one has expressed an interest in controlling the compression strategy
> used AFAIK.  On top of that, LZ4 outclasses it easily, so if there is
> a need to worry about performance with the TOAST data, users could
> just move to use LZ4.
>
> +       if (strcmp(def->defname, "acceleration") == 0)
> +       {
> +           int32 acceleration =
> +               pg_atoi(defGetString(def), sizeof(acceleration), 0);
> +
> +           if (acceleration < INT32_MIN || acceleration > INT32_MAX)
> +               ereport(ERROR,
> +                   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                    errmsg("unexpected value for lz4 compression acceleration: \"%s\"",
> +                               defGetString(def)),
> +                    errhint("expected value between INT32_MIN and
> INT32_MAX")
> +                   ));
>
> Then comes the part with LZ4 and its acceleration.  The default
> compression level used by LZ4 compresses data the most, and it is
> already pretty cheap in CPU.  Do you have cases where this would be
> useful?  Increasing the acceleration reduces the compression to be
> close to zero, but if one cares about the compression cost, he/she
> could fall back to the external storage for basically the same
> effect.  Is there really a use-case for something in-between?

IMHO there is certainly a use case, basically, if we compress the data
so that we can avoid storing it externally.  Now suppose for some
data, with default LZ4 compression, the compression ratio is so high
that you are able to compress to the size which is way under the
limit.  So for such data, the acceleration can be increased such that
compression is fast and compression ratio is good enough that it is
not going to the external storage.  I agree it will be difficult for
the user to make such a decision and select the acceleration value but
based on the data pattern and their compressed length the admin can
make such a decision.  So in short select the acceleration value such
that you can compress it fast and the compression ratio is good enough
to keep it from storing externally.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Toast compression method options

From
Michael Paquier
Date:
On Tue, Jun 22, 2021 at 11:05:22AM +0530, Dilip Kumar wrote:
> IMHO there is certainly a use case, basically, if we compress the data
> so that we can avoid storing it externally.  Now suppose for some
> data, with default LZ4 compression, the compression ratio is so high
> that you are able to compress to the size which is way under the
> limit.  So for such data, the acceleration can be increased such that
> compression is fast and compression ratio is good enough that it is
> not going to the external storage.  I agree it will be difficult for
> the user to make such a decision and select the acceleration value but
> based on the data pattern and their compressed length the admin can
> make such a decision.  So in short select the acceleration value such
> that you can compress it fast and the compression ratio is good enough
> to keep it from storing externally.

Theoritically, I agree that there could be a use case, and that was
the point I was trying to outline above.  My point is more from a
practical point of view.  LZ4 is designed to be fast and cheap in CPU
with a rather low compression ratio compared to other modern algos.

Could it be possible to think about some worst cases where one may
want to reduce its compression to save some CPU?  The point, as you
say, to allow a tuning of the acceleration would be that one may want
to save a bit of CPU and does not care about the extra disk space it
takes.  Still, I am wondering why one would not just store the values
externally in such cases and just save as much compression effort as
possible.
--
Michael

Attachment

Re: Toast compression method options

From
vignesh C
Date:
On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, May 3, 2021 at 6:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > We have already pushed the configurable lz4 toast compression code[1].
> > In the custom compression thread, we were already having the patch to
> > support the compression method options[2].  But the design for the
> > base patches was heavily modified before commit but I never rebased
> > this patch based on the new design.  Now, I have rebased this patch so
> > that we don't lose track and we can work on this for v15.  This is
> > still a WIP patch.
> >
> > For v15 I will work on improving the code and I will also work on
> > analyzing the usage of compression method options (compression
> > speed/ratio).
> >
> > [1] https://www.postgresql.org/message-id/E1lNKw9-0008DT-1L%40gemulon.postgresql.org
> > [2] https://www.postgresql.org/message-id/CAFiTN-s7fno8pGwfK7jwSf7uNaVhPZ38C3LAcF%2B%3DWHu7jNvy7g%40mail.gmail.com
> >
>
> I have fixed some comments offlist reported by Justin.  Apart from
> that, I have also improved documentation and test case.  Stil it has a
> lot of cleanup to be done but I am not planning to do that
> immediately.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh



Re: Toast compression method options

From
Dilip Kumar
Date:
On Tue, Jun 22, 2021 at 1:37 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jun 22, 2021 at 11:05:22AM +0530, Dilip Kumar wrote:
> > IMHO there is certainly a use case, basically, if we compress the data
> > so that we can avoid storing it externally.  Now suppose for some
> > data, with default LZ4 compression, the compression ratio is so high
> > that you are able to compress to the size which is way under the
> > limit.  So for such data, the acceleration can be increased such that
> > compression is fast and compression ratio is good enough that it is
> > not going to the external storage.  I agree it will be difficult for
> > the user to make such a decision and select the acceleration value but
> > based on the data pattern and their compressed length the admin can
> > make such a decision.  So in short select the acceleration value such
> > that you can compress it fast and the compression ratio is good enough
> > to keep it from storing externally.
>
> Theoritically, I agree that there could be a use case, and that was
> the point I was trying to outline above.  My point is more from a
> practical point of view.  LZ4 is designed to be fast and cheap in CPU
> with a rather low compression ratio compared to other modern algos.
>
> Could it be possible to think about some worst cases where one may
> want to reduce its compression to save some CPU?  The point, as you
> say, to allow a tuning of the acceleration would be that one may want
> to save a bit of CPU and does not care about the extra disk space it
> takes.  Still, I am wondering why one would not just store the values
> externally in such cases and just save as much compression effort as
> possible.

Well, that actually depends upon the data, basically, LZ4 acceleration
searches in wider increments, which may reduce the number of potential
matches but increase the speed.  So based on the actual data pattern
it is highly possible that you get the speed benefit without losing
much or nothing on the compression ratio.  So IMHO, this is user
exposed option so based on the user's data pattern why it is not wise
to provide an option for the user to give the acceration when the user
is sure that selecting a better speed will not harm anything on
compression ratio for their data pattern.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Toast compression method options

From
Dilip Kumar
Date:
On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote
>
> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".

Okay, I will rebase and send it by next week.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Toast compression method options

From
Dilip Kumar
Date:
On Sun, Jul 18, 2021 at 9:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote
> >
> > The patch does not apply on Head anymore, could you rebase and post a
> > patch. I'm changing the status to "Waiting for Author".
>
> Okay, I will rebase and send it by next week.

I have rebased the patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Toast compression method options

From
Jaime Casanova
Date:
On Mon, Jul 19, 2021 at 01:24:03PM +0530, Dilip Kumar wrote:
> On Sun, Jul 18, 2021 at 9:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote
> > >
> > > The patch does not apply on Head anymore, could you rebase and post a
> > > patch. I'm changing the status to "Waiting for Author".
> >
> > Okay, I will rebase and send it by next week.
> 
> I have rebased the patch.
> 

Hi,

This doesn't apply cleanly nor compile.
Are you planning to send a rebased version?

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL



Re: Toast compression method options

From
Dilip Kumar
Date:
On Fri, 10 Sep 2021 at 10:40 AM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote:
On Mon, Jul 19, 2021 at 01:24:03PM +0530, Dilip Kumar wrote:
> On Sun, Jul 18, 2021 at 9:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote
> > >
> > > The patch does not apply on Head anymore, could you rebase and post a
> > > patch. I'm changing the status to "Waiting for Author".
> >
> > Okay, I will rebase and send it by next week.
>
> I have rebased the patch.
>

Hi,

This doesn't apply cleanly nor compile.
Are you planning to send a rebased version?

I will do that early next week.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Toast compression method options

From
Jaime Casanova
Date:
On Fri, Sep 10, 2021 at 10:54:04AM +0530, Dilip Kumar wrote:
> On Fri, 10 Sep 2021 at 10:40 AM, Jaime Casanova <
> jcasanov@systemguards.com.ec> wrote:
> 
> > On Mon, Jul 19, 2021 at 01:24:03PM +0530, Dilip Kumar wrote:
> > > On Sun, Jul 18, 2021 at 9:15 PM Dilip Kumar <dilipbalaut@gmail.com>
> > wrote:
> > > >
> > > > On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote:
> > > > >
> > > > > On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com>
> > wrote
> > > > >
> > > > > The patch does not apply on Head anymore, could you rebase and post a
> > > > > patch. I'm changing the status to "Waiting for Author".
> > > >
> > > > Okay, I will rebase and send it by next week.
> > >
> > > I have rebased the patch.
> > >
> >
> > Hi,
> >
> > This doesn't apply cleanly nor compile.
> > Are you planning to send a rebased version?
> 
> 
> I will do that early next week.
> 

Great! I'm marking the patch as "waiting on author". 
Thanks for keep working on this.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL



Re: Toast compression method options

From
Dilip Kumar
Date:
On Fri, Sep 10, 2021 at 9:54 PM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

> > I will do that early next week.
> >
>
> Great! I'm marking the patch as "waiting on author".
> Thanks for keep working on this.
>

I haved rebased the patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Toast compression method options

From
Michael Paquier
Date:
On Mon, Sep 13, 2021 at 05:10:22PM +0530, Dilip Kumar wrote:
> I haved rebased the patch.

Please note that the patch does not apply.  FWIW, I still don't think
that this is a good idea to have that.  I don't recall seeing much on
this list that users would like to have such a level of tuning for
pglz, while lz4 only offers the option to reduce the compression rate
while being already very cheap in CPU so the impact is limited.  On
top of that, this adds a new attribute to pg_attribute with much more
complexity into ALTER TABLE code paths in tablecmds.c..
--
Michael

Attachment

Re: Toast compression method options

From
Dilip Kumar
Date:
On Fri, Oct 1, 2021 at 12:42 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Sep 13, 2021 at 05:10:22PM +0530, Dilip Kumar wrote:
> > I haved rebased the patch.
>
> Please note that the patch does not apply.  FWIW, I still don't think
> that this is a good idea to have that.  I don't recall seeing much on
> this list that users would like to have such a level of tuning for
> pglz, while lz4 only offers the option to reduce the compression rate
> while being already very cheap in CPU so the impact is limited.  On
> top of that, this adds a new attribute to pg_attribute with much more
> complexity into ALTER TABLE code paths in tablecmds.c..

Thanks for the feedback, seeing no much interest from other hackers
and also as Michael pointed out that there is no much use case for
this, I am withdrawing this patch from the commitfest.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com