Thread: Toast compression method options
We have already pushed the configurable lz4 toast compression code[1]. In the custom compression thread, we were already having the patch to support the compression method options[2]. But the design for the base patches was heavily modified before commit but I never rebased this patch based on the new design. Now, I have rebased this patch so that we don't lose track and we can work on this for v15. This is still a WIP patch. For v15 I will work on improving the code and I will also work on analyzing the usage of compression method options (compression speed/ratio). [1] https://www.postgresql.org/message-id/E1lNKw9-0008DT-1L%40gemulon.postgresql.org [2] https://www.postgresql.org/message-id/CAFiTN-s7fno8pGwfK7jwSf7uNaVhPZ38C3LAcF%2B%3DWHu7jNvy7g%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
On Mon, May 3, 2021 at 6:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > We have already pushed the configurable lz4 toast compression code[1]. > In the custom compression thread, we were already having the patch to > support the compression method options[2]. But the design for the > base patches was heavily modified before commit but I never rebased > this patch based on the new design. Now, I have rebased this patch so > that we don't lose track and we can work on this for v15. This is > still a WIP patch. > > For v15 I will work on improving the code and I will also work on > analyzing the usage of compression method options (compression > speed/ratio). > > [1] https://www.postgresql.org/message-id/E1lNKw9-0008DT-1L%40gemulon.postgresql.org > [2] https://www.postgresql.org/message-id/CAFiTN-s7fno8pGwfK7jwSf7uNaVhPZ38C3LAcF%2B%3DWHu7jNvy7g%40mail.gmail.com > I have fixed some comments offlist reported by Justin. Apart from that, I have also improved documentation and test case. Stil it has a lot of cleanup to be done but I am not planning to do that immediately. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
On Thu, May 06, 2021 at 07:23:48PM +0530, Dilip Kumar wrote: > I have fixed some comments offlist reported by Justin. Apart from > that, I have also improved documentation and test case. Stil it has a > lot of cleanup to be done but I am not planning to do that > immediately. I was testing the various compression algos we touch in core, and I am not really convinced that we need more code to control that. First, pglz is living as-is in the backend code for a very long time and no one has expressed an interest in controlling the compression strategy used AFAIK. On top of that, LZ4 outclasses it easily, so if there is a need to worry about performance with the TOAST data, users could just move to use LZ4. + if (strcmp(def->defname, "acceleration") == 0) + { + int32 acceleration = + pg_atoi(defGetString(def), sizeof(acceleration), 0); + + if (acceleration < INT32_MIN || acceleration > INT32_MAX) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("unexpected value for lz4 compression acceleration: \"%s\"", + defGetString(def)), + errhint("expected value between INT32_MIN and INT32_MAX") + )); Then comes the part with LZ4 and its acceleration. The default compression level used by LZ4 compresses data the most, and it is already pretty cheap in CPU. Do you have cases where this would be useful? Increasing the acceleration reduces the compression to be close to zero, but if one cares about the compression cost, he/she could fall back to the external storage for basically the same effect. Is there really a use-case for something in-between? -- Michael
Attachment
On Fri, Jun 18, 2021 at 12:13 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, May 06, 2021 at 07:23:48PM +0530, Dilip Kumar wrote: > > I have fixed some comments offlist reported by Justin. Apart from > > that, I have also improved documentation and test case. Stil it has a > > lot of cleanup to be done but I am not planning to do that > > immediately. > > I was testing the various compression algos we touch in core, and I am > not really convinced that we need more code to control that. First, > pglz is living as-is in the backend code for a very long time and no > one has expressed an interest in controlling the compression strategy > used AFAIK. On top of that, LZ4 outclasses it easily, so if there is > a need to worry about performance with the TOAST data, users could > just move to use LZ4. > > + if (strcmp(def->defname, "acceleration") == 0) > + { > + int32 acceleration = > + pg_atoi(defGetString(def), sizeof(acceleration), 0); > + > + if (acceleration < INT32_MIN || acceleration > INT32_MAX) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("unexpected value for lz4 compression acceleration: \"%s\"", > + defGetString(def)), > + errhint("expected value between INT32_MIN and > INT32_MAX") > + )); > > Then comes the part with LZ4 and its acceleration. The default > compression level used by LZ4 compresses data the most, and it is > already pretty cheap in CPU. Do you have cases where this would be > useful? Increasing the acceleration reduces the compression to be > close to zero, but if one cares about the compression cost, he/she > could fall back to the external storage for basically the same > effect. Is there really a use-case for something in-between? IMHO there is certainly a use case, basically, if we compress the data so that we can avoid storing it externally. Now suppose for some data, with default LZ4 compression, the compression ratio is so high that you are able to compress to the size which is way under the limit. So for such data, the acceleration can be increased such that compression is fast and compression ratio is good enough that it is not going to the external storage. I agree it will be difficult for the user to make such a decision and select the acceleration value but based on the data pattern and their compressed length the admin can make such a decision. So in short select the acceleration value such that you can compress it fast and the compression ratio is good enough to keep it from storing externally. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 22, 2021 at 11:05:22AM +0530, Dilip Kumar wrote: > IMHO there is certainly a use case, basically, if we compress the data > so that we can avoid storing it externally. Now suppose for some > data, with default LZ4 compression, the compression ratio is so high > that you are able to compress to the size which is way under the > limit. So for such data, the acceleration can be increased such that > compression is fast and compression ratio is good enough that it is > not going to the external storage. I agree it will be difficult for > the user to make such a decision and select the acceleration value but > based on the data pattern and their compressed length the admin can > make such a decision. So in short select the acceleration value such > that you can compress it fast and the compression ratio is good enough > to keep it from storing externally. Theoritically, I agree that there could be a use case, and that was the point I was trying to outline above. My point is more from a practical point of view. LZ4 is designed to be fast and cheap in CPU with a rather low compression ratio compared to other modern algos. Could it be possible to think about some worst cases where one may want to reduce its compression to save some CPU? The point, as you say, to allow a tuning of the acceleration would be that one may want to save a bit of CPU and does not care about the extra disk space it takes. Still, I am wondering why one would not just store the values externally in such cases and just save as much compression effort as possible. -- Michael
Attachment
On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, May 3, 2021 at 6:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > We have already pushed the configurable lz4 toast compression code[1]. > > In the custom compression thread, we were already having the patch to > > support the compression method options[2]. But the design for the > > base patches was heavily modified before commit but I never rebased > > this patch based on the new design. Now, I have rebased this patch so > > that we don't lose track and we can work on this for v15. This is > > still a WIP patch. > > > > For v15 I will work on improving the code and I will also work on > > analyzing the usage of compression method options (compression > > speed/ratio). > > > > [1] https://www.postgresql.org/message-id/E1lNKw9-0008DT-1L%40gemulon.postgresql.org > > [2] https://www.postgresql.org/message-id/CAFiTN-s7fno8pGwfK7jwSf7uNaVhPZ38C3LAcF%2B%3DWHu7jNvy7g%40mail.gmail.com > > > > I have fixed some comments offlist reported by Justin. Apart from > that, I have also improved documentation and test case. Stil it has a > lot of cleanup to be done but I am not planning to do that > immediately. The patch does not apply on Head anymore, could you rebase and post a patch. I'm changing the status to "Waiting for Author". Regards, Vignesh
On Tue, Jun 22, 2021 at 1:37 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jun 22, 2021 at 11:05:22AM +0530, Dilip Kumar wrote: > > IMHO there is certainly a use case, basically, if we compress the data > > so that we can avoid storing it externally. Now suppose for some > > data, with default LZ4 compression, the compression ratio is so high > > that you are able to compress to the size which is way under the > > limit. So for such data, the acceleration can be increased such that > > compression is fast and compression ratio is good enough that it is > > not going to the external storage. I agree it will be difficult for > > the user to make such a decision and select the acceleration value but > > based on the data pattern and their compressed length the admin can > > make such a decision. So in short select the acceleration value such > > that you can compress it fast and the compression ratio is good enough > > to keep it from storing externally. > > Theoritically, I agree that there could be a use case, and that was > the point I was trying to outline above. My point is more from a > practical point of view. LZ4 is designed to be fast and cheap in CPU > with a rather low compression ratio compared to other modern algos. > > Could it be possible to think about some worst cases where one may > want to reduce its compression to save some CPU? The point, as you > say, to allow a tuning of the acceleration would be that one may want > to save a bit of CPU and does not care about the extra disk space it > takes. Still, I am wondering why one would not just store the values > externally in such cases and just save as much compression effort as > possible. Well, that actually depends upon the data, basically, LZ4 acceleration searches in wider increments, which may reduce the number of potential matches but increase the speed. So based on the actual data pattern it is highly possible that you get the speed benefit without losing much or nothing on the compression ratio. So IMHO, this is user exposed option so based on the user's data pattern why it is not wise to provide an option for the user to give the acceration when the user is sure that selecting a better speed will not harm anything on compression ratio for their data pattern. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote: > > On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote > > The patch does not apply on Head anymore, could you rebase and post a > patch. I'm changing the status to "Waiting for Author". Okay, I will rebase and send it by next week. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Sun, Jul 18, 2021 at 9:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote: > > > > On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote > > > > The patch does not apply on Head anymore, could you rebase and post a > > patch. I'm changing the status to "Waiting for Author". > > Okay, I will rebase and send it by next week. I have rebased the patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
On Mon, Jul 19, 2021 at 01:24:03PM +0530, Dilip Kumar wrote: > On Sun, Jul 18, 2021 at 9:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote: > > > > > > On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote > > > > > > The patch does not apply on Head anymore, could you rebase and post a > > > patch. I'm changing the status to "Waiting for Author". > > > > Okay, I will rebase and send it by next week. > > I have rebased the patch. > Hi, This doesn't apply cleanly nor compile. Are you planning to send a rebased version? -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
On Fri, 10 Sep 2021 at 10:40 AM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote:
On Mon, Jul 19, 2021 at 01:24:03PM +0530, Dilip Kumar wrote:
> On Sun, Jul 18, 2021 at 9:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote
> > >
> > > The patch does not apply on Head anymore, could you rebase and post a
> > > patch. I'm changing the status to "Waiting for Author".
> >
> > Okay, I will rebase and send it by next week.
>
> I have rebased the patch.
>
Hi,
This doesn't apply cleanly nor compile.
Are you planning to send a rebased version?
I will do that early next week.
On Fri, Sep 10, 2021 at 10:54:04AM +0530, Dilip Kumar wrote: > On Fri, 10 Sep 2021 at 10:40 AM, Jaime Casanova < > jcasanov@systemguards.com.ec> wrote: > > > On Mon, Jul 19, 2021 at 01:24:03PM +0530, Dilip Kumar wrote: > > > On Sun, Jul 18, 2021 at 9:15 PM Dilip Kumar <dilipbalaut@gmail.com> > > wrote: > > > > > > > > On Wed, Jul 14, 2021 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote: > > > > > > > > > > On Thu, May 6, 2021 at 7:24 PM Dilip Kumar <dilipbalaut@gmail.com> > > wrote > > > > > > > > > > The patch does not apply on Head anymore, could you rebase and post a > > > > > patch. I'm changing the status to "Waiting for Author". > > > > > > > > Okay, I will rebase and send it by next week. > > > > > > I have rebased the patch. > > > > > > > Hi, > > > > This doesn't apply cleanly nor compile. > > Are you planning to send a rebased version? > > > I will do that early next week. > Great! I'm marking the patch as "waiting on author". Thanks for keep working on this. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
On Fri, Sep 10, 2021 at 9:54 PM Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > > I will do that early next week. > > > > Great! I'm marking the patch as "waiting on author". > Thanks for keep working on this. > I haved rebased the patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
On Mon, Sep 13, 2021 at 05:10:22PM +0530, Dilip Kumar wrote: > I haved rebased the patch. Please note that the patch does not apply. FWIW, I still don't think that this is a good idea to have that. I don't recall seeing much on this list that users would like to have such a level of tuning for pglz, while lz4 only offers the option to reduce the compression rate while being already very cheap in CPU so the impact is limited. On top of that, this adds a new attribute to pg_attribute with much more complexity into ALTER TABLE code paths in tablecmds.c.. -- Michael
Attachment
On Fri, Oct 1, 2021 at 12:42 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Sep 13, 2021 at 05:10:22PM +0530, Dilip Kumar wrote: > > I haved rebased the patch. > > Please note that the patch does not apply. FWIW, I still don't think > that this is a good idea to have that. I don't recall seeing much on > this list that users would like to have such a level of tuning for > pglz, while lz4 only offers the option to reduce the compression rate > while being already very cheap in CPU so the impact is limited. On > top of that, this adds a new attribute to pg_attribute with much more > complexity into ALTER TABLE code paths in tablecmds.c.. Thanks for the feedback, seeing no much interest from other hackers and also as Michael pointed out that there is no much use case for this, I am withdrawing this patch from the commitfest. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com