Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id CAFiTN-v=cXD8ntnVhQUnNspFZ8ZmeCxwyoiMNONmLbh-G1vmMw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Feb 10, 2021 at 1:42 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Please remember to trim unnecessary quoted material.

Okay, I will.

> On Sun, Feb 7, 2021 at 6:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > [ a whole lot of quoted stuff ]
> >
> > Conclusion:
> > 1. In most cases lz4 is faster and doing better compression as well.
> > 2. In Test2 when small data is incompressible then lz4 tries to
> > compress whereas pglz doesn't try so there is some performance loss.
> > But if we want we can fix
> > it by setting some minimum limit of size for lz4 as well, maybe the
> > same size as pglz?
>
> So my conclusion here is that perhaps there's no real problem. It
> looks like externalizing is so expensive compared to compression that
> it's worth trying to compress even though it may not always pay off.
> If, by trying to compress, we avoid externalizing, it's a huge win
> (~5x). If we try to compress and don't manage to avoid externalizing,
> it's a small loss (~6%). It's probably reasonable to expect that
> compressible data is more common than incompressible data, so not only
> is the win a lot bigger than the loss, but we should be able to expect
> it to happen a lot more often. It's not impossible that somebody could
> get bitten, but it doesn't feel like a huge risk to me.

I agree with this.  That said maybe we could test the performance of
pglz also by lowering/removing the min compression limit but maybe
that should be an independent change.

> One thing that does occur to me is that it might be a good idea to
> skip compression if it doesn't change the number of chunks that will
> be stored into the TOAST table. If we compress the value but still
> need to externalize it, and the compression didn't save enough to
> reduce the number of chunks, I suppose we ideally would externalize
> the uncompressed version. That would save decompression time later,
> without really costing anything. However, I suppose that would be a
> separate improvement from this patch.

Yeah, this seems like a good idea and we can work on that in a different thread.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: pg_cryptohash_final possible out-of-bounds access (per Coverity)
Next
From: Ashutosh Bapat
Date:
Subject: Re: TRUNCATE on foreign table