Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id 20200623200042.gkzftoz5n6kn6lgh@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Custom compression methods  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2020-06-23 14:27:47 -0400, Robert Haas wrote:
> On Mon, Jun 22, 2020 at 4:53 PM Andres Freund <andres@anarazel.de> wrote:
> > > Or maybe we add 1 or 2 "privileged" built-in compressors that get
> > > dedicated bit-patterns in the upper 2 bits of the size field, with the
> > > last bit pattern being reserved for future algorithms. (e.g. 0x00 =
> > > pglz, 0x01 = lz4, 0x10 = zstd, 0x11 = something else - see within for
> > > details).
> >
> > Agreed. I favor an approach roughly like I'd implemented below
> > https://postgr.es/m/20130605150144.GD28067%40alap2.anarazel.de
> > I.e. leave the vartag etc as-is, but utilize the fact that pglz
> > compressed datums starts with a 4 byte length header, and that due to
> > the 1GB limit, the first two bits currently have to be 0. That allows to
> > indicate 2 compression methods without any space overhead, and
> > additional compression methods are supported by using an additional byte
> > (or some variable length encoded larger amount) if both bits are 1.

https://postgr.es/m/20130621000900.GA12425%40alap2.anarazel.de is a
thread with more information / patches further along.


> I think there's essentially no difference between these two ideas,
> unless the two bits we're talking about stealing are not the same in
> the two cases. Am I missing something?

I confused this patch with the approach in
https://www.postgresql.org/message-id/d8576096-76ba-487d-515b-44fdedba8bb5%402ndquadrant.com
sorry for that.  It obviously still differs by not having lower space
overhead (by virtue of not having a 4 byte 'va_cmid', but no additional
space for two methods, and then 1 byte overhead for 256 more), but
that's not that fundamental a difference.

I do think it's nicer to hide the details of the compression inside
toast specific code as the version in the "further along" thread above
did.


The varlena stuff feels so archaic, it's hard to keep it all in my head...


I think I've pondered that elsewhere before (but perhaps just on IM with
you?), but I do think we'll need a better toast pointer format at some
point. It's pretty fundamentally based on having the 1GB limit, which I
don't think we can justify for that much longer.

Using something like https://postgr.es/m/20191210015054.5otdfuftxrqb5gum%40alap3.anarazel.de
I'd probably make it something roughly like:

1) signed varint indicating "in-place" length
1a) if positive, it's "plain" "in-place" data
1b) if negative, data type indicator follows. abs(length) includes size of metadata.
2) optional: unsigned varint metadata type indicator
3) data

Because 1) is the size of the data, toast datums can be skipped with a
relatively low amount of instructions during tuple deforming. Instead of
needing a fair number of branches, as the case right now.

So a small in-place uncompressed varlena2 would have an overhead of 1
byte up to 63 bytes, and 2 bytes otherwise (with 8 kb pages at least).

An in-place compressed datum could have an overhead as low as 3 bytes (1
byte length, 1 byte indicator for type of compression, 1 byte raw size),
although I suspect it's rarely going to be useful at that small sizes.


Anyway. I think it's probably reasonable to utilize those two bits
before going to a new toast format. But if somebody were more interested
in working on toastv2 I'd not push back either.

Regards,

Andres



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Assertion failure in pg_copy_logical_replication_slot()
Next
From: Daniel Gustafsson
Date:
Subject: Re: some more pg_dump refactoring