Re: Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Re: [HACKERS] Custom compression methods
Date
Msg-id CA+TgmobSDVgUage9qQ5P_=F_9jaMkCgyKxUQGtFQU7oN4kX-AA@mail.gmail.com
Whole thread Raw
In response to Re: Re: [HACKERS] Custom compression methods  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: [HACKERS] Custom compression methods
Re: Re: [HACKERS] Custom compression methods
List pgsql-hackers
On Thu, Mar 7, 2019 at 2:51 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Yes.  I took a look at code of this patch.  I think it's in pretty good shape.  But high level review/discussion is
required.

I agree that the code of this patch is in pretty good shape, although
there is a lot of rebasing needed at this point. Here is an attempt at
some high level review and discussion:

- As far as I can see, there is broad agreement that we shouldn't
consider ourselves to be locked into 'pglz' forever. I believe
numerous people have reported that there are other methods of doing
compression that either compress better, or compress faster, or
decompress faster, or all of the above. This isn't surprising and nor
is it a knock on 'pglz'; Jan committed it in 1999, and it's not
surprising that in 20 years some people have come up with better
ideas. Not only that, but the quantity and quality of open source
software that is available for this kind of thing and for many other
kinds of things have improved dramatically in that time.

- I can see three possible ways of breaking our dependence on 'pglz'
for TOAST compression. Option #1 is to pick one new algorithm which we
think is better than 'pglz' in all relevant ways and use it as the
default for all new compressed datums. This would be dramatically
simpler than what this patch does, because there would be no user
interface. It would just be out with the old and in with the new.
Option #2 is to create a short list of new algorithms that have
different trade-offs; e.g. one that is very fast (like lz4) and one
that has an extremely high compression ratio, and provide an interface
for users to choose between them. This would be moderately simpler
than what this patch does, because we would expose to the user
anything about how a new compression method could be added, but it
would still require a UI for the user to choose between the available
(and hard-coded) options. It has the further advantage that every
PostgreSQL cluster will offer the same options (or a subset of them,
perhaps, depending on configure flags) and so you don't have to worry
that, say, a pg_am row gets lost and suddenly all of your toasted data
is inaccessible and uninterpretable. Option #3 is to do what this
patch actually does, which is to allow for the addition of any number
of compressors, including by extensions. It has the advantage that new
compressors can be added with core's permission, so, for example, if
it is unclear whether some excellent compressor is free of patent
problems, we can elect not to ship support for it in core, while at
the same time people who are willing to accept the associated legal
risk can add that functionality to their own copy as an extension
without having to patch core. The legal climate may even vary by
jurisdiction, so what might be questionable in country A might be
clearly just fine in country B. Aside from those issues, this approach
allows people to experiment and innovate outside of core relatively
quickly, instead of being bound by the somewhat cumbrous development
process which has left this patch in limbo for the last few years. My
view is that option #1 is likely to be impractical, because getting
people to agree is hard, and better things are likely to come along
later, and people like options. So I prefer either #2 or #3.

- The next question is how a datum compressed with some non-default
method should be represented on disk. The patch handles this first of
all by making the observation that the compressed size can't be >=1GB,
because the uncompressed size can't be >=1GB, and we wouldn't have
stored it compressed if it expanded. Therefore, the upper two bits of
the compressed size should always be zero on disk, and the patch
steals one of them to indicate whether "custom" compression is in use.
If it is, the 4-byte varlena header is followed not only by a 4-byte
size (now with the new flag bit also included) but also by a 4-byte
OID, indicating the compression AM in use. I don't think this is a
terrible approach, but I don't think it's amazing, either. 4 bytes is
quite a bit to use for this; if I guess correctly what will be a
typical cluster configuration, you probably would really only need
about 2 bits. For a datum that is both stored externally and
compressed, the overhead is likely negligible, because the length is
probably measured in kB or MB. But for a datum that is compressed but
not stored externally, it seems pretty expensive; the datum is
probably short, and having an extra 4 bytes of uncompressible data
kinda sucks. One possibility would be to allow only one byte here:
require each compression AM that is installed to advertise a one-byte
value that will denote its compressed datums. If more than one AM
tries to claim the same byte value, complain. Another possibility is
to abandon this approach and go with #2 from the previous paragraph.
Or maybe we add 1 or 2 "privileged" built-in compressors that get
dedicated bit-patterns in the upper 2 bits of the size field, with the
last bit pattern being reserved for future algorithms. (e.g. 0x00 =
pglz, 0x01 = lz4, 0x10 = zstd, 0x11 = something else - see within for
details).

- I don't really like the use of the phrase "custom compression". I
think the terminology needs to be rethought so that we just talk about
compression methods. Perhaps in certain contexts we need to specify
that we mean extensible compression methods or user-provided
compression methods or something like that, but I don't think the word
"custom" is very well-suited here. The main point of this shouldn't be
for every cluster in the universe to use a different approach to
compression, or to compress columns within a database in 47 different
ways, but rather to help us get out from under 'pglz'. Eventually we
probably want to change the default, but as the patch phrases things
now, that default would be a custom method, which is almost a
contradiction in terms.

- Yet another possible approach to the on-disk format is to leave
varatt_external.va_extsize and varattrib_4b.rawsize untouched and
instead add new compression methods by adding new vartag_external
values. There's quite a lot of bit-space available there: we have a
whole byte, and we're currently only using 4 values. We could easily
add a half-dozen new possibilities there for new compression methods
without sweating the bit-space consumption. The main thing I don't
like about this is that it only seems like a useful way to provide for
out-of-line compression. Perhaps it could be generalized to allow for
inline compression as well, but it seems like it would take some
hacking.

- One thing I really don't like about the patch is that it consumes a
bit from infomask2 for a new flag HEAP_HASCUSTOMCOMPRESSED. infomask
bits are at a premium, and there's been no real progress in the decade
plus that I've been hanging around here in clawing back any bit-space.
I think we really need to avoid burning our remaining bits for
anything other than a really critical need, and I don't think I
understand what the need is in this case. I might be missing
something, but I'd really strongly suggest looking for a way to get
rid of this. It also invents the concept of a TupleDesc flag, and the
flag invented is TD_ATTR_CUSTOM_COMPRESSED; I'm not sure I see why we
need that, either.

- It seems like this kind of approach has a sort of built-in
circularity problem. It means that every place that might need to
detoast a datum needs to be able to access the pg_am catalog. I wonder
if that's actually true. For instance, consider logical decoding. I
guess that can do catalog lookups in general, but can it do them from
the places where detoasting is happening? Moreover, can it do them
with the right snapshot? Suppose we rewrite a table to change the
compression method, then drop the old compression method, then try to
decode a transaction that modified that table before those operations
were performed. As an even more extreme example, suppose we need to
open pg_am, and to do that we have to build a relcache entry for it,
and suppose the relevant pg_class entry had a relacl or reloptions
field that happened to be custom-compressed. Or equally suppose that
any of the various other tables we use when building a relcache entry
had the same kind of problem, especially those that have TOAST tables.
We could just disallow the use of non-default compressors in the
system catalogs, but the benefits mentioned in
http://postgr.es/m/5541614A.5030208@2ndquadrant.com seem too large to
ignore.

- I think it would be awfully appealing if we could find some way of
dividing this great big patch into some somewhat smaller patches. For
example:

Patch #1. Add syntax allowing a compression method to be specified,
but the only possible choice is pglz, and the PRESERVE stuff isn't
supported, and changing the value associated with an existing column
isn't supported, but we can add tab-completion support and stuff.

Patch #2. Add a second built-in method, like gzip or lz4.

Patch #3. Add support for changing the compression method associated
with a column, forcing a table rewrite.

Patch #4. Add support for PRESERVE, so that you can change the
compression method associated with a column without forcing a table
rewrite, by including the old method in the PRESERVE list, or with a
rewrite, by not including it in the PRESERVE list.

Patch #5. Add support for compression methods via the AM interface.
Perhaps methods added in this manner are prohibited in system
catalogs. (This could also go before #4 or even before #3, but with a
noticeable hit to usability.)

Patch #6 (new development). Add a contrib module using the facility
added in #5, perhaps with a slightly off-beat compressor like bzip2
that is more of a niche use case.

I think that if the patch set were broken up this way, it would be a
lot easier to review and get committed. I think you could commit each
bit separately. I don't think you'd want to commit #1 unless you had a
sense that #2 was pretty close to done, and similarly for #5 and #6,
but that would still make things a lot easier than having one giant
monolithic patch, at least IMHO.

There might be more to say here, but that's what I have got for now. I
hope it helps.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: hash as an search key and hash collision
Next
From: Tomas Vondra
Date:
Subject: Re: Failures with installcheck and low work_mem value in 13~