Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id CAFiTN-vh==jT9KdNAeneY=LRNdH6ZH255GBHZW6PiN1QpmQeww@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [HACKERS] Custom compression methods
List pgsql-hackers
On Wed, Dec 9, 2020 at 5:37 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sat, Nov 21, 2020 at 3:50 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Wed, Nov 11, 2020 at 9:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > There were a few problems in this rebased version, basically, the
> > > compression options were not passed while compressing values from the
> > > brin_form_tuple, so I have fixed this.
> >
> > Since the authorship history of this patch is complicated, it would be
> > nice if you would include authorship information and relevant
> > "Discussion" links in the patches.
>
> I have added that.
>
> > Design level considerations and overall notes:
> >
> > configure is autogenerated from configure.in, so the patch shouldn't
> > include changes only to the former.
>
> Yeah, I missed those changes. Done now.
>
> > Looking over the changes to src/include:
> >
> > +       PGLZ_COMPRESSION_ID,
> > +       LZ4_COMPRESSION_ID
> >
> > I think that it would be good to assign values to these explicitly.
>
> Done
>
> > +/* compresion handler routines */
> >
> > Spelling.
>
> Done
>
> > +       /* compression routine for the compression method */
> > +       cmcompress_function cmcompress;
> > +
> > +       /* decompression routine for the compression method */
> > +       cmcompress_function cmdecompress;
> >
> > Don't reuse cmcompress_function; that's confusing. Just have a typedef
> > per structure member, even if they end up being the same.
>
> Fixed as suggested
>
> >  #define TOAST_COMPRESS_SET_RAWSIZE(ptr, len) \
> > -       (((toast_compress_header *) (ptr))->rawsize = (len))
> > +do { \
> > +       Assert(len > 0 && len <= RAWSIZEMASK); \
> > +       ((toast_compress_header *) (ptr))->info = (len); \
> > +} while (0)
> >
> > Indentation.
>
> Done
>
> > +#define TOAST_COMPRESS_SET_COMPRESSION_METHOD(ptr, cm_method) \
> > +       ((toast_compress_header *) (ptr))->info |= ((cm_method) << 30);
> >
> > What about making TOAST_COMPRESS_SET_RAWSIZE() take another argument?
> > And possibly also rename it to TEST_COMPRESS_SET_SIZE_AND_METHOD() or
> > something? It seems not great to have separate functions each setting
> > part of a 4-byte quantity. Too much chance of failing to set both
> > parts. I guess you've got a function called
> > toast_set_compressed_datum_info() for that, but it's just a wrapper
> > around two macros that could just be combined, which would reduce
> > complexity overall.
>
> Done that way
>
> > +       T_CompressionRoutine,           /* in access/compressionapi.h */
> >
> > This looks misplaced. I guess it should go just after these:
> >
> >     T_FdwRoutine,               /* in foreign/fdwapi.h */
> >     T_IndexAmRoutine,           /* in access/amapi.h */
> >     T_TableAmRoutine,           /* in access/tableam.h */
>
> Done
>
> > Looking over the regression test changes:
> >
> > The tests at the top of create_cm.out that just test that we can
> > create tables with various storage types seem unrelated to the purpose
> > of the patch. And the file doesn't test creating a compression method
> > either, as the file name would suggest, so either the file name needs
> > to be changed (compression, compression_method?) or the tests don't go
> > here.
>
> Changed to "compression"
>
> > +-- check data is okdd
> >
> > I guess whoever is responsible for this comment prefers vi to emacs.
>
> Fixed
>
> > I don't quite understand the purpose of all of these tests, and there
> > are some things that I feel like ought to be tested that seemingly
> > aren't. Like, you seem to test using an UPDATE to move a datum from a
> > table to another table with the same compression method, but not one
> > with a different compression method.
>
> Added test for this, and some other tests to improve overall coverage.
>
>  Testing the former is nice and
> > everything, but that's the easy case: I think we also need to test the
> > latter. I think it would be good to verify not only that the data is
> > readable but that it's compressed the way we expect. I think it would
> > be a great idea to add a pg_column_compression() function in a similar
> > spirit to pg_column_size(). Perhaps it could return NULL when
> > compression is not in use or the data type is not varlena, and the
> > name of the compression method otherwise. That would allow for better
> > testing of this feature, and it would also be useful to users who are
> > switching methods, to see what data they still have that's using the
> > old method. It could be useful for debugging problems on customer
> > systems, too.
>
> This is a really great idea, I have added this function and used in my test.
>
> > I wonder if we need a test that moves data between tables through an
> > intermediary. For instance, suppose a plpgsql function or DO block
> > fetches some data and stores it in a plpgsql variable and then uses
> > the variable to insert into another table. Hmm, maybe that would force
> > de-TOASTing. But perhaps there are other cases. Maybe a more general
> > way to approach the problem is: have you tried running a coverage
> > report and checked which parts of your code are getting exercised by
> > the existing tests and which parts are not? The stuff that isn't, we
> > should try to add more tests. It's easy to get corner cases wrong with
> > this kind of thing.
> >
> > I notice that LIKE INCLUDING COMPRESSION doesn't seem to be tested, at
> > least not by 0001, which reinforces my feeling that the tests here are
> > not as thorough as they could be.
>
> Added test for this as well.
>
> > +NOTICE:  pg_compression contains unpinned initdb-created object(s)
>
> > This seems wrong to me - why is it OK?
>
> Yeah, this is wrong, now fixed.
>
> > -       result = (struct varlena *)
> > -               palloc(TOAST_COMPRESS_RAWSIZE(attr) + VARHDRSZ);
> > -       SET_VARSIZE(result, TOAST_COMPRESS_RAWSIZE(attr) + VARHDRSZ);
> > +       cmoid = GetCompressionOidFromCompressionId(TOAST_COMPRESS_METHOD(attr));
> >
> > -       if (pglz_decompress(TOAST_COMPRESS_RAWDATA(attr),
> > -                                               TOAST_COMPRESS_SIZE(attr),
> > -                                               VARDATA(result),
> > -
> > TOAST_COMPRESS_RAWSIZE(attr), true) < 0)
> > -               elog(ERROR, "compressed data is corrupted");
> > +       /* get compression method handler routines */
> > +       cmroutine = GetCompressionRoutine(cmoid);
> >
> > -       return result;
> > +       return cmroutine->cmdecompress(attr);
> >
> > I'm worried about how expensive this might be, and I think we could
> > make it cheaper. The reason why I think this might be expensive is:
> > currently, for every datum, you have a single direct function call.
> > Now, with this, you first have a direct function call to
> > GetCompressionOidFromCompressionId(). Then you have a call to
> > GetCompressionRoutine(), which does a syscache lookup and calls a
> > handler function, which is quite a lot more expensive than a single
> > function call. And the handler isn't even returning a statically
> > allocated structure, but is allocating new memory every time, which
> > involves more function calls and maybe memory leaks. Then you use the
> > results of all that to make an indirect function call.
> >
> > I'm not sure exactly what combination of things we could use to make
> > this better, but it seems like there are a few possibilities:
> >
> > (1) The handler function could return a pointer to the same
> > CompressionRoutine every time instead of constructing a new one every
> > time.
> > (2) The CompressionRoutine to which the handler function returns a
> > pointer could be statically allocated instead of being built at
> > runtime.
> > (3) GetCompressionRoutine could have an OID -> handler cache instead
> > of relying on syscache + calling the handler function all over again.
> > (4) For the compression types that have dedicated bit patterns in the
> > high bits of the compressed TOAST size, toast_compress_datum() could
> > just have hard-coded logic to use the correct handlers instead of
> > translating the bit pattern into an OID and then looking it up over
> > again.
> > (5) Going even further than #4 we could skip the handler layer
> > entirely for such methods, and just call the right function directly.
> > I think we should definitely do (1), and also (2) unless there's some
> > reason it's hard. (3) doesn't need to be part of this patch, but might
> > be something to consider later in the series. It's possible that it
> > doesn't have enough benefit to be worth the work, though. Also, I
> > think we should do either (4) or (5). I have a mild preference for (5)
> > unless it looks too ugly.
> > Note that I'm not talking about hard-coding a fast path for a
> > hard-coded list of OIDs - which would seem a little bit unprincipled -
> > but hard-coding a fast path for the bit patterns that are themselves
> > hard-coded. I don't think we lose anything in terms of extensibility
> > or even-handedness there; it's just avoiding a bunch of rigamarole
> > that doesn't really buy us anything.
> >
> > All these points apply equally to toast_decompress_datum_slice() and
> > toast_compress_datum().
>
> Fixed as discussed at [1]
>
> > +       /* Fallback to default compression method, if not specified */
> > +       if (!OidIsValid(cmoid))
> > +               cmoid = DefaultCompressionOid;
> >
> > I think that the caller should be required to specify a legal value,
> > and this should be an elog(ERROR) or an Assert().
> >
> > The change to equalTupleDescs() makes me wonder. Like, can we specify
> > the compression method for a function parameter, or a function return
> > value? I would think not. But then how are the tuple descriptors set
> > up in that case? Under what circumstances do we actually need the
> > tuple descriptors to compare unequal?
>
> If we alter the compression method then we check whether we need to
> rebuild the tuple descriptor or not based on what value is changed so
> if the attribute compression method is changed we need to rebuild the
> compression method right.  You might say that in the first patch we
> are not allowing altering the compression method so we might move this
> to the second patch but I thought since we added this field to
> pg_attribute in this patch then better to add this check as well.
> What am I missing?
>
> > lz4.c's header comment calls it cm_lz4.c, and the pathname is wrong too.
> >
> > I wonder if we should try to adopt a convention for the names of these
> > files that isn't just the compression method name, like cmlz4 or
> > compress_lz4. I kind of like the latter one. I am a little worried
> > that just calling it lz4.c will result in name collisions later - not
> > in this directory, of course, but elsewhere in the system. It's not a
> > disaster if that happens, but for example verbose error reports print
> > the file name, so it's nice if it's unambiguous.
>
> Changed to compress_lz4.
>
> > +               if (!IsBinaryUpgrade &&
> > +                       (relkind == RELKIND_RELATION ||
> > +                        relkind == RELKIND_PARTITIONED_TABLE))
> > +                       attr->attcompression =
> > +
> > GetAttributeCompressionMethod(attr, colDef->compression);
> > +               else
> > +                       attr->attcompression = InvalidOid;
> >
> > Storing InvalidOid in the IsBinaryUpgrade case looks wrong. If
> > upgrading from pre-v14, we need to store PGLZ_COMPRESSION_OID.
> > Otherwise, we need to preserve whatever value was present in the old
> > version. Or am I confused here?
>
> Okay, so I think we can simply remove the IsBinaryUpgrade check so it
> will behave as expected.  Basically, now it the compression method is
> specified then it will take that compression method and if it is not
> specified then it will take the PGLZ_COMPRESSION_OID.
>
> > I think there should be tests for the way this interacts with
> > partitioning, and I think the intended interaction should be
> > documented. Perhaps it should behave like TABLESPACE, where the parent
> > property has no effect on what gets stored because the parent has no
> > storage, but is inherited by each new child.
>
> I have added the test for this and also documented the same.
>
> > I wonder in passing about TOAST tables and materialized views, which
> > are the other things that have storage. What gets stored for
> > attcompression?
>
> I have changed this to store the Invalid compression method always.
>
>  For a TOAST table it probably doesn't matter much
> > since TOAST table entries shouldn't ever be toasted themselves, so
> > anything that doesn't crash is fine (but maybe we should test that
> > trying to alter the compression properties of a TOAST table doesn't
> > crash, for example).
>
> You mean to update the pg_attribute table for the toasted field (e.g
> chunk_data) and set the attcompression to something valid?  Or there
> is a better way to write this test?
>
>  For a materialized view it seems reasonable to
> > want to set column properties, but I'm not quite sure how that works
> > today for things like STORAGE anyway. If we do allow setting STORAGE
> > or COMPRESSION for materialized view columns then dump-and-reload
> > needs to preserve the values.
>
> Fixed as described as [2]
>
> > +       /*
> > +        * Use default compression method if the existing compression method is
> > +        * invalid but the new storage type is non plain storage.
> > +        */
> > +       if (!OidIsValid(attrtuple->attcompression) &&
> > +               (newstorage != TYPSTORAGE_PLAIN))
> > +               attrtuple->attcompression = DefaultCompressionOid;
> >
> > You have a few too many parens in there.
>
> Fixed
>
> > I don't see a particularly good reason to treat plain and external
> > differently. More generally, I think there's a question here about
> > when we need an attribute to have a valid compression type and when we
> > don't. If typstorage is plan or external, then there's no point in
> > ever having a compression type and maybe we should even reject
> > attempts to set one (but I'm not sure). However, the attstorage is a
> > different case. Suppose the column is created with extended storage
> > and then later it's changed to plain. That's only a hint, so there may
> > still be toasted values in that column, so the compression setting
> > must endure. At any rate, we need to make sure we have clear and
> > sensible rules for when attcompression (a) must be valid, (b) may be
> > valid, and (c) must be invalid. And those rules need to at least be
> > documented in the comments, and maybe in the SGML docs.
> >
> > I'm out of time for today, so I'll have to look at this more another
> > day. Hope this helps for a start.
>
> Fixed as I have described at [2], and the rules are documented in
> pg_attribute.h (atop attcompression field)
>
> [1] https://www.postgresql.org/message-id/CA%2BTgmob3W8cnLgOQX%2BJQzeyGN3eKGmRrBkUY6WGfNyHa%2Bt_qEw%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CAFiTN-tzTTT2oqWdRGLv1dvvS5MC1W%2BLE%2B3bqWPJUZj4GnHOJg%40mail.gmail.com
>

I was working on analyzing the behavior of how the attribute merging
should work for the compression method for an inherited child so for
that, I was analyzing the behavior for the storage method.  I found
some behavior that doesn't seem right.  Basically, while creating the
inherited child we don't allow the storage to be different than the
parent attribute's storage but later we are allowed to alter that, is
that correct behavior.

Here is the test case to demonstrate this.

postgres[12546]=# create table t (a varchar);
postgres[12546]=# alter table t ALTER COLUMN a SET STORAGE plain;
postgres[12546]=# create table t1 (a varchar);
postgres[12546]=# alter table t1 ALTER COLUMN a SET STORAGE external;

/* Not allowing to set the external because parent attribute has plain */
postgres[12546]=# create table t2 (LIKE t1 INCLUDING STORAGE) INHERITS ( t);
NOTICE:  00000: merging column "a" with inherited definition
LOCATION:  MergeAttributes, tablecmds.c:2685
ERROR:  42804: column "a" has a storage parameter conflict
DETAIL:  PLAIN versus EXTERNAL
LOCATION:  MergeAttributes, tablecmds.c:2730


postgres[12546]=# create table t2 (LIKE t1 ) INHERITS (t);

/* But you can alter now */
postgres[12546]=# alter TABLE t2 ALTER COLUMN a SET STORAGE EXTERNAL ;

postgres[12546]=# \d+ t
                                                 Table "public.t"
 Column |       Type        | Collation | Nullable | Default | Storage
| Compression | Stats target | Description
--------+-------------------+-----------+----------+---------+---------+-------------+--------------+-------------
 a      | character varying |           |          |
  | plain   | pglz        |              |

Child tables: t2
Access method: heap

postgres[12546]=# \d+ t2
                                                 Table "public.t2"
 Column |       Type        | Collation | Nullable | Default | Storage
 | Compression | Stats target | Description
--------+-------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 a      | character varying |           |          |
| external | pglz        |              |
Inherits: t
Access method: heap

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: New Table Access Methods for Multi and Single Inserts
Next
From: Ajin Cherian
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions