Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id CAFiTN-u9+ePF_FTiMBpHNzdxmOQYj9n2cjFx+XbyyJr-vXxgOw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Custom compression methods
List pgsql-hackers
On Sat, Nov 21, 2020 at 3:50 AM Robert Haas <robertmhaas@gmail.com> wrote:

While working on this comment I have doubts.

> I wonder in passing about TOAST tables and materialized views, which
> are the other things that have storage. What gets stored for
> attcompression? For a TOAST table it probably doesn't matter much
> since TOAST table entries shouldn't ever be toasted themselves, so
> anything that doesn't crash is fine (but maybe we should test that
> trying to alter the compression properties of a TOAST table doesn't
> crash, for example).

Yeah for the toast table it doesn't matter,  but I am not sure what do
you mean by altering the compression method for the toast table. Do you
mean manually update the pg_attribute tuple for the toast table and
set different compression methods?  Or there is some direct way to
alter the toast table?

 For a materialized view it seems reasonable to
> want to set column properties, but I'm not quite sure how that works
> today for things like STORAGE anyway. If we do allow setting STORAGE
> or COMPRESSION for materialized view columns then dump-and-reload
> needs to preserve the values.

I see that we allow setting the STORAGE for the materialized view but
I am not sure what is the use case.  Basically, the tuples are
directly getting selected from the host table and inserted in the
materialized view without checking target and source storage type.
The behavior is the same if you execute INSERT INTO dest_table SELECT
* FROM source_table.  Basically, if the source_table attribute has
extended storage and the target table has plain storage, still the
value will be inserted directly into the target table without any
conversion.  However, in the table, you can insert the new tuple and
that will be stored as per the new storage method so that is still
fine but I don't know any use case for the materialized view.  Now I am
thinking what should be the behavior for the materialized view?

For the materialized view can we have the same behavior as storage?  I
think for the built-in compression method that might not be a problem
but for the external compression method how can we handle the
dependency, I mean when the materialized view has created the table
was having an external compression method "cm1" and we have created
the materialized view based on that now if we alter table and set the
new compression method and force table rewrite then what will happen
to the tuple inside the materialized view, I mean tuple is still
compressed with "cm1" and there is no attribute is maintaining the
dependency on "cm1" because the materialized view can point to any
compression method.  Now if we drop the cm1 it will be allowed to
drop.  So I think for the compression method we can consider the
materialized view same as the table, I mean we can allow setting the
compression method for the materialized view and we can always ensure
that all the tuple in this view is compressed with the current or the
preserved compression methods.  So whenever we are inserting in the
materialized view then we should compare the datum compression method
with the target compression method.


> +       /*
> +        * Use default compression method if the existing compression method is
> +        * invalid but the new storage type is non plain storage.
> +        */
> +       if (!OidIsValid(attrtuple->attcompression) &&
> +               (newstorage != TYPSTORAGE_PLAIN))
> +               attrtuple->attcompression = DefaultCompressionOid;
>
> You have a few too many parens in there.
>
> I don't see a particularly good reason to treat plain and external
> differently.

Yeah, I think they should be treated the same.

 More generally, I think there's a question here about
> when we need an attribute to have a valid compression type and when we
> don't. If typstorage is plan or external, then there's no point in
> ever having a compression type and maybe we should even reject
> attempts to set one (but I'm not sure).

I agree.

> However, the attstorage is a
> different case. Suppose the column is created with extended storage
> and then later it's changed to plain. That's only a hint, so there may
> still be toasted values in that column, so the compression setting
> must endure. At any rate, we need to make sure we have clear and
> sensible rules for when attcompression (a) must be valid, (b) may be
> valid, and (c) must be invalid. And those rules need to at least be
> documented in the comments, and maybe in the SGML docs.

IIUC, even if we change the attstorage the existing tuples are stored
as it is without changing the tuple storage.  So I think even if the
attstorage is changed the attcompression should not have any change.

After observing this behavior of storage I tend to think that for
built-in compression methods also we should have the same behavior,  I mean
if the tuple is compressed with one of the built-in compression
methods and if we are altering the compression method or we are doing
INSERT INTO SELECT to the target field having a different compression
method then we should not rewrite/decompress those tuples.  Basically,
I mean to say that the built-in compression methods can always be
treated as PRESERVE because those can not be dropped.




--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Hou, Zhijie"
Date:
Subject: Avoid using lcons and list_delete_first in plan_union_children()
Next
From: Craig Ringer
Date:
Subject: Re: Two fsync related performance issues?