Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CAFiTN-u9+ePF_FTiMBpHNzdxmOQYj9n2cjFx+XbyyJr-vXxgOw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Custom compression methods
|
List | pgsql-hackers |
On Sat, Nov 21, 2020 at 3:50 AM Robert Haas <robertmhaas@gmail.com> wrote: While working on this comment I have doubts. > I wonder in passing about TOAST tables and materialized views, which > are the other things that have storage. What gets stored for > attcompression? For a TOAST table it probably doesn't matter much > since TOAST table entries shouldn't ever be toasted themselves, so > anything that doesn't crash is fine (but maybe we should test that > trying to alter the compression properties of a TOAST table doesn't > crash, for example). Yeah for the toast table it doesn't matter, but I am not sure what do you mean by altering the compression method for the toast table. Do you mean manually update the pg_attribute tuple for the toast table and set different compression methods? Or there is some direct way to alter the toast table? For a materialized view it seems reasonable to > want to set column properties, but I'm not quite sure how that works > today for things like STORAGE anyway. If we do allow setting STORAGE > or COMPRESSION for materialized view columns then dump-and-reload > needs to preserve the values. I see that we allow setting the STORAGE for the materialized view but I am not sure what is the use case. Basically, the tuples are directly getting selected from the host table and inserted in the materialized view without checking target and source storage type. The behavior is the same if you execute INSERT INTO dest_table SELECT * FROM source_table. Basically, if the source_table attribute has extended storage and the target table has plain storage, still the value will be inserted directly into the target table without any conversion. However, in the table, you can insert the new tuple and that will be stored as per the new storage method so that is still fine but I don't know any use case for the materialized view. Now I am thinking what should be the behavior for the materialized view? For the materialized view can we have the same behavior as storage? I think for the built-in compression method that might not be a problem but for the external compression method how can we handle the dependency, I mean when the materialized view has created the table was having an external compression method "cm1" and we have created the materialized view based on that now if we alter table and set the new compression method and force table rewrite then what will happen to the tuple inside the materialized view, I mean tuple is still compressed with "cm1" and there is no attribute is maintaining the dependency on "cm1" because the materialized view can point to any compression method. Now if we drop the cm1 it will be allowed to drop. So I think for the compression method we can consider the materialized view same as the table, I mean we can allow setting the compression method for the materialized view and we can always ensure that all the tuple in this view is compressed with the current or the preserved compression methods. So whenever we are inserting in the materialized view then we should compare the datum compression method with the target compression method. > + /* > + * Use default compression method if the existing compression method is > + * invalid but the new storage type is non plain storage. > + */ > + if (!OidIsValid(attrtuple->attcompression) && > + (newstorage != TYPSTORAGE_PLAIN)) > + attrtuple->attcompression = DefaultCompressionOid; > > You have a few too many parens in there. > > I don't see a particularly good reason to treat plain and external > differently. Yeah, I think they should be treated the same. More generally, I think there's a question here about > when we need an attribute to have a valid compression type and when we > don't. If typstorage is plan or external, then there's no point in > ever having a compression type and maybe we should even reject > attempts to set one (but I'm not sure). I agree. > However, the attstorage is a > different case. Suppose the column is created with extended storage > and then later it's changed to plain. That's only a hint, so there may > still be toasted values in that column, so the compression setting > must endure. At any rate, we need to make sure we have clear and > sensible rules for when attcompression (a) must be valid, (b) may be > valid, and (c) must be invalid. And those rules need to at least be > documented in the comments, and maybe in the SGML docs. IIUC, even if we change the attstorage the existing tuples are stored as it is without changing the tuple storage. So I think even if the attstorage is changed the attcompression should not have any change. After observing this behavior of storage I tend to think that for built-in compression methods also we should have the same behavior, I mean if the tuple is compressed with one of the built-in compression methods and if we are altering the compression method or we are doing INSERT INTO SELECT to the target field having a different compression method then we should not rewrite/decompress those tuples. Basically, I mean to say that the built-in compression methods can always be treated as PRESERVE because those can not be dropped. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: