Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CAFiTN-t6pS3vnzXr7SdvHQP_brYLWCt3-0avabP2y2wGmT1Kkw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: [HACKERS] Custom compression methods
|
List | pgsql-hackers |
On Fri, Oct 9, 2020 at 3:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Oct 9, 2020 at 3:24 AM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: > > > > On Thu, Oct 08, 2020 at 02:38:27PM +0530, Dilip Kumar wrote: > > >On Wed, Oct 7, 2020 at 5:00 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > >> > > >> On Wed, Oct 7, 2020 at 10:26 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > >> > > > >> > On Tue, Oct 6, 2020 at 10:21 PM Tomas Vondra > > >> > <tomas.vondra@2ndquadrant.com> wrote: > > >> > > > > >> > > On Tue, Oct 06, 2020 at 11:00:55AM +0530, Dilip Kumar wrote: > > >> > > >On Mon, Oct 5, 2020 at 9:34 PM Tomas Vondra > > >> > > ><tomas.vondra@2ndquadrant.com> wrote: > > >> > > >> > > >> > > >> On Mon, Oct 05, 2020 at 07:57:41PM +0530, Dilip Kumar wrote: > > >> > > >> >On Mon, Oct 5, 2020 at 5:53 PM Tomas Vondra > > >> > > >> ><tomas.vondra@2ndquadrant.com> wrote: > > >> > > >> >> > > >> > > >> >> On Mon, Oct 05, 2020 at 11:17:28AM +0530, Dilip Kumar wrote: > > >> > > >> >> >On Mon, Oct 5, 2020 at 3:37 AM Tomas Vondra > > >> > > >> >> ><tomas.vondra@2ndquadrant.com> wrote: > > >> > > >> >> > > > >> > > >> >> >Thanks, Tomas for your feedback. > > >> > > >> >> > > > >> > > >> >> >> 9) attcompression ... > > >> > > >> >> >> > > >> > > >> >> >> The main issue I see is what the patch does with attcompression. Instead > > >> > > >> >> >> of just using it to store a the compression method, it's also used to > > >> > > >> >> >> store the preserved compression methods. And using NameData to store > > >> > > >> >> >> this seems wrong too - if we really want to store this info, the correct > > >> > > >> >> >> way is either using text[] or inventing charvector or similar. > > >> > > >> >> > > > >> > > >> >> >The reason for using the NameData is the get it in the fixed part of > > >> > > >> >> >the data structure. > > >> > > >> >> > > > >> > > >> >> > > >> > > >> >> Why do we need that? It's possible to have varlena fields with direct > > >> > > >> >> access (see pg_index.indkey for example). > > >> > > >> > > > >> > > >> >I see. While making it NameData I was thinking whether we have an > > >> > > >> >option to direct access the varlena. Thanks for pointing me there. I > > >> > > >> >will change this. > > >> > > >> > > > >> > > >> > Adding NameData just to make > > >> > > >> >> it fixed-length means we're always adding 64B even if we just need a > > >> > > >> >> single byte, which means ~30% overhead for the FormData_pg_attribute. > > >> > > >> >> That seems a bit unnecessary, and might be an issue with many attributes > > >> > > >> >> (e.g. with many temp tables, etc.). > > >> > > >> > > > >> > > >> >You are right. Even I did not like to keep 64B for this, so I will change it. > > >> > > >> > > > >> > > >> >> > > >> > > >> >> >> But to me this seems very much like a misuse of attcompression to track > > >> > > >> >> >> dependencies on compression methods, necessary because we don't have a > > >> > > >> >> >> separate catalog listing compression methods. If we had that, I think we > > >> > > >> >> >> could simply add dependencies between attributes and that catalog. > > >> > > >> >> > > > >> > > >> >> >Basically, up to this patch, we are having only built-in compression > > >> > > >> >> >methods and those can not be dropped so we don't need any dependency > > >> > > >> >> >at all. We just want to know what is the current compression method > > >> > > >> >> >and what is the preserve compression methods supported for this > > >> > > >> >> >attribute. Maybe we can do it better instead of using the NameData > > >> > > >> >> >but I don't think it makes sense to add a separate catalog? > > >> > > >> >> > > > >> > > >> >> > > >> > > >> >> Sure, I understand what the goal was - all I'm saying is that it looks > > >> > > >> >> very much like a workaround needed because we don't have the catalog. > > >> > > >> >> > > >> > > >> >> I don't quite understand how could we support custom compression methods > > >> > > >> >> without listing them in some sort of catalog? > > >> > > >> > > > >> > > >> >Yeah for supporting custom compression we need some catalog. > > >> > > >> > > > >> > > >> >> >> Moreover, having the catalog would allow adding compression methods > > >> > > >> >> >> (from extensions etc) instead of just having a list of hard-coded > > >> > > >> >> >> compression methods. Which seems like a strange limitation, considering > > >> > > >> >> >> this thread is called "custom compression methods". > > >> > > >> >> > > > >> > > >> >> >I think I forgot to mention while submitting the previous patch that > > >> > > >> >> >the next patch I am planning to submit is, Support creating the custom > > >> > > >> >> >compression methods wherein we can use pg_am catalog to insert the new > > >> > > >> >> >compression method. And for dependency handling, we can create an > > >> > > >> >> >attribute dependency on the pg_am row. Basically, we will create the > > >> > > >> >> >attribute dependency on the current compression method AM as well as > > >> > > >> >> >on the preserved compression methods AM. As part of this, we will > > >> > > >> >> >add two build-in AMs for zlib and pglz, and the attcompression field > > >> > > >> >> >will be converted to the oid_vector (first OID will be of the current > > >> > > >> >> >compression method, followed by the preserved compression method's > > >> > > >> >> >oids). > > >> > > >> >> > > > >> > > >> >> > > >> > > >> >> Hmmm, ok. Not sure pg_am is the right place - compression methods don't > > >> > > >> >> quite match what I though AMs are about, but maybe it's just my fault. > > >> > > >> >> > > >> > > >> >> FWIW it seems a bit strange to first do the attcompression magic and > > >> > > >> >> then add the catalog later - I think we should start with the catalog > > >> > > >> >> right away. The advantage is that if we end up committing only some of > > >> > > >> >> the patches in this cycle, we already have all the infrastructure etc. > > >> > > >> >> We can reorder that later, though. > > >> > > >> > > > >> > > >> >Hmm, yeah we can do this way as well that first create a new catalog > > >> > > >> >table and add entries for these two built-in methods and the > > >> > > >> >attcompression can store the oid vector. But if we only commit the > > >> > > >> >build-in compression methods part then does it make sense to create an > > >> > > >> >extra catalog or adding these build-in methods to the existing catalog > > >> > > >> >(if we plan to use pg_am). Then in attcompression instead of using > > >> > > >> >one byte for each preserve compression method, we need to use oid. So > > >> > > >> >from Robert's mail[1], it appeared to me that he wants that the > > >> > > >> >build-in compression methods part should be independently committable > > >> > > >> >and if we think from that perspective then adding a catalog doesn't > > >> > > >> >make much sense. But if we are planning to commit the custom method > > >> > > >> >also then it makes more sense to directly start with the catalog > > >> > > >> >because that way it will be easy to expand without much refactoring. > > >> > > >> > > > >> > > >> >[1] https://www.postgresql.org/message-id/CA%2BTgmobSDVgUage9qQ5P_%3DF_9jaMkCgyKxUQGtFQU7oN4kX-AA%40mail.gmail.com > > >> > > >> > > > >> > > >> > > >> > > >> Hmmm. Maybe I'm missing something subtle, but I think that plan can be > > >> > > >> interpreted in various ways - it does not really say whether the initial > > >> > > >> list of built-in methods should be in some C array, or already in a proper > > >> > > >> catalog. > > >> > > >> > > >> > > >> All I'm saying is it seems a bit weird to first implement dependencies > > >> > > >> based on strange (mis)use of attcompression attribute, and then replace > > >> > > >> it with a proper catalog. My understanding is those patches are expected > > >> > > >> to be committable one by one, but the attcompression approach seems a > > >> > > >> bit too hacky to me - not sure I'd want to commit that ... > > >> > > > > > >> > > >Okay, I will change this. So I will make create a new catalog > > >> > > >pg_compression and add the entry for two built-in compression methods > > >> > > >from the very first patch. > > >> > > > > > >> > > > > >> > > OK. > > >> > > >> I have changed the first 2 patches, basically, now we are providing a > > >> new catalog pg_compression and the pg_attribute is storing the oid of > > >> the compression method. The patches still need some cleanup and there > > >> is also one open comment that for index we should use its table > > >> compression. > > >> > > >> I am still working on the preserve patch. For preserving the > > >> compression method I am planning to convert the attcompression field > > >> to the oidvector so that we can store the oid of the preserve method > > >> also. I am not sure whether we can access this oidvector as a fixed > > >> part of the FormData_pg_attribute or not. The reason is that for > > >> building the tuple descriptor, we need to give the size of the fixed > > >> part (#define ATTRIBUTE_FIXED_PART_SIZE \ > > >> (offsetof(FormData_pg_attribute,attcompression) + sizeof(Oid))). But > > >> if we convert this to the oidvector then we don't know the size of the > > >> fixed part. Am I missing something? > > > > > >I could think of two solutions here > > >Sol1. > > >Make the first oid of the oidvector as part of the fixed size, like below > > >#define ATTRIBUTE_FIXED_PART_SIZE \ > > >(offsetof(FormData_pg_attribute, attcompression) + OidVectorSize(1)) > > > > > >Sol2: > > >Keep attcompression as oid only and for the preserve list, adds > > >another field in the variable part which will be of type oidvector. I > > >think most of the time we need to access the current compression > > >method and with this solution, we will be able to access that as part > > >of the tuple desc. > > > > > > > And is the oidvector actually needed? If we have the extra catalog, > > can't we track this simply using the regular dependencies? So we'd have > > the attcompression OID of the current compression method, and the > > preserved values would be tracked in pg_depend. > > Right, we can do that as well. Actually, the preserved list need to > be accessed only in case of ALTER TABLE SET COMPRESSION and INSERT > INTO SELECT * FROM queries. So in such cases, I think it is okay to > get the preserved compression oids from pg_depends. I have worked on this patch, so as discussed now I am maintaining the preserved compression methods using dependency. Still PRESERVE ALL syntax is not supported, I will work on that part. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: