Re: Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CA+TgmoYYeMv=eJqf2JF8VWDPe=S6rYfH78DOxzE-Sa2kQHhiPw@mail.gmail.com Whole thread Raw |
In response to | Re: Re: [HACKERS] Custom compression methods (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Re: [HACKERS] Custom compression methods
|
List | pgsql-hackers |
On Mon, Aug 24, 2020 at 2:12 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > IIUC, the main reason for using this flag is for taking the decision > whether we need any detoasting for this tuple. For example, if we are > rewriting the table because the compression method is changed then if > HEAP_HASCUSTOMCOMPRESSED bit is not set in the tuple header and tuple > length, not tup->t_len > TOAST_TUPLE_THRESHOLD then we don't need to > call heap_toast_insert_or_update function for this tuple. Whereas if > this flag is set then we need to because we might need to uncompress > and compress back using a different compression method. The same is > the case with INSERT into SELECT * FROM. This doesn't really seem worth it to me. I don't see how we can justify burning an on-disk bit just to save a little bit of overhead during a rare maintenance operation. If there's a performance problem here we need to look for another way of mitigating it. Slowing CLUSTER and/or VACUUM FULL down by a large amount for this feature would be unacceptable, but is that really a problem? And if so, can we solve it without requiring this bit? > > > something, but I'd really strongly suggest looking for a way to get > > > rid of this. It also invents the concept of a TupleDesc flag, and the > > > flag invented is TD_ATTR_CUSTOM_COMPRESSED; I'm not sure I see why we > > > need that, either. > > This is also used in a similar way as the above but for the target > table, i.e. if the target table has the custom compressed attribute > then maybe we can not directly insert the tuple because it might have > compressed data which are compressed using the default compression > methods. I think this is just an in-memory flag, which is much less precious than an on-disk bit. However, I still wonder whether it's really the right design. I think that if we offer lz4 we may well want to make it the default eventually, or perhaps even right away. If that ends up causing this flag to get set on every tuple all the time, then it won't really optimize anything. > I have already extracted these 2 patches from the main patch set. > But, in these patches, I am still storing the am_oid in the toast > header. I am not sure can we get rid of that at least for these 2 > patches? But, then wherever we try to uncompress the tuple we need to > know the tuple descriptor to get the am_oid but I think that is not > possible in all the cases. Am I missing something here? I think we should instead use the high bits of the toast size word for patches #1-#4, as discussed upthread. > > > Patch #3. Add support for changing the compression method associated > > > with a column, forcing a table rewrite. > > > > > > Patch #4. Add support for PRESERVE, so that you can change the > > > compression method associated with a column without forcing a table > > > rewrite, by including the old method in the PRESERVE list, or with a > > > rewrite, by not including it in the PRESERVE list. > > Does this make sense to have Patch #3 and Patch #4, without having > Patch #5? I mean why do we need to support rewrite or preserve unless > we have the customer compression methods right? because the build-in > compression method can not be dropped so why do we need to preserve? I think that patch #3 makes sense because somebody might have a table that is currently compressed with pglz and they want to switch to lz4, and I think patch #4 also makes sense because they might want to start using lz4 for future data but not force a rewrite to get rid of all the pglz data they've already got. Those options are valuable as soon as there is more than one possible compression algorithm, even if they're all built in. Now, as I said upthread, it's also true that you could do #5 before #3 and #4. I don't think that's insane. But I prefer it in the other order, because I think having #5 without #3 and #4 wouldn't be too much fun for users. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: