Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CAFiTN-uHnBhkVBDYznSS-gNGtaLzXCOLVxXQ1SxhnkzYAKRU0A@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Custom compression methods
|
List | pgsql-hackers |
On Thu, Feb 11, 2021 at 3:26 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Feb 10, 2021 at 3:06 PM Robert Haas <robertmhaas@gmail.com> wrote: > > I think you have a fairly big problem with row types. Consider this example: > > > > create table t1 (a int, b text compression pglz); > > create table t2 (a int, b text compression lz4); > > create table t3 (x t1); > > insert into t1 values (1, repeat('foo', 1000)); > > insert into t2 values (1, repeat('foo', 1000)); > > insert into t3 select t1 from t1; > > insert into t3 select row(a, b)::t1 from t2; > > > > rhaas=# select pg_column_compression((t3.x).b) from t3; > > pg_column_compression > > ----------------------- > > pglz > > lz4 > > (2 rows) > > > > That's not good, because now Yeah, that's really bad. > ...because now I hit send too soon. Also, because now column b has > implicit dependencies on both compression AMs and the rest of the > system has no idea. I think we probably should have a rule that > nothing except pglz is allowed inside of a record, just to keep it > simple. The record overall can be toasted so it's not clear why we > should also be toasting the original columns at all, but I think > precedent probably argues for continuing to allow PGLZ, as it can > already be like that on disk and pg_upgrade is a thing. The same kind > of issue probably exists for arrays and range types. While constructing a row type from the tuple we flatten the external data I think that would be the place to decompress the data if they are not compressed with PGLZ. For array-type, we are already detoasting/decompressing the source attribute see "construct_md_array" so the array type doesn't have this problem. I haven't yet checked the range type. Based on my analysis it appeared that the different data types are getting constructed at different paths so maybe we should find some centralized place or we need to make some function call in all such places so that we can decompress the attribute if required before forming the tuple for the composite type. I have quickly hacked the code and after that, your test case is working fine. postgres[55841]=# select pg_column_compression((t3.x).b) from t3; pg_column_compression ----------------------- pglz (2 rows) -> now the attribute 'b' inside the second tuple is decompressed (because it was not compressed with PGLZ) so the compression method of b is NULL postgres[55841]=# select pg_column_compression((t3.x)) from t3; pg_column_compression ----------------------- pglz (2 rows) --> but the second row itself is compressed back using the local compression method of t3 W.R.T the attached patch, In HeapTupleHeaderGetDatum, we don't even attempt to detoast if there is no external field in the tuple, in POC I have got rid of that check, but I think we might need to do better. Maybe we can add a flag in infomask to detect whether the tuple has any compressed data or not as we have for detecting the external data (HEAP_HASEXTERNAL). So I will do some more analysis in this area and try to come up with a clean solution. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: