Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CA+TgmoZ9Rap=NuJLSvw2u=9fibUdB2L_Cs2Wb8FD49vhiXsdpQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Custom compression methods
Re: [HACKERS] Custom compression methods |
List | pgsql-hackers |
On Wed, Feb 10, 2021 at 3:06 PM Robert Haas <robertmhaas@gmail.com> wrote: > I think you have a fairly big problem with row types. Consider this example: > > create table t1 (a int, b text compression pglz); > create table t2 (a int, b text compression lz4); > create table t3 (x t1); > insert into t1 values (1, repeat('foo', 1000)); > insert into t2 values (1, repeat('foo', 1000)); > insert into t3 select t1 from t1; > insert into t3 select row(a, b)::t1 from t2; > > rhaas=# select pg_column_compression((t3.x).b) from t3; > pg_column_compression > ----------------------- > pglz > lz4 > (2 rows) > > That's not good, because now ...because now I hit send too soon. Also, because now column b has implicit dependencies on both compression AMs and the rest of the system has no idea. I think we probably should have a rule that nothing except pglz is allowed inside of a record, just to keep it simple. The record overall can be toasted so it's not clear why we should also be toasting the original columns at all, but I think precedent probably argues for continuing to allow PGLZ, as it can already be like that on disk and pg_upgrade is a thing. The same kind of issue probably exists for arrays and range types. I poked around a bit trying to find ways of getting data compressed with one compression method into a column that was marked with another compression method, but wasn't able to break it. In CompareCompressionMethodAndDecompress, I think this is still playing a bit fast and loose with the rules around slots. I think we can do better. Suppose that at the point where we discover that we need to decompress at least one attribute, we create the new slot right then, and also memcpy tts_values and tts_isnull. Then, for that attribute and any future attributes that need decompression, we reset tts_values in the *new* slot, leaving the old one untouched. Then, after finishing all the attributes, the if (decompressed_any) block, you just have a lot less stuff to do. The advantage of this is that you haven't tainted the old slot; it's still got whatever contents it had before, and is in a clean state, which seems better to me. It's unclear to me whether this function actually needs to ExecMaterializeSlot(newslot). It definitely does need to ExecStoreVirtualTuple(newslot) and I think it's a very good idea, if not absolutely mandatory, for it not to modify anything about the old slot. But what's the argument that the new slot needs to be materialized at this point? It may be needed, if the old slot would've had to be materialized at this point. But it's something to think about. The CREATE TABLE documentation says that COMPRESSION is a kind of column constraint, but that's wrong. For example, you can't write CREATE TABLE a (b int4 CONSTRAINT thunk COMPRESSION lz4), for example, contrary to what the syntax summary implies. When you fix this so that the documentation matches the grammar change, you may also need to move the longer description further up in create_table.sgml so the order matches. The use of VARHDRSZ_COMPRESS in toast_get_compression_oid() appears to be incorrect. VARHDRSZ_COMPRESS is offsetof(varattrib_4b, va_compressed.va_data). But what gets externalized in the case of a compressed datum is just VARDATA(dval), which excludes the length word, unlike VARHDRSZ_COMPRESS, which does not. This has no consequences since we're only going to fetch 1 chunk either way, but I think we should make it correct. TOAST_COMPRESS_SET_SIZE_AND_METHOD() could Assert something about cm_method. Small delta patch with a few other suggested changes attached. -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: