Re: RFD: ALTER COLUMN .. SET STORAGE COMPRESSED; - Mailing list pgsql-hackers

From Dawid Kuroczko
Subject Re: RFD: ALTER COLUMN .. SET STORAGE COMPRESSED;
Date
Msg-id 758d5e7f0806101751m6c48c27drca00db2f0957c519@mail.gmail.com
Whole thread Raw
In response to Re: RFD: ALTER COLUMN .. SET STORAGE COMPRESSED;  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Jun 10, 2008 at 5:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Dawid Kuroczko" <qnex42@gmail.com> writes:
>> As we already have four types of ALTER COLUMN .. SET STORAGE
>> { PLAIN | EXTERNAL | EXTENDED | MAIN } I would like to add
>> "COMPRESSED" which would force column compression (if column is
>> smaller than some minimun, I guess somwehwere between 16 and 32 bytes).
>
> Please see previous discussions about per-column toasting parameters,
> for instance
> http://archives.postgresql.org/pgsql-hackers/2007-08/msg00082.php
> http://archives.postgresql.org/pgsql-general/2007-08/msg01129.php
>
> I think the general consensus was that we want more flexible access to
> the compression knobs than just another STORAGE setting.

Sounds like a right way to do it.  Perhaps the syntax should be something like:

ALTER TABLE tab ALTER COLUMN x WITH (storage_parameter = value, ...);

With storage parameters like:  compress -- enable/disable compression (like PLAIN or EXTERNAL)  min_input_size -- don't
compressif smaller than size  min_comp_rate -- leave uncompressed if rate is smaller than  toast -- for out-of-line
storageparameters?  compression_algo -- for specifying alternative algorithms if any
 
(per Alvaro's suggestion).

Perhaps it would be wise to introduce GUCs with default values (as we have now
ALTER COLUMN .. SET STATISTICS and default_statistics_target), named
for example: default_column_min_input_size (and so on).

ALTER COLUMN .. SET STORAGE ... should be aliases for WITH (...) and be
deprecated I guess.

The HEAP_HASEXTERNAL infomask bit should probably be used to "trigger"
TOASTing code.  Perhaps it should be renamed then?  I am worried if storage
parameters wouldn't introduce overhead in PostgreSQL's key parts.

...as for compression_algo, perhaps it could be an oid of compression
function(s)
(we need to decompress too).  Also we would need to store information which
algo was used to compress the column.  Perhaps a byte between varvarlena
herader and actual compressed data (this way we could have multiple algos
simultaneousley).

Speaking of algorithms, I think that e2compr (ext2 filesystem with transparent
compression) could be a nice source of input in this area. http://e2compr.sourceforge.net/
(Having algos as plugins would allow us to use foreign licenses (gzip) or
event patented algos in countries where software patents are prohibited
without risking anything in core PostgreSQL)

OK, enough for today.  Good night.
 Regards,    Dawid
-- 
Solving [site load issues] with [more database replication] is a lot
like solving your own personal problems with heroin - at first it
sorta works, but after a while things just get out of hand.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Next
From: ITAGAKI Takahiro
Date:
Subject: Re: Core team statement on replication in PostgreSQL