Thread: Compression of text columns

Compression of text columns

From
Stef
Date:
I have a table in the databases I work with,
that contains two text columns with XML data
stored inside them.

This table is by far the biggest table in the databases,
and the text columns use up the most space.
I saw that the default storage type for text columns is
"EXTENDED" which, according to the documentation, uses up extra
space to make possible substring functioning faster.

Suppose that the data in those columns are only really ever
_used_ once, but may be needed in future for viewing purposes mostly,
and I cannot really change the underlying structure of the table,
what can I possibly do to maximally reduce the amount of disk space
used by the table on disk. (There are no indexes on these two columns.)
I've thought about compression using something like :
ztext http://www.mahalito.net/~harley/sw/postgres/

but I have to change the table structure a lot and I've already
encountered problems unzipping the data again.
The other problem with this solution, is that database dumps almost double
in size, because of double compression.

Any suggestions much appreciated

TIA
Stefan

Re: Compression of text columns

From
Tino Wildenhain
Date:
Stef schrieb:
> I have a table in the databases I work with,
> that contains two text columns with XML data
> stored inside them.
>
> This table is by far the biggest table in the databases,
> and the text columns use up the most space.
> I saw that the default storage type for text columns is
> "EXTENDED" which, according to the documentation, uses up extra
> space to make possible substring functioning faster.
>
> Suppose that the data in those columns are only really ever
> _used_ once, but may be needed in future for viewing purposes mostly,
> and I cannot really change the underlying structure of the table,
> what can I possibly do to maximally reduce the amount of disk space
> used by the table on disk. (There are no indexes on these two columns.)
> I've thought about compression using something like :
> ztext http://www.mahalito.net/~harley/sw/postgres/
>
> but I have to change the table structure a lot and I've already
> encountered problems unzipping the data again.
> The other problem with this solution, is that database dumps almost double
> in size, because of double compression.
>
> Any suggestions much appreciated

Well, text columns are automatically compressed via the toast mechanism.
This is handled transparently for you.


Re: Compression of text columns

From
Stef
Date:
Tino Wildenhain mentioned :
=> Well, text columns are automatically compressed via the toast mechanism.
=> This is handled transparently for you.

OK, I misread the documentation, and I forgot to mention that
I'm using postgres 7.3 and 8.0
It's actually the EXTERNAL storage type that is larger, not EXTENDED.
What kind of compression is used in the EXTERNAL storage type?
Is there any way to achieve better compression?

Re: Compression of text columns

From
Tom Lane
Date:
Stef <svb@ucs.co.za> writes:
> I saw that the default storage type for text columns is
> "EXTENDED" which, according to the documentation, uses up extra
> space to make possible substring functioning faster.

You misread it.  EXTENDED does compression by default on long strings.
EXTERNAL is the one that suppresses compression.

            regards, tom lane

Re: Compression of text columns

From
Simon Riggs
Date:
On Mon, 2005-10-10 at 14:57 +0200, Stef wrote:
> Is there any way to achieve better compression?

You can use XML schema aware compression techniques, but PostgreSQL
doesn't know about those. You have to do it yourself, or translate the
XML into an infoset-preserving form that will still allow XPath and
friends.

Best Regards, Simon Riggs