Thread: Compression of text columns

Compression of text columns

From
Stef
Date:
I have a table in the databases I work with,
that contains two text columns with XML data
stored inside them.

This table is by far the biggest table in the databases,
and the text columns use up the most space.
I saw that the default storage type for text columns is
"EXTENDED" which, according to the documentation, uses up extra
space to make possible substring functioning faster.

Suppose that the data in those columns are only really ever
_used_ once, but may be needed in future for viewing purposes mostly,
and I cannot really change the underlying structure of the table,
what can I possibly do to maximally reduce the amount of disk space
used by the table on disk. (There are no indexes on these two columns.)
I've thought about compression using something like :
ztext http://www.mahalito.net/~harley/sw/postgres/

but I have to change the table structure a lot and I've already
encountered problems unzipping the data again.
The other problem with this solution, is that database dumps almost double
in size, because of double compression.

Any suggestions much appreciated

TIA
Stefan

Re: Compression of text columns

From
"Jim C. Nasby"
Date:
On Mon, Oct 10, 2005 at 02:18:39PM +0200, Stef wrote:
> I have a table in the databases I work with,
> that contains two text columns with XML data
> stored inside them.
>
> This table is by far the biggest table in the databases,
> and the text columns use up the most space.
> I saw that the default storage type for text columns is
> "EXTENDED" which, according to the documentation, uses up extra
> space to make possible substring functioning faster.

You're thinking of EXTERNAL. From http://lnk.nu/postgresql.org/4ff.html:

EXTERNAL is for external, uncompressed data, and EXTENDED is for
external, compressed data. EXTENDED is the default for most data types
that support non-PLAIN storage. Use of EXTERNAL will make substring
operations on text and bytea columns faster, at the penalty of increased
storage space.

> Suppose that the data in those columns are only really ever
> _used_ once, but may be needed in future for viewing purposes mostly,
> and I cannot really change the underlying structure of the table,
> what can I possibly do to maximally reduce the amount of disk space
> used by the table on disk. (There are no indexes on these two columns.)
> I've thought about compression using something like :
> ztext http://www.mahalito.net/~harley/sw/postgres/
>
> but I have to change the table structure a lot and I've already
> encountered problems unzipping the data again.
> The other problem with this solution, is that database dumps almost double
> in size, because of double compression.

Are you seeing much gain using ztext over using EXTENDED?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Compression of text columns

From
Stef
Date:
Jim C. Nasby mentioned :
=> Are you seeing much gain using ztext over using EXTENDED?

When storage is defined external, there is a visible gain,
but not really significant enough to make me want to use it.

And I had instances where not all the rows would uncompress again
afterwards. I want to try and get the bzip2 compression working, and
see if that does anything for me. But at this stage, I think it's as compressed
as I will be able to get it.

I'm not getting the best results, because a lot of the data rows don't contain
enough data to compress properly, and I think with ztext the column sizes sometimes
end up bigger than they were to start off with.

Kind Regards
Stefan