Thread: Compression of text columns
I have a table in the databases I work with, that contains two text columns with XML data stored inside them. This table is by far the biggest table in the databases, and the text columns use up the most space. I saw that the default storage type for text columns is "EXTENDED" which, according to the documentation, uses up extra space to make possible substring functioning faster. Suppose that the data in those columns are only really ever _used_ once, but may be needed in future for viewing purposes mostly, and I cannot really change the underlying structure of the table, what can I possibly do to maximally reduce the amount of disk space used by the table on disk. (There are no indexes on these two columns.) I've thought about compression using something like : ztext http://www.mahalito.net/~harley/sw/postgres/ but I have to change the table structure a lot and I've already encountered problems unzipping the data again. The other problem with this solution, is that database dumps almost double in size, because of double compression. Any suggestions much appreciated TIA Stefan
Stef schrieb: > I have a table in the databases I work with, > that contains two text columns with XML data > stored inside them. > > This table is by far the biggest table in the databases, > and the text columns use up the most space. > I saw that the default storage type for text columns is > "EXTENDED" which, according to the documentation, uses up extra > space to make possible substring functioning faster. > > Suppose that the data in those columns are only really ever > _used_ once, but may be needed in future for viewing purposes mostly, > and I cannot really change the underlying structure of the table, > what can I possibly do to maximally reduce the amount of disk space > used by the table on disk. (There are no indexes on these two columns.) > I've thought about compression using something like : > ztext http://www.mahalito.net/~harley/sw/postgres/ > > but I have to change the table structure a lot and I've already > encountered problems unzipping the data again. > The other problem with this solution, is that database dumps almost double > in size, because of double compression. > > Any suggestions much appreciated Well, text columns are automatically compressed via the toast mechanism. This is handled transparently for you.
Tino Wildenhain mentioned : => Well, text columns are automatically compressed via the toast mechanism. => This is handled transparently for you. OK, I misread the documentation, and I forgot to mention that I'm using postgres 7.3 and 8.0 It's actually the EXTERNAL storage type that is larger, not EXTENDED. What kind of compression is used in the EXTERNAL storage type? Is there any way to achieve better compression?
Stef <svb@ucs.co.za> writes: > I saw that the default storage type for text columns is > "EXTENDED" which, according to the documentation, uses up extra > space to make possible substring functioning faster. You misread it. EXTENDED does compression by default on long strings. EXTERNAL is the one that suppresses compression. regards, tom lane
On Mon, 2005-10-10 at 14:57 +0200, Stef wrote: > Is there any way to achieve better compression? You can use XML schema aware compression techniques, but PostgreSQL doesn't know about those. You have to do it yourself, or translate the XML into an infoset-preserving form that will still allow XPath and friends. Best Regards, Simon Riggs