Thread: Compression of text columns
I have a table in the databases I work with, that contains two text columns with XML data stored inside them. This table is by far the biggest table in the databases, and the text columns use up the most space. I saw that the default storage type for text columns is "EXTENDED" which, according to the documentation, uses up extra space to make possible substring functioning faster. Suppose that the data in those columns are only really ever _used_ once, but may be needed in future for viewing purposes mostly, and I cannot really change the underlying structure of the table, what can I possibly do to maximally reduce the amount of disk space used by the table on disk. (There are no indexes on these two columns.) I've thought about compression using something like : ztext http://www.mahalito.net/~harley/sw/postgres/ but I have to change the table structure a lot and I've already encountered problems unzipping the data again. The other problem with this solution, is that database dumps almost double in size, because of double compression. Any suggestions much appreciated TIA Stefan
On Mon, Oct 10, 2005 at 02:18:39PM +0200, Stef wrote: > I have a table in the databases I work with, > that contains two text columns with XML data > stored inside them. > > This table is by far the biggest table in the databases, > and the text columns use up the most space. > I saw that the default storage type for text columns is > "EXTENDED" which, according to the documentation, uses up extra > space to make possible substring functioning faster. You're thinking of EXTERNAL. From http://lnk.nu/postgresql.org/4ff.html: EXTERNAL is for external, uncompressed data, and EXTENDED is for external, compressed data. EXTENDED is the default for most data types that support non-PLAIN storage. Use of EXTERNAL will make substring operations on text and bytea columns faster, at the penalty of increased storage space. > Suppose that the data in those columns are only really ever > _used_ once, but may be needed in future for viewing purposes mostly, > and I cannot really change the underlying structure of the table, > what can I possibly do to maximally reduce the amount of disk space > used by the table on disk. (There are no indexes on these two columns.) > I've thought about compression using something like : > ztext http://www.mahalito.net/~harley/sw/postgres/ > > but I have to change the table structure a lot and I've already > encountered problems unzipping the data again. > The other problem with this solution, is that database dumps almost double > in size, because of double compression. Are you seeing much gain using ztext over using EXTENDED? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby mentioned : => Are you seeing much gain using ztext over using EXTENDED? When storage is defined external, there is a visible gain, but not really significant enough to make me want to use it. And I had instances where not all the rows would uncompress again afterwards. I want to try and get the bzip2 compression working, and see if that does anything for me. But at this stage, I think it's as compressed as I will be able to get it. I'm not getting the best results, because a lot of the data rows don't contain enough data to compress properly, and I think with ztext the column sizes sometimes end up bigger than they were to start off with. Kind Regards Stefan