Thread: Compressing table images
My apologies if this subject has already been hashed to death, or if this is the wrong list, but I was wondering if people had seen this paper: http://www.cwi.nl/htbin/ins1/publications?request=intabstract&key=ZuHeNeBo:ICDE:06 Basically it describes a compression algorithm for tables of a database. The huge advantage of doing this is that it reduced the disk traffic by (approximately) a factor of four- at the cost of more CPU utilization. Any thoughts or comments? Brian
Brian Hurt wrote: > My apologies if this subject has already been hashed to death, or if > this is the wrong list, but I was wondering if people had seen this paper: > http://www.cwi.nl/htbin/ins1/publications?request=intabstract&key=ZuHeNeBo:ICDE:06 > > > Basically it describes a compression algorithm for tables of a > database. The huge advantage of doing this is that it reduced the disk > traffic by (approximately) a factor of four- at the cost of more CPU > utilization. > Any thoughts or comments? I don't know if that is the algorithm we use but PostgreSQL will compress its data within the table. Joshua D. Drake > > Brian > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Joshua D. Drake wrote: > Brian Hurt wrote: > >My apologies if this subject has already been hashed to death, or if > >this is the wrong list, but I was wondering if people had seen this paper: > >http://www.cwi.nl/htbin/ins1/publications?request=intabstract&key=ZuHeNeBo:ICDE:06 > > > > > >Basically it describes a compression algorithm for tables of a > >database. The huge advantage of doing this is that it reduced the disk > >traffic by (approximately) a factor of four- at the cost of more CPU > >utilization. > >Any thoughts or comments? > > I don't know if that is the algorithm we use but PostgreSQL will > compress its data within the table. But only in certain very specific cases. And we compress on a per-attribute basis. Compressing at the page level is pretty much out of the question; but compressing at the tuple level I think is doable. How much benefit that brings is another matter. I think we still have more use for our limited manpower elsewhere. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, May 11, 2006 at 05:05:26PM -0400, Alvaro Herrera wrote: > Joshua D. Drake wrote: > > Brian Hurt wrote: > > >My apologies if this subject has already been hashed to death, or if > > >this is the wrong list, but I was wondering if people had seen this paper: > > >http://www.cwi.nl/htbin/ins1/publications?request=intabstract&key=ZuHeNeBo:ICDE:06 > > > > > > > > >Basically it describes a compression algorithm for tables of a > > >database. The huge advantage of doing this is that it reduced the disk > > >traffic by (approximately) a factor of four- at the cost of more CPU > > >utilization. > > >Any thoughts or comments? > > > > I don't know if that is the algorithm we use but PostgreSQL will > > compress its data within the table. > > But only in certain very specific cases. And we compress on a > per-attribute basis. Compressing at the page level is pretty much out > of the question; but compressing at the tuple level I think is doable. > How much benefit that brings is another matter. I think we still have > more use for our limited manpower elsewhere. Except that I think it would be highly useful to allow users to change the limits used for both toasting and compressing on a per-table and/or per-field basis. For example, if you have a varchar(1500) in a table it's unlikely to ever be large enough to trigger toasting, but if that field is rarely updated it could be a big win to store it toasted. Of course you can always create a 'side table' (vertical partitioning), but all of that framework already exists in the database; we just don't provide the required knobs. I suspect it wouldn't be that hard to expose those knobs. In fact, if we could agree on syntax, this is probably a beginner TODO. ISTR having this discussion on one of the lists recently, but I can't find it, and don't see anything in the TODO. Basically, I think we'd want knobs that say: if this field is over X size, compress it. If it's over Y size, store it externally. Per-table and per-cluster (ie: GUC) knobs for that would be damn handy as well. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461