Thread: page compression
I know its been discussed before, and one big problem is license and patent problems. Would this project be a problem: http://oldhome.schmorp.de/marc/liblzf.html -Andy
On Tue, Dec 28, 2010 at 10:10 AM, Andy Colson <andy@squeakycode.net> wrote: > I know its been discussed before, and one big problem is license and patent > problems. > > Would this project be a problem: > > http://oldhome.schmorp.de/marc/liblzf.html It looks like even liblzf is not going to be accepted. I have proposed to only link against liblzf if available for pg_dump and have somehow failed, see: http://archives.postgresql.org/pgsql-hackers/2010-11/msg00824.php Remember that PostgreSQL has toast tables to compress large values and store them externally, so it still has to be proven that page compression has the same benefit for PostgreSQL as for other databases. Ironically we also use an LZ compression algorithm for toast compression (defined in pg_lzcompress.c). I am still failing to understand why linking against liblzf would bring us deeper into the compression patents mine field than we already are by hardwiring and shipping this other algorithm in pg_lzcompress.c. Joachim
On Dec 28, 2010, at 10:33 AM, Joachim Wieland <joe@mcknight.de> wrote: > On Tue, Dec 28, 2010 at 10:10 AM, Andy Colson <andy@squeakycode.net> wrote: >> I know its been discussed before, and one big problem is license and patent >> problems. >> >> Would this project be a problem: >> >> http://oldhome.schmorp.de/marc/liblzf.html > > It looks like even liblzf is not going to be accepted. I have proposed > to only link against liblzf if available for pg_dump and have somehow > failed, see: I thought that was mostly about not wanting multiple changes in one patch. I don't see why liblzf would be objectionablein general. ...Robert
On Tue, 2010-12-28 at 09:10 -0600, Andy Colson wrote: > I know its been discussed before, and one big problem is license and > patent problems. Would like to see a design for that. There's a few different ways we might want to do that, and I'm interested to see if its possible to get compressed pages to be indexable as well. For example, if you compress 2 pages into 8Kb then you do one I/O and out pops 2 buffers. That would work nicely with ring buffers. Or you might try to have pages > 8Kb in one block, which would mean decompressing every time you access the page. That wouldn't be much of a problem if we were just seq scanning. Or you might want to compress the whole table at once, so it can only be read by seq scan. Efficient, but not indexes. It would be interesting to explore pre-populating the compression dictionary with some common patterns. Anyway, interesting topic. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Jan 2, 2011, at 5:36 PM, Simon Riggs wrote: > On Tue, 2010-12-28 at 09:10 -0600, Andy Colson wrote: > >> I know its been discussed before, and one big problem is license and >> patent problems. > > Would like to see a design for that. There's a few different ways we > might want to do that, and I'm interested to see if its possible to get > compressed pages to be indexable as well. > > For example, if you compress 2 pages into 8Kb then you do one I/O and > out pops 2 buffers. That would work nicely with ring buffers. > > Or you might try to have pages > 8Kb in one block, which would mean > decompressing every time you access the page. That wouldn't be much of a > problem if we were just seq scanning. > > Or you might want to compress the whole table at once, so it can only be > read by seq scan. Efficient, but not indexes. FWIW, last time I looked at how Oracle handled compression, it would only compress existing data. As soon as you modifieda row, it ended up un-compressed, presumably in a different page that was also un-compressed. I wonder if it would be feasible to use a fork to store where a compressed page lives inside the heap... if we could do thatI don't see any reason why indexes wouldn't work. The changes required to support that might not be too horrific either... -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Mon, Jan 3, 2011 at 4:02 AM, Jim Nasby <jim@nasby.net> wrote: > FWIW, last time I looked at how Oracle handled compression, it would only compress existing data. As soon as you modifieda row, it ended up un-compressed, presumably in a different page that was also un-compressed. IIUC, InnoDB basically compresses a block as small as it'll go, and then stores it in a regular size block. That leaves free space at the end, which can be used to cram additional tuples into the page. Eventually that free space is exhausted, at which point you try to recompress the whole page and see if that gives you room to cram in even more stuff. I thought that was a pretty clever approach. > I wonder if it would be feasible to use a fork to store where a compressed page lives inside the heap... if we could dothat I don't see any reason why indexes wouldn't work. The changes required to support that might not be too horrific either... At first blush, that sounds like a recipe for large amounts of undesirable random I/O. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company