Re: [HACKERS] compression in LO and other fields - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] compression in LO and other fields
Date
Msg-id 26760.942419535@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] compression in LO and other fields  (wieck@debis.com (Jan Wieck))
Responses Re: [HACKERS] compression in LO and other fields  (wieck@debis.com (Jan Wieck))
Re: [HACKERS] compression in LO and other fields  (The Hermit Hacker <scrappy@hub.org>)
Re: [HACKERS] compression in LO and other fields  (The Hermit Hacker <scrappy@hub.org>)
List pgsql-hackers
wieck@debis.com (Jan Wieck) writes:
> Tom Lane wrote:
>> It occurred to me last night that applying compression to individual
>> fields might not be the best approach.  Certainly a "bytez" data type
>> is the easiest thing to fit into the existing system, but it's leaving
>> some space savings on the table.  What about compressing the *whole*
>> data contents of a tuple on-disk, as a single entity?  That should save
>> more space than field-by-field compression.

>     But  it requires decompression of every tuple into palloc()'d
>     memory during heap access. AFAIK, the  heap  access  routines
>     currently  return  a  pointer  to  the  tuple  inside the shm
>     buffer. Don't know what it's performance impact would be.

Good point, but the same will be needed when a tuple is split across
multiple blocks.  I would expect that (given a reasonably fast
decompressor) there will be a net performance *gain* due to having
less disk I/O to do.  Also, this won't be happening for "every" tuple,
just those exceeding a size threshold --- we'd be able to tune the
threshold value to trade off speed and space.

One thing that does occur to me is that we need to store the
uncompressed as well as the compressed data size, so that the
working space can be palloc'd before starting the decompression.

Also, in case it wasn't clear, I was envisioning leaving the tuple
header uncompressed, so that time quals etc can be checked before
decompressing the tuple data.
        regards, tom lane


pgsql-hackers by date:

Previous
From: wieck@debis.com (Jan Wieck)
Date:
Subject: Re: [HACKERS] compression in LO and other fields
Next
From: Philip Warner
Date:
Subject: Re: [HACKERS] compression in LO and other fields