Re: [HACKERS] compression in LO and other fields - Mailing list pgsql-hackers

From wieck@debis.com (Jan Wieck)
Subject Re: [HACKERS] compression in LO and other fields
Date
Msg-id m11mI1a-0003kLC@orion.SAPserv.Hamburg.dsh.de
Whole thread Raw
In response to Re: [HACKERS] compression in LO and other fields  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] compression in LO and other fields
List pgsql-hackers
Tom Lane wrote:

> wieck@debis.com (Jan Wieck) writes:
> >     Html input might be somewhat  optimal  for  Adisak's  storage
> >     format,  but  taking into account that my source implementing
> >     the type input and  output  functions  is  smaller  than  600
> >     lines,  I  think 11% difference to a gzip -9 is a good result
> >     anyway.
>
> These strike me as very good results.  I'm not at all sure that using
> gzip or bzip would give much better results in practice in Postgres,
> because those compressors are optimized for relatively large files,
> whereas a compressed-field datatype would likely be getting relatively
> small field values to work on.  (So your test data set is probably a
> good one for our purposes --- do the numbers change if you exclude
> all the files over, say, 10K?)

    Will give it a try.

> It occurred to me last night that applying compression to individual
> fields might not be the best approach.  Certainly a "bytez" data type
> is the easiest thing to fit into the existing system, but it's leaving
> some space savings on the table.  What about compressing the *whole*
> data contents of a tuple on-disk, as a single entity?  That should save
> more space than field-by-field compression.

    But  it requires decompression of every tuple into palloc()'d
    memory during heap access. AFAIK, the  heap  access  routines
    currently  return  a  pointer  to  the  tuple  inside the shm
    buffer. Don't know what it's performance impact would be.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: Re: [HACKERS] compression in LO and other fields
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Slow - grindingly slow - query