Tom Lane wrote:
> wieck@debis.com (Jan Wieck) writes:
> > Html input might be somewhat optimal for Adisak's storage
> > format, but taking into account that my source implementing
> > the type input and output functions is smaller than 600
> > lines, I think 11% difference to a gzip -9 is a good result
> > anyway.
>
> These strike me as very good results. I'm not at all sure that using
> gzip or bzip would give much better results in practice in Postgres,
> because those compressors are optimized for relatively large files,
> whereas a compressed-field datatype would likely be getting relatively
> small field values to work on. (So your test data set is probably a
> good one for our purposes --- do the numbers change if you exclude
> all the files over, say, 10K?)
Will give it a try.
> It occurred to me last night that applying compression to individual
> fields might not be the best approach. Certainly a "bytez" data type
> is the easiest thing to fit into the existing system, but it's leaving
> some space savings on the table. What about compressing the *whole*
> data contents of a tuple on-disk, as a single entity? That should save
> more space than field-by-field compression.
But it requires decompression of every tuple into palloc()'d
memory during heap access. AFAIK, the heap access routines
currently return a pointer to the tuple inside the shm
buffer. Don't know what it's performance impact would be.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #