Thread: compression in LO and other fields

compression in LO and other fields

From

Karel Zak - Zakkr

Date:

11 November 1999, 17:09:47


Hi,
in TODO is the item "Allow compression of large fields or a 
compressed field type". It is good idea, but it prabably needs 
binary field firstly (or not?). 
I see the inv_api and other LO routines, and my idea is add support 
for bzip2 stream to the inv_api and allow in current LO routines used 
compression. It si good idea? 
                    Karel Zak
 

------------------------------------------------------------------------------
Karel Zak <zakkr@zf.jcu.cz>                      http://home.zf.jcu.cz/~zakkr/

Docs:         http://docs.linux.cz                          (big docs archive)    
Kim Project:  http://home.zf.jcu.cz/~zakkr/kim/              (process manager)
FTP:          ftp://ftp2.zf.jcu.cz/users/zakkr/              (C/ncurses/PgSQL)
------------------------------------------------------------------------------

Re: [HACKERS] compression in LO and other fields

From

Tom Lane

Date:

11 November 1999, 19:09:50

Karel Zak - Zakkr <zakkr@zf.jcu.cz> writes:
>  I see the inv_api and other LO routines, and my idea is add support 
> for bzip2 stream to the inv_api and allow in current LO routines used 
> compression. It si good idea? 

LO is a dead end.  What we really want to do is eliminate tuple-size
restrictions and then have large ordinary fields (probably of type
bytea) in regular tuples.  I'd suggest working on compression in that
context, say as a new data type called "bytez" or something like that.
bytez would act just like bytea except the on-disk representation would
be compressed.  A compressed variant of type "text" would also be useful.
In the long run this will be much more useful and easier to work with
than adding another frammish to large objects.
        regards, tom lane

Re: [HACKERS] compression in LO and other fields

From

Tatsuo Ishii

Date:

12 November 1999, 00:01:51

> LO is a dead end.  What we really want to do is eliminate tuple-size
> restrictions and then have large ordinary fields (probably of type
> bytea) in regular tuples.  I'd suggest working on compression in that
> context, say as a new data type called "bytez" or something like that.

It sounds ideal but I remember that Vadim said inserting a 2GB record
is not good idea since it will be written into the log too. If it's a
necessary limitation from the point of view of WAL, we have to accept
it, I think.

BTW, I still don't have enough time to run the huge sort tests on
6.5.x. Probably I would have chance next week to do that...
---
Tatsuo Ishii

Re: [HACKERS] compression in LO and other fields

From

wieck@debis.com (Jan Wieck)

Date:

12 November 1999, 01:41:53

Tatsuo Ishii wrote:

> > LO is a dead end.  What we really want to do is eliminate tuple-size
> > restrictions and then have large ordinary fields (probably of type
> > bytea) in regular tuples.  I'd suggest working on compression in that
> > context, say as a new data type called "bytez" or something like that.
>
> It sounds ideal but I remember that Vadim said inserting a 2GB record
> is not good idea since it will be written into the log too. If it's a
> necessary limitation from the point of view of WAL, we have to accept
> it, I think.

    Just  in case someone want to implement a complete compressed
    data type  (including  comarision  functions,  operators  and
    indexing default operator class).

    I  already  made  some  tests  with  a type I called 'lztext'
    locally.  Only the input-/output-functions exist so  far  and
    as  the  name  might  suggest, it would be an alternative for
    'text'. It uses a simple but fast, byte oriented LZ  backward
    pointing  method.  No  Huffman coding or variable offset/size
    tagging. First byte of a chunk  tells  bitwise  if  the  next
    following  8  items are raw bytes to copy or 12 bit offset, 4
    bit size copy information.  That is max back offset 4096  and
    max match size 17 bytes.

    What   made  it  my  preferred  method  was  the  fact,  that
    decompression is done entirely using the already decompressed
    portion  of  the data, so it does not need any code tables or
    the like at that time.

    It is really FASTEST on decompression, which I  assume  would
    be  the  mostly often used operation on huge data types. With
    some care,  comparision  could  be  done  on  the  fly  while
    decompressing  two values, so that the entire comparision can
    be aborted at the occurence of the first difference.

    The compression rates aren't that giantic.  I've  got  30-50%
    for  rule  plan  strings  (size  limit  on views!!!). And the
    method used only allows for  buffer  back  references  of  4K
    offsets  at  most,  so the rate will not grow for larger data
    chunks. That's a heavy tradeoff between compression rate  and
    no  memory  leakage  for sure and speed, I know, but I prefer
    not to force it, instead I usually use a bigger  hammer  (the
    tuple  size limit is still our original problem - and another
    IBM 72GB disk doing 22-37 MB/s will make any compressing data
    type obsolete then).

    Sorry  for the compression specific slang here.  Well, anyone
    interested in the code?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] compression in LO and other fields

From

Bruce Momjian

Date:

12 November 1999, 01:55:53

>     The compression rates aren't that giantic.  I've  got  30-50%
>     for  rule  plan  strings  (size  limit  on views!!!). And the
>     method used only allows for  buffer  back  references  of  4K
>     offsets  at  most,  so the rate will not grow for larger data
>     chunks. That's a heavy tradeoff between compression rate  and
>     no  memory  leakage  for sure and speed, I know, but I prefer
>     not to force it, instead I usually use a bigger  hammer  (the
>     tuple  size limit is still our original problem - and another
>     IBM 72GB disk doing 22-37 MB/s will make any compressing data
>     type obsolete then).
> 
>     Sorry  for the compression specific slang here.  Well, anyone
>     interested in the code?

In contrib?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] compression in LO and other fields

From

Tom Lane

Date:

12 November 1999, 04:15:55

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> LO is a dead end.  What we really want to do is eliminate tuple-size
>> restrictions and then have large ordinary fields (probably of type
>> bytea) in regular tuples.  I'd suggest working on compression in that
>> context, say as a new data type called "bytez" or something like that.

> It sounds ideal but I remember that Vadim said inserting a 2GB record
> is not good idea since it will be written into the log too. If it's a
> necessary limitation from the point of view of WAL, we have to accept
> it, I think.

LO won't make that any better: the data still goes into a table.
You'd have 2GB worth of WAL entries either way.

The only thing LO would do for you is divide the data into block-sized
tuples, so there would be a bunch of little WAL entries instead of one
big one.  But that'd probably be easy to duplicate too.  If we implement
big tuples by chaining together disk-block-sized segments, which seems
like the most likely approach, couldn't WAL log each segment as a
separate log entry?  If so, there's almost no difference between LO and
inline field for logging purposes.
        regards, tom lane

Re: [HACKERS] compression in LO and other fields

From

Karel Zak - Zakkr

Date:

12 November 1999, 07:33:57

On Fri, 12 Nov 1999, Tom Lane wrote:

> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> >> LO is a dead end.  What we really want to do is eliminate tuple-size
> >> restrictions and then have large ordinary fields (probably of type
> >> bytea) in regular tuples.  I'd suggest working on compression in that
> >> context, say as a new data type called "bytez" or something like that.

--- cut ---
> 
> The only thing LO would do for you is divide the data into block-sized
> tuples, so there would be a bunch of little WAL entries instead of one
> big one.  But that'd probably be easy to duplicate too.  If we implement
> big tuples by chaining together disk-block-sized segments, which seems
> like the most likely approach, couldn't WAL log each segment as a
> separate log entry?  If so, there's almost no difference between LO and
> inline field for logging purposes.
> 

I'am not sure, that LO is a dead end for every users. Big (blob) fields
going during SQL engine (?), but why - if I needn't use this data as
typically SQL data (I not need index, search .. in (example) gif files).
Will pity if LO devel. will go down. I still thing that LO compression is
not bad idea :-)

Other eventual compression questions:

* some aplication allow use over slow networks between client<->server  a compressed stream, and PostgreSQL? 

* MySQL dump allow make compressed dump file, it is good, and PostgreSQL?
                        Karel

Re: [HACKERS] compression in LO and other fields

From

Karel Zak - Zakkr

Date:

12 November 1999, 07:52:57

On Fri, 12 Nov 1999, Jan Wieck wrote:

>     Just  in case someone want to implement a complete compressed
>     data type  (including  comarision  functions,  operators  and
>     indexing default operator class).
> 
>     I  already  made  some  tests  with  a type I called 'lztext'
>     locally.  Only the input-/output-functions exist so  far  and
>     as  the  name  might  suggest, it would be an alternative for
>     'text'. It uses a simple but fast, byte oriented LZ  backward
>     pointing  method.  No  Huffman coding or variable offset/size
>     tagging. First byte of a chunk  tells  bitwise  if  the  next
>     following  8  items are raw bytes to copy or 12 bit offset, 4
>     bit size copy information.  That is max back offset 4096  and
>     max match size 17 bytes.

I is your original implementation or you use any current compression 
code? I try bzip2, but output from this algorithm is total binary, 
I don't know how this use in PgSQL if in backend are all routines
(in/out) use *char (yes, I'am newbie for PgSQL hacking:-). 

> 
>     What   made  it  my  preferred  method  was  the  fact,  that
>     decompression is done entirely using the already decompressed
>     portion  of  the data, so it does not need any code tables or
>     the like at that time.
> 
>     It is really FASTEST on decompression, which I  assume  would
>     be  the  mostly often used operation on huge data types. With
>     some care,  comparision  could  be  done  on  the  fly  while
>     decompressing  two values, so that the entire comparision can
>     be aborted at the occurence of the first difference.
> 
>     The compression rates aren't that giantic.  I've  got  30-50%

Not is problem, that your implementation compress all data at once?
Typically compression use a stream, and compress only small a buffer 
in any cycle.

>     for  rule  plan  strings  (size  limit  on views!!!). And the
>     method used only allows for  buffer  back  references  of  4K
>     offsets  at  most,  so the rate will not grow for larger data
>     chunks. That's a heavy tradeoff between compression rate  and
>     no  memory  leakage  for sure and speed, I know, but I prefer
>     not to force it, instead I usually use a bigger  hammer  (the
>     tuple  size limit is still our original problem - and another
>     IBM 72GB disk doing 22-37 MB/s will make any compressing data
>     type obsolete then).
> 
>     Sorry  for the compression specific slang here.  Well, anyone
>     interested in the code?

Yes, for me - I finish to_char()/to_data() ora compatible routines 
(Thomas, you still quiet?) and this is new appeal for me :-)
                    Karel

Re: [HACKERS] compression in LO and other fields

From

Tatsuo Ishii

Date:

12 November 1999, 11:45:00

> > It sounds ideal but I remember that Vadim said inserting a 2GB record
> > is not good idea since it will be written into the log too. If it's a
> > necessary limitation from the point of view of WAL, we have to accept
> > it, I think.
> 
> LO won't make that any better: the data still goes into a table.
> You'd have 2GB worth of WAL entries either way.

What in my mind was LO that is not under the transaction control.  I
would not say this is a good thing, but I'm afraid we might need this
kind of beast in WAL.

> The only thing LO would do for you is divide the data into block-sized
> tuples, so there would be a bunch of little WAL entries instead of one
> big one.  But that'd probably be easy to duplicate too.  If we implement
> big tuples by chaining together disk-block-sized segments, which seems
> like the most likely approach, couldn't WAL log each segment as a
> separate log entry?  If so, there's almost no difference between LO and
> inline field for logging purposes.

Right.

BTW, does anybody know how BLOBs are handled by WAL in commercial
DBMSs?
---
Tatsuo Ishii

Re: [HACKERS] compression in LO and other fields

From

wieck@debis.com (Jan Wieck)

Date:

12 November 1999, 12:08:00

Karel Zak - Zakkr wrote:

> On Fri, 12 Nov 1999, Jan Wieck wrote:
>
> >     I  already  made  some  tests  with  a type I called 'lztext'
> >     locally.  Only the input-/output-functions exist so  far  and
>
> I is your original implementation or you use any current compression
> code? I try bzip2, but output from this algorithm is total binary,
> I don't know how this use in PgSQL if in backend are all routines
> (in/out) use *char (yes, I'am newbie for PgSQL hacking:-).

    The  internal  storage  format is based on an article I found
    at:

        http://www.neutralzone.org/home/faqsys/docs/slz_art.txt

        Simple Compression using an LZ buffer
        Part 3 Revision 1.d:
        An introduction to compression on the Amiga by Adisak Pochanayon

        Freely Distributable as long as reproduced completely.
        Copyright 1993 Adisak Pochanayon

    I've written the code from scratch.

    The internal representation  is  binary,  for  sure.  It's  a
    PostgreSQL variable length data format as usual.

    I  don't know if there's a compression library available that
    fit's our need. First and  most  important  it  must  have  a
    license  that  permits  us  to include it in the distribution
    under our existing license. Second it's  implementation  must
    not  cause any problems in the backend like memory leakage or
    the like.

> >     The compression rates aren't that giantic.  I've  got  30-50%
>
> Not is problem, that your implementation compress all data at once?
> Typically compression use a stream, and compress only small a buffer
> in any cycle.

    No, that's no problem. On type input, the original  value  is
    completely  in  memory  given  as  a  char*, and the internal
    representation is returned as a palloc()'d Datum. For  output
    it's vice versa.

    O.K.  some  details  on  the  compression rate. I've used 112
    .html files with a total size of  1188346  bytes  this  time.
    The  smallest one was 131 bytes, the largest one 114549 bytes
    and most of the files are somewhere between 3-12K.

    Compression results on the binary level are:

        gzip -9 outputs 398180 bytes (66.5% rate)

        gzip -1 outputs 447597 bytes (62.3% rate)

        my code outputs 529420 bytes (55.4% rate)

    Html input might be somewhat  optimal  for  Adisak's  storage
    format,  but  taking into account that my source implementing
    the type input and  output  functions  is  smaller  than  600
    lines,  I  think 11% difference to a gzip -9 is a good result
    anyway.

> >     Sorry  for the compression specific slang here.  Well, anyone
> >     interested in the code?
>
> Yes, for me - I finish to_char()/to_data() ora compatible routines
> (Thomas, you still quiet?) and this is new appeal for me :-)

    Bruce suggested the contrib area, but I'm not sure if  that's
    the right place. If it goes into the distribution at all, I'd
    like to use this data type for rule plan strings and function
    source text in the system catalogs. I don't expect we'll have
    a general solution for tuples split  across  multiple  blocks
    for  v7.0.  And  using  lztext for rules and function sources
    would lower some FRP's. But using it in the catalogs requires
    to be builtin.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] compression in LO and other fields

From

Tom Lane

Date:

12 November 1999, 12:46:00

wieck@debis.com (Jan Wieck) writes:
>     Html input might be somewhat  optimal  for  Adisak's  storage
>     format,  but  taking into account that my source implementing
>     the type input and  output  functions  is  smaller  than  600
>     lines,  I  think 11% difference to a gzip -9 is a good result
>     anyway.

These strike me as very good results.  I'm not at all sure that using
gzip or bzip would give much better results in practice in Postgres,
because those compressors are optimized for relatively large files,
whereas a compressed-field datatype would likely be getting relatively
small field values to work on.  (So your test data set is probably a
good one for our purposes --- do the numbers change if you exclude
all the files over, say, 10K?)

>     Bruce suggested the contrib area, but I'm not sure if  that's
>     the right place. If it goes into the distribution at all, I'd
>     like to use this data type for rule plan strings and function
>     source text in the system catalogs.

Right, if we are going to bother with it at all, we should put it
into the core so that we can use it for rule plans.

>     I don't expect we'll have
>     a general solution for tuples split  across  multiple  blocks
>     for  v7.0.

I haven't given up hope of that yet --- but even if we do, compressing
the data is an attractive choice to reduce the frequency with which
tuples must be split across blocks.

It occurred to me last night that applying compression to individual
fields might not be the best approach.  Certainly a "bytez" data type
is the easiest thing to fit into the existing system, but it's leaving
some space savings on the table.  What about compressing the *whole*
data contents of a tuple on-disk, as a single entity?  That should save
more space than field-by-field compression.  It could be triggered in
the tuple storage routines whenever the uncompressed size exceeds some
threshold.  (We'd need a flag in the tuple header to indicate compressed
data, but I think there are bits to spare.)  When we get around to
having split tuples, the code would still be useful because it'd be
applied as a first resort before splitting a large tuple; it'd reduce
the frequency of splits and the number of sections big tuples get split
into.  All automatic and transparent, too --- the user doesn't have to
change data declarations at all.

Also, if we do it that way, then it would *automatically* apply to
both regular tuples and LO, because the current LO implementation is
just tuples.  (Tatsuo's idea of a non-transaction-controlled LO would
need extra work, of course, if we decide that's a good idea...)
        regards, tom lane

Re: [HACKERS] compression in LO and other fields

From

Thomas Lockhart

Date:

12 November 1999, 12:48:00

> Yes, for me - I finish to_char()/to_data() ora compatible routines
> (Thomas, you still quiet?) and this is new appeal for me :-)

Ack! When I saw this I rolled up my primary Netscape window and found
you're almost-completed reply underneath. I've sent it along now.
                    - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California

Re: [HACKERS] compression in LO and other fields

From

The Hermit Hacker

Date:

12 November 1999, 12:51:00

On Fri, 12 Nov 1999, Jan Wieck wrote:

>     I  don't know if there's a compression library available that
>     fit's our need. First and  most  important  it  must  have  a
>     license  that  permits  us  to include it in the distribution
>     under our existing license. Second it's  implementation  must
>     not  cause any problems in the backend like memory leakage or
>     the like.

Is this something that could be a configure option?  Put the stubs in
place, and if someone wants to enable that feature, they can install the
compression library first and run with it?

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org

Re: [HACKERS] compression in LO and other fields

From

wieck@debis.com (Jan Wieck)

Date:

12 November 1999, 12:59:00

Tom Lane wrote:

> wieck@debis.com (Jan Wieck) writes:
> >     Html input might be somewhat  optimal  for  Adisak's  storage
> >     format,  but  taking into account that my source implementing
> >     the type input and  output  functions  is  smaller  than  600
> >     lines,  I  think 11% difference to a gzip -9 is a good result
> >     anyway.
>
> These strike me as very good results.  I'm not at all sure that using
> gzip or bzip would give much better results in practice in Postgres,
> because those compressors are optimized for relatively large files,
> whereas a compressed-field datatype would likely be getting relatively
> small field values to work on.  (So your test data set is probably a
> good one for our purposes --- do the numbers change if you exclude
> all the files over, say, 10K?)

    Will give it a try.

> It occurred to me last night that applying compression to individual
> fields might not be the best approach.  Certainly a "bytez" data type
> is the easiest thing to fit into the existing system, but it's leaving
> some space savings on the table.  What about compressing the *whole*
> data contents of a tuple on-disk, as a single entity?  That should save
> more space than field-by-field compression.

    But  it requires decompression of every tuple into palloc()'d
    memory during heap access. AFAIK, the  heap  access  routines
    currently  return  a  pointer  to  the  tuple  inside the shm
    buffer. Don't know what it's performance impact would be.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] compression in LO and other fields

From

wieck@debis.com (Jan Wieck)

Date:

12 November 1999, 13:06:06

Marc G. Fournier wrote:

> On Fri, 12 Nov 1999, Jan Wieck wrote:
>
> >     I  don't know if there's a compression library available that
> >     fit's our need. First and  most  important  it  must  have  a
> >     license  that  permits  us  to include it in the distribution
> >     under our existing license. Second it's  implementation  must
> >     not  cause any problems in the backend like memory leakage or
> >     the like.
>
> Is this something that could be a configure option?  Put the stubs in
> place, and if someone wants to enable that feature, they can install the
> compression library first and run with it?

    If  using  the  new type in system catalogs, the option could
    only be what kind of compression to use. And we need our  own
    default compression code shipped anyway.

    Of  course, it could depend on the config what types are used
    in the syscat. But making the catalog headers things that are
    shipped as a .in isn't really that good IMHO.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] compression in LO and other fields

From

Tom Lane

Date:

12 November 1999, 13:14:01

wieck@debis.com (Jan Wieck) writes:
> Tom Lane wrote:
>> It occurred to me last night that applying compression to individual
>> fields might not be the best approach.  Certainly a "bytez" data type
>> is the easiest thing to fit into the existing system, but it's leaving
>> some space savings on the table.  What about compressing the *whole*
>> data contents of a tuple on-disk, as a single entity?  That should save
>> more space than field-by-field compression.

>     But  it requires decompression of every tuple into palloc()'d
>     memory during heap access. AFAIK, the  heap  access  routines
>     currently  return  a  pointer  to  the  tuple  inside the shm
>     buffer. Don't know what it's performance impact would be.

Good point, but the same will be needed when a tuple is split across
multiple blocks.  I would expect that (given a reasonably fast
decompressor) there will be a net performance *gain* due to having
less disk I/O to do.  Also, this won't be happening for "every" tuple,
just those exceeding a size threshold --- we'd be able to tune the
threshold value to trade off speed and space.

One thing that does occur to me is that we need to store the
uncompressed as well as the compressed data size, so that the
working space can be palloc'd before starting the decompression.

Also, in case it wasn't clear, I was envisioning leaving the tuple
header uncompressed, so that time quals etc can be checked before
decompressing the tuple data.
        regards, tom lane

Re: [HACKERS] compression in LO and other fields

From

Philip Warner

Date:

12 November 1999, 13:16:05

>
>BTW, does anybody know how BLOBs are handled by WAL in commercial
>DBMSs?

Dec/RDB stores them in it's equivalent of the WAL; full rollback etc is
supported. If you load lots of large blobs, you need to make sure you have
enough disk space for the journal copies.

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: [HACKERS] compression in LO and other fields

From

Karel Zak - Zakkr

Date:

12 November 1999, 13:32:01

On Fri, 12 Nov 1999, Jan Wieck wrote:

> 
>     I  don't know if there's a compression library available that
>     fit's our need. First and  most  important  it  must  have  a
>     license  that  permits  us  to include it in the distribution
IMHO bzip2 compression algorith is free and open source - README
from bzip2 source:

"bzip2-0.9.5 is distributed under a BSD-style license.  For details,
see the file LICENSE"

>     Bruce suggested the contrib area, but I'm not sure if  that's
>     the right place. If it goes into the distribution at all, I'd

Is any space (on postgresql ftp?) for this unstable code? Good project 
has incoming of ftp for devel. versions...

If you this code not move to contrib, send me it (patch?), please. 
                    Karel

Re: [HACKERS] compression in LO and other fields

From

wieck@debis.com (Jan Wieck)

Date:

12 November 1999, 13:50:01

Tom Lane wrote:

> wieck@debis.com (Jan Wieck) writes:
>
> >     But  it requires decompression of every tuple into palloc()'d
> >     memory during heap access. AFAIK, the  heap  access  routines
> >     currently  return  a  pointer  to  the  tuple  inside the shm
> >     buffer. Don't know what it's performance impact would be.
>
> Good point, but the same will be needed when a tuple is split across
> multiple blocks.  I would expect that (given a reasonably fast
> decompressor) there will be a net performance *gain* due to having
> less disk I/O to do.  Also, this won't be happening for "every" tuple,
> just those exceeding a size threshold --- we'd be able to tune the
> threshold value to trade off speed and space.

    Right,  this  time  it's your good point. All of the problems
    will be there on tuple split implementation.

    The major problem I see is that a palloc()'d tuple should  be
    pfree()'d  after  the fetcher is done with it. Since they are
    in buffer actually, the fetcher doesn't have to care.

> One thing that does occur to me is that we need to store the
> uncompressed as well as the compressed data size, so that the
> working space can be palloc'd before starting the decompression.

    Yepp - and I'm doing so. Only during compression  the  result
    size  isn't known. But there is a well known maximum, that is
    the header overhead plus the data size by 1.125 plus 2  bytes
    (totally  worst  case  on uncompressable data). And a general
    mechanism working on the tuple level would fallback to  store
    uncompressed  data in the case the compressed size is bigger.

> Also, in case it wasn't clear, I was envisioning leaving the tuple
> header uncompressed, so that time quals etc can be checked before
> decompressing the tuple data.

    Of course.

    Well, you asked for the rates on the smaller html files only.
    78  files,  131  bytes  min, 10000 bytes max, 4582 bytes avg,
    357383 bytes total.

    gzip -9 outputs 145659 bytes (59.2%)
    gzip -1 outputs 155113 bytes (56.6%)
    my code outputs 184109 bytes (48.5%)

    67 files, 2000 bytes min, 10000 bytes max,  5239  bytes  avg,
    351006 bytes total.

    gzip -9 outputs 141772 bytes (59.6%)
    gzip -1 outputs 151150 bytes (56.9%)
    my code outputs 179428 bytes (48.9%)

    The  threshold will surely be a tuning parameter of interest.
    Another tuning option must be to allow/deny  compression  per
    table  at  all.   Then  we  could  have both options, using a
    compressing field type to define which portion of a tuple  to
    compress, or allow to compress the entire tuples.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] compression in LO and other fields

From

"Ross J. Reedstrom"

Date:

12 November 1999, 14:01:03

On Fri, Nov 12, 1999 at 10:16:21AM +0100, Karel Zak - Zakkr wrote:
> 
> 
> Other eventual compression questions:
> 
> * MySQL dump allow make compressed dump file, it is good, and PostgreSQL?
> 

pg_dump database | gzip >db.out.gz

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005

Re: [HACKERS] compression in LO and other fields

From

Tom Lane

Date:

12 November 1999, 14:22:02

wieck@debis.com (Jan Wieck) writes:
>     The major problem I see is that a palloc()'d tuple should  be
>     pfree()'d  after  the fetcher is done with it. Since they are
>     in buffer actually, the fetcher doesn't have to care.

I think this may not be as big a problem as it looks.  Most places
in the executor keep tuples in TupleTableSlots, which are responsible
for pfree'ing the tuple if (and only if) necessary; all that code is
ready for this change already.  There are probably some routines in
heapam/indexam that assume they only work with tuples that never need
to be freed, but I don't think the fixes will be pervasive.  And we're
going to have to do that work in any case to support big tuples
(assuming we do it by splitting tuples into segments that fit in disk
pages).

>     And a general
>     mechanism working on the tuple level would fallback to  store
>     uncompressed  data in the case the compressed size is bigger.

Right.  Another possible place for speed-vs-space tuning would be to
store the uncompressed representation unless the compressed version is
at least X percent smaller, not just at-least-one-byte smaller.
        regards, tom lane

Re: [HACKERS] compression in LO and other fields

From

Karel Zak - Zakkr

Date:

12 November 1999, 14:32:02


On Fri, 12 Nov 1999, Ross J. Reedstrom wrote:

> On Fri, Nov 12, 1999 at 10:16:21AM +0100, Karel Zak - Zakkr wrote:
> > 
> > 
> > Other eventual compression questions:
> > 
> > * MySQL dump allow make compressed dump file, it is good, and PostgreSQL?
> > 
> 
> pg_dump database | gzip >db.out.gz

Thank... :-)) 
But mysqldump --compress is very nice.
                    Karel

Re: [HACKERS] compression in LO and other fields

From

Bruce Momjian

Date:

12 November 1999, 15:15:03

> >     Bruce suggested the contrib area, but I'm not sure if  that's
> >     the right place. If it goes into the distribution at all, I'd
> >     like to use this data type for rule plan strings and function
> >     source text in the system catalogs.
> 
> Right, if we are going to bother with it at all, we should put it
> into the core so that we can use it for rule plans.

Agreed.  I suggested contrib so the code doesn't just disappear.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] compression in LO and other fields

From

Don Baccus

Date:

12 November 1999, 16:45:04

At 05:15 PM 11/12/99 +0100, Karel Zak - Zakkr wrote:

>> pg_dump database | gzip >db.out.gz

> Thank... :-)) 

> But mysqldump --compress is very nice.

Why add functionality for something that can be done so
easily by piping the output of pg_dump?

This is exactly the kind of coupling of tools that pipes
were invented for.

- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.

Re: [HACKERS] compression in LO and other fields

From

Hannu Krosing

Date:

12 November 1999, 20:04:06

Jan Wieck wrote:
> 
>     Of course.
> 
>     Well, you asked for the rates on the smaller html files only.
>     78  files,  131  bytes  min, 10000 bytes max, 4582 bytes avg,
>     357383 bytes total.
> 
>     gzip -9 outputs 145659 bytes (59.2%)
>     gzip -1 outputs 155113 bytes (56.6%)
>     my code outputs 184109 bytes (48.5%)
> 
>     67 files, 2000 bytes min, 10000 bytes max,  5239  bytes  avg,
>     351006 bytes total.
> 
>     gzip -9 outputs 141772 bytes (59.6%)
>     gzip -1 outputs 151150 bytes (56.9%)
>     my code outputs 179428 bytes (48.9%)
> 
>     The  threshold will surely be a tuning parameter of interest.
>     Another tuning option must be to allow/deny  compression  per
>     table  at  all.   Then  we  could  have both options, using a
>     compressing field type to define which portion of a tuple  to
>     compress, or allow to compress the entire tuples.

The next step would be tweaking the costs for sequential scans vs.
index scans.

I guess that the indexes would stay uncompressed ?

------
Hannu

Re: [HACKERS] compression in LO and other fields

From

Hannu Krosing

Date:

12 November 1999, 20:12:06

Don Baccus wrote:
> 
> At 05:15 PM 11/12/99 +0100, Karel Zak - Zakkr wrote:
> 
> >> pg_dump database | gzip >db.out.gz
> 
> > Thank... :-))
> 
> > But mysqldump --compress is very nice.
> 
> Why add functionality for something that can be done so
> easily by piping the output of pg_dump?
> 
> This is exactly the kind of coupling of tools that pipes
> were invented for.

Exactly !

Another version of the same is used when dumping databases 
bigger than 2GB (or whatever the file system size limit is)

just do:

pg_dump database | gzip | split -b 1000000000 db.out.gz.

and restore it using

cat db.out.gz* | gunzip > psql

you could also do other fancy things with your dumps - send 
them directly to tape storage in Japan or whatever ;)

If you need that functionality often enough then write a shell 
script.

---------------
Hannu

Re: [HACKERS] compression in LO and other fields

From

wieck@debis.com (Jan Wieck)

Date:

12 November 1999, 23:53:08

>
> Jan Wieck wrote:
> >
> >     Of course.
> >
> >     Well, you asked for the rates on the smaller html files only.
> >     78  files,  131  bytes  min, 10000 bytes max, 4582 bytes avg,
> >     357383 bytes total.
> >
> >     gzip -9 outputs 145659 bytes (59.2%)
> >     gzip -1 outputs 155113 bytes (56.6%)
> >     my code outputs 184109 bytes (48.5%)
> >
> >     67 files, 2000 bytes min, 10000 bytes max,  5239  bytes  avg,
> >     351006 bytes total.
> >
> >     gzip -9 outputs 141772 bytes (59.6%)
> >     gzip -1 outputs 151150 bytes (56.9%)
> >     my code outputs 179428 bytes (48.9%)
> >
> >     The  threshold will surely be a tuning parameter of interest.
> >     Another tuning option must be to allow/deny  compression  per
> >     table  at  all.   Then  we  could  have both options, using a
> >     compressing field type to define which portion of a tuple  to
> >     compress, or allow to compress the entire tuples.
>
> The next step would be tweaking the costs for sequential scans vs.
> index scans.
>
> I guess that the indexes would stay uncompressed ?
>
> ------
> Hannu
>


--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] compression in LO and other fields

From

The Hermit Hacker

Date:

13 November 1999, 00:02:08

On Fri, 12 Nov 1999, Tom Lane wrote:

> wieck@debis.com (Jan Wieck) writes:
> > Tom Lane wrote:
> >> It occurred to me last night that applying compression to individual
> >> fields might not be the best approach.  Certainly a "bytez" data type
> >> is the easiest thing to fit into the existing system, but it's leaving
> >> some space savings on the table.  What about compressing the *whole*
> >> data contents of a tuple on-disk, as a single entity?  That should save
> >> more space than field-by-field compression.
> 
> >     But  it requires decompression of every tuple into palloc()'d
> >     memory during heap access. AFAIK, the  heap  access  routines
> >     currently  return  a  pointer  to  the  tuple  inside the shm
> >     buffer. Don't know what it's performance impact would be.
> 
> Good point, but the same will be needed when a tuple is split across
> multiple blocks.  I would expect that (given a reasonably fast
> decompressor) there will be a net performance *gain* due to having
> less disk I/O to do.  
Right now, we're dealing theory...my concern is what Jan points
out "what it's performance impact would be"...would much harder would it
be to extent our "CREATE TABLE" syntax to do something like:

CREATE TABLE classname ( .. ) compressed;
Or something similar?  Something that leaves the ability to do
this in the core, but makes the use of this the choice of the admin?  
*Assuming* that I'm also reading this thread correctly, it should
almost be extended into "ALTER TABLE classname SET COMPRESSED on;", or
something like that.  Where all new records are *written* compressed (or
uncompressed), but any reads checks if compressed size == uncompressed
size, and decompresses accordingly...

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org

Re: [HACKERS] compression in LO and other fields

From

The Hermit Hacker

Date:

13 November 1999, 00:03:08

On Fri, 12 Nov 1999, Tom Lane wrote:

> wieck@debis.com (Jan Wieck) writes:
> > Tom Lane wrote:
> >> It occurred to me last night that applying compression to individual
> >> fields might not be the best approach.  Certainly a "bytez" data type
> >> is the easiest thing to fit into the existing system, but it's leaving
> >> some space savings on the table.  What about compressing the *whole*
> >> data contents of a tuple on-disk, as a single entity?  That should save
> >> more space than field-by-field compression.
> 
> >     But  it requires decompression of every tuple into palloc()'d
> >     memory during heap access. AFAIK, the  heap  access  routines
> >     currently  return  a  pointer  to  the  tuple  inside the shm
> >     buffer. Don't know what it's performance impact would be.
> 
> Good point, but the same will be needed when a tuple is split across
> multiple blocks.  I would expect that (given a reasonably fast
> decompressor) there will be a net performance *gain* due to having
> less disk I/O to do.  
Right now, we're dealing theory...my concern is what Jan points
out "what it's performance impact would be"...would much harder would it
be to extent our "CREATE TABLE" syntax to do something like:

CREATE TABLE classname ( .. ) compressed;
Or something similar?  Something that leaves the ability to do
this in the core, but makes the use of this the choice of the admin?  
*Assuming* that I'm also reading this thread correctly, it should
almost be extended into "ALTER TABLE classname SET COMPRESSED on;", or
something like that.  Where all new records are *written* compressed (or
uncompressed), but any reads checks if compressed size == uncompressed
size, and decompresses accordingly...

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org

Re: [HACKERS] compression in LO and other fields

From

wieck@debis.com (Jan Wieck)

Date:

13 November 1999, 00:34:09

Ech - wrong key :-)

Hannu Krosing wrote:

> Jan Wieck wrote:
> >
>
> The next step would be tweaking the costs for sequential scans vs.
> index scans.
>
> I guess that the indexes would stay uncompressed ?

    I'm  sure  about  this.  On  a  database of significant size,
    anyone indexing a field with a possible size over  100  bytes
    is  doing  something  wrong  (and  only  idiots  go above 500
    bytes). They are IMPLEMENTING a not well thought out database
    DESIGN.  A  database  engine should support indices on bigger
    fields, but it's still a bad schema and thus idiotic.

    Currently, we don't check the size of indexed fields. And the
    only  problems I've seen with it where some reports that huge
    PL functions could not be created because there was an unused
    (idiotic) index on the prosrc attribute and they exceeded the
    4K limit for index tuples. I've removed this index already in
    the  v7.0 tree. The ?bug? in the btree code, failing to split
    a page if the key values exceed 4K, is  still  there.  But  I
    don't think anyone really cares for it.

    Thus,  I  assume  there  aren't  many idiots out there. And I
    don't expect that anyone would ever  create  an  index  on  a
    compressed data type.

    ?bug?  ->  The  difference  between  a  bug  and a feature is
    DOCUMENTATION.  Thomas, would you please add  this  limit  on
    index tuples to the doc's so we have a new FEATURE to tell in
    the v7.0 announcement?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] compression in LO and other fields

From

wieck@debis.com (Jan Wieck)

Date:

13 November 1999, 01:01:09

Marc G. Fournier wrote:

> On Fri, 12 Nov 1999, Tom Lane wrote:
> > wieck@debis.com (Jan Wieck) writes:
> > > Tom Lane wrote:

>    Right now, we're dealing theory...my concern is what Jan points
> out "what it's performance impact would be"...would much harder would it
> be to extent our "CREATE TABLE" syntax to do something like:
>
> CREATE TABLE classname ( .. ) compressed;
>
>    Or something similar?  Something that leaves the ability to do
> this in the core, but makes the use of this the choice of the admin?

    Yepp,   exactly   that's  what  I  meant  with  making  tuple
    compression a per table option. Obviously,  ALTER  TABLE  ...
    must  be supported too - that's simply a parser -> utility ->
    flip flag in pg_class thing (90% cut&paste).

    I think the part  on  deciding  what  to  compress  is  easy,
    because  the  flag  telling  if  heap  access  should  try to
    compress a tuple on append (i.e. INSERT or UPDATE) has to  be
    in pg_class. And the content of a relations pg_class entry is
    somewhere below the Relation struct (thus already known after
    heap_open).

    The  idea  was to use another bit in the tuple header to tell
    if an existing heap tuple's data is compressed or not. So the
    heap  fetching  allways looks at the bit in the tuple header,
    and the heap appending looks at  the  flag  in  the  relation
    pointer. That's exactly what you want, no?

    The  major  part  is  to make all callers of heap_fetch() and
    sisters treat in memory decompressed  (or  from  block  split
    reconstructed) tuples the right way.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] compression in LO and other fields

From

Tom Lane

Date:

13 November 1999, 01:48:09

wieck@debis.com (Jan Wieck) writes:
>     The  idea  was to use another bit in the tuple header to tell
>     if an existing heap tuple's data is compressed or not. So the
>     heap  fetching  allways looks at the bit in the tuple header,
>     and the heap appending looks at  the  flag  in  the  relation
>     pointer. That's exactly what you want, no?

Right.  Compressed tuples must be unambiguously marked as such on-disk.
Whether to compress a tuple when writing it out is a decision that
can be made on-the-fly, using strategies that could change from time
to time, without invalidating the data that's already out there or
affecting the tuple-reading code.

If we choose to provide also a way of compressing individual fields
rather than whole tuples, it would be good to provide the same
flexibility at the field level.  Some tuples might contain the field
in compressed form, some in uncompressed form.  The reading logic
should not need to be aware of the way that the writing logic chooses
which to do.
        regards, tom lane

Re: [HACKERS] compression in LO and other fields

From

inoue

Date:

16 November 1999, 06:44:09


Tom Lane wrote:

> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> >> LO is a dead end.  What we really want to do is eliminate tuple-size
> >> restrictions and then have large ordinary fields (probably of type
> >> bytea) in regular tuples.  I'd suggest working on compression in that
> >> context, say as a new data type called "bytez" or something like that.
>
> > It sounds ideal but I remember that Vadim said inserting a 2GB record
> > is not good idea since it will be written into the log too. If it's a
> > necessary limitation from the point of view of WAL, we have to accept
> > it, I think.
>
> LO won't make that any better: the data still goes into a table.
> You'd have 2GB worth of WAL entries either way.
>
> The only thing LO would do for you is divide the data into block-sized
> tuples, so there would be a bunch of little WAL entries instead of one
> big one.  But that'd probably be easy to duplicate too.  If we implement
> big tuples by chaining together disk-block-sized segments, which seems
> like the most likely approach, couldn't WAL log each segment as a
> separate log entry?  If so, there's almost no difference between LO and
> inline field for logging purposes.
>

I don't know LO well.
But seems LO allows partial update.
Big tuples
If so,isn't it a significant difference ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp

Re: [HACKERS] compression in LO and other fields

From

inoue

Date:

16 November 1999, 06:48:10

Sorry,
I sent a mail by mistake.
Ignore my previous mail.

inoue wrote:

> Tom Lane wrote:
>
> > Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > >> LO is a dead end.  What we really want to do is eliminate tuple-size
> > >> restrictions and then have large ordinary fields (probably of type
> > >> bytea) in regular tuples.  I'd suggest working on compression in that
> > >> context, say as a new data type called "bytez" or something like that.
> >
> > > It sounds ideal but I remember that Vadim said inserting a 2GB record
> > > is not good idea since it will be written into the log too. If it's a
> > > necessary limitation from the point of view of WAL, we have to accept
> > > it, I think.
> >
> > LO won't make that any better: the data still goes into a table.
> > You'd have 2GB worth of WAL entries either way.
> >
> > The only thing LO would do for you is divide the data into block-sized
> > tuples, so there would be a bunch of little WAL entries instead of one
> > big one.  But that'd probably be easy to duplicate too.  If we implement
> > big tuples by chaining together disk-block-sized segments, which seems
> > like the most likely approach, couldn't WAL log each segment as a
> > separate log entry?  If so, there's almost no difference between LO and
> > inline field for logging purposes.
> >
>
> I don't know LO well.
> But seems LO allows partial update.
> Big tuples
> If so,isn't it a significant difference ?
>
> Regards.
>
> Hiroshi Inoue
> Inoue@tpf.co.jp