Home > mailing lists

Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Date	January 6, 2009 13:57:25
Msg-id	603c8f070901060657k40de254ew53f510e6b5a0b2dd@mail.gmail.com Whole thread Raw
In response to	Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) (Gregory Stark <stark@enterprisedb.com>)
Responses	Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) (Alvaro Herrera <alvherre@commandprompt.com>)
List	pgsql-hackers

Tree view

>>> not compressing very small datums (< 256 bytes) also seems smart,
>>> since that could end up producing a lot of extra compression attempts,
>>> most of which will end up saving little or no space.
>
> That was presumably the rationale for the original logic. However experience
> shows that there are certainly databases that store a lot of compressible
> short strings.
>
> Obviously databases with CHAR(n) desperately need us to compress them. But
> even plain text data are often moderately compressible even with our fairly
> weak compression algorithm.
>
> One other thing that bothers me about our toast mechanism is that it only
> kicks in for tuples that are "too large". It seems weird that the same column
> is worth compressing or not depending on what other columns are in the same
> tuple.

That's a fair point.  There's definitely some inconsistency in the
current behavior.  It seems to me that, in theory, compression and
out-of-line storage are two separate behaviors.  Out-of-line storage
is pretty much a requirement for dealing with large objects, given
that the page size is a constant; compression is not a requirement,
but definitely beneficial under some circumstances, particularly when
it removes the need for out-of-line storage.

char(n) is kind of a wierd case because you could also compress by
storing a count of the trailing spaces, without applying a
general-purpose compression algorithm.  But either way the field is no
longer fixed-width, and therefore field access can't be done as a
simple byte offset from the start of the tuple.

It's difficult even to enumerate the possible use cases, let alone
what knobs would be needed to cater to all of them.

...Robert

pgsql-hackers by date:

From: Tom Lane
Date: 06 January 2009, 13:45:12
Subject: Re: Bugs during ProcessCatchupEvent()

From: Tom Lane
Date: 06 January 2009, 13:58:45
Subject: Re: Some more function-default issues

Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) - Mailing list pgsql-hackers

Previous

Next