Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Date
Msg-id 603c8f070901060657k40de254ew53f510e6b5a0b2dd@mail.gmail.com
Whole thread Raw
In response to Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
>>> not compressing very small datums (< 256 bytes) also seems smart,
>>> since that could end up producing a lot of extra compression attempts,
>>> most of which will end up saving little or no space.
>
> That was presumably the rationale for the original logic. However experience
> shows that there are certainly databases that store a lot of compressible
> short strings.
>
> Obviously databases with CHAR(n) desperately need us to compress them. But
> even plain text data are often moderately compressible even with our fairly
> weak compression algorithm.
>
> One other thing that bothers me about our toast mechanism is that it only
> kicks in for tuples that are "too large". It seems weird that the same column
> is worth compressing or not depending on what other columns are in the same
> tuple.

That's a fair point.  There's definitely some inconsistency in the
current behavior.  It seems to me that, in theory, compression and
out-of-line storage are two separate behaviors.  Out-of-line storage
is pretty much a requirement for dealing with large objects, given
that the page size is a constant; compression is not a requirement,
but definitely beneficial under some circumstances, particularly when
it removes the need for out-of-line storage.

char(n) is kind of a wierd case because you could also compress by
storing a count of the trailing spaces, without applying a
general-purpose compression algorithm.  But either way the field is no
longer fixed-width, and therefore field access can't be done as a
simple byte offset from the start of the tuple.

It's difficult even to enumerate the possible use cases, let alone
what knobs would be needed to cater to all of them.

...Robert


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bugs during ProcessCatchupEvent()
Next
From: Tom Lane
Date:
Subject: Re: Some more function-default issues