Re: Repair cosmetic damage (done by pg_indent?) - Mailing list pgsql-patches

From Gregory Stark
Subject Re: Repair cosmetic damage (done by pg_indent?)
Date
Msg-id 87myxfqxit.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: Repair cosmetic damage (done by pg_indent?)  (Decibel! <decibel@decibel.org>)
Responses Re: Repair cosmetic damage (done by pg_indent?)
Re: Repair cosmetic damage (done by pg_indent?)
List pgsql-patches
"Decibel!" <decibel@decibel.org> writes:

> On Fri, Jul 27, 2007 at 04:07:01PM +0100, Gregory Stark wrote:
>> Fwiw, do we really not want to compress anything smaller than 256 bytes
>> (everyone in Postgres uses the default strategy, not the always strategy).
>
> Is there actually a way to specify always compressing? I'm not seeing it
> on http://www.postgresql.org/docs/8.2/interactive/storage-toast.html

In the code there's an "always" strategy, but nothing in Postgres uses it so
there's no way to set it using ALTER TABLE ... SET STORAGE.

That might be an interesting approach though. We could add another SET STORAGE
value "COMPRESSIBLE" which says to use the always strategy. The neat thing
about this is we could set bpchar to use this storage type by default.

It's sort of sad that we're storing the extra padding bytes at all. But I
don't see a way around it. I would really prefer to strip them on input and
add them on output but I then we're back to the issue of possibly having lost
the typmod at some point along the way.

>> ISTM that with things like CHAR(n) around we might very well have some
>> databases where compression for smaller sized datums would be beneficial. I
>> would suggest 32 for the minimum.
>
> CPU is generally cheaper than IO now-a-days, so I agree with something
> less than 256. Not sure what would be best though.

Well it depends a lot on how large your database is. If your whole database
fits in RAM and you use datatypes like CHAR(n) only for storing data which is
exactly b characters long then there's really no benefit to trying to compress
smaller data.

If on the other hand your database is heavily I/O-bound and you're using
CHAR(n) or storing other highly repetitive short strings then compressing data
will save I/O bandwidth at the expense of cpu cycles.

> I do have a database that has both user-entered information as well as
> things like email addresses, so I could do some testing on that if
> people want.

You would have to recompile with the value at line 214 of
src/backend/utils/adt/pg_lzcompress.c set to a lower value.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


pgsql-patches by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: allow CSV quote in NULL
Next
From: Andrew Dunstan
Date:
Subject: use binary mode on syslog pipe on windows to avoid upsetting chunking protocol