Re: Compressed binary field - Mailing list pgsql-general

From Jeff Janes
Subject Re: Compressed binary field
Date
Msg-id CAMkU=1yENbW+cjA-L8Najitv=E-7Bqa4re1Uuamujcgd+OTpNg@mail.gmail.com
Whole thread Raw
In response to Re: Compressed binary field  (Edson Richter <edsonrichter@hotmail.com>)
Responses Re: Compressed binary field  (Edson Richter <edsonrichter@hotmail.com>)
List pgsql-general
On Tue, Sep 11, 2012 at 9:34 AM, Edson Richter <edsonrichter@hotmail.com> wrote:
>
> No, there is no problem. Just trying to reduce database size forcing these
> fields to compress.
> Actual database size = 8Gb
> Backup size = 1.6Gb (5x smaller)
>
> Seems to me (IMHO) that there is room for improvement in database storage
> (we don't have many indexes, and biggest tables are just the ones with bytea
> fields). That's why I've asked for experts counseling.

There are two things to keep in mind.  One is that each datum is
compressed separately, so that a lot of redundancy that occurs between
fields of different tuples, but not within any given tuple, will not
be available to TOAST, but will be available to the compression of a
dump file.

Another thing is that PG's TOAST compression was designed to be simple
and fast and patent free, and often it is not all that good.  It is
quite good if you have long stretches of repeats of a single
character, or exact densely spaced repeats of a sequence of characters
("123123123123123..."), but when the redundancy is less simple it does
a much worse job than gzip, for example, does.

It is possible but unlikely there is a bug somewhere, but most likely
your documents just aren't very compressible using pglz_compress.

Cheers,

Jeff


pgsql-general by date:

Previous
From: Sébastien Lorion
Date:
Subject: Re: Amazon High I/O instances
Next
From: Craig Ringer
Date:
Subject: Re: Index creation takes more time?