Home > mailing lists

Re: Compressed binary field - Mailing list pgsql-general

From	Jeff Janes
Subject	Re: Compressed binary field
Date	September 17, 2012 06:17:59
Msg-id	CAMkU=1yENbW+cjA-L8Najitv=E-7Bqa4re1Uuamujcgd+OTpNg@mail.gmail.com Whole thread Raw
In response to	Re: Compressed binary field (Edson Richter <edsonrichter@hotmail.com>)
Responses	Re: Compressed binary field (Edson Richter <edsonrichter@hotmail.com>)
List	pgsql-general

Tree view

On Tue, Sep 11, 2012 at 9:34 AM, Edson Richter <edsonrichter@hotmail.com> wrote:
>
> No, there is no problem. Just trying to reduce database size forcing these
> fields to compress.
> Actual database size = 8Gb
> Backup size = 1.6Gb (5x smaller)
>
> Seems to me (IMHO) that there is room for improvement in database storage
> (we don't have many indexes, and biggest tables are just the ones with bytea
> fields). That's why I've asked for experts counseling.

There are two things to keep in mind.  One is that each datum is
compressed separately, so that a lot of redundancy that occurs between
fields of different tuples, but not within any given tuple, will not
be available to TOAST, but will be available to the compression of a
dump file.

Another thing is that PG's TOAST compression was designed to be simple
and fast and patent free, and often it is not all that good.  It is
quite good if you have long stretches of repeats of a single
character, or exact densely spaced repeats of a sequence of characters
("123123123123123..."), but when the redundancy is less simple it does
a much worse job than gzip, for example, does.

It is possible but unlikely there is a bug somewhere, but most likely
your documents just aren't very compressible using pglz_compress.

Cheers,

Jeff

pgsql-general by date:

From: Sébastien Lorion
Date: 16 September 2012, 18:30:22
Subject: Re: Amazon High I/O instances

From: Craig Ringer
Date: 17 September 2012, 06:56:39
Subject: Re: Index creation takes more time?

Re: Compressed binary field - Mailing list pgsql-general

Previous

Next