Thread: Compression in PG

Compression in PG

From
Shaul Dar
Date:
Hi,

I have several long text fields in my DB that I would love to compress (descriptions, URLs etc). Can you tell me what options exists in PG (+pointers please), typical effect on space and run time?

Thanks,

-- Shaul

Re: Compression in PG

From
Pavel Stehule
Date:
2009/11/1 Shaul Dar <shauldar@gmail.com>:
> Hi,
>
> I have several long text fields in my DB that I would love to compress
> (descriptions, URLs etc). Can you tell me what options exists in PG
> (+pointers please), typical effect on space and run time?

Hello

You can do nothing. PostgreSQL compresses data automatically

http://www.postgresql.org/docs/8.4/interactive/storage-toast.html

Regards
Pavel Stehule
>
> Thanks,
>
> -- Shaul
>

Re: Compression in PG

From
Jesper Krogh
Date:
Shaul Dar wrote:
> Hi,
>
> I have several long text fields in my DB that I would love to compress
> (descriptions, URLs etc). Can you tell me what options exists in PG
> (+pointers please), typical effect on space and run time?

variable length text fields .. e.g TEXT will automatically be stored in
a TOAST table and compressed. Search the manual for toast.

--
Jesper

Re: Compression in PG

From
Scott Marlowe
Date:
On Sun, Nov 1, 2009 at 7:56 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2009/11/1 Shaul Dar <shauldar@gmail.com>:
>> Hi,
>>
>> I have several long text fields in my DB that I would love to compress
>> (descriptions, URLs etc). Can you tell me what options exists in PG
>> (+pointers please), typical effect on space and run time?
>
> Hello
>
> You can do nothing. PostgreSQL compresses data automatically
>
> http://www.postgresql.org/docs/8.4/interactive/storage-toast.html

Well you can pick a strategy.  But yeah, there's not much for the
average user to do really.

Re: Compression in PG

From
Shaul Dar
Date:
Guys,

I am aware of the TOAST mechanism (actually complained about it in this forum...). The text fields I have are below the limits that trigger this mechanism, and also I may want to compress specific fields, not all of them. And also I have performance concerns as TOAST splits tables and can potentially cause a performance hit on queries.

My question is if PG can compress smaller text fields e.g 0.5-1KB, or must I do this outside PG?

-- Shaul

On Sun, Nov 1, 2009 at 5:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Sun, Nov 1, 2009 at 7:56 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2009/11/1 Shaul Dar <shauldar@gmail.com>:
>> Hi,
>>
>> I have several long text fields in my DB that I would love to compress
>> (descriptions, URLs etc). Can you tell me what options exists in PG
>> (+pointers please), typical effect on space and run time?
>
> Hello
>
> You can do nothing. PostgreSQL compresses data automatically
>
> http://www.postgresql.org/docs/8.4/interactive/storage-toast.html

Well you can pick a strategy.  But yeah, there's not much for the
average user to do really.

Re: Compression in PG

From
Adam Tauno Williams
Date:
On Sun, 2009-11-01 at 18:53 +0200, Shaul Dar wrote:
> I am aware of the TOAST mechanism (actually complained about it in
> this forum...). The text fields I have are below the limits that
> trigger this mechanism,

Have you proved somehow that compressing tiny values has any value?

>  and also I may want to compress *specific* fields, not all of them.

You can do that.

“ALTER TABLE table ALTER COLUMN comment SET STORAGE mechanism;"

For example:

ALTER TABLE job_history_info ALTER COLUMN comment SET STORAGE
EXTERNAL;

Where mechanism is -

<quote source="WMOGAG"
url="http://docs.opengroupware.org/Members/whitemice/wmogag/file_view">
* Extended – With the extended TOAST strategy the long value, once it
exceeds the TOASTing threshold will be compressed. If the compression
reduced the length to below the TOAST threshold the value will be
stored, compressed, in the original table. If compression does not
reduce the value to below the TOAST threshold the value will be stored
uncompressed in the table's TOAST table. Because the value is stored
compressed it most be uncompressed in order to perform value
comparisons; for large tables with many compressed values this can
result in spikes of processor utilization. On the other hand this
storage mechanism conserves disk space and reduces the need to perform
seek-and-read operations on the TOAST table. Extended is the default,
and usually recommended, TOAST storage mechanism.
* External – With the external TOAST strategy a long value is
immediately migrated to the TOAST table, compression is disabled.
Disabling compressions can increase the performance for substring
searches on long text values at the cost of increasing seeks in the
TOAST table as well as disk consumption.
* Main – Main enables compression and uses any means available to avoid
migrating the value to the TOAST table.
</quote>

As I recall all the above is in the PostgreSQL TOAST documentation; you
should go look at that.

>  And also I have performance concerns as TOAST splits tables and can
> potentially cause a performance hit on queries.

Then change your TOAST mechanism to "MAIN".

But benchmarking [aka: knowing] is always preferable to having
"concerns".  I'd wager your biggest bottlenecks will be elsewhere.

> My question is if PG can compress smaller text fields e.g 0.5-1KB, or
> must I do this outside PG?

I just think compressing small documents seems pointless.
--
OpenGroupware developer: awilliam@whitemice.org
<http://whitemiceconsulting.blogspot.com/>
OpenGroupare & Cyrus IMAPd documenation @
<http://docs.opengroupware.org/Members/whitemice/wmogag/file_view>


Re: Compression in PG

From
Scott Marlowe
Date:
On Sun, Nov 1, 2009 at 9:53 AM, Shaul Dar <shauldar@gmail.com> wrote:
> Guys,
>
> I am aware of the TOAST mechanism (actually complained about it in this
> forum...). The text fields I have are below the limits that trigger this
> mechanism, and also I may want to compress specific fields, not all of them.
> And also I have performance concerns as TOAST splits tables and can
> potentially cause a performance hit on queries.

Did you even read the link posted?

Re: Compression in PG

From
Robert Haas
Date:
On Sun, Nov 1, 2009 at 11:53 AM, Shaul Dar <shauldar@gmail.com> wrote:
> I am aware of the TOAST mechanism (actually complained about it in this
> forum...). The text fields I have are below the limits that trigger this
> mechanism, and also I may want to compress specific fields, not all of them.
> And also I have performance concerns as TOAST splits tables and can
> potentially cause a performance hit on queries.
>
> My question is if PG can compress smaller text fields e.g 0.5-1KB, or must I
> do this outside PG?

The manual explains the behavior of the system fairly clearly.  You
should read it through carefully, since it doesn't seem that you aware
of all the ins and outs (for example, compression and external storage
can be configured independently of one another, and on a per-column
basis).  If it doesn't seem like the behavior you want, then it is
likely that you are trying to solve a different problem than the one
that TOAST is designed to solve.  In that case, you should tell us
more about what you're trying to do.

The only reason I can think of for wanting to compress very small
datums is if you have a gajillion of them, they're highly
compressible, and you have extra CPU time coming out of your ears.  In
that case - yeah, you might want to think about pre-compressing them
outside of Pg.  If you're doing this for some other reason you could
probably get some better advice if you explain what it is...

...Robert

Re: Compression in PG

From
Eduardo Morras
Date:
At 05:24 02/11/2009, you wrote:

>The only reason I can think of for wanting to compress very small
>datums is if you have a gajillion of them, they're highly
>compressible, and you have extra CPU time coming out of your ears.  In
>that case - yeah, you might want to think about pre-compressing them
>outside of Pg.  If you're doing this for some other reason you could
>probably get some better advice if you explain what it is...
>
>...Robert

There is another reason. If you compress (lossless) all the small text datums with the same algorithm, you get unique
andsmaller representations of the text that can be used as primary unique keys. You can see it like a variable length
hashingalgorithm. 

Depending the compression method (f.ex. static huffman) you can compare 2 texts using only the compressed versions or
sortthem faster. This cannot be done with the actual LZ algorithm. 


--------------------------------
Eduardo Morrás González
Dept. I+D+i e-Crime Vigilancia Digital
S21sec Labs
Tlf: +34 902 222 521
Móvil: +34 555 555 555
www.s21sec.com, blog.s21sec.com


Salvo que se indique lo contrario, esta información es CONFIDENCIAL y
contiene datos de carácter personal que han de ser tratados conforme a la
legislación vigente en materia de protección de datos. Si usted no es
destinatario original de este mensaje, le comunicamos que no está autorizado
a revisar, reenviar, distribuir, copiar o imprimir la información en él
contenida y le rogamos que proceda a borrarlo de sus sistemas.

Kontrakoa adierazi ezean, posta elektroniko honen barruan doana ISILPEKO
informazioa da eta izaera pertsonaleko datuak dituenez, indarrean dagoen
datu pertsonalak babesteko legediaren arabera tratatu beharrekoa. Posta
honen hartzaile ez zaren kasuan, jakinarazten dizugu baimenik ez duzula
bertan dagoen informazioa aztertu, igorri, banatu, kopiatu edo inprimatzeko.
Hortaz, erregutzen dizugu posta hau zure sistemetatik berehala ezabatzea.

Antes de imprimir este mensaje valora si verdaderamente es necesario. De
esta forma contribuimos a la preservación del Medio Ambiente.