Thread: Compression in PG
Hi,
I have several long text fields in my DB that I would love to compress (descriptions, URLs etc). Can you tell me what options exists in PG (+pointers please), typical effect on space and run time?
Thanks,
-- Shaul
I have several long text fields in my DB that I would love to compress (descriptions, URLs etc). Can you tell me what options exists in PG (+pointers please), typical effect on space and run time?
Thanks,
-- Shaul
2009/11/1 Shaul Dar <shauldar@gmail.com>: > Hi, > > I have several long text fields in my DB that I would love to compress > (descriptions, URLs etc). Can you tell me what options exists in PG > (+pointers please), typical effect on space and run time? Hello You can do nothing. PostgreSQL compresses data automatically http://www.postgresql.org/docs/8.4/interactive/storage-toast.html Regards Pavel Stehule > > Thanks, > > -- Shaul >
Shaul Dar wrote: > Hi, > > I have several long text fields in my DB that I would love to compress > (descriptions, URLs etc). Can you tell me what options exists in PG > (+pointers please), typical effect on space and run time? variable length text fields .. e.g TEXT will automatically be stored in a TOAST table and compressed. Search the manual for toast. -- Jesper
On Sun, Nov 1, 2009 at 7:56 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2009/11/1 Shaul Dar <shauldar@gmail.com>: >> Hi, >> >> I have several long text fields in my DB that I would love to compress >> (descriptions, URLs etc). Can you tell me what options exists in PG >> (+pointers please), typical effect on space and run time? > > Hello > > You can do nothing. PostgreSQL compresses data automatically > > http://www.postgresql.org/docs/8.4/interactive/storage-toast.html Well you can pick a strategy. But yeah, there's not much for the average user to do really.
Guys,
I am aware of the TOAST mechanism (actually complained about it in this forum...). The text fields I have are below the limits that trigger this mechanism, and also I may want to compress specific fields, not all of them. And also I have performance concerns as TOAST splits tables and can potentially cause a performance hit on queries.
My question is if PG can compress smaller text fields e.g 0.5-1KB, or must I do this outside PG?
-- Shaul
I am aware of the TOAST mechanism (actually complained about it in this forum...). The text fields I have are below the limits that trigger this mechanism, and also I may want to compress specific fields, not all of them. And also I have performance concerns as TOAST splits tables and can potentially cause a performance hit on queries.
My question is if PG can compress smaller text fields e.g 0.5-1KB, or must I do this outside PG?
-- Shaul
On Sun, Nov 1, 2009 at 5:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Sun, Nov 1, 2009 at 7:56 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:Well you can pick a strategy. But yeah, there's not much for the
> 2009/11/1 Shaul Dar <shauldar@gmail.com>:
>> Hi,
>>
>> I have several long text fields in my DB that I would love to compress
>> (descriptions, URLs etc). Can you tell me what options exists in PG
>> (+pointers please), typical effect on space and run time?
>
> Hello
>
> You can do nothing. PostgreSQL compresses data automatically
>
> http://www.postgresql.org/docs/8.4/interactive/storage-toast.html
average user to do really.
On Sun, 2009-11-01 at 18:53 +0200, Shaul Dar wrote: > I am aware of the TOAST mechanism (actually complained about it in > this forum...). The text fields I have are below the limits that > trigger this mechanism, Have you proved somehow that compressing tiny values has any value? > and also I may want to compress *specific* fields, not all of them. You can do that. “ALTER TABLE table ALTER COLUMN comment SET STORAGE mechanism;" For example: ALTER TABLE job_history_info ALTER COLUMN comment SET STORAGE EXTERNAL; Where mechanism is - <quote source="WMOGAG" url="http://docs.opengroupware.org/Members/whitemice/wmogag/file_view"> * Extended – With the extended TOAST strategy the long value, once it exceeds the TOASTing threshold will be compressed. If the compression reduced the length to below the TOAST threshold the value will be stored, compressed, in the original table. If compression does not reduce the value to below the TOAST threshold the value will be stored uncompressed in the table's TOAST table. Because the value is stored compressed it most be uncompressed in order to perform value comparisons; for large tables with many compressed values this can result in spikes of processor utilization. On the other hand this storage mechanism conserves disk space and reduces the need to perform seek-and-read operations on the TOAST table. Extended is the default, and usually recommended, TOAST storage mechanism. * External – With the external TOAST strategy a long value is immediately migrated to the TOAST table, compression is disabled. Disabling compressions can increase the performance for substring searches on long text values at the cost of increasing seeks in the TOAST table as well as disk consumption. * Main – Main enables compression and uses any means available to avoid migrating the value to the TOAST table. </quote> As I recall all the above is in the PostgreSQL TOAST documentation; you should go look at that. > And also I have performance concerns as TOAST splits tables and can > potentially cause a performance hit on queries. Then change your TOAST mechanism to "MAIN". But benchmarking [aka: knowing] is always preferable to having "concerns". I'd wager your biggest bottlenecks will be elsewhere. > My question is if PG can compress smaller text fields e.g 0.5-1KB, or > must I do this outside PG? I just think compressing small documents seems pointless. -- OpenGroupware developer: awilliam@whitemice.org <http://whitemiceconsulting.blogspot.com/> OpenGroupare & Cyrus IMAPd documenation @ <http://docs.opengroupware.org/Members/whitemice/wmogag/file_view>
On Sun, Nov 1, 2009 at 9:53 AM, Shaul Dar <shauldar@gmail.com> wrote: > Guys, > > I am aware of the TOAST mechanism (actually complained about it in this > forum...). The text fields I have are below the limits that trigger this > mechanism, and also I may want to compress specific fields, not all of them. > And also I have performance concerns as TOAST splits tables and can > potentially cause a performance hit on queries. Did you even read the link posted?
On Sun, Nov 1, 2009 at 11:53 AM, Shaul Dar <shauldar@gmail.com> wrote: > I am aware of the TOAST mechanism (actually complained about it in this > forum...). The text fields I have are below the limits that trigger this > mechanism, and also I may want to compress specific fields, not all of them. > And also I have performance concerns as TOAST splits tables and can > potentially cause a performance hit on queries. > > My question is if PG can compress smaller text fields e.g 0.5-1KB, or must I > do this outside PG? The manual explains the behavior of the system fairly clearly. You should read it through carefully, since it doesn't seem that you aware of all the ins and outs (for example, compression and external storage can be configured independently of one another, and on a per-column basis). If it doesn't seem like the behavior you want, then it is likely that you are trying to solve a different problem than the one that TOAST is designed to solve. In that case, you should tell us more about what you're trying to do. The only reason I can think of for wanting to compress very small datums is if you have a gajillion of them, they're highly compressible, and you have extra CPU time coming out of your ears. In that case - yeah, you might want to think about pre-compressing them outside of Pg. If you're doing this for some other reason you could probably get some better advice if you explain what it is... ...Robert
At 05:24 02/11/2009, you wrote: >The only reason I can think of for wanting to compress very small >datums is if you have a gajillion of them, they're highly >compressible, and you have extra CPU time coming out of your ears. In >that case - yeah, you might want to think about pre-compressing them >outside of Pg. If you're doing this for some other reason you could >probably get some better advice if you explain what it is... > >...Robert There is another reason. If you compress (lossless) all the small text datums with the same algorithm, you get unique andsmaller representations of the text that can be used as primary unique keys. You can see it like a variable length hashingalgorithm. Depending the compression method (f.ex. static huffman) you can compare 2 texts using only the compressed versions or sortthem faster. This cannot be done with the actual LZ algorithm. -------------------------------- Eduardo Morrás González Dept. I+D+i e-Crime Vigilancia Digital S21sec Labs Tlf: +34 902 222 521 Móvil: +34 555 555 555 www.s21sec.com, blog.s21sec.com Salvo que se indique lo contrario, esta información es CONFIDENCIAL y contiene datos de carácter personal que han de ser tratados conforme a la legislación vigente en materia de protección de datos. Si usted no es destinatario original de este mensaje, le comunicamos que no está autorizado a revisar, reenviar, distribuir, copiar o imprimir la información en él contenida y le rogamos que proceda a borrarlo de sus sistemas. Kontrakoa adierazi ezean, posta elektroniko honen barruan doana ISILPEKO informazioa da eta izaera pertsonaleko datuak dituenez, indarrean dagoen datu pertsonalak babesteko legediaren arabera tratatu beharrekoa. Posta honen hartzaile ez zaren kasuan, jakinarazten dizugu baimenik ez duzula bertan dagoen informazioa aztertu, igorri, banatu, kopiatu edo inprimatzeko. Hortaz, erregutzen dizugu posta hau zure sistemetatik berehala ezabatzea. Antes de imprimir este mensaje valora si verdaderamente es necesario. De esta forma contribuimos a la preservación del Medio Ambiente.