Thread: Compression

Compression

From
Yang Zhang
Date:
Is there any effort to add compression into PG, a la MySQL's
row_format=compressed or HBase's LZO block compression?

Re: Compression

From
Adrian Klaver
Date:
On Thursday, April 14, 2011 4:01:54 pm Yang Zhang wrote:
> Is there any effort to add compression into PG, a la MySQL's
> row_format=compressed or HBase's LZO block compression?

TOAST?
http://www.postgresql.org/docs/9.0/interactive/storage-toast.html
--
Adrian Klaver
adrian.klaver@gmail.com

Re: Compression

From
Craig Ringer
Date:
On 15/04/2011 7:01 AM, Yang Zhang wrote:
> Is there any effort to add compression into PG, a la MySQL's
> row_format=compressed or HBase's LZO block compression?

There's no row compression, but as mentioned by others there is
out-of-line compression of large values using TOAST.

Row compression would be interesting, but I can't imagine it not having
been investigated already.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Compression

From
Adrian Klaver
Date:

On Thursday, April 14, 2011 4:50:44 pm Craig Ringer wrote:

> On 15/04/2011 7:01 AM, Yang Zhang wrote:

> > Is there any effort to add compression into PG, a la MySQL's

> > row_format=compressed or HBase's LZO block compression?

>

> There's no row compression, but as mentioned by others there is

> out-of-line compression of large values using TOAST.

I could be misunderstanding but I thought compression happened in the row as well. From the docs:

"EXTENDED allows both compression and out-of-line storage. This is the default for most TOAST-able data types. Compression will be attempted first, then out-of-

line storage if the row is still too big. "

>

> Row compression would be interesting, but I can't imagine it not having

> been investigated already.

--

Adrian Klaver

adrian.klaver@gmail.com

Re: Compression

From
Yang Zhang
Date:
On Thu, Apr 14, 2011 at 5:07 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote:
> On Thursday, April 14, 2011 4:50:44 pm Craig Ringer wrote:
>
>> On 15/04/2011 7:01 AM, Yang Zhang wrote:
>
>> > Is there any effort to add compression into PG, a la MySQL's
>
>> > row_format=compressed or HBase's LZO block compression?
>
>>
>
>> There's no row compression, but as mentioned by others there is
>
>> out-of-line compression of large values using TOAST.
>
> I could be misunderstanding but I thought compression happened in the row as
> well. From the docs:
>
> "EXTENDED allows both compression and out-of-line storage. This is the
> default for most TOAST-able data types. Compression will be attempted first,
> then out-of-
>
> line storage if the row is still too big. "
>
>>
>
>> Row compression would be interesting, but I can't imagine it not having
>
>> been investigated already.
>
> --
>
> Adrian Klaver
>
> adrian.klaver@gmail.com

Already know about TOAST.  I could've been clearer, but that's not the
same as the block-/page-level compression I was referring to.

--
Yang Zhang
http://yz.mit.edu/

Re: Compression

From
"mark"
Date:

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Yang Zhang
> Sent: Thursday, April 14, 2011 6:51 PM
> To: Adrian Klaver
> Cc: pgsql-general@postgresql.org; Craig Ringer
> Subject: Re: [GENERAL] Compression
>
> On Thu, Apr 14, 2011 at 5:07 PM, Adrian Klaver
> <adrian.klaver@gmail.com> wrote:
> > On Thursday, April 14, 2011 4:50:44 pm Craig Ringer wrote:
> >
> >> On 15/04/2011 7:01 AM, Yang Zhang wrote:
> >
> >> > Is there any effort to add compression into PG, a la MySQL's
> >
> >> > row_format=compressed or HBase's LZO block compression?
> >
> >>
> >
> >> There's no row compression, but as mentioned by others there is
> >
> >> out-of-line compression of large values using TOAST.
> >
> > I could be misunderstanding but I thought compression happened in the
> row as
> > well. From the docs:
> >
> > "EXTENDED allows both compression and out-of-line storage. This is
> the
> > default for most TOAST-able data types. Compression will be attempted
> first,
> > then out-of-
> >
> > line storage if the row is still too big. "
> >
> >>
> >
> >> Row compression would be interesting, but I can't imagine it not
> having
> >
> >> been investigated already.
> >
> > --
> >
> > Adrian Klaver
> >
> > adrian.klaver@gmail.com
>
> Already know about TOAST.  I could've been clearer, but that's not the
> same as the block-/page-level compression I was referring to.

There is a (closed source) PG fork that has row (or column) oriented storage
that can have compression applied to them.... if you are willing to give up
updates and deletes on the table that is.


I haven't seen a lot of people talking about wanting that in the Postgres
core tho.


-M

>
> --
> Yang Zhang
> http://yz.mit.edu/
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Re: Compression

From
Adrian Klaver
Date:

On Thursday, April 14, 2011 5:51:21 pm Yang Zhang wrote:

> >

> > adrian.klaver@gmail.com

>

> Already know about TOAST. I could've been clearer, but that's not the

> same as the block-/page-level compression I was referring to.

I am obviously missing something. The TOAST mechanism is designed to keep tuple data below the default 8KB page size. In fact it kicks in at a lower level than that:

"The TOAST code is triggered only when a row value to be stored in a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code will compress and/or move field values out-of-line until the row value is shorter than TOAST_TUPLE_TARGET bytes (also normally 2 kB) or no more gains can be had. During an UPDATE operation, values of unchanged fields are normally preserved as-is; so an UPDATE of a row with out-of-line values incurs no TOAST costs if none of the out-of-line values change.'

Granted no all data types are TOASTable. Are you looking for something more aggressive than that?

--

Adrian Klaver

adrian.klaver@gmail.com

Re: Compression

From
Yang Zhang
Date:
On Thu, Apr 14, 2011 at 7:42 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote:
> On Thursday, April 14, 2011 5:51:21 pm Yang Zhang wrote:
>
>> >
>
>> > adrian.klaver@gmail.com
>
>>
>
>> Already know about TOAST. I could've been clearer, but that's not the
>
>> same as the block-/page-level compression I was referring to.
>
> I am obviously missing something. The TOAST mechanism is designed to keep
> tuple data below the default 8KB page size. In fact it kicks in at a lower
> level than that:
>
> "The TOAST code is triggered only when a row value to be stored in a table
> is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code
> will compress and/or move field values out-of-line until the row value is
> shorter than TOAST_TUPLE_TARGET bytes (also normally 2 kB) or no more gains
> can be had. During an UPDATE operation, values of unchanged fields are
> normally preserved as-is; so an UPDATE of a row with out-of-line values
> incurs no TOAST costs if none of the out-of-line values change.'
>
> Granted no all data types are TOASTable. Are you looking for something more
> aggressive than that?

Yes.

http://blog.oskarsson.nu/2009/03/hadoop-feat-lzo-save-disk-space-and.html

http://wiki.apache.org/hadoop/UsingLzoCompression

http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals-algorithms.html

>
> --
>
> Adrian Klaver
>
> adrian.klaver@gmail.com



--
Yang Zhang
http://yz.mit.edu/

Re: Compression

From
Adrian Klaver
Date:
On Thursday, April 14, 2011 7:46:34 pm Yang Zhang wrote:
> On Thu, Apr 14, 2011 at 7:42 PM, Adrian Klaver <adrian.klaver@gmail.com>
wrote:

> > Granted no all data types are TOASTable. Are you looking for something
> > more aggressive than that?
>
> Yes.
>
> http://blog.oskarsson.nu/2009/03/hadoop-feat-lzo-save-disk-space-and.html
>
> http://wiki.apache.org/hadoop/UsingLzoCompression
>
> http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals-
> algorithms.html

I can see that as a another use case for SQL/MED in 9.1+.

>
> > --
> >
> > Adrian Klaver
> >
> > adrian.klaver@gmail.com

--
Adrian Klaver
adrian.klaver@gmail.com

Re: Compression

From
Yang Zhang
Date:
On Thu, Apr 14, 2011 at 6:46 PM, mark <dvlhntr@gmail.com> wrote:
>
>
>> -----Original Message-----
>> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
>> owner@postgresql.org] On Behalf Of Yang Zhang
>> Sent: Thursday, April 14, 2011 6:51 PM
>> To: Adrian Klaver
>> Cc: pgsql-general@postgresql.org; Craig Ringer
>> Subject: Re: [GENERAL] Compression
>>
>> On Thu, Apr 14, 2011 at 5:07 PM, Adrian Klaver
>> <adrian.klaver@gmail.com> wrote:
>> > On Thursday, April 14, 2011 4:50:44 pm Craig Ringer wrote:
>> >
>> >> On 15/04/2011 7:01 AM, Yang Zhang wrote:
>> >
>> >> > Is there any effort to add compression into PG, a la MySQL's
>> >
>> >> > row_format=compressed or HBase's LZO block compression?
>> >
>> >>
>> >
>> >> There's no row compression, but as mentioned by others there is
>> >
>> >> out-of-line compression of large values using TOAST.
>> >
>> > I could be misunderstanding but I thought compression happened in the
>> row as
>> > well. From the docs:
>> >
>> > "EXTENDED allows both compression and out-of-line storage. This is
>> the
>> > default for most TOAST-able data types. Compression will be attempted
>> first,
>> > then out-of-
>> >
>> > line storage if the row is still too big. "
>> >
>> >>
>> >
>> >> Row compression would be interesting, but I can't imagine it not
>> having
>> >
>> >> been investigated already.
>> >
>> > --
>> >
>> > Adrian Klaver
>> >
>> > adrian.klaver@gmail.com
>>
>> Already know about TOAST.  I could've been clearer, but that's not the
>> same as the block-/page-level compression I was referring to.
>
> There is a (closed source) PG fork that has row (or column) oriented storage
> that can have compression applied to them.... if you are willing to give up
> updates and deletes on the table that is.

Greenplum and Aster?

We *are* mainly doing analytical (non-updating/deleting) processing.
But it's not a critical pain point - we're mainly interested in FOSS
for now.

>
>
> I haven't seen a lot of people talking about wanting that in the Postgres
> core tho.
>
>
> -M
>
>>
>> --
>> Yang Zhang
>> http://yz.mit.edu/
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>



--
Yang Zhang
http://yz.mit.edu/

Re: Compression

From
Craig Ringer
Date:
On 15/04/2011 8:07 AM, Adrian Klaver wrote:

> "EXTENDED allows both compression and out-of-line storage. This is the
> default for most TOAST-able data types. Compression will be attempted
> first, then out-of-
>
> line storage if the row is still too big. "

Good point. I was unclear; thanks for pointing it out.

What I was trying to say is that there's no whole-row compression, ie
compression of the whole tuple except for minimal headers. A value in a
field may be compressed, but you can't (say) compress a 100-column row
of integers in Pg, because the individual fields don't support compression.


--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Compression

From
Adrian Klaver
Date:
On Thursday, April 14, 2011 9:37:10 pm Craig Ringer wrote:
> On 15/04/2011 8:07 AM, Adrian Klaver wrote:
> > "EXTENDED allows both compression and out-of-line storage. This is the
> > default for most TOAST-able data types. Compression will be attempted
> > first, then out-of-
> >
> > line storage if the row is still too big. "
>
> Good point. I was unclear; thanks for pointing it out.
>
> What I was trying to say is that there's no whole-row compression, ie
> compression of the whole tuple except for minimal headers. A value in a
> field may be compressed, but you can't (say) compress a 100-column row
> of integers in Pg, because the individual fields don't support compression.

Got it now, thanks.
--
Adrian Klaver
adrian.klaver@gmail.com

Re: Compression

From
rtshadow
Date:
Where do I find more information about PG fork you mentioned?



--
View this message in context: http://postgresql.1045698.n5.nabble.com/Compression-tp4304322p5727363.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.