Home > mailing lists

Thread: Are there plans to add data compression feature to postgresql?

Are there plans to add data compression feature to postgresql?

From

小波顾

Date:

26 October 2008, 15:54:35

Explore the seven wonders of the world Learn more!

Re: Are there plans to add data compression feature to postgresql?

From

"Scott Marlowe"

Date:

26 October 2008, 16:37:06

On Sun, Oct 26, 2008 at 9:54 AM, 小波 顾 <guxiaobo1982@hotmail.com> wrote:
> Are there plans to add data compression feature to postgresql?

There already is data compression in postgresql.

Re: Are there plans to add data compression feature to postgresql?

From

Martin Gainty

Date:

26 October 2008, 22:57:55

Scott-

Straight from Postgres doc
The zlib compression library will be used by default. If you don't want to use it then you must specify the --without-zlib option for configure. Using this option disables support for compressed archives in pg_dump and pg_restore.
Martin
______________________________________________
Disclaimer and confidentiality note
Everything in this e-mail and any attachments relates to the official business of Sender. This transmission is of a confidential nature and Sender does not endorse distribution to any party other than intended recipient. Sender does not necessarily endorse content contained within this transmission.

> Date: Sun, 26 Oct 2008 10:37:01 -0600
> From: scott.marlowe@gmail.com
> To: guxiaobo1982@hotmail.com
> Subject: Re: [GENERAL] Are there plans to add data compression feature to postgresql?
> CC: pgsql-general@postgresql.org
>
> On Sun, Oct 26, 2008 at 9:54 AM, 小波顾 <guxiaobo1982@hotmail.com> wrote:
> > Are there plans to add data compression feature to postgresql?
>
> There already is data com pression in postgresql.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Stay organized with simple drag and drop from Windows Live Hotmail. Try it

Re: Are there plans to add data compression feature to postgresql?

From

"Scott Marlowe"

Date:

26 October 2008, 23:20:01

2008/10/26 Martin Gainty <mgainty@hotmail.com>:
> Scott-
>
> Straight from Postgres doc
> The zlib compression library will be used by default. If you don't want to
> use it then you must specify the --without-zlib option for configure. Using
> this option disables support for compressed archives in pg_dump and
> pg_restore.

I was thinking more along the lines of the automatic compression of
text types over 4k or so when they are moved out of line and into
toast tables.

The original question was a little vague though wasn't it?

Re: Are there plans to add data compression feature to postgresql?

From

Ron Mayer

Date:

27 October 2008, 07:34:28

You might want to try using a file system (ZFS, NTFS) that
does compression, depending on what you're trying to compress.

Re: Are there plans to add data compression feature to postgresql?

From

Chris.Ellis@shropshire.gov.uk

Date:

27 October 2008, 10:31:04

Note that most data stored in the TOAST table is compressed.

IE a Text type with length greater than around 2K will be stored in the TOAST table. By default data in the TOAST table is compressed, this can be overriden.

However I expect that compression will reduce the performance of certain queries.

http://www.postgresql.org/docs/8.3/interactive/storage-toast.html

Out of interested, in what context did you want compression?

Ron Mayer <rm_pg@cheapcomplexdevices.com>
Sent by: pgsql-general-owner@postgresql.org

27/10/2008 07:34

To	小波顾 <guxiaobo1982@hotmail.com>
cc	"pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Subject	Re: [GENERAL] Are there plans to add data compression feature to postgresql?

You might want to try using a file system (ZFS, NTFS) that does compression, depending on what you're trying to compress. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general

******************************************************************************

If you are not the intended recipient of this email please do not send it on

to others, open any attachments or file the email locally.

Please inform the sender of the error and then delete the original email.

For more information, please refer to http://www.shropshire.gov.uk/privacy.nsf

******************************************************************************

Re: Are there plans to add data compression feature to postgresql?

From

小波顾

Date:

29 October 2008, 15:26:46

Sorry for following up so late, actually I mean compression features like what other commercial RDBMS have, such as DB2 9.5 or SQL Server 2008. In those databases, all data types in all tables can be compressed, following are two features we think very useful:
1. Little integers of types take 8 bytes in the past now only take 4 or 2 bytes if there are not so large.
2. If two values have the same text or pattern, only one is stored, and that one is compressed with traditional compress methods too.

To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Are there plans to add data compression feature to postgresql?
From: Chris.Ellis@shropshire.gov.uk
Date: Mon, 27 Oct 2008 10:19:31 +0000

Note that most data stored in the TOAST table is compressed.

IE a Text type with length greater than around 2K will be stored in the TOAST table. By default data in the TOAST table is compressed, this can be overriden.

However I expect that compression will reduce the performance of certain queries.

http://www.postgresql.org/docs/8.3/interactive/storage-toast.html

Out of interested, in what context did you want compression?

Ron Mayer <rm_pg@cheapcomplexdevices.com>
Sent by: pgsql-general-owner@postgresql.org 27/10/2008 07:34

To	小波顾 <guxiaobo1982@hotmail.com>
cc	"pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Subject	Re: [GENERAL] Are there plans to add data compression feature to postgresql?

If you are not the intended recipient of this email please do not send it on

to others, open any attachments or file the email locally.

Please inform the sender of the error and then delete the original email.

For more information, please refer to http://www.shropshire.gov.uk/privacy.nsf******************************************************************************

Explore the seven wonders of the world Learn more!

Re: Are there plans to add data compression feature to postgresql?

From

"Grzegorz Jaśkiewicz"

Date:

29 October 2008, 15:35:56

2008/10/29 小波顾 <guxiaobo1982@hotmail.com>

1. Little integers of types take 8 bytes in the past now only take 4 or 2 bytes if there are not so large.

So what actually happen if I have a table with few mills of values that fit in 2 bytes, but all of the sudent I am going to add another column with something that requires 8 bytes ? update on all columns ? I am actually even against varchars in my databases, so something like that sounds at least creepy.

--
GJ

Re: Are there plans to add data compression feature to postgresql?

From

小波顾

Date:

29 October 2008, 16:09:24

Data Compression

The new data compression feature in SQL Server 2008 reduces the size of tables, indexes or a subset of their partitions by storing fixed-length data types in variable length storage format and by reducing the redundant data. The space savings achieved depends on the schema and the data distribution. Based on our testing with various data warehouse databases, we have seen a reduction in the size of real user databases up to 87% (a 7 to 1 compression ratio) but more commonly you should expect a reduction in the range of 50-70% (a compression ratio between roughly 2 to 1 and 3 to 1).

SQL Server provides two types of compression as follows:

· ROW compression enables storing fixed length types in variable length storage format. So for example, if you have a column of data type BIGINT which takes 8 bytes of storage in fixed format, when compressed it takes a variable number of bytes—anywhere from 0 bytes to up to 8 bytes. Since column values are stored as variable length, an additional 4‑bit length code is stored for each field within the row. Additionally, zero and NULL values don’t take any storage except for the 4‑bit code.

· PAGE compression is built on top of ROW compression. It minimizes storage of redundant data on the page by storing commonly occurring byte patterns on the page once and then referencing these values for respective columns. The byte pattern recognition is type-independent. Under PAGE compression, SQL Server optimizes space on a page using two techniques.

The first technique is column prefix. In this case, the system looks for a common byte pattern as a prefix for all values of a specific column across rows on the page. This process is repeated for all the columns in the table or index. The column prefix values that are computed are stored as an anchor record on the page and the data or index rows refer to the anchor record for the common prefix, if available, for each column.

The second technique is page level dictionary. This dictionary stores common values across columns and rows and stores them in a dictionary. The columns are then modified to refer to the dictionary entry.

Compression comes with additional CPU cost. This overhead is paid when you query or execute DML operations on compressed data. The relative CPU overhead with ROW is less than for PAGE, but PAGE compression can provide better compression. Since there are many kinds of workloads and data patterns, SQL Server exposes compression granularity at a partition level. You can choose to compress the whole table or index or a subset of partitions. For example, in a DW workload, if CPU is the dominant cost in your workload but you want to save some disk space, you may want to enable PAGE compression on partitions that are not accessed frequently while not compressing the current partition(s) that are accessed and manipulated more frequently. This reduces the total CPU cost, at a small increase in disk space requirements. If I/O cost is dominant for your workload, or you need to reduce disk space costs, compressing all data using PAGE compression may be the best choice. Compression can give many-fold speedups if it causes your working set of frequently touched pages to be cached in the main memory buffer pool, when it does not otherwise fit in memory. Preliminary performance results on one large-scale internal DW query performance benchmark used to test SQL Server 2008 show a 58% disk savings, an average 15% reduction in query runtime, and an average 20% increase in CPU cost. Some queries speeded up by a factor of up to seven. Your results depend on your workload, database, and hardware.

The commands to compress data are exposed as options in CREATE/ALTER DDL statements and support both ONLINE and OFFLINE mode. Additionally, a stored procedure is provided to help you estimate the space savings prior to actual compression.

Backup Compression

Backup compression helps you to save in multiple ways.

By reducing the size of your SQL backups, you save significantly on disk media for your SQL backups. While all compression results depend on the nature of the data being compressed, results of 50% are not uncommon, and greater compression is possible. This enables you to use less storage for keeping your backups online, or to keep more cycles of backups online using the same storage.

Backup compression also saves you time. Traditional SQL backups are almost entirely limited by I/O performance. By reducing the I/O load of the backup process, we actually speed up both backups and restores.

Of course, nothing is entirely free, and this reduction in space and time come at the expense of using CPU cycles. The good news here is that the savings in I/O time offsets the increased use of CPU time, and you can control how much CPU is used by your backups at the expense of the rest of your workload by taking advantage of the Resource Governor.
URL:http://msdn.microsoft.com/en-us/library/cc278097.aspx

Date: Wed, 29 Oct 2008 15:35:44 +0000
From: gryzman@gmail.com
To: guxiaobo1982@hotmail.com
Subject: Re: [GENERAL] Are there plans to add data compression feature to postgresql?
CC: chris.ellis@shropshire.gov.uk; pgsql-general@postgresql.org

2008/10/29 小波顾 <guxiaobo1982@hotmail.com>

1. Little integers of types take 8 bytes in the past now only take 4 or 2 bytes if there are not so large.

--
GJ

Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! Try it!

Re: Are there plans to add data compression feature to postgresql?

From

Tom Lane

Date:

29 October 2008, 16:32:13

=?utf-8?Q?=E5=B0=8F=E6=B3=A2_=E9=A1=BE?= <guxiaobo1982@hotmail.com> writes:
> [ snip a lot of marketing for SQL Server ]

I think the part of this you need to pay attention to is

> Of course, nothing is entirely free, and this reduction in space and
> time come at the expense of using CPU cycles.

We already have the portions of this behavior that seem to me to be
likely to be worthwhile (such as NULL elimination and compression of
large field values).  Shaving a couple bytes from a bigint doesn't
strike me as interesting.

(Note: you could build a user-defined type that involved a one-byte
length indicator followed by however many bytes of the bigint you
needed.  So someone who thought this might be worthwhile could do it
for themselves.  I don't see it being a win, though.)

            regards, tom lane

Re: Are there plans to add data compression feature to postgresql?

From

"Scott Marlowe"

Date:

29 October 2008, 18:22:44

On Wed, Oct 29, 2008 at 10:09 AM, 小波 顾 <guxiaobo1982@hotmail.com> wrote:
>
> Data Compression
>
> The new data compression feature in SQL Server 2008 reduces the size of
> tables, indexes or a subset of their partitions by storing fixed-length data
> types in variable length storage format and by reducing the redundant data.
> The space savings achieved depends on the schema and the data distribution.
> Based on our testing with various data warehouse databases, we have seen a
> reduction in the size of real user databases up to 87% (a 7 to 1 compression
> ratio) but more commonly you should expect a reduction in the range of
> 50-70% (a compression ratio between roughly 2 to 1 and 3 to 1).

I'm sure this makes for a nice brochure or power point presentation,
but in the real world I can't imagine putting that much effort into it
when compressed file systems seem the place to be doing this.

Re: Are there plans to add data compression feature to postgresql?

From

Grant Allen

Date:

29 October 2008, 23:53:58

Tom Lane wrote:
> =?utf-8?Q?=E5=B0=8F=E6=B3=A2_=E9=A1=BE?= <guxiaobo1982@hotmail.com> writes:
>
>> [ snip a lot of marketing for SQL Server ]
>>
>
> I think the part of this you need to pay attention to is
>
>
>> Of course, nothing is entirely free, and this reduction in space and
>> time come at the expense of using CPU cycles.
>>
>
> We already have the portions of this behavior that seem to me to be
> likely to be worthwhile (such as NULL elimination and compression of
> large field values).  Shaving a couple bytes from a bigint doesn't
> strike me as interesting.

Think about it on a fact table for a warehouse.  A few bytes per bigint
multiplied by several billions/trillions of bigints (not an exaggeration
in a DW) and you're talking some significant storage saving on the main
storage hog in a DW.  Not to mention the performance _improvements_ you
can get, even with some CPU overhead for dynamic decompression, if the
planner/optimiser understands how to work with the compression index/map
to perform things like range/partition elimination etc.  Admittedly this
depends heavily on the storage mechanics and optimisation techniques of
the DB, but there is value to be had there ... IBM is seeing typical
storage savings in the 40-60% range, mostly based on boring,
bog-standard int, char and varchar data.

The IDUG (so DB2 users themselves, not IBM's marketing) had a
competition to see what was happening in the real world, take a look if
interested: http://www.idug.org/wps/portal/idug/compressionchallenge

Other big benefits come with XML ... but that is even more dependent on
the starting point.  Oracle and SQL Server will see big benefits in
compression with this, because their XML technology is so
mind-bogglingly broken in the first place.

So there's certainly utility in this kind of feature ... but whether it
rates above some of the other great stuff in the PostgreSQL pipeline is
questionable.

Ciao
Fuzzy
:-)

------------------------------------------------
Dazed and confused about technology for 20 years
http://fuzzydata.wordpress.com/

Re: Are there plans to add data compression feature to postgresql?

From

Ron Mayer

Date:

30 October 2008, 03:30:02

Grant Allen wrote:
> ...warehouse...DB2...IBM is seeing typical
> storage savings in the 40-60% range

Sounds about the same as what compressing file systems claim:

http://opensolaris.org/os/community/zfs/whatis/
  "ZFS provides built-in compression. In addition to
   reducing space usage by 2-3x, compression also reduces
   the amount of I/O by 2-3x. For this reason, enabling
   compression actually makes some workloads go faster.

I do note that Netezza got a lot of PR around their
compression release; claiming it doubled performance.
Wonder if they added that at the file system or higher
in the DB.

Re: Are there plans to add data compression feature to postgresql?

From

Grant Allen

Date:

30 October 2008, 04:50:32

Ron Mayer wrote:
> Grant Allen wrote:
>> ...warehouse...DB2...IBM is seeing typical storage savings in the
>> 40-60% range
>
> Sounds about the same as what compressing file systems claim:
>
> http://opensolaris.org/os/community/zfs/whatis/
>  "ZFS provides built-in compression. In addition to
>   reducing space usage by 2-3x, compression also reduces
>   the amount of I/O by 2-3x. For this reason, enabling
>   compression actually makes some workloads go faster.
>
> I do note that Netezza got a lot of PR around their
> compression release; claiming it doubled performance.
> Wonder if they added that at the file system or higher
> in the DB.
>

I just so happen to have access to a Netezza system :-) I'll see if I
can find out.

One other thing I forgot to mention:  Compression by the DB trumps
filesystem compression in one very important area - shared_buffers! (or
buffer_cache, bufferpool or whatever your favourite DB calls its working
memory for caching data).  Because the data stays compressed in the
block/page when cached by the database in one of its buffers, you get
more bang for you memory buck in many circumstances!  Just another angle
to contemplate :-)

Ciao
Fuzzy
:-)

------------------------------------------------
Dazed and confused about technology for 20 years
http://fuzzydata.wordpress.com/

Re: Are there plans to add data compression feature to postgresql?

From

Steve Atkins

Date:

30 October 2008, 05:26:33

On Oct 29, 2008, at 9:50 PM, Grant Allen wrote:
>
>
> One other thing I forgot to mention:  Compression by the DB trumps
> filesystem compression in one very important area - shared_buffers!
> (or buffer_cache, bufferpool or whatever your favourite DB calls its
> working memory for caching data).  Because the data stays compressed
> in the block/page when cached by the database in one of its buffers,
> you get more bang for you memory buck in many circumstances!  Just
> another angle to contemplate :-)

The additional latency added by decompression is reasonably small
compared with traditional disk access time. It's rather large compared
to memory access time.

Cheers,
   Steve

Re: Are there plans to add data compression feature to postgresql?

From

"Joshua D. Drake"

Date:

30 October 2008, 05:43:36

Steve Atkins wrote:
>
> On Oct 29, 2008, at 9:50 PM, Grant Allen wrote:
>>
>>
>> One other thing I forgot to mention:  Compression by the DB trumps
>> filesystem compression in one very important area - shared_buffers!
>> (or buffer_cache, bufferpool or whatever your favourite DB calls its
>> working memory for caching data).  Because the data stays compressed
>> in the block/page when cached by the database in one of its buffers,
>> you get more bang for you memory buck in many circumstances!  Just
>> another angle to contemplate :-)
>
> The additional latency added by decompression is reasonably small
> compared with traditional disk access time. It's rather large compared
> to memory access time.

The one place where Compression is an immediate benefit is the wire. It
is easy to forget that one of our number one bottlenecks (even at
gigabit) is the amount of data we are pushing over the wire.

Joshua D. Drake

>
> Cheers,
>   Steve
>
>

Re: Are there plans to add data compression feature to postgresql?

From

Steve Atkins

Date:

30 October 2008, 05:52:58

On Oct 29, 2008, at 10:43 PM, Joshua D. Drake wrote:

> Steve Atkins wrote:
>> On Oct 29, 2008, at 9:50 PM, Grant Allen wrote:
>>>
>>>
>>> One other thing I forgot to mention:  Compression by the DB trumps
>>> filesystem compression in one very important area -
>>> shared_buffers! (or buffer_cache, bufferpool or whatever your
>>> favourite DB calls its working memory for caching data).  Because
>>> the data stays compressed in the block/page when cached by the
>>> database in one of its buffers, you get more bang for you memory
>>> buck in many circumstances!  Just another angle to contemplate :-)
>> The additional latency added by decompression is reasonably small
>> compared with traditional disk access time. It's rather large
>> compared to memory access time.
>
> The one place where Compression is an immediate benefit is the wire.
> It is easy to forget that one of our number one bottlenecks (even at
> gigabit) is the amount of data we are pushing over the wire.

Wouldn't "ssl_ciphers=NULL-MD5" or somesuch give zlib compression over
the wire?

Cheers,
   Steve

Re: Are there plans to add data compression feature to postgresql?

From

"Joshua D. Drake"

Date:

30 October 2008, 06:05:16

Steve Atkins wrote:

>> The one place where Compression is an immediate benefit is the wire.
>> It is easy to forget that one of our number one bottlenecks (even at
>> gigabit) is the amount of data we are pushing over the wire.
>
> Wouldn't "ssl_ciphers=NULL-MD5" or somesuch give zlib compression over
> the wire?

I don't think so.

Joshua D. Drake

>
> Cheers,
>   Steve
>
>

Re: Are there plans to add data compression feature to postgresql?

From

"Grzegorz Jaśkiewicz"

Date:

30 October 2008, 11:13:23

it should, every book on encryption says, that if you compress your data before encryption - its better.

Re: Are there plans to add data compression feature to postgresql?

From

Sam Mason

Date:

30 October 2008, 11:46:16

On Thu, Oct 30, 2008 at 03:50:20PM +1100, Grant Allen wrote:
> One other thing I forgot to mention:  Compression by the DB trumps
> filesystem compression in one very important area - shared_buffers! (or
> buffer_cache, bufferpool or whatever your favourite DB calls its working
> memory for caching data).  Because the data stays compressed in the
> block/page when cached by the database in one of its buffers, you get
> more bang for you memory buck in many circumstances!  Just another angle
> to contemplate :-)

The database research project known as MonetDB/X100 has been looking at
this recently; the first paper below gives a bit of an introduction into
the design of the database and the second into the effects of different
compression schemes:

  http://www.cwi.nl/htbin/ins1/publications?request=pdf&key=ZuBoNeHe:DEBULL:05
  http://www.cwi.nl/htbin/ins1/publications?request=pdf&key=ZuHeNeBo:ICDE:06

The important thing seems to be is that you don't want a storage
efficient compression scheme, decent RAID subsystems demand a very
lightweight scheme that can be decompressed at several GB/s (i.e. two or
three cycles per tuple, not 50 to 100 like traditional schemes like zlib
or bzip).  It's very interesting reading (references to "commercial DBMS
`X'" being somewhat comical), but it's a *long* way from being directly
useful to Postgres.

It's interesting to bear in mind some of the things they talk about when
writing new code, the importance of designing cache conscious algorithms
(and then when writing the code) seem to have stuck in my mind the most.
Am I just old fashioned, or is this focus on cache conscious design
quite a new thing and somewhat undervalued in the rest of the software
world?

  Sam

p.s. if you're interested, there are more papers about MonetDB here:

  http://monetdb.cwi.nl/projects/monetdb/Development/Research/Articles/index.html

Re: Are there plans to add data compression feature to postgresql?

From

"Grzegorz Jaśkiewicz"

Date:

30 October 2008, 11:59:20

currently postgresql is slower on RAID, so something tells me that little bit of compression underneeth will make it far more worse, than better. But I guess, Tom will be the man to know more about it.

Re: Are there plans to add data compression feature to postgresql?

From

Andrew Sullivan

Date:

30 October 2008, 14:11:21

On Thu, Oct 30, 2008 at 10:53:27AM +1100, Grant Allen wrote:
> Other big benefits come with XML ... but that is even more dependent on the
> starting point.  Oracle and SQL Server will see big benefits in compression
> with this, because their XML technology is so mind-bogglingly broken in the
> first place.

It seems to me that for this use case, you can already get the
interesting compression advantages in Postgres, and have been getting
them since TOAST was introduced back when the 8k row limit was broken.
It's recently been enhanced, ISTR, so that you can SET STORAGE with
better granularity.

Indeed, it seems to me that in some ways, the big databases are only
catching up with Postgres now on this front.  That alone oughta be
news :)

--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: Are there plans to add data compression feature to postgresql?

From

小波顾

Date:

30 October 2008, 14:29:01

Yes, we are in a data warehouse like environments, where the database server is used to hold very large volumn of read only historical data, CPU, memory, I/O and network are all OK now except storage space, the only goal of compression is to reduce storage consumption.

> Date: Thu, 30 Oct 2008 10:53:27 +1100
> From: gxallen@gmail.com
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Are there plans to add data compression feature to postgresql?
>
> Tom Lane wrote:
> > 小波顾 <guxiaobo1982@hotmail.com> writes:
> >
> >> [ snip a lot of marketing for SQL Server ]
> >>
> >
> > I think the part of this you need to pay attention to is
> >
> >
> >> Of course, nothing is entirely free, and this reduction in space and
> >> time come at the expense of using CPU cycles.
> >>
> >
> > We already have the portions of this behavior that seem to me to be
> > likely to be worthwhile (such as NULL elimination and compression of
> > large field values). Shaving a couple bytes from a bigint doesn't
> > strike me as in teresting.
>
> Think about it on a fact table for a warehouse. A few bytes per bigint
> multiplied by several billions/trillions of bigints (not an exaggeration
> in a DW) and you're talking some significant storage saving on the main
> storage hog in a DW. Not to mention the performance _improvements_ you
> can get, even with some CPU overhead for dynamic decompression, if the
> planner/optimiser understands how to work with the compression index/map
> to perform things like range/partition elimination etc. Admittedly this
> depends heavily on the storage mechanics and optimisation techniques of
> the DB, but there is value to be had there ... IBM is seeing typical
> storage savings in the 40-60% range, mostly based on boring,
> bog-standard int, char and varchar data.
>
> The IDUG (so DB2 users themselves, not IBM's marketing) had a
> competition to see what was happening in the real world, take a look if
> interested: http://www.idug.org/wps/portal/idug/compressionchallenge
>
> Other big benefits come with XML ... but that is even more dependent on
> the starting point. Oracle and SQL Server will see big benefits in
> compression with this, because their XML technology is so
> mind-bogglingly broken in the first place.
>
> So there's certainly utility in this kind of feature ... but whether it
> rates above some of the other great stuff in the PostgreSQL pipeline is
> questionable.
>
> Ciao
> Fuzzy
> :-)
>
> ------------------------------------------------
> Dazed and confused about technology for 20 years
> http://fuzzydata.wordpress.com/
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mai lpref/pgsql-general

Get news, entertainment and everything you care about at Live.com. Check it out!

Re: Are there plans to add data compression feature to postgresql?

From

"Joshua D. Drake"

Date:

30 October 2008, 14:59:00

Grzegorz Jaśkiewicz wrote:
> currently postgresql is slower on RAID, so something tells me that
> little bit of compression underneeth will make it far more worse, than
> better. But I guess, Tom will be the man to know more about it.

What? PostgreSQL is slower on RAID? Care to define that better?

Re: Are there plans to add data compression feature to postgresql?

From

"Grzegorz Jaśkiewicz"

Date:

30 October 2008, 15:11:03

On Thu, Oct 30, 2008 at 2:58 PM, Joshua D. Drake <jd@commandprompt.com> wrote:

Grzegorz Jaśkiewicz wrote:
currently postgresql is slower on RAID, so something tells me that little bit of compression underneeth will make it far more worse, than better. But I guess, Tom will be the man to know more about it.

What? PostgreSQL is slower on RAID? Care to define that better?

up to 8.3 it was massively slower on raid1 (software raid on linux), starting from 8.3 things got lot lot better (we speak 3x speed improvement here), but it still isn't same as on 'plain' drive.

--
GJ

Re: Are there plans to add data compression feature to postgresql?

From

Christophe

Date:

30 October 2008, 15:28:28

On Oct 30, 2008, at 8:10 AM, Grzegorz Jaśkiewicz wrote:
> up to 8.3 it was massively slower on raid1 (software raid on
> linux), starting from 8.3 things got lot lot better (we speak 3x
> speed improvement here), but it still isn't same as on 'plain' drive.

I'm a bit surprised to hear that; what would pg be doing, unique to
it, that would cause it to be slower on a RAID-1 cluster than on a
plain drive?

Re: Are there plans to add data compression feature to postgresql?

From

"Joshua D. Drake"

Date:

30 October 2008, 15:38:18

Grzegorz Jaśkiewicz wrote:

>     What? PostgreSQL is slower on RAID? Care to define that better?
>
> up to 8.3 it was massively slower on raid1 (software raid on linux),
> starting from 8.3 things got lot lot better (we speak 3x speed
> improvement here), but it still isn't same as on 'plain' drive.
>

Slower on RAID1 than what and doing what?

Joshua D. Drake

Re: Are there plans to add data compression feature to postgresql?

From

"Grzegorz Jaśkiewicz"

Date:

30 October 2008, 15:42:57

On Thu, Oct 30, 2008 at 3:27 PM, Christophe <xof@thebuild.com> wrote:

I'm a bit surprised to hear that; what would pg be doing, unique to it, that would cause it to be slower on a RAID-1 cluster than on a plain drive?

yes, it is slower on mirror-raid from single drive.

I can give you all the /proc/* dumps if you want, as far as computer goes, it isn't anything fancy. dual way p4, and sata drives of some sort.

--
GJ

Re: Are there plans to add data compression feature to postgresql?

From

"Joshua D. Drake"

Date:

30 October 2008, 15:54:25

Grzegorz Jaśkiewicz wrote:
>
>
> On Thu, Oct 30, 2008 at 3:27 PM, Christophe <xof@thebuild.com
> <mailto:xof@thebuild.com>> wrote:
>
>     I'm a bit surprised to hear that; what would pg be doing, unique to
>     it, that would cause it to be slower on a RAID-1 cluster than on a
>     plain drive?
>
> yes, it is slower on mirror-raid from single drive.
> I can give you all the /proc/* dumps if you want, as far as computer
> goes, it isn't anything fancy. dual way p4, and sata drives of some sort.

O.k. that doesn't actually surprise me all that much. Software RAID 1 on
SATA Drives for specific workloads would be slower than a single drive.
It should still be faster for reads assuming some level of concurrency
but not likely faster for a single thread. Writes would be expected to
be slower because you are managing across two spindles, identical writes
and SATA is slow for that type of thing.

Joshua D. Drake

>
>
>
>
>
>
>
> --
> GJ

Re: Are there plans to add data compression feature to postgresql?

From

Chris Browne

Date:

30 October 2008, 18:03:44

tgl@sss.pgh.pa.us (Tom Lane) writes:
> We already have the portions of this behavior that seem to me to be
> likely to be worthwhile (such as NULL elimination and compression of
> large field values).  Shaving a couple bytes from a bigint doesn't
> strike me as interesting.

I expect that there would be value in doing this with the inet type,
to distinguish between the smaller IPv4 addresses and the larger IPv6
ones.  We use the inet type (surprise! ;-)) and would benefit from
having it "usually smaller" (notably since IPv6 addresses are a
relative rarity, at this point).

That doesn't contradict you; just points out one of the cases where
there might be some value in *a* form of compression...

(Of course, this may already be done; I'm not remembering just now...)
--
output = ("cbbrowne" "@" "cbbrowne.com")
http://cbbrowne.com/info/nonrdbms.html
Fatal Error: Found [MS-Windows] System -> Repartitioning Disk for
Linux...
-- <cbbrowne@hex.net> Christopher Browne

Re: Are there plans to add data compression feature to postgresql?

From

Gregory Stark

Date:

30 October 2008, 22:01:17

"Scott Marlowe" <scott.marlowe@gmail.com> writes:

> I'm sure this makes for a nice brochure or power point presentation,
> but in the real world I can't imagine putting that much effort into it
> when compressed file systems seem the place to be doing this.

I can't really see trusting Postgres on a filesystem that felt free to
compress portions of it. Would the filesystem still be able to guarantee that
torn pages won't "tear" across adjacent blocks? What about torn pages that
included hint bits being set?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

Re: Are there plans to add data compression feature to postgresql?

From

Tom Lane

Date:

30 October 2008, 22:03:58

Chris Browne <cbbrowne@acm.org> writes:
> tgl@sss.pgh.pa.us (Tom Lane) writes:
>> We already have the portions of this behavior that seem to me to be
>> likely to be worthwhile (such as NULL elimination and compression of
>> large field values).  Shaving a couple bytes from a bigint doesn't
>> strike me as interesting.

> I expect that there would be value in doing this with the inet type,
> to distinguish between the smaller IPv4 addresses and the larger IPv6
> ones.  We use the inet type (surprise! ;-)) and would benefit from
> having it "usually smaller" (notably since IPv6 addresses are a
> relative rarity, at this point).

Uh ... inet already does that.  Now it's true you could save a byte or
two more with a bespoke IPv4-only type, but the useful lifespan of such a
type probably isn't very long.

            regards, tom lane

Re: Are there plans to add data compression feature to postgresql?

From

"Scott Marlowe"

Date:

30 October 2008, 22:35:44

On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <stark@enterprisedb.com> wrote:
> "Scott Marlowe" <scott.marlowe@gmail.com> writes:
>
>> I'm sure this makes for a nice brochure or power point presentation,
>> but in the real world I can't imagine putting that much effort into it
>> when compressed file systems seem the place to be doing this.
>
> I can't really see trusting Postgres on a filesystem that felt free to
> compress portions of it. Would the filesystem still be able to guarantee that
> torn pages won't "tear" across adjacent blocks? What about torn pages that
> included hint bits being set?

I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte
block, the OS compresses it and it's brethren as the go to disk,
uncompresses as they come out, and as long as what you put in is what
you get back it shouldn't really matter.

Re: Are there plans to add data compression feature to postgresql?

From

Tom Lane

Date:

30 October 2008, 22:42:34

"Scott Marlowe" <scott.marlowe@gmail.com> writes:
> On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <stark@enterprisedb.com> wrote:
>> I can't really see trusting Postgres on a filesystem that felt free to
>> compress portions of it. Would the filesystem still be able to guarantee that
>> torn pages won't "tear" across adjacent blocks? What about torn pages that
>> included hint bits being set?

> I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte
> block, the OS compresses it and it's brethren as the go to disk,
> uncompresses as they come out, and as long as what you put in is what
> you get back it shouldn't really matter.

I think Greg's issue is exactly about what guarantees you'll have left
after the data that comes back fails to be the data that went in.

            regards, tom lane

Re: Are there plans to add data compression feature to postgresql?

From

"Scott Marlowe"

Date:

30 October 2008, 22:50:12

On Thu, Oct 30, 2008 at 4:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Scott Marlowe" <scott.marlowe@gmail.com> writes:
>> On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <stark@enterprisedb.com> wrote:
>>> I can't really see trusting Postgres on a filesystem that felt free to
>>> compress portions of it. Would the filesystem still be able to guarantee that
>>> torn pages won't "tear" across adjacent blocks? What about torn pages that
>>> included hint bits being set?
>
>> I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte
>> block, the OS compresses it and it's brethren as the go to disk,
>> uncompresses as they come out, and as long as what you put in is what
>> you get back it shouldn't really matter.
>
> I think Greg's issue is exactly about what guarantees you'll have left
> after the data that comes back fails to be the data that went in.

Sounds kinda hand wavy to me.  If compressed file systems didn't give
you back what you gave them I couldn't imagine them being around for
very long.

Re: Are there plans to add data compression feature to postgresql?

From

Gregory Stark

Date:

31 October 2008, 00:03:48

"Scott Marlowe" <scott.marlowe@gmail.com> writes:

> On Thu, Oct 30, 2008 at 4:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Scott Marlowe" <scott.marlowe@gmail.com> writes:
>>> On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <stark@enterprisedb.com> wrote:
>>>> I can't really see trusting Postgres on a filesystem that felt free to
>>>> compress portions of it. Would the filesystem still be able to guarantee that
>>>> torn pages won't "tear" across adjacent blocks? What about torn pages that
>>>> included hint bits being set?
>>
>>> I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte
>>> block, the OS compresses it and it's brethren as the go to disk,
>>> uncompresses as they come out, and as long as what you put in is what
>>> you get back it shouldn't really matter.
>>
>> I think Greg's issue is exactly about what guarantees you'll have left
>> after the data that comes back fails to be the data that went in.
>
> Sounds kinda hand wavy to me.  If compressed file systems didn't give
> you back what you gave them I couldn't imagine them being around for
> very long.

I don't know, NFS has lasted quite a while.

So you tell me, I write 512 bytes of data to a compressed filesystem, how does
it handle the torn page problem? Is it going to have to WAL log all data
operations again?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

Re: Are there plans to add data compression feature to postgresql?

From

"Scott Marlowe"

Date:

31 October 2008, 00:12:35

On Thu, Oct 30, 2008 at 6:03 PM, Gregory Stark <stark@enterprisedb.com> wrote:
> "Scott Marlowe" <scott.marlowe@gmail.com> writes:
>> Sounds kinda hand wavy to me.  If compressed file systems didn't give
>> you back what you gave them I couldn't imagine them being around for
>> very long.
>
> I don't know, NFS has lasted quite a while.
>
> So you tell me, I write 512 bytes of data to a compressed filesystem, how does
> it handle the torn page problem? Is it going to have to WAL log all data
> operations again?

What is the torn page problem?  Note I'm no big fan of compressed file
systems, but I can't imagine them not working with databases, as I've
seen them work quite reliably under exhange server running a db
oriented storage subsystem.  And I can't imagine them not being
invisible to an application, otherwise you'd just be asking for
trouble.

Re: Are there plans to add data compression feature to postgresql?

From

Alvaro Herrera

Date:

31 October 2008, 01:38:13

Scott Marlowe escribió:

> What is the torn page problem?  Note I'm no big fan of compressed file
> systems, but I can't imagine them not working with databases, as I've
> seen them work quite reliably under exhange server running a db
> oriented storage subsystem.  And I can't imagine them not being
> invisible to an application, otherwise you'd just be asking for
> trouble.

Exchange, isn't that the thing that's very prone to corrupted databases?
I've heard lots of horror stories about that (and also about how you
have to defragment the database once in a while, so what kind of
database it really is?)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Are there plans to add data compression feature to postgresql?

From

"Scott Marlowe"

Date:

31 October 2008, 03:15:46

On Thu, Oct 30, 2008 at 7:37 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Scott Marlowe escribió:
>
>> What is the torn page problem?  Note I'm no big fan of compressed file
>> systems, but I can't imagine them not working with databases, as I've
>> seen them work quite reliably under exhange server running a db
>> oriented storage subsystem.  And I can't imagine them not being
>> invisible to an application, otherwise you'd just be asking for
>> trouble.
>
> Exchange, isn't that the thing that's very prone to corrupted databases?
> I've heard lots of horror stories about that (and also about how you
> have to defragment the database once in a while, so what kind of
> database it really is?)

Sure, bash Microsoft it's easy.   But it doesn't address the point, is
a database safe on top of a compressed file system and if not, why?

Re: Are there plans to add data compression feature to postgresql?

From

Tom Lane

Date:

31 October 2008, 03:43:46

"Scott Marlowe" <scott.marlowe@gmail.com> writes:
> Sure, bash Microsoft it's easy.   But it doesn't address the point, is
> a database safe on top of a compressed file system and if not, why?

It is certainly *less* safe than it is on top of an uncompressed
filesystem.  Any given hardware failure will affect more stored bits
(if the compression is effective) in a less predictable way.

If you assume that hardware failure rates are below your level of
concern, this doesn't matter.  But DBAs are paid to be paranoid.

            regards, tom lane

Re: Are there plans to add data compression feature to postgresql?

From

Gregory Stark

Date:

31 October 2008, 08:50:16

"Scott Marlowe" <scott.marlowe@gmail.com> writes:

> What is the torn page problem?  Note I'm no big fan of compressed file
> systems, but I can't imagine them not working with databases, as I've
> seen them work quite reliably under exhange server running a db
> oriented storage subsystem.  And I can't imagine them not being
> invisible to an application, otherwise you'd just be asking for
> trouble.

Invisible under normal operation sure, but when something fails the
consequences will surely be different and I can't see how you could make a
compressed filesystem safe without a huge performance hit.

The torn page problem is what happens if the system loses power or crashes
when only part of the data written has made it to disk. If you're compressing
or encrypting data then you can't expect the old data portion and the new data
portion to make sense together.

So for example if Postgres sets a hint bit on one tuple in a block, then
writes out that block and the filesystem recompresses it, the entire block
will change. If the system crashes when only 4k of it has reached disk then
when we read in that block it will fail decompression.

And if the block size of the compressed filesystem is larger than the
PostgreSQL block size your problems are even more severe. Even a regular
WAL-logged write to a database block can cause the subsequent database block
to become unreadable if power is lost before the entire set of database blocks
within the filesystem block is written.

The only way I could see this working is if you use a filesystem which logs
data changes like ZFS or ext3 with data=journal. Even then you have to be very
careful to make the filesystem block size that the journal treats as atomic
match the Postgres block size or you'll still be in trouble.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

Re: Are there plans to add data compression feature to postgresql?

From

"Scott Marlowe"

Date:

31 October 2008, 10:47:48

On Thu, Oct 30, 2008 at 9:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Scott Marlowe" <scott.marlowe@gmail.com> writes:
>> Sure, bash Microsoft it's easy.   But it doesn't address the point, is
>> a database safe on top of a compressed file system and if not, why?
>
> It is certainly *less* safe than it is on top of an uncompressed
> filesystem.  Any given hardware failure will affect more stored bits
> (if the compression is effective) in a less predictable way.

Agreed.  But I wasn't talking about hardware failures earlier, and
someone made the point that a compressed file system, without hardware
failure, was likely to eat your data.  And I still don't think that's
true.  Keep in mind a lot of the talk on this so far has been on data
warehouses, which are mostly static and well backed up.  If you could
reduce the size on disk by a factor of 2 or 3, then it's worth taking
a small chance on having to recreate the whole db should something go
wrong.

To put it another way, if you find out you've got corrupted blocks in
your main db, due to bad main memory or CPU or something, are you
going to fix the bad blocks and memory and just keep going? Of course
not, you're going to reinstall from a clean backup to a clean machine.
 You can"t trust the data that the machine was mangling, whether it
was on a compressed volume or not.  So now your argument is one of
degree, which wasn't the discussion point I was trying to make.

> If you assume that hardware failure rates are below your level of
> concern, this doesn't matter.

I assume hardware failure rates are zero, until there is one.  Then I
restore from a known good backup.  compressed file systems have little
to do with that.

Re: Are there plans to add data compression feature to postgresql?

From

"Scott Marlowe"

Date:

31 October 2008, 10:50:38

On Fri, Oct 31, 2008 at 2:49 AM, Gregory Stark <stark@enterprisedb.com> wrote:
>
> "Scott Marlowe" <scott.marlowe@gmail.com> writes:
>
>> What is the torn page problem?  Note I'm no big fan of compressed file
>> systems, but I can't imagine them not working with databases, as I've
>> seen them work quite reliably under exhange server running a db
>> oriented storage subsystem.  And I can't imagine them not being
>> invisible to an application, otherwise you'd just be asking for
>> trouble.
>
> Invisible under normal operation sure, but when something fails the
> consequences will surely be different and I can't see how you could make a
> compressed filesystem safe without a huge performance hit.

While I'm quite willing to concede that a crashed machine can cause
corruption in a compressed file system you wouldn't otherwise see, I'm
also willing to admit there are times, much like the OP was talking
about, where that's an acceptable loss, like Data Warehousing.

No way would I run a db for data that mattered on a compressed file system.

Re: Are there plans to add data compression feature to postgresql?

From

Alvaro Herrera

Date:

31 October 2008, 14:02:35

Scott Marlowe escribió:
> On Thu, Oct 30, 2008 at 7:37 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
> > Scott Marlowe escribió:
> >
> >> What is the torn page problem?  Note I'm no big fan of compressed file
> >> systems, but I can't imagine them not working with databases, as I've
> >> seen them work quite reliably under exhange server running a db
> >> oriented storage subsystem.  And I can't imagine them not being
> >> invisible to an application, otherwise you'd just be asking for
> >> trouble.
> >
> > Exchange, isn't that the thing that's very prone to corrupted databases?
> > I've heard lots of horror stories about that (and also about how you
> > have to defragment the database once in a while, so what kind of
> > database it really is?)
>
> Sure, bash Microsoft it's easy.   But it doesn't address the point, is
> a database safe on top of a compressed file system and if not, why?

I'm not bashing Microsoft.  I'm just saying that your example
application already shows signs that could, perhaps, be explained by the
hypothesis put forward by Greg -- that a compressed filesystem is more
prone to corruption.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Are there plans to add data compression feature to postgresql?

From

Ivan Sergio Borgonovo

Date:

31 October 2008, 14:12:21

On Fri, 31 Oct 2008 08:49:56 +0000
Gregory Stark <stark@enterprisedb.com> wrote:

> "Scott Marlowe" <scott.marlowe@gmail.com> writes:
>
> > What is the torn page problem?  Note I'm no big fan of
> > compressed file systems, but I can't imagine them not working
> > with databases, as I've seen them work quite reliably under
> > exhange server running a db oriented storage subsystem.  And I
> > can't imagine them not being invisible to an application,
> > otherwise you'd just be asking for trouble.

> Invisible under normal operation sure, but when something fails the
> consequences will surely be different and I can't see how you
> could make a compressed filesystem safe without a huge performance
> hit.

Pardon my naiveness but I can't get why compression and data
integrity should be always considered clashing factors.

DB operation are supposed to be atomic if fsync actually does what
it is supposed to do.
So you'd have coherency assured by proper execution of "fsync" going
down to all HW levels before it reach permanent storage.

Now suppose your problem is "avoiding to lose data" not avoiding to
lose coherency.
eg. you're having a very fast stream of data coming from the LHC.
The faster you write to the disk the lower the chances to lose data
in case you incur in some kind of hardware failure during the write.

The fact you're choosing data compression or not depends on which
kind of failure you think is more probable on your hardware and
associated costs.

If you expect gamma rays cooking your SCSI cables or an asteroid
splashing your UPS, compression may be a good choice... it will make
your data reach your permanent storage faster.
If you expect your permanent storage to store data in a not reliable
way (and not report back) a loss of 1 sector may correspond to larger
loss of data.

Another thing I think should be put in the equation of understanding
where is your risk of data loss would be to factor in if your "data
source" has some form of "data persistence".
If it has you could introduce one more layer of "fsyncing", that
means, your data source is not going to wipe the original copy till
your DB report back that everything went fine (no asteroid etc...).
etc... so data compression may be just one more tool to manage your
budget for asteroid shelters.

An annoyance of compression may be that while compression *on
average* may let you put data faster on permanent storage it
increase uncertainty on instant speed of transfer, especially if fs
level and db level compression are not aware of each other and fs
level compression is less aware of the data that is worth to
compress.
If I had to push more for data compression I'd make it data-type
aware and switchable (or auto-switchable based on ANALYZE or stats
results).

Of course if you expect to have faulty "permanent storage", data
compression *may* not be a good bet... but still it depends on
hardware cost, rate of compression, specific kind of failure...
eg. the more you compress the more RAID becomes cheaper...

I understand Tom that DBA are paid to be paranoid and I really
really really appreciate data stored in a format that doesn't require
a long queue of tools to be read. I do really hate dependencies that
translates in hours of *boring* work if something turn bad.

BTW I gave a glance to MonetDB papers posted earlier and it seems
that compression algorithms are strongly read-only search optimised.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: Are there plans to add data compression feature to postgresql?

From

"Thomas Samson"

Date:

31 October 2008, 14:39:14

On Fri, Oct 31, 2008 at 3:01 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote:

Scott Marlowe escribió:
> On Thu, Oct 30, 2008 at 7:37 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
> > Scott Marlowe escribió:
> >
> >> What is the torn page problem? Note I'm no big fan of compressed file
> >> systems, but I can't imagine them not working with databases, as I've
> >> seen them work quite reliably under exhange server running a db
> >> oriented storage subsystem. And I can't imagine them not being
> >> invisible to an application, otherwise you'd just be asking for
> >> trouble.
> >
> > Exchange, isn't that the thing that's very prone to corrupted databases?
> > I've heard lots of horror stories about that (and also about how you
> > have to defragment the database once in a while, so what kind of
> > database it really is?)
>
> Sure, bash Microsoft it's easy. But it doesn't address the point, is
> a database safe on top of a compressed file system and if not, why?

I'm not bashing Microsoft. I'm just saying that your example
application already shows signs that could, perhaps, be explained by the
hypothesis put forward by Greg -- that a compressed filesystem is more
prone to corruption.

Each added layer could lead to corruption/instability.

Yet, some people might be willing to try out some of these layers
to enhance functionnality.

Postgresql already uses an OS, and even an fs! Why would it decide to not
recode it's own raw device handler ... like some serious db ;)

--
Thomas SAMSON
Simplicity does not precede complexity, but follows it.

Re: Are there plans to add data compression feature to postgresql?

From

Gregory Stark

Date:

31 October 2008, 17:09:07

Ivan Sergio Borgonovo <mail@webthatworks.it> writes:

> On Fri, 31 Oct 2008 08:49:56 +0000
> Gregory Stark <stark@enterprisedb.com> wrote:
>
>> Invisible under normal operation sure, but when something fails the
>> consequences will surely be different and I can't see how you
>> could make a compressed filesystem safe without a huge performance
>> hit.
>
> Pardon my naiveness but I can't get why compression and data
> integrity should be always considered clashing factors.

Well the answer was in the next paragraph of my email, the one you've clipped
out here.

> DB operation are supposed to be atomic if fsync actually does what
> it is supposed to do.
> So you'd have coherency assured by proper execution of "fsync" going
> down to all HW levels before it reach permanent storage.

fsync lets the application know when the data has reached disk. Once it
returns you know the data on disk is coherent. What we're talking about is
what to do if the power fails or the system crashes before that happens.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

Re: Are there plans to add data compression feature to postgresql?

From

Chris Browne

Date:

31 October 2008, 20:33:57

scott.marlowe@gmail.com ("Scott Marlowe") writes:
> I assume hardware failure rates are zero, until there is one.  Then I
> restore from a known good backup.  compressed file systems have little
> to do with that.

There's a way that compressed filesystems might *help* with a risk
factor, here...

By reducing the number of disk drives required to hold the data, you
may be reducing the risk of enough of them failing to invalidate the
RAID array.

If a RAID array is involved, where *some* failures may be silently
coped with, I could readily see this *improving* reliability, in most
cases.

This is at least *vaguely* similar to the way that aircraft have moved
from requiring rather large numbers of engines for cross-Atlantic
trips to requiring just 2.

In the distant past, the engines were sufficiently unreliable that you
wanted to have at least 4 in order to be reasonably assured that you
could limp along with at least 2.

With increases in engine reliability, it's now considered preferable
to have *just* 2 engines, as having 4 means doubling the risk of there
being a failure.

Disk drives and jet engines are hardly the same thing, but I suspect
the analogy fits.
--
let name="cbbrowne" and tld="linuxfinances.info" in String.concat "@" [name;tld];;
http://linuxfinances.info/info/lisp.html
Why do they put Braille dots on the keypad of the drive-up ATM?

Re: Are there plans to add data compression feature to postgresql?

From

Ivan Sergio Borgonovo

Date:

31 October 2008, 21:47:15

On Fri, 31 Oct 2008 17:08:52 +0000
Gregory Stark <stark@enterprisedb.com> wrote:

> >> Invisible under normal operation sure, but when something fails
> >> the consequences will surely be different and I can't see how
> >> you could make a compressed filesystem safe without a huge
> >> performance hit.
> >
> > Pardon my naiveness but I can't get why compression and data
> > integrity should be always considered clashing factors.
>
> Well the answer was in the next paragraph of my email, the one
> you've clipped out here.

Sorry I didn't want to hide your argument but just to cut the
length of the email.
Maybe I haven't been clear enough too. I'd consider compression at
the fs level more "risky" than compression at the DB level because
re-compression at fs level may more frequently spawn across more data
structures.

But sorry I still can't get WHY compression as a whole and data
integrity are mutually exclusive.

What I think is going to happen, not necessarily what really happens
is:
- you make a change to the DB
- you ask the underlying fs to write that change to the disk (fsync)
- the fs may decide it has to re-compress more than one block but I'd
think it still have to oblige to the fsync command and *start* to
put them on permanent storage.

Now on *average* the write operations should be faster so the risk
you'll be hit by an asteroid during the time a fsync has been
requested and the time it returns should be shorter.
If you're not fsyncing... you've no warranty that your changes
reached your permanent storage.

Unless compressed fs don't abide to fsync as I'd expect.

Furthermore you're starting from the 3 assumption that may
not be true:
1) partially written compressed data are completely unrecoverable.
2) you don't have concurrent physical writes to permanent storage
3) the data that should have reached the DB would have survived if
they were not sent to the DB

Compression change the granularity of physical writes on a single
write. But if you consider concurrent physical writes and
unrecoverable transmission of data... higher throughput should
reduce data loss.

If I think at changes as trains with wagons the chances a train can
be struck by an asteroid grow as much as the train is long.
When you use compression, small changes to a data structure *may*
result in longer trains leaving the station but on average you
*should* have shorter trains.

> > DB operation are supposed to be atomic if fsync actually does
> > what it is supposed to do.
> > So you'd have coherency assured by proper execution of "fsync"
> > going down to all HW levels before it reach permanent storage.

> fsync lets the application know when the data has reached disk.
> Once it returns you know the data on disk is coherent. What we're
> talking about is what to do if the power fails or the system
> crashes before that happens.

Yeah... actually successful fsync are at a higher integrity level
than just "let as much data as possible reach the disk and made it
so that they can be read later".

But still when you issue an fsync you're asking "put those data on
permanent storage". Until then the fs is free to keep manage them in
cache and modify/compress them there.
The faster they will reach the disk the lower the chances you'll
lose them.

Of course on the assumption that once an asteroid hit a wagon the
whole train is lost that's not ideal... but still the average length
of trains *should* be less and reduce the *average* chances they get
hit.

This *may* still not be the case and it depends on the pattern with
which data change.
If most of the time you're changing 1 bit followed by an fsync and
that requires 2 sectors rewrite that's bad.
The chances that this could happen are higher if compression takes
place at the fs level and not at the DB level since DB should be
more aware of which data can be efficiently compressed and what
could be the trade off in terms of data loss if something goes wrong
in a 2 sector write when without compression you'd just write one.
But I think you could still take advantage of fs compression
without sacrificing integrity choosing which tables should reside on
a compressed fs and which not and in some circumstances fs
compression may get better results than just TOAST.
eg. if there are several columns that are frequently updated
together...

I'd say that compression could be one more tool for managing data
integrity not that it will inevitably have a negative impact on it
(nor a positive one if not correctly managed).

What am I still missing?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: Are there plans to add data compression feature to postgresql?

From

Bruce Momjian

Date:

31 October 2008, 22:06:39

Scott Marlowe wrote:
> On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <stark@enterprisedb.com> wrote:
> > "Scott Marlowe" <scott.marlowe@gmail.com> writes:
> >
> >> I'm sure this makes for a nice brochure or power point presentation,
> >> but in the real world I can't imagine putting that much effort into it
> >> when compressed file systems seem the place to be doing this.
> >
> > I can't really see trusting Postgres on a filesystem that felt free to
> > compress portions of it. Would the filesystem still be able to guarantee that
> > torn pages won't "tear" across adjacent blocks? What about torn pages that
> > included hint bits being set?
>
> I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte
> block, the OS compresses it and it's brethren as the go to disk,
> uncompresses as they come out, and as long as what you put in is what
> you get back it shouldn't really matter.

The question is whether a write of 512 writes to disk blocks that hold
data for other parts of the file;  in such a case we might not have the
full page write copies of those pages to restore, and the compressed
operating system might not be able to guarantee that the other parts of
the file will be restored if only part of the 512 gets on disk.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Are there plans to add data compression feature to postgresql?

From

Ron Mayer

Date:

31 October 2008, 22:33:47

Chris Browne wrote:
> There's a way that compressed filesystems might *help* with a risk
> factor, here...
> By reducing the number of disk drives required to hold the data, you
> may be reducing the risk of enough of them failing to invalidate the
> RAID array.

And one more way.

If neither your database nor filesystem do checksums on
blocks (seems the compressing filesystems mostly do checksums, tho),
a one bit error may go undetected corrupting your data without you
knowing it.

With a filesystem compression, that one bit error is likely to grow
into something big enough to be detected immediately.

Re: Are there plans to add data compression feature to postgresql?

From

Gregory Stark

Date:

01 November 2008, 13:02:32

Ivan Sergio Borgonovo <mail@webthatworks.it> writes:

> But sorry I still can't get WHY compression as a whole and data
> integrity are mutually exclusive.
...
> Now on *average* the write operations should be faster so the risk
> you'll be hit by an asteroid during the time a fsync has been
> requested and the time it returns should be shorter.
> If you're not fsyncing... you've no warranty that your changes
> reached your permanent storage.

Postgres *guarantees* that as long as everything else works correctly it
doesn't lose data. Not that it minimizes the chances of losing data. It is
interesting to discuss hardening against unforeseen circumstances as well but
it's of secondary importance to first of all guaranteeing 100% that there is
no data loss in the expected scenarios.

That means Postgres has to guarantee 100% that if the power is lost mid-write
that it can recover all the data correctly. It does this by fsyncing logs of
some changes and depending on filesystems and drives behaving in certain ways
for others -- namely that a partially completed write will leave each byte
with either the new or old value. Compressed filesystems might break that
assumption making Postgres's guarantee void.

I don't know how these hypothetical compressed filesystems are implemented so
I can't say whether they work or not. When I first wrote the comment I was
picturing a traditional filesystem with each block stored compressed. That
can't guarantee anything like this.

However later in the discussion I mentioned that ZFS with an 8k block size
could actually get this right since it never overwrites existing data, it
always writes to a new location and then changes metadata pointers. I expect
ext3 with data=journal might also be ok. These both have to make performance
sacrifices to get there though.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

Re: Are there plans to add data compression feature to postgresql?

From

Joris Dobbelsteen

Date:

02 November 2008, 18:33:25

Grzegorz Jaśkiewicz wrote, On 30-10-08 12:13:
>
> it should, every book on encryption says, that if you compress your data
> before encryption - its better.

Those books also should mention that you should leave this subject to
experts and have numerous examples on systems that follow the book, but
are still broken. There are other techniques as well that make breaking
it harder, such as the CBC and CTS modes.

Using compression consumes processing power and resources, easing DoS
attacks a lot.

Also I still have to see an compression algorithm that can sustain over
(or even anything close to, for that matter) 100MB/s on todays COTS
hardware. As TOAST provides compression, maybe that data can be
transmitted in compressed manner  (without recompression).

- Joris

Re: Are there plans to add data compression feature to postgresql?

From

Joris Dobbelsteen

Date:

02 November 2008, 18:53:23

Gregory Stark wrote, On 01-11-08 14:02:
> Ivan Sergio Borgonovo <mail@webthatworks.it> writes:
>
>> But sorry I still can't get WHY compression as a whole and data
>> integrity are mutually exclusive.
> ...
[snip performance theory]
>
> Postgres *guarantees* that as long as everything else works correctly it
> doesn't lose data. Not that it minimizes the chances of losing data. It is
> interesting to discuss hardening against unforeseen circumstances as well but
> it's of secondary importance to first of all guaranteeing 100% that there is
> no data loss in the expected scenarios.
>
> That means Postgres has to guarantee 100% that if the power is lost mid-write
> that it can recover all the data correctly. It does this by fsyncing logs of
> some changes and depending on filesystems and drives behaving in certain ways
> for others -- namely that a partially completed write will leave each byte
> with either the new or old value. Compressed filesystems might break that
> assumption making Postgres's guarantee void.

The guarentee YOU want from the underlaying file system is that, in case
of, lets say, a power failure:

* Already existing data is not modified.
* Overwritten data might be corrupted, but its either old or new data.
* If an fsync completes, all written data IS commited to disk

If an (file) system CAN guarantee that, in any way possible, it is safe
to use with PostGreSQL (considering my list is complete, of course).

As a side note: I consider the second assumption a bit too strong, but
there are probably good reasons to do so.

> I don't know how these hypothetical compressed filesystems are implemented so
> I can't say whether they work or not. When I first wrote the comment I was
> picturing a traditional filesystem with each block stored compressed. That
> can't guarantee anything like this.

Instead the discussion reverts to discussing file systems without having
even a glance at their method of operation. No algorithm used by the
file system is written down, but these are being discussed.

> However later in the discussion I mentioned that ZFS with an 8k block size
> could actually get this right since it never overwrites existing data, it
> always writes to a new location and then changes metadata pointers. I expect
> ext3 with data=journal might also be ok. These both have to make performance
> sacrifices to get there though.

Instead, here we are going to specifics we needed a long time ago: ZFS
takes 8kB as their optimal point(*), and never overwrite existing data.
So it should be as safe as any other file system, if he is indeed correct.

Now, does a different block size (of ZFS or PostGreSQL) make any
difference to that? No, it still guarentees the list above.

Performance is a discussion better left alone, since it is really really
dependent on your workload, installation and more specifics. It could be
better and it can be worse.

- Joris

(*) Larger block sizes improve compression ratio. However, you pay a
bigger penalty on writes, as more must be read, processed and written.

Re: Are there plans to add data compression feature to postgresql?

From

Craig Ringer

Date:

02 November 2008, 22:19:05

Joris Dobbelsteen wrote:

> Also I still have to see an compression algorithm that can sustain over
> (or even anything close to, for that matter) 100MB/s on todays COTS
> hardware. As TOAST provides compression, maybe that data can be
> transmitted in compressed manner  (without recompression).

I did a few quick tests on compression speed, as I was curious about
just what sort of performance was available. I was under the impression
that modern hardware could easily top 100 Mbit/s with common compression
algorithms, and wanted to test that.

Based on the results I'd have to agree with the quoted claim. I was
apparently thinking of symmetric encryption throughput rather than
compression throughput.

I get 19 Mbit/s from gzip (deflate) on my 2.4GHz Core 2 Duo laptop. With
lzop (LZO) the machine achieves 45 Mbit/s. In both cases only a single
core is used. With 7zip (LZMA) it only manages 3.1 Mb/s using BOTH cores
together.

All tests were done on a 278MB block of data that was precached in RAM.
Output was to /dev/null except for the LZMA case (due to utility
limitations) in which case output was written to a tmpfs.

Perhaps a multi-core and/or SIMD-ized implementation of LZO (if such a
thing is possible or practical) might manage 100 Mbit/s, or you might
pull it off on an absolutely top of the range desktop (or server) CPU
like the 3.3 GHz Core 2 Duo. Maybe, but probably not without
considerable overclocking, which eliminates the "COTS" aspect rather
soundly.

Given that very few people have dedicated gzip (or other algorithm)
acceleration cards in their systems, it looks like it should be faster
to do transfers uncompressed over a network of any respectable speed.
Not entirely surprising, really, or it'd be used a lot more in common
file server protocols.

Wire protocol compression support in PostgreSQL would probably still be
extremely useful for Internet or WAN based clients, though, and there
are probably more than a few of those around. I know it'd benefit me
massively, as I have users using PostgreSQL over 3G cellular radio
(UMTS/HSDPA) where real-world speeds are around 0.1 - 1.5 Mbit/s, data
transfer limits are low and data transfer charges are high.

Compression would clearly need to be a negotiated connection option, though.

Interestingly, the Via thin clients at work, which have AES 256 (among
other things) implemented in hardware can encrypt to AES 256 at over 300
MB/s. Yes, megabytes, not megabits. Given that the laptop used in the
above testing only gets 95 MB/s, it makes you wonder about whether it'd
be worthwhile for CPU designers to offer a common compression algorithm
like LZO, deflate, or LZMA in hardware for server CPUs.

--
Craig Ringer

Re: Are there plans to add data compression feature to postgresql?

From

Sam Mason

Date:

02 November 2008, 23:44:17

On Mon, Nov 03, 2008 at 08:18:54AM +0900, Craig Ringer wrote:
> Joris Dobbelsteen wrote:
> > Also I still have to see an compression algorithm that can sustain over
> > (or even anything close to, for that matter) 100MB/s on todays COTS
> > hardware. As TOAST provides compression, maybe that data can be
> > transmitted in compressed manner  (without recompression).

> I get 19 Mbit/s from gzip (deflate) on my 2.4GHz Core 2 Duo laptop. With
> lzop (LZO) the machine achieves 45 Mbit/s. In both cases only a single
> core is used. With 7zip (LZMA) it only manages 3.1 Mb/s using BOTH cores
> together.

The algorithms in the MonetDB/X100 paper I posted upstream[1] appears
to be designed more for this use.  Their PFOR algorithm gets between
~0.4GB/s and ~1.7GB/s in compression and ~0.9GBs and 3GB/s in
decompression.  Your lzop numbers look *very* low; the paper suggests
compression going up to ~0.3GB/s on a 2GHz Opteron.  In fact, in an old
page for lzop[2] they were getting 5MB/s on a Pentium 133 so I don't
think I'm understanding what your numbers are.

I'll see if I can write some code that implements their algorithms and
send another mail.  If PFOR really is this fast then it may be good for
TOAST compression, though judging by the comments in pg_lzcompress.c it
may not be worth it as the time spent on compression gets lost in the
noise.

  Sam

  [1] http://old-www.cwi.nl/themes/ins1/publications/docs/ZuHeNeBo:ICDE:06.pdf
  [2] http://www.oberhumer.com/opensource/lzo/#speed

Re: Are there plans to add data compression feature to postgresql?

From

Craig Ringer

Date:

03 November 2008, 00:01:37

Sam Mason wrote:
> On Mon, Nov 03, 2008 at 08:18:54AM +0900, Craig Ringer wrote:
>> Joris Dobbelsteen wrote:
>>> Also I still have to see an compression algorithm that can sustain over
>>> (or even anything close to, for that matter) 100MB/s on todays COTS
>>> hardware. As TOAST provides compression, maybe that data can be
>>> transmitted in compressed manner  (without recompression).
>
>> I get 19 Mbit/s from gzip (deflate) on my 2.4GHz Core 2 Duo laptop. With
>> lzop (LZO) the machine achieves 45 Mbit/s. In both cases only a single
>> core is used. With 7zip (LZMA) it only manages 3.1 Mb/s using BOTH cores
>> together.
>
> Your lzop numbers look *very* low; the paper suggests
> compression going up to ~0.3GB/s on a 2GHz Opteron.

Er ... ENOCOFFEE? . s/Mb(it)?/MB/g . And I'm normally *so* careful about
Mb/MB etc; this was just a complete thinko at some level. My apologies,
and thanks for catching that stupid error.

The paragraph should've read:

I get 19 MB/s (152 Mb/s) from gzip (deflate) on my 2.4GHz Core 2 Duo
laptop. With lzop (LZO) the machine achieves 45 MB/s (360 Mb/s). In both
cases only a single core is used. With 7zip (LZMA) it only manages 3.1
MB/s (24.8 Mb/s) using BOTH cores together.

So - it's potentially even worth compressing the wire protocol for use
on a 100 megabit LAN if a lightweight scheme like LZO can be used.

--
Craig Ringer

Re: Are there plans to add data compression feature to postgresql?

From

Tom Lane

Date:

03 November 2008, 00:34:12

Craig Ringer <craig@postnewspapers.com.au> writes:
> I get 19 Mbit/s from gzip (deflate) on my 2.4GHz Core 2 Duo laptop. With
> lzop (LZO) the machine achieves 45 Mbit/s. In both cases only a single
> core is used. With 7zip (LZMA) it only manages 3.1 Mb/s using BOTH cores
> together.

It'd be interesting to know where pg_lzcompress fits in.

> Wire protocol compression support in PostgreSQL would probably still be
> extremely useful for Internet or WAN based clients, though,

Use an ssh tunnel ... get compression *and* encryption, which you surely
should want on a WAN link.

            regards, tom lane

Re: Are there plans to add data compression feature to postgresql?

From

Craig Ringer

Date:

03 November 2008, 01:01:46

Tom Lane wrote:

>> Wire protocol compression support in PostgreSQL would probably still be
>> extremely useful for Internet or WAN based clients, though,
>
> Use an ssh tunnel ... get compression *and* encryption, which you surely
> should want on a WAN link.

An ssh tunnel, while very useful, is only suitable for more capable
users and is far from transparent. It requires an additional setup step
before connection to the database that's going to cause support problems
and confuse users. It's also somewhat painful on Windows machines.
Additionally, use of an SSH tunnel makes recovery after a connection is
broken much, MUCH more difficult for an application to handle
transparently automatically.

As you know, PostgreSQL supports SSL/TLS for encryption of wire
communications, and you can use client certificates as an additional
layer of authentication much as you can use an ssh key. It's clean and
to the end user it's basically transparent. All the major clients, like
the ODBC and JDBC drivers, already support it. Adding optional
compression within that would be wonderful - and since the client and
server are already designed to communicate through filters (for
encryption) it shouldn't be that hard to stack another filter layer on top.

It's something I'm going to have to look at myself, actually, though I
have some work on the qemu LSI SCSI driver that I *really* have to
finish first.

--
Craig Ringer

Re: Are there plans to add data compression feature to postgresql?

From

Sam Mason

Date:

03 November 2008, 01:19:19

On Mon, Nov 03, 2008 at 10:01:31AM +0900, Craig Ringer wrote:
> Sam Mason wrote:
> >Your lzop numbers look *very* low; the paper suggests
> >compression going up to ~0.3GB/s on a 2GHz Opteron.
>
> Er ... ENOCOFFEE? . s/Mb(it)?/MB/g . And I'm normally *so* careful about
> Mb/MB etc; this was just a complete thinko at some level. My apologies,
> and thanks for catching that stupid error.

Nice to know we're all human here :)

> The paragraph should've read:
>
> I get 19 MB/s (152 Mb/s) from gzip (deflate) on my 2.4GHz Core 2 Duo
> laptop. With lzop (LZO) the machine achieves 45 MB/s (360 Mb/s). In both
> cases only a single core is used. With 7zip (LZMA) it only manages 3.1
> MB/s (24.8 Mb/s) using BOTH cores together.

Hum, I've just had a look and found that Debian has a version of a lzop
compression program.  I uncompressed a copy of the Postgres source for
a test and I'm getting around 120MBs when compressing on a 2.1GHz Core2
processor (72MB in 0.60 seconds, fast mode).  If I save the output
and recompress it I get about 40MB/s (22MB in 0.67 seconds), so the
compression rate seems to be very dependent on the type of data.  As
a test, I've just written some code that writes out (what I guess the
"LINENUMBER" test is in the X100 paper) a file consisting of small
integers (less than 2 decimal digits, i.e. lots of zero bytes) and
now get up to 0.4GB/s (200MB in 0.5 seconds), which nicely matches my
eyeballing of the figure in the paper.

It does point out that compression rates seem to be very data dependent!

> So - it's potentially even worth compressing the wire protocol for use
> on a 100 megabit LAN if a lightweight scheme like LZO can be used.

The problem is that then you're then dedicating most of a processor to
doing the compression, one that would otherwise be engaged in doing
useful work for other clients.

BTW, the X100 work was about trying to become less IO bound; they had
a 350MB/s RAID array and were highly IO bound.  If I'm reading the
paper right, with their PFOR algorithm they got the final query (i.e.
decompressing and doing useful work) running at 500MB/s.

  Sam

Re: Are there plans to add data compression feature to postgresql?

From

"Scott Marlowe"

Date:

03 November 2008, 01:40:03

On Sun, Nov 2, 2008 at 7:19 PM, Sam Mason <sam@samason.me.uk> wrote:
> On Mon, Nov 03, 2008 at 10:01:31AM +0900, Craig Ringer wrote:
>> So - it's potentially even worth compressing the wire protocol for use
>> on a 100 megabit LAN if a lightweight scheme like LZO can be used.
>
> The problem is that then you're then dedicating most of a processor to
> doing the compression, one that would otherwise be engaged in doing
> useful work for other clients.

Considering the low cost of gigabit networks nowadays (even my old T42
thinkpad that's 4 years old has gigabit in it) it would be cheaper to
buy gig nics and cheap switches than to worry about the network
component most the time.  On Wans it's another story of course.

Re: Are there plans to add data compression feature to postgresql?

From

Peter Eisentraut

Date:

03 November 2008, 08:50:21

Craig Ringer wrote:
> So - it's potentially even worth compressing the wire protocol for use
> on a 100 megabit LAN if a lightweight scheme like LZO can be used.

LZO is under the GPL though.

Re: Are there plans to add data compression feature to postgresql?

From

Craig Ringer

Date:

03 November 2008, 13:44:28

Peter Eisentraut wrote:
> Craig Ringer wrote:
>> So - it's potentially even worth compressing the wire protocol for use
>> on a 100 megabit LAN if a lightweight scheme like LZO can be used.
>
> LZO is under the GPL though.

Good point. I'm so used to libraries being under more appropriate
licenses like the LGPL or BSD license that I completely forgot to check.

It doesn't matter that much, anyway, in that deflate would also do the
job quite well for any sort of site-to-site or user-to-site WAN link.

--
Craig Ringer

Re: Are there plans to add data compression feature to postgresql?

From

Scott Ribe

Date:

03 November 2008, 22:26:57

> It doesn't matter that much, anyway, in that deflate would also do the
> job quite well for any sort of site-to-site or user-to-site WAN link.

I used to use that, then switched to bzip. Thing is, if your client is
really just issuing SQL, how much does it matter? Compression can't help
with latency. Which is why I went with 3 tiers, so that all communication
with Postgres occurs on the server, and all communication between server &
client is binary, compressed, and a single request/response per user request
regardless of how many tables the data is pulled from.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice

Re: Are there plans to add data compression feature to postgresql?

From

Craig Ringer

Date:

04 November 2008, 01:31:57

Scott Ribe wrote:
>> It doesn't matter that much, anyway, in that deflate would also do the
>> job quite well for any sort of site-to-site or user-to-site WAN link.
>
> I used to use that, then switched to bzip. Thing is, if your client is
> really just issuing SQL, how much does it matter?

It depends a lot on what your requests are. If you have queries that
must return significant chunks of data to the client then compression
will help with total request time on a slow link, in that there's less
data to transfer so the last byte arrives sooner. Of course it's
generally preferable to avoid transferring hundreds of KB of data to the
client in the first place, but it's not always practical.

Additionally, not all connection types have effectively unlimited data
transfers. Many mobile networks, for example, tend to have limits on
monthly data transfers or charge per MB/KB transferred.

Wire compression would be nice for performance on slower networks, but
it's mostly appealing for reducing the impact on other users on a WAN,
reducing data transfer costs, reducing required WAN capacity, etc.

It's appealing because it looks like it should be possible to make it
quite simple to enable or disable, so it'd be a simple ODBC/JDBC
connection option.

> Compression can't help
> with latency.

Not with network round trip latency, no.

--
Craig Ringer

Re: Are there plans to add data compression feature to postgresql?

From

Ivan Voras

Date:

05 November 2008, 22:28:06

Peter Eisentraut wrote:
> Craig Ringer wrote:
>> So - it's potentially even worth compressing the wire protocol for use
>> on a 100 megabit LAN if a lightweight scheme like LZO can be used.
>
> LZO is under the GPL though.

But liblzf is BSD-style.

http://www.goof.com/pcg/marc/liblzf.html

Attachment

signature.asc

Re: Are there plans to add data compression feature to postgresql?

From

"Joshua D. Drake"

Date:

05 November 2008, 22:34:27

On Thu, 2008-11-06 at 00:27 +0100, Ivan Voras wrote:
> Peter Eisentraut wrote:
> > Craig Ringer wrote:
> >> So - it's potentially even worth compressing the wire protocol for use
> >> on a 100 megabit LAN if a lightweight scheme like LZO can be used.

Yes compressing the wire protocol is a benefit. You can troll the
archives for when this has come up in the past. CMD at one time had a
hacked up version that proved compression was a benefit (even at 100Mb).
Alas it was ugly, :P... If it was done right, it would be a great
benefit to folks out there.

Joshua D. Drake

> >
> > LZO is under the GPL though.
>
> But liblzf is BSD-style.
>
> http://www.goof.com/pcg/marc/liblzf.html
>
--

Re: Are there plans to add data compression feature to postgresql?

From

"Dann Corbit"

Date:

05 November 2008, 23:21:10

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Ivan Voras
> Sent: Wednesday, November 05, 2008 3:28 PM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Are there plans to add data compression feature
> to postgresql?
> 
> Peter Eisentraut wrote:
> > Craig Ringer wrote:
> >> So - it's potentially even worth compressing the wire protocol for
> use
> >> on a 100 megabit LAN if a lightweight scheme like LZO can be used.
> >
> > LZO is under the GPL though.
> 
> But liblzf is BSD-style.
> 
> http://www.goof.com/pcg/marc/liblzf.html

Here is a 64 bit windows port of that library:
http://cap.connx.com/chess-engines/new-approach/liblzf34.zip

It has fantastic compression/decompression speed (100 MB well under a second to either compress or decompress) and I
seeabout 50% compression.