Thread: Are there plans to add data compression feature to postgresql?
On Sun, Oct 26, 2008 at 9:54 AM, 小波 顾 <guxiaobo1982@hotmail.com> wrote: > Are there plans to add data compression feature to postgresql? There already is data compression in postgresql.
Straight from Postgres doc
The zlib compression library will be used by default. If you don't want to use it then you must specify the --without-zlib option for configure. Using this option disables support for compressed archives in pg_dump and pg_restore.
Martin
______________________________________________
Disclaimer and confidentiality note
Everything in this e-mail and any attachments relates to the official business of Sender. This transmission is of a confidential nature and Sender does not endorse distribution to any party other than intended recipient. Sender does not necessarily endorse content contained within this transmission.
> Date: Sun, 26 Oct 2008 10:37:01 -0600
> From: scott.marlowe@gmail.com
> To: guxiaobo1982@hotmail.com
> Subject: Re: [GENERAL] Are there plans to add data compression feature to postgresql?
> CC: pgsql-general@postgresql.org
>
> On Sun, Oct 26, 2008 at 9:54 AM, 小波 顾 <guxiaobo1982@hotmail.com> wrote:
> > Are there plans to add data compression feature to postgresql?
>
> There already is data com pression in postgresql.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
Stay organized with simple drag and drop from Windows Live Hotmail. Try it
2008/10/26 Martin Gainty <mgainty@hotmail.com>: > Scott- > > Straight from Postgres doc > The zlib compression library will be used by default. If you don't want to > use it then you must specify the --without-zlib option for configure. Using > this option disables support for compressed archives in pg_dump and > pg_restore. I was thinking more along the lines of the automatic compression of text types over 4k or so when they are moved out of line and into toast tables. The original question was a little vague though wasn't it?
You might want to try using a file system (ZFS, NTFS) that does compression, depending on what you're trying to compress.
Re: Are there plans to add data compression feature to postgresql?
Note that most data stored in the TOAST table is compressed.
IE a Text type with length greater than around 2K will be stored in the TOAST table. By default data in the TOAST table is compressed, this can be overriden.
However I expect that compression will reduce the performance of certain queries.
http://www.postgresql.org/docs/8.3/interactive/storage-toast.html
Out of interested, in what context did you want compression?
Ron Mayer <rm_pg@cheapcomplexdevices.com> Sent by: pgsql-general-owner@postgresql.org 27/10/2008 07:34 |
|
You might want to try using a file system (ZFS, NTFS) that
does compression, depending on what you're trying to compress.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
******************************************************************************
If you are not the intended recipient of this email please do not send it on
to others, open any attachments or file the email locally.
Please inform the sender of the error and then delete the original email.
For more information, please refer to http://www.shropshire.gov.uk/privacy.nsf
******************************************************************************
1. Little integers of types take 8 bytes in the past now only take 4 or 2 bytes if there are not so large.
2. If two values have the same text or pattern, only one is stored, and that one is compressed with traditional compress methods too.
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Are there plans to add data compression feature to postgresql?
From: Chris.Ellis@shropshire.gov.uk
Date: Mon, 27 Oct 2008 10:19:31 +0000
Note that most data stored in the TOAST table is compressed.
IE a Text type with length greater than around 2K will be stored in the TOAST table. By default data in the TOAST table is compressed, this can be overriden.
However I expect that compression will reduce the performance of certain queries.
http://www.postgresql.org/docs/8.3/interactive/storage-toast.html
Out of interested, in what context did you want compression?
Ron Mayer <rm_pg@cheapcomplexdevices.com> Sent by: pgsql-general-owner@postgresql.org 27/10/2008 07:34 |
|
You might want to try using a file system (ZFS, NTFS) that
does compression, depending on what you're trying to compress.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
******************************************************************************
If you are not the intended recipient of this email please do not send it on
to others, open any attachments or file the email locally.
Please inform the sender of the error and then delete the original email.
For more information, please refer to http://www.shropshire.gov.uk/privacy.nsf******************************************************************************
Explore the seven wonders of the world Learn more!
1. Little integers of types take 8 bytes in the past now only take 4 or 2 bytes if there are not so large.
GJ
Data Compression
The new data compression feature in SQL Server 2008 reduces the size of tables, indexes or a subset of their partitions by storing fixed-length data types in variable length storage format and by reducing the redundant data. The space savings achieved depends on the schema and the data distribution. Based on our testing with various data warehouse databases, we have seen a reduction in the size of real user databases up to 87% (a 7 to 1 compression ratio) but more commonly you should expect a reduction in the range of 50-70% (a compression ratio between roughly 2 to 1 and 3 to 1).
SQL Server provides two types of compression as follows:
· ROW compression enables storing fixed length types in variable length storage format. So for example, if you have a column of data type BIGINT which takes 8 bytes of storage in fixed format, when compressed it takes a variable number of bytes—anywhere from 0 bytes to up to 8 bytes. Since column values are stored as variable length, an additional 4‑bit length code is stored for each field within the row. Additionally, zero and NULL values don’t take any storage except for the 4‑bit code.
· PAGE compression is built on top of ROW compression. It minimizes storage of redundant data on the page by storing commonly occurring byte patterns on the page once and then referencing these values for respective columns. The byte pattern recognition is type-independent. Under PAGE compression, SQL Server optimizes space on a page using two techniques.
The first technique is column prefix. In this case, the system looks for a common byte pattern as a prefix for all values of a specific column across rows on the page. This process is repeated for all the columns in the table or index. The column prefix values that are computed are stored as an anchor record on the page and the data or index rows refer to the anchor record for the common prefix, if available, for each column.
The second technique is page level dictionary. This dictionary stores common values across columns and rows and stores them in a dictionary. The columns are then modified to refer to the dictionary entry.
Compression comes with additional CPU cost. This overhead is paid when you query or execute DML operations on compressed data. The relative CPU overhead with ROW is less than for PAGE, but PAGE compression can provide better compression. Since there are many kinds of workloads and data patterns, SQL Server exposes compression granularity at a partition level. You can choose to compress the whole table or index or a subset of partitions. For example, in a DW workload, if CPU is the dominant cost in your workload but you want to save some disk space, you may want to enable PAGE compression on partitions that are not accessed frequently while not compressing the current partition(s) that are accessed and manipulated more frequently. This reduces the total CPU cost, at a small increase in disk space requirements. If I/O cost is dominant for your workload, or you need to reduce disk space costs, compressing all data using PAGE compression may be the best choice. Compression can give many-fold speedups if it causes your working set of frequently touched pages to be cached in the main memory buffer pool, when it does not otherwise fit in memory. Preliminary performance results on one large-scale internal DW query performance benchmark used to test SQL Server 2008 show a 58% disk savings, an average 15% reduction in query runtime, and an average 20% increase in CPU cost. Some queries speeded up by a factor of up to seven. Your results depend on your workload, database, and hardware.
The commands to compress data are exposed as options in CREATE/ALTER DDL statements and support both ONLINE and OFFLINE mode. Additionally, a stored procedure is provided to help you estimate the space savings prior to actual compression.
Backup Compression
Backup compression helps you to save in multiple ways.
By reducing the size of your SQL backups, you save significantly on disk media for your SQL backups. While all compression results depend on the nature of the data being compressed, results of 50% are not uncommon, and greater compression is possible. This enables you to use less storage for keeping your backups online, or to keep more cycles of backups online using the same storage.
Backup compression also saves you time. Traditional SQL backups are almost entirely limited by I/O performance. By reducing the I/O load of the backup process, we actually speed up both backups and restores.
Of course, nothing is entirely free, and this reduction in space and time come at the expense of using CPU cycles. The good news here is that the savings in I/O time offsets the increased use of CPU time, and you can control how much CPU is used by your backups at the expense of the rest of your workload by taking advantage of the Resource Governor.
URL:http://msdn.microsoft.com/en-us/library/cc278097.aspx
Date: Wed, 29 Oct 2008 15:35:44 +0000
From: gryzman@gmail.com
To: guxiaobo1982@hotmail.com
Subject: Re: [GENERAL] Are there plans to add data compression feature to postgresql?
CC: chris.ellis@shropshire.gov.uk; pgsql-general@postgresql.org
1. Little integers of types take 8 bytes in the past now only take 4 or 2 bytes if there are not so large.
GJ
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! Try it!
=?utf-8?Q?=E5=B0=8F=E6=B3=A2_=E9=A1=BE?= <guxiaobo1982@hotmail.com> writes: > [ snip a lot of marketing for SQL Server ] I think the part of this you need to pay attention to is > Of course, nothing is entirely free, and this reduction in space and > time come at the expense of using CPU cycles. We already have the portions of this behavior that seem to me to be likely to be worthwhile (such as NULL elimination and compression of large field values). Shaving a couple bytes from a bigint doesn't strike me as interesting. (Note: you could build a user-defined type that involved a one-byte length indicator followed by however many bytes of the bigint you needed. So someone who thought this might be worthwhile could do it for themselves. I don't see it being a win, though.) regards, tom lane
On Wed, Oct 29, 2008 at 10:09 AM, 小波 顾 <guxiaobo1982@hotmail.com> wrote: > > Data Compression > > The new data compression feature in SQL Server 2008 reduces the size of > tables, indexes or a subset of their partitions by storing fixed-length data > types in variable length storage format and by reducing the redundant data. > The space savings achieved depends on the schema and the data distribution. > Based on our testing with various data warehouse databases, we have seen a > reduction in the size of real user databases up to 87% (a 7 to 1 compression > ratio) but more commonly you should expect a reduction in the range of > 50-70% (a compression ratio between roughly 2 to 1 and 3 to 1). I'm sure this makes for a nice brochure or power point presentation, but in the real world I can't imagine putting that much effort into it when compressed file systems seem the place to be doing this.
Tom Lane wrote: > =?utf-8?Q?=E5=B0=8F=E6=B3=A2_=E9=A1=BE?= <guxiaobo1982@hotmail.com> writes: > >> [ snip a lot of marketing for SQL Server ] >> > > I think the part of this you need to pay attention to is > > >> Of course, nothing is entirely free, and this reduction in space and >> time come at the expense of using CPU cycles. >> > > We already have the portions of this behavior that seem to me to be > likely to be worthwhile (such as NULL elimination and compression of > large field values). Shaving a couple bytes from a bigint doesn't > strike me as interesting. Think about it on a fact table for a warehouse. A few bytes per bigint multiplied by several billions/trillions of bigints (not an exaggeration in a DW) and you're talking some significant storage saving on the main storage hog in a DW. Not to mention the performance _improvements_ you can get, even with some CPU overhead for dynamic decompression, if the planner/optimiser understands how to work with the compression index/map to perform things like range/partition elimination etc. Admittedly this depends heavily on the storage mechanics and optimisation techniques of the DB, but there is value to be had there ... IBM is seeing typical storage savings in the 40-60% range, mostly based on boring, bog-standard int, char and varchar data. The IDUG (so DB2 users themselves, not IBM's marketing) had a competition to see what was happening in the real world, take a look if interested: http://www.idug.org/wps/portal/idug/compressionchallenge Other big benefits come with XML ... but that is even more dependent on the starting point. Oracle and SQL Server will see big benefits in compression with this, because their XML technology is so mind-bogglingly broken in the first place. So there's certainly utility in this kind of feature ... but whether it rates above some of the other great stuff in the PostgreSQL pipeline is questionable. Ciao Fuzzy :-) ------------------------------------------------ Dazed and confused about technology for 20 years http://fuzzydata.wordpress.com/
Grant Allen wrote: > ...warehouse...DB2...IBM is seeing typical > storage savings in the 40-60% range Sounds about the same as what compressing file systems claim: http://opensolaris.org/os/community/zfs/whatis/ "ZFS provides built-in compression. In addition to reducing space usage by 2-3x, compression also reduces the amount of I/O by 2-3x. For this reason, enabling compression actually makes some workloads go faster. I do note that Netezza got a lot of PR around their compression release; claiming it doubled performance. Wonder if they added that at the file system or higher in the DB.
Ron Mayer wrote: > Grant Allen wrote: >> ...warehouse...DB2...IBM is seeing typical storage savings in the >> 40-60% range > > Sounds about the same as what compressing file systems claim: > > http://opensolaris.org/os/community/zfs/whatis/ > "ZFS provides built-in compression. In addition to > reducing space usage by 2-3x, compression also reduces > the amount of I/O by 2-3x. For this reason, enabling > compression actually makes some workloads go faster. > > I do note that Netezza got a lot of PR around their > compression release; claiming it doubled performance. > Wonder if they added that at the file system or higher > in the DB. > I just so happen to have access to a Netezza system :-) I'll see if I can find out. One other thing I forgot to mention: Compression by the DB trumps filesystem compression in one very important area - shared_buffers! (or buffer_cache, bufferpool or whatever your favourite DB calls its working memory for caching data). Because the data stays compressed in the block/page when cached by the database in one of its buffers, you get more bang for you memory buck in many circumstances! Just another angle to contemplate :-) Ciao Fuzzy :-) ------------------------------------------------ Dazed and confused about technology for 20 years http://fuzzydata.wordpress.com/
On Oct 29, 2008, at 9:50 PM, Grant Allen wrote: > > > One other thing I forgot to mention: Compression by the DB trumps > filesystem compression in one very important area - shared_buffers! > (or buffer_cache, bufferpool or whatever your favourite DB calls its > working memory for caching data). Because the data stays compressed > in the block/page when cached by the database in one of its buffers, > you get more bang for you memory buck in many circumstances! Just > another angle to contemplate :-) The additional latency added by decompression is reasonably small compared with traditional disk access time. It's rather large compared to memory access time. Cheers, Steve
Steve Atkins wrote: > > On Oct 29, 2008, at 9:50 PM, Grant Allen wrote: >> >> >> One other thing I forgot to mention: Compression by the DB trumps >> filesystem compression in one very important area - shared_buffers! >> (or buffer_cache, bufferpool or whatever your favourite DB calls its >> working memory for caching data). Because the data stays compressed >> in the block/page when cached by the database in one of its buffers, >> you get more bang for you memory buck in many circumstances! Just >> another angle to contemplate :-) > > The additional latency added by decompression is reasonably small > compared with traditional disk access time. It's rather large compared > to memory access time. The one place where Compression is an immediate benefit is the wire. It is easy to forget that one of our number one bottlenecks (even at gigabit) is the amount of data we are pushing over the wire. Joshua D. Drake > > Cheers, > Steve > >
On Oct 29, 2008, at 10:43 PM, Joshua D. Drake wrote: > Steve Atkins wrote: >> On Oct 29, 2008, at 9:50 PM, Grant Allen wrote: >>> >>> >>> One other thing I forgot to mention: Compression by the DB trumps >>> filesystem compression in one very important area - >>> shared_buffers! (or buffer_cache, bufferpool or whatever your >>> favourite DB calls its working memory for caching data). Because >>> the data stays compressed in the block/page when cached by the >>> database in one of its buffers, you get more bang for you memory >>> buck in many circumstances! Just another angle to contemplate :-) >> The additional latency added by decompression is reasonably small >> compared with traditional disk access time. It's rather large >> compared to memory access time. > > The one place where Compression is an immediate benefit is the wire. > It is easy to forget that one of our number one bottlenecks (even at > gigabit) is the amount of data we are pushing over the wire. Wouldn't "ssl_ciphers=NULL-MD5" or somesuch give zlib compression over the wire? Cheers, Steve
Steve Atkins wrote: >> The one place where Compression is an immediate benefit is the wire. >> It is easy to forget that one of our number one bottlenecks (even at >> gigabit) is the amount of data we are pushing over the wire. > > Wouldn't "ssl_ciphers=NULL-MD5" or somesuch give zlib compression over > the wire? I don't think so. Joshua D. Drake > > Cheers, > Steve > >
On Thu, Oct 30, 2008 at 03:50:20PM +1100, Grant Allen wrote: > One other thing I forgot to mention: Compression by the DB trumps > filesystem compression in one very important area - shared_buffers! (or > buffer_cache, bufferpool or whatever your favourite DB calls its working > memory for caching data). Because the data stays compressed in the > block/page when cached by the database in one of its buffers, you get > more bang for you memory buck in many circumstances! Just another angle > to contemplate :-) The database research project known as MonetDB/X100 has been looking at this recently; the first paper below gives a bit of an introduction into the design of the database and the second into the effects of different compression schemes: http://www.cwi.nl/htbin/ins1/publications?request=pdf&key=ZuBoNeHe:DEBULL:05 http://www.cwi.nl/htbin/ins1/publications?request=pdf&key=ZuHeNeBo:ICDE:06 The important thing seems to be is that you don't want a storage efficient compression scheme, decent RAID subsystems demand a very lightweight scheme that can be decompressed at several GB/s (i.e. two or three cycles per tuple, not 50 to 100 like traditional schemes like zlib or bzip). It's very interesting reading (references to "commercial DBMS `X'" being somewhat comical), but it's a *long* way from being directly useful to Postgres. It's interesting to bear in mind some of the things they talk about when writing new code, the importance of designing cache conscious algorithms (and then when writing the code) seem to have stuck in my mind the most. Am I just old fashioned, or is this focus on cache conscious design quite a new thing and somewhat undervalued in the rest of the software world? Sam p.s. if you're interested, there are more papers about MonetDB here: http://monetdb.cwi.nl/projects/monetdb/Development/Research/Articles/index.html
currently postgresql is slower on RAID, so something tells me that little bit of compression underneeth will make it far more worse, than better. But I guess, Tom will be the man to know more about it.
On Thu, Oct 30, 2008 at 10:53:27AM +1100, Grant Allen wrote: > Other big benefits come with XML ... but that is even more dependent on the > starting point. Oracle and SQL Server will see big benefits in compression > with this, because their XML technology is so mind-bogglingly broken in the > first place. It seems to me that for this use case, you can already get the interesting compression advantages in Postgres, and have been getting them since TOAST was introduced back when the 8k row limit was broken. It's recently been enhanced, ISTR, so that you can SET STORAGE with better granularity. Indeed, it seems to me that in some ways, the big databases are only catching up with Postgres now on this front. That alone oughta be news :) -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
> Date: Thu, 30 Oct 2008 10:53:27 +1100
> From: gxallen@gmail.com
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Are there plans to add data compression feature to postgresql?
>
> Tom Lane wrote:
> > 小波 顾 <guxiaobo1982@hotmail.com> writes:
> >
> >> [ snip a lot of marketing for SQL Server ]
> >>
> >
> > I think the part of this you need to pay attention to is
> >
> >
> >> Of course, nothing is entirely free, and this reduction in space and
> >> time come at the expense of using CPU cycles.
> >>
> >
> > We already have the portions of this behavior that seem to me to be
> > likely to be worthwhile (such as NULL elimination and compression of
> > large field values). Shaving a couple bytes from a bigint doesn't
> > strike me as in teresting.
>
> Think about it on a fact table for a warehouse. A few bytes per bigint
> multiplied by several billions/trillions of bigints (not an exaggeration
> in a DW) and you're talking some significant storage saving on the main
> storage hog in a DW. Not to mention the performance _improvements_ you
> can get, even with some CPU overhead for dynamic decompression, if the
> planner/optimiser understands how to work with the compression index/map
> to perform things like range/partition elimination etc. Admittedly this
> depends heavily on the storage mechanics and optimisation techniques of
> the DB, but there is value to be had there ... IBM is seeing typical
> storage savings in the 40-60% range, mostly based on boring,
> bog-standard int, char and varchar data.
>
> The IDUG (so DB2 users themselves, not IBM's marketing) had a
> competition to see what was happening in the real world, take a look if
> interested: http://www.idug.org/wps/portal/idug/compressionchallenge
>
> Other big benefits come with XML ... but that is even more dependent on
> the starting point. Oracle and SQL Server will see big benefits in
> compression with this, because their XML technology is so
> mind-bogglingly broken in the first place.
>
> So there's certainly utility in this kind of feature ... but whether it
> rates above some of the other great stuff in the PostgreSQL pipeline is
> questionable.
>
> Ciao
> Fuzzy
> :-)
>
> ------------------------------------------------
> Dazed and confused about technology for 20 years
> http://fuzzydata.wordpress.com/
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mai lpref/pgsql-general
Get news, entertainment and everything you care about at Live.com. Check it out!
Grzegorz Jaśkiewicz wrote: > currently postgresql is slower on RAID, so something tells me that > little bit of compression underneeth will make it far more worse, than > better. But I guess, Tom will be the man to know more about it. What? PostgreSQL is slower on RAID? Care to define that better?
Grzegorz Jaśkiewicz wrote:What? PostgreSQL is slower on RAID? Care to define that better?currently postgresql is slower on RAID, so something tells me that little bit of compression underneeth will make it far more worse, than better. But I guess, Tom will be the man to know more about it.
--
GJ
On Oct 30, 2008, at 8:10 AM, Grzegorz Jaśkiewicz wrote: > up to 8.3 it was massively slower on raid1 (software raid on > linux), starting from 8.3 things got lot lot better (we speak 3x > speed improvement here), but it still isn't same as on 'plain' drive. I'm a bit surprised to hear that; what would pg be doing, unique to it, that would cause it to be slower on a RAID-1 cluster than on a plain drive?
Grzegorz Jaśkiewicz wrote: > What? PostgreSQL is slower on RAID? Care to define that better? > > up to 8.3 it was massively slower on raid1 (software raid on linux), > starting from 8.3 things got lot lot better (we speak 3x speed > improvement here), but it still isn't same as on 'plain' drive. > Slower on RAID1 than what and doing what? Joshua D. Drake
I'm a bit surprised to hear that; what would pg be doing, unique to it, that would cause it to be slower on a RAID-1 cluster than on a plain drive?
--
GJ
Grzegorz Jaśkiewicz wrote: > > > On Thu, Oct 30, 2008 at 3:27 PM, Christophe <xof@thebuild.com > <mailto:xof@thebuild.com>> wrote: > > I'm a bit surprised to hear that; what would pg be doing, unique to > it, that would cause it to be slower on a RAID-1 cluster than on a > plain drive? > > yes, it is slower on mirror-raid from single drive. > I can give you all the /proc/* dumps if you want, as far as computer > goes, it isn't anything fancy. dual way p4, and sata drives of some sort. O.k. that doesn't actually surprise me all that much. Software RAID 1 on SATA Drives for specific workloads would be slower than a single drive. It should still be faster for reads assuming some level of concurrency but not likely faster for a single thread. Writes would be expected to be slower because you are managing across two spindles, identical writes and SATA is slow for that type of thing. Joshua D. Drake > > > > > > > > -- > GJ
tgl@sss.pgh.pa.us (Tom Lane) writes: > We already have the portions of this behavior that seem to me to be > likely to be worthwhile (such as NULL elimination and compression of > large field values). Shaving a couple bytes from a bigint doesn't > strike me as interesting. I expect that there would be value in doing this with the inet type, to distinguish between the smaller IPv4 addresses and the larger IPv6 ones. We use the inet type (surprise! ;-)) and would benefit from having it "usually smaller" (notably since IPv6 addresses are a relative rarity, at this point). That doesn't contradict you; just points out one of the cases where there might be some value in *a* form of compression... (Of course, this may already be done; I'm not remembering just now...) -- output = ("cbbrowne" "@" "cbbrowne.com") http://cbbrowne.com/info/nonrdbms.html Fatal Error: Found [MS-Windows] System -> Repartitioning Disk for Linux... -- <cbbrowne@hex.net> Christopher Browne
"Scott Marlowe" <scott.marlowe@gmail.com> writes: > I'm sure this makes for a nice brochure or power point presentation, > but in the real world I can't imagine putting that much effort into it > when compressed file systems seem the place to be doing this. I can't really see trusting Postgres on a filesystem that felt free to compress portions of it. Would the filesystem still be able to guarantee that torn pages won't "tear" across adjacent blocks? What about torn pages that included hint bits being set? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
Chris Browne <cbbrowne@acm.org> writes: > tgl@sss.pgh.pa.us (Tom Lane) writes: >> We already have the portions of this behavior that seem to me to be >> likely to be worthwhile (such as NULL elimination and compression of >> large field values). Shaving a couple bytes from a bigint doesn't >> strike me as interesting. > I expect that there would be value in doing this with the inet type, > to distinguish between the smaller IPv4 addresses and the larger IPv6 > ones. We use the inet type (surprise! ;-)) and would benefit from > having it "usually smaller" (notably since IPv6 addresses are a > relative rarity, at this point). Uh ... inet already does that. Now it's true you could save a byte or two more with a bespoke IPv4-only type, but the useful lifespan of such a type probably isn't very long. regards, tom lane
On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <stark@enterprisedb.com> wrote: > "Scott Marlowe" <scott.marlowe@gmail.com> writes: > >> I'm sure this makes for a nice brochure or power point presentation, >> but in the real world I can't imagine putting that much effort into it >> when compressed file systems seem the place to be doing this. > > I can't really see trusting Postgres on a filesystem that felt free to > compress portions of it. Would the filesystem still be able to guarantee that > torn pages won't "tear" across adjacent blocks? What about torn pages that > included hint bits being set? I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte block, the OS compresses it and it's brethren as the go to disk, uncompresses as they come out, and as long as what you put in is what you get back it shouldn't really matter.
"Scott Marlowe" <scott.marlowe@gmail.com> writes: > On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <stark@enterprisedb.com> wrote: >> I can't really see trusting Postgres on a filesystem that felt free to >> compress portions of it. Would the filesystem still be able to guarantee that >> torn pages won't "tear" across adjacent blocks? What about torn pages that >> included hint bits being set? > I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte > block, the OS compresses it and it's brethren as the go to disk, > uncompresses as they come out, and as long as what you put in is what > you get back it shouldn't really matter. I think Greg's issue is exactly about what guarantees you'll have left after the data that comes back fails to be the data that went in. regards, tom lane
On Thu, Oct 30, 2008 at 4:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Scott Marlowe" <scott.marlowe@gmail.com> writes: >> On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <stark@enterprisedb.com> wrote: >>> I can't really see trusting Postgres on a filesystem that felt free to >>> compress portions of it. Would the filesystem still be able to guarantee that >>> torn pages won't "tear" across adjacent blocks? What about torn pages that >>> included hint bits being set? > >> I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte >> block, the OS compresses it and it's brethren as the go to disk, >> uncompresses as they come out, and as long as what you put in is what >> you get back it shouldn't really matter. > > I think Greg's issue is exactly about what guarantees you'll have left > after the data that comes back fails to be the data that went in. Sounds kinda hand wavy to me. If compressed file systems didn't give you back what you gave them I couldn't imagine them being around for very long.
"Scott Marlowe" <scott.marlowe@gmail.com> writes: > On Thu, Oct 30, 2008 at 4:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Scott Marlowe" <scott.marlowe@gmail.com> writes: >>> On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <stark@enterprisedb.com> wrote: >>>> I can't really see trusting Postgres on a filesystem that felt free to >>>> compress portions of it. Would the filesystem still be able to guarantee that >>>> torn pages won't "tear" across adjacent blocks? What about torn pages that >>>> included hint bits being set? >> >>> I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte >>> block, the OS compresses it and it's brethren as the go to disk, >>> uncompresses as they come out, and as long as what you put in is what >>> you get back it shouldn't really matter. >> >> I think Greg's issue is exactly about what guarantees you'll have left >> after the data that comes back fails to be the data that went in. > > Sounds kinda hand wavy to me. If compressed file systems didn't give > you back what you gave them I couldn't imagine them being around for > very long. I don't know, NFS has lasted quite a while. So you tell me, I write 512 bytes of data to a compressed filesystem, how does it handle the torn page problem? Is it going to have to WAL log all data operations again? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
On Thu, Oct 30, 2008 at 6:03 PM, Gregory Stark <stark@enterprisedb.com> wrote: > "Scott Marlowe" <scott.marlowe@gmail.com> writes: >> Sounds kinda hand wavy to me. If compressed file systems didn't give >> you back what you gave them I couldn't imagine them being around for >> very long. > > I don't know, NFS has lasted quite a while. > > So you tell me, I write 512 bytes of data to a compressed filesystem, how does > it handle the torn page problem? Is it going to have to WAL log all data > operations again? What is the torn page problem? Note I'm no big fan of compressed file systems, but I can't imagine them not working with databases, as I've seen them work quite reliably under exhange server running a db oriented storage subsystem. And I can't imagine them not being invisible to an application, otherwise you'd just be asking for trouble.
Scott Marlowe escribió: > What is the torn page problem? Note I'm no big fan of compressed file > systems, but I can't imagine them not working with databases, as I've > seen them work quite reliably under exhange server running a db > oriented storage subsystem. And I can't imagine them not being > invisible to an application, otherwise you'd just be asking for > trouble. Exchange, isn't that the thing that's very prone to corrupted databases? I've heard lots of horror stories about that (and also about how you have to defragment the database once in a while, so what kind of database it really is?) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, Oct 30, 2008 at 7:37 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Scott Marlowe escribió: > >> What is the torn page problem? Note I'm no big fan of compressed file >> systems, but I can't imagine them not working with databases, as I've >> seen them work quite reliably under exhange server running a db >> oriented storage subsystem. And I can't imagine them not being >> invisible to an application, otherwise you'd just be asking for >> trouble. > > Exchange, isn't that the thing that's very prone to corrupted databases? > I've heard lots of horror stories about that (and also about how you > have to defragment the database once in a while, so what kind of > database it really is?) Sure, bash Microsoft it's easy. But it doesn't address the point, is a database safe on top of a compressed file system and if not, why?
"Scott Marlowe" <scott.marlowe@gmail.com> writes: > Sure, bash Microsoft it's easy. But it doesn't address the point, is > a database safe on top of a compressed file system and if not, why? It is certainly *less* safe than it is on top of an uncompressed filesystem. Any given hardware failure will affect more stored bits (if the compression is effective) in a less predictable way. If you assume that hardware failure rates are below your level of concern, this doesn't matter. But DBAs are paid to be paranoid. regards, tom lane
"Scott Marlowe" <scott.marlowe@gmail.com> writes: > What is the torn page problem? Note I'm no big fan of compressed file > systems, but I can't imagine them not working with databases, as I've > seen them work quite reliably under exhange server running a db > oriented storage subsystem. And I can't imagine them not being > invisible to an application, otherwise you'd just be asking for > trouble. Invisible under normal operation sure, but when something fails the consequences will surely be different and I can't see how you could make a compressed filesystem safe without a huge performance hit. The torn page problem is what happens if the system loses power or crashes when only part of the data written has made it to disk. If you're compressing or encrypting data then you can't expect the old data portion and the new data portion to make sense together. So for example if Postgres sets a hint bit on one tuple in a block, then writes out that block and the filesystem recompresses it, the entire block will change. If the system crashes when only 4k of it has reached disk then when we read in that block it will fail decompression. And if the block size of the compressed filesystem is larger than the PostgreSQL block size your problems are even more severe. Even a regular WAL-logged write to a database block can cause the subsequent database block to become unreadable if power is lost before the entire set of database blocks within the filesystem block is written. The only way I could see this working is if you use a filesystem which logs data changes like ZFS or ext3 with data=journal. Even then you have to be very careful to make the filesystem block size that the journal treats as atomic match the Postgres block size or you'll still be in trouble. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
On Thu, Oct 30, 2008 at 9:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Scott Marlowe" <scott.marlowe@gmail.com> writes: >> Sure, bash Microsoft it's easy. But it doesn't address the point, is >> a database safe on top of a compressed file system and if not, why? > > It is certainly *less* safe than it is on top of an uncompressed > filesystem. Any given hardware failure will affect more stored bits > (if the compression is effective) in a less predictable way. Agreed. But I wasn't talking about hardware failures earlier, and someone made the point that a compressed file system, without hardware failure, was likely to eat your data. And I still don't think that's true. Keep in mind a lot of the talk on this so far has been on data warehouses, which are mostly static and well backed up. If you could reduce the size on disk by a factor of 2 or 3, then it's worth taking a small chance on having to recreate the whole db should something go wrong. To put it another way, if you find out you've got corrupted blocks in your main db, due to bad main memory or CPU or something, are you going to fix the bad blocks and memory and just keep going? Of course not, you're going to reinstall from a clean backup to a clean machine. You can"t trust the data that the machine was mangling, whether it was on a compressed volume or not. So now your argument is one of degree, which wasn't the discussion point I was trying to make. > If you assume that hardware failure rates are below your level of > concern, this doesn't matter. I assume hardware failure rates are zero, until there is one. Then I restore from a known good backup. compressed file systems have little to do with that.
On Fri, Oct 31, 2008 at 2:49 AM, Gregory Stark <stark@enterprisedb.com> wrote: > > "Scott Marlowe" <scott.marlowe@gmail.com> writes: > >> What is the torn page problem? Note I'm no big fan of compressed file >> systems, but I can't imagine them not working with databases, as I've >> seen them work quite reliably under exhange server running a db >> oriented storage subsystem. And I can't imagine them not being >> invisible to an application, otherwise you'd just be asking for >> trouble. > > Invisible under normal operation sure, but when something fails the > consequences will surely be different and I can't see how you could make a > compressed filesystem safe without a huge performance hit. While I'm quite willing to concede that a crashed machine can cause corruption in a compressed file system you wouldn't otherwise see, I'm also willing to admit there are times, much like the OP was talking about, where that's an acceptable loss, like Data Warehousing. No way would I run a db for data that mattered on a compressed file system.
Scott Marlowe escribió: > On Thu, Oct 30, 2008 at 7:37 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: > > Scott Marlowe escribió: > > > >> What is the torn page problem? Note I'm no big fan of compressed file > >> systems, but I can't imagine them not working with databases, as I've > >> seen them work quite reliably under exhange server running a db > >> oriented storage subsystem. And I can't imagine them not being > >> invisible to an application, otherwise you'd just be asking for > >> trouble. > > > > Exchange, isn't that the thing that's very prone to corrupted databases? > > I've heard lots of horror stories about that (and also about how you > > have to defragment the database once in a while, so what kind of > > database it really is?) > > Sure, bash Microsoft it's easy. But it doesn't address the point, is > a database safe on top of a compressed file system and if not, why? I'm not bashing Microsoft. I'm just saying that your example application already shows signs that could, perhaps, be explained by the hypothesis put forward by Greg -- that a compressed filesystem is more prone to corruption. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Fri, 31 Oct 2008 08:49:56 +0000 Gregory Stark <stark@enterprisedb.com> wrote: > "Scott Marlowe" <scott.marlowe@gmail.com> writes: > > > What is the torn page problem? Note I'm no big fan of > > compressed file systems, but I can't imagine them not working > > with databases, as I've seen them work quite reliably under > > exhange server running a db oriented storage subsystem. And I > > can't imagine them not being invisible to an application, > > otherwise you'd just be asking for trouble. > Invisible under normal operation sure, but when something fails the > consequences will surely be different and I can't see how you > could make a compressed filesystem safe without a huge performance > hit. Pardon my naiveness but I can't get why compression and data integrity should be always considered clashing factors. DB operation are supposed to be atomic if fsync actually does what it is supposed to do. So you'd have coherency assured by proper execution of "fsync" going down to all HW levels before it reach permanent storage. Now suppose your problem is "avoiding to lose data" not avoiding to lose coherency. eg. you're having a very fast stream of data coming from the LHC. The faster you write to the disk the lower the chances to lose data in case you incur in some kind of hardware failure during the write. The fact you're choosing data compression or not depends on which kind of failure you think is more probable on your hardware and associated costs. If you expect gamma rays cooking your SCSI cables or an asteroid splashing your UPS, compression may be a good choice... it will make your data reach your permanent storage faster. If you expect your permanent storage to store data in a not reliable way (and not report back) a loss of 1 sector may correspond to larger loss of data. Another thing I think should be put in the equation of understanding where is your risk of data loss would be to factor in if your "data source" has some form of "data persistence". If it has you could introduce one more layer of "fsyncing", that means, your data source is not going to wipe the original copy till your DB report back that everything went fine (no asteroid etc...). etc... so data compression may be just one more tool to manage your budget for asteroid shelters. An annoyance of compression may be that while compression *on average* may let you put data faster on permanent storage it increase uncertainty on instant speed of transfer, especially if fs level and db level compression are not aware of each other and fs level compression is less aware of the data that is worth to compress. If I had to push more for data compression I'd make it data-type aware and switchable (or auto-switchable based on ANALYZE or stats results). Of course if you expect to have faulty "permanent storage", data compression *may* not be a good bet... but still it depends on hardware cost, rate of compression, specific kind of failure... eg. the more you compress the more RAID becomes cheaper... I understand Tom that DBA are paid to be paranoid and I really really really appreciate data stored in a format that doesn't require a long queue of tools to be read. I do really hate dependencies that translates in hours of *boring* work if something turn bad. BTW I gave a glance to MonetDB papers posted earlier and it seems that compression algorithms are strongly read-only search optimised. -- Ivan Sergio Borgonovo http://www.webthatworks.it
Scott Marlowe escribió:> On Thu, Oct 30, 2008 at 7:37 PM, Alvaro HerreraI'm not bashing Microsoft. I'm just saying that your example
> <alvherre@commandprompt.com> wrote:
> > Scott Marlowe escribió:
> >
> >> What is the torn page problem? Note I'm no big fan of compressed file
> >> systems, but I can't imagine them not working with databases, as I've
> >> seen them work quite reliably under exhange server running a db
> >> oriented storage subsystem. And I can't imagine them not being
> >> invisible to an application, otherwise you'd just be asking for
> >> trouble.
> >
> > Exchange, isn't that the thing that's very prone to corrupted databases?
> > I've heard lots of horror stories about that (and also about how you
> > have to defragment the database once in a while, so what kind of
> > database it really is?)
>
> Sure, bash Microsoft it's easy. But it doesn't address the point, is
> a database safe on top of a compressed file system and if not, why?
application already shows signs that could, perhaps, be explained by the
hypothesis put forward by Greg -- that a compressed filesystem is more
prone to corruption.
Each added layer could lead to corruption/instability.
Yet, some people might be willing to try out some of these layers
to enhance functionnality.
Postgresql already uses an OS, and even an fs! Why would it decide to not
recode it's own raw device handler ... like some serious db ;)
--
Thomas SAMSON
Simplicity does not precede complexity, but follows it.
Ivan Sergio Borgonovo <mail@webthatworks.it> writes: > On Fri, 31 Oct 2008 08:49:56 +0000 > Gregory Stark <stark@enterprisedb.com> wrote: > >> Invisible under normal operation sure, but when something fails the >> consequences will surely be different and I can't see how you >> could make a compressed filesystem safe without a huge performance >> hit. > > Pardon my naiveness but I can't get why compression and data > integrity should be always considered clashing factors. Well the answer was in the next paragraph of my email, the one you've clipped out here. > DB operation are supposed to be atomic if fsync actually does what > it is supposed to do. > So you'd have coherency assured by proper execution of "fsync" going > down to all HW levels before it reach permanent storage. fsync lets the application know when the data has reached disk. Once it returns you know the data on disk is coherent. What we're talking about is what to do if the power fails or the system crashes before that happens. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
scott.marlowe@gmail.com ("Scott Marlowe") writes: > I assume hardware failure rates are zero, until there is one. Then I > restore from a known good backup. compressed file systems have little > to do with that. There's a way that compressed filesystems might *help* with a risk factor, here... By reducing the number of disk drives required to hold the data, you may be reducing the risk of enough of them failing to invalidate the RAID array. If a RAID array is involved, where *some* failures may be silently coped with, I could readily see this *improving* reliability, in most cases. This is at least *vaguely* similar to the way that aircraft have moved from requiring rather large numbers of engines for cross-Atlantic trips to requiring just 2. In the distant past, the engines were sufficiently unreliable that you wanted to have at least 4 in order to be reasonably assured that you could limp along with at least 2. With increases in engine reliability, it's now considered preferable to have *just* 2 engines, as having 4 means doubling the risk of there being a failure. Disk drives and jet engines are hardly the same thing, but I suspect the analogy fits. -- let name="cbbrowne" and tld="linuxfinances.info" in String.concat "@" [name;tld];; http://linuxfinances.info/info/lisp.html Why do they put Braille dots on the keypad of the drive-up ATM?
On Fri, 31 Oct 2008 17:08:52 +0000 Gregory Stark <stark@enterprisedb.com> wrote: > >> Invisible under normal operation sure, but when something fails > >> the consequences will surely be different and I can't see how > >> you could make a compressed filesystem safe without a huge > >> performance hit. > > > > Pardon my naiveness but I can't get why compression and data > > integrity should be always considered clashing factors. > > Well the answer was in the next paragraph of my email, the one > you've clipped out here. Sorry I didn't want to hide your argument but just to cut the length of the email. Maybe I haven't been clear enough too. I'd consider compression at the fs level more "risky" than compression at the DB level because re-compression at fs level may more frequently spawn across more data structures. But sorry I still can't get WHY compression as a whole and data integrity are mutually exclusive. What I think is going to happen, not necessarily what really happens is: - you make a change to the DB - you ask the underlying fs to write that change to the disk (fsync) - the fs may decide it has to re-compress more than one block but I'd think it still have to oblige to the fsync command and *start* to put them on permanent storage. Now on *average* the write operations should be faster so the risk you'll be hit by an asteroid during the time a fsync has been requested and the time it returns should be shorter. If you're not fsyncing... you've no warranty that your changes reached your permanent storage. Unless compressed fs don't abide to fsync as I'd expect. Furthermore you're starting from the 3 assumption that may not be true: 1) partially written compressed data are completely unrecoverable. 2) you don't have concurrent physical writes to permanent storage 3) the data that should have reached the DB would have survived if they were not sent to the DB Compression change the granularity of physical writes on a single write. But if you consider concurrent physical writes and unrecoverable transmission of data... higher throughput should reduce data loss. If I think at changes as trains with wagons the chances a train can be struck by an asteroid grow as much as the train is long. When you use compression, small changes to a data structure *may* result in longer trains leaving the station but on average you *should* have shorter trains. > > DB operation are supposed to be atomic if fsync actually does > > what it is supposed to do. > > So you'd have coherency assured by proper execution of "fsync" > > going down to all HW levels before it reach permanent storage. > fsync lets the application know when the data has reached disk. > Once it returns you know the data on disk is coherent. What we're > talking about is what to do if the power fails or the system > crashes before that happens. Yeah... actually successful fsync are at a higher integrity level than just "let as much data as possible reach the disk and made it so that they can be read later". But still when you issue an fsync you're asking "put those data on permanent storage". Until then the fs is free to keep manage them in cache and modify/compress them there. The faster they will reach the disk the lower the chances you'll lose them. Of course on the assumption that once an asteroid hit a wagon the whole train is lost that's not ideal... but still the average length of trains *should* be less and reduce the *average* chances they get hit. This *may* still not be the case and it depends on the pattern with which data change. If most of the time you're changing 1 bit followed by an fsync and that requires 2 sectors rewrite that's bad. The chances that this could happen are higher if compression takes place at the fs level and not at the DB level since DB should be more aware of which data can be efficiently compressed and what could be the trade off in terms of data loss if something goes wrong in a 2 sector write when without compression you'd just write one. But I think you could still take advantage of fs compression without sacrificing integrity choosing which tables should reside on a compressed fs and which not and in some circumstances fs compression may get better results than just TOAST. eg. if there are several columns that are frequently updated together... I'd say that compression could be one more tool for managing data integrity not that it will inevitably have a negative impact on it (nor a positive one if not correctly managed). What am I still missing? -- Ivan Sergio Borgonovo http://www.webthatworks.it
Scott Marlowe wrote: > On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <stark@enterprisedb.com> wrote: > > "Scott Marlowe" <scott.marlowe@gmail.com> writes: > > > >> I'm sure this makes for a nice brochure or power point presentation, > >> but in the real world I can't imagine putting that much effort into it > >> when compressed file systems seem the place to be doing this. > > > > I can't really see trusting Postgres on a filesystem that felt free to > > compress portions of it. Would the filesystem still be able to guarantee that > > torn pages won't "tear" across adjacent blocks? What about torn pages that > > included hint bits being set? > > I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte > block, the OS compresses it and it's brethren as the go to disk, > uncompresses as they come out, and as long as what you put in is what > you get back it shouldn't really matter. The question is whether a write of 512 writes to disk blocks that hold data for other parts of the file; in such a case we might not have the full page write copies of those pages to restore, and the compressed operating system might not be able to guarantee that the other parts of the file will be restored if only part of the 512 gets on disk. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Chris Browne wrote: > There's a way that compressed filesystems might *help* with a risk > factor, here... > By reducing the number of disk drives required to hold the data, you > may be reducing the risk of enough of them failing to invalidate the > RAID array. And one more way. If neither your database nor filesystem do checksums on blocks (seems the compressing filesystems mostly do checksums, tho), a one bit error may go undetected corrupting your data without you knowing it. With a filesystem compression, that one bit error is likely to grow into something big enough to be detected immediately.
Ivan Sergio Borgonovo <mail@webthatworks.it> writes: > But sorry I still can't get WHY compression as a whole and data > integrity are mutually exclusive. ... > Now on *average* the write operations should be faster so the risk > you'll be hit by an asteroid during the time a fsync has been > requested and the time it returns should be shorter. > If you're not fsyncing... you've no warranty that your changes > reached your permanent storage. Postgres *guarantees* that as long as everything else works correctly it doesn't lose data. Not that it minimizes the chances of losing data. It is interesting to discuss hardening against unforeseen circumstances as well but it's of secondary importance to first of all guaranteeing 100% that there is no data loss in the expected scenarios. That means Postgres has to guarantee 100% that if the power is lost mid-write that it can recover all the data correctly. It does this by fsyncing logs of some changes and depending on filesystems and drives behaving in certain ways for others -- namely that a partially completed write will leave each byte with either the new or old value. Compressed filesystems might break that assumption making Postgres's guarantee void. I don't know how these hypothetical compressed filesystems are implemented so I can't say whether they work or not. When I first wrote the comment I was picturing a traditional filesystem with each block stored compressed. That can't guarantee anything like this. However later in the discussion I mentioned that ZFS with an 8k block size could actually get this right since it never overwrites existing data, it always writes to a new location and then changes metadata pointers. I expect ext3 with data=journal might also be ok. These both have to make performance sacrifices to get there though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
Grzegorz Jaśkiewicz wrote, On 30-10-08 12:13: > > it should, every book on encryption says, that if you compress your data > before encryption - its better. Those books also should mention that you should leave this subject to experts and have numerous examples on systems that follow the book, but are still broken. There are other techniques as well that make breaking it harder, such as the CBC and CTS modes. Using compression consumes processing power and resources, easing DoS attacks a lot. Also I still have to see an compression algorithm that can sustain over (or even anything close to, for that matter) 100MB/s on todays COTS hardware. As TOAST provides compression, maybe that data can be transmitted in compressed manner (without recompression). - Joris
Gregory Stark wrote, On 01-11-08 14:02: > Ivan Sergio Borgonovo <mail@webthatworks.it> writes: > >> But sorry I still can't get WHY compression as a whole and data >> integrity are mutually exclusive. > ... [snip performance theory] > > Postgres *guarantees* that as long as everything else works correctly it > doesn't lose data. Not that it minimizes the chances of losing data. It is > interesting to discuss hardening against unforeseen circumstances as well but > it's of secondary importance to first of all guaranteeing 100% that there is > no data loss in the expected scenarios. > > That means Postgres has to guarantee 100% that if the power is lost mid-write > that it can recover all the data correctly. It does this by fsyncing logs of > some changes and depending on filesystems and drives behaving in certain ways > for others -- namely that a partially completed write will leave each byte > with either the new or old value. Compressed filesystems might break that > assumption making Postgres's guarantee void. The guarentee YOU want from the underlaying file system is that, in case of, lets say, a power failure: * Already existing data is not modified. * Overwritten data might be corrupted, but its either old or new data. * If an fsync completes, all written data IS commited to disk If an (file) system CAN guarantee that, in any way possible, it is safe to use with PostGreSQL (considering my list is complete, of course). As a side note: I consider the second assumption a bit too strong, but there are probably good reasons to do so. > I don't know how these hypothetical compressed filesystems are implemented so > I can't say whether they work or not. When I first wrote the comment I was > picturing a traditional filesystem with each block stored compressed. That > can't guarantee anything like this. Instead the discussion reverts to discussing file systems without having even a glance at their method of operation. No algorithm used by the file system is written down, but these are being discussed. > However later in the discussion I mentioned that ZFS with an 8k block size > could actually get this right since it never overwrites existing data, it > always writes to a new location and then changes metadata pointers. I expect > ext3 with data=journal might also be ok. These both have to make performance > sacrifices to get there though. Instead, here we are going to specifics we needed a long time ago: ZFS takes 8kB as their optimal point(*), and never overwrite existing data. So it should be as safe as any other file system, if he is indeed correct. Now, does a different block size (of ZFS or PostGreSQL) make any difference to that? No, it still guarentees the list above. Performance is a discussion better left alone, since it is really really dependent on your workload, installation and more specifics. It could be better and it can be worse. - Joris (*) Larger block sizes improve compression ratio. However, you pay a bigger penalty on writes, as more must be read, processed and written.
Joris Dobbelsteen wrote: > Also I still have to see an compression algorithm that can sustain over > (or even anything close to, for that matter) 100MB/s on todays COTS > hardware. As TOAST provides compression, maybe that data can be > transmitted in compressed manner (without recompression). I did a few quick tests on compression speed, as I was curious about just what sort of performance was available. I was under the impression that modern hardware could easily top 100 Mbit/s with common compression algorithms, and wanted to test that. Based on the results I'd have to agree with the quoted claim. I was apparently thinking of symmetric encryption throughput rather than compression throughput. I get 19 Mbit/s from gzip (deflate) on my 2.4GHz Core 2 Duo laptop. With lzop (LZO) the machine achieves 45 Mbit/s. In both cases only a single core is used. With 7zip (LZMA) it only manages 3.1 Mb/s using BOTH cores together. All tests were done on a 278MB block of data that was precached in RAM. Output was to /dev/null except for the LZMA case (due to utility limitations) in which case output was written to a tmpfs. Perhaps a multi-core and/or SIMD-ized implementation of LZO (if such a thing is possible or practical) might manage 100 Mbit/s, or you might pull it off on an absolutely top of the range desktop (or server) CPU like the 3.3 GHz Core 2 Duo. Maybe, but probably not without considerable overclocking, which eliminates the "COTS" aspect rather soundly. Given that very few people have dedicated gzip (or other algorithm) acceleration cards in their systems, it looks like it should be faster to do transfers uncompressed over a network of any respectable speed. Not entirely surprising, really, or it'd be used a lot more in common file server protocols. Wire protocol compression support in PostgreSQL would probably still be extremely useful for Internet or WAN based clients, though, and there are probably more than a few of those around. I know it'd benefit me massively, as I have users using PostgreSQL over 3G cellular radio (UMTS/HSDPA) where real-world speeds are around 0.1 - 1.5 Mbit/s, data transfer limits are low and data transfer charges are high. Compression would clearly need to be a negotiated connection option, though. Interestingly, the Via thin clients at work, which have AES 256 (among other things) implemented in hardware can encrypt to AES 256 at over 300 MB/s. Yes, megabytes, not megabits. Given that the laptop used in the above testing only gets 95 MB/s, it makes you wonder about whether it'd be worthwhile for CPU designers to offer a common compression algorithm like LZO, deflate, or LZMA in hardware for server CPUs. -- Craig Ringer
On Mon, Nov 03, 2008 at 08:18:54AM +0900, Craig Ringer wrote: > Joris Dobbelsteen wrote: > > Also I still have to see an compression algorithm that can sustain over > > (or even anything close to, for that matter) 100MB/s on todays COTS > > hardware. As TOAST provides compression, maybe that data can be > > transmitted in compressed manner (without recompression). > I get 19 Mbit/s from gzip (deflate) on my 2.4GHz Core 2 Duo laptop. With > lzop (LZO) the machine achieves 45 Mbit/s. In both cases only a single > core is used. With 7zip (LZMA) it only manages 3.1 Mb/s using BOTH cores > together. The algorithms in the MonetDB/X100 paper I posted upstream[1] appears to be designed more for this use. Their PFOR algorithm gets between ~0.4GB/s and ~1.7GB/s in compression and ~0.9GBs and 3GB/s in decompression. Your lzop numbers look *very* low; the paper suggests compression going up to ~0.3GB/s on a 2GHz Opteron. In fact, in an old page for lzop[2] they were getting 5MB/s on a Pentium 133 so I don't think I'm understanding what your numbers are. I'll see if I can write some code that implements their algorithms and send another mail. If PFOR really is this fast then it may be good for TOAST compression, though judging by the comments in pg_lzcompress.c it may not be worth it as the time spent on compression gets lost in the noise. Sam [1] http://old-www.cwi.nl/themes/ins1/publications/docs/ZuHeNeBo:ICDE:06.pdf [2] http://www.oberhumer.com/opensource/lzo/#speed
Sam Mason wrote: > On Mon, Nov 03, 2008 at 08:18:54AM +0900, Craig Ringer wrote: >> Joris Dobbelsteen wrote: >>> Also I still have to see an compression algorithm that can sustain over >>> (or even anything close to, for that matter) 100MB/s on todays COTS >>> hardware. As TOAST provides compression, maybe that data can be >>> transmitted in compressed manner (without recompression). > >> I get 19 Mbit/s from gzip (deflate) on my 2.4GHz Core 2 Duo laptop. With >> lzop (LZO) the machine achieves 45 Mbit/s. In both cases only a single >> core is used. With 7zip (LZMA) it only manages 3.1 Mb/s using BOTH cores >> together. > > Your lzop numbers look *very* low; the paper suggests > compression going up to ~0.3GB/s on a 2GHz Opteron. Er ... ENOCOFFEE? . s/Mb(it)?/MB/g . And I'm normally *so* careful about Mb/MB etc; this was just a complete thinko at some level. My apologies, and thanks for catching that stupid error. The paragraph should've read: I get 19 MB/s (152 Mb/s) from gzip (deflate) on my 2.4GHz Core 2 Duo laptop. With lzop (LZO) the machine achieves 45 MB/s (360 Mb/s). In both cases only a single core is used. With 7zip (LZMA) it only manages 3.1 MB/s (24.8 Mb/s) using BOTH cores together. So - it's potentially even worth compressing the wire protocol for use on a 100 megabit LAN if a lightweight scheme like LZO can be used. -- Craig Ringer
Craig Ringer <craig@postnewspapers.com.au> writes: > I get 19 Mbit/s from gzip (deflate) on my 2.4GHz Core 2 Duo laptop. With > lzop (LZO) the machine achieves 45 Mbit/s. In both cases only a single > core is used. With 7zip (LZMA) it only manages 3.1 Mb/s using BOTH cores > together. It'd be interesting to know where pg_lzcompress fits in. > Wire protocol compression support in PostgreSQL would probably still be > extremely useful for Internet or WAN based clients, though, Use an ssh tunnel ... get compression *and* encryption, which you surely should want on a WAN link. regards, tom lane
Tom Lane wrote: >> Wire protocol compression support in PostgreSQL would probably still be >> extremely useful for Internet or WAN based clients, though, > > Use an ssh tunnel ... get compression *and* encryption, which you surely > should want on a WAN link. An ssh tunnel, while very useful, is only suitable for more capable users and is far from transparent. It requires an additional setup step before connection to the database that's going to cause support problems and confuse users. It's also somewhat painful on Windows machines. Additionally, use of an SSH tunnel makes recovery after a connection is broken much, MUCH more difficult for an application to handle transparently automatically. As you know, PostgreSQL supports SSL/TLS for encryption of wire communications, and you can use client certificates as an additional layer of authentication much as you can use an ssh key. It's clean and to the end user it's basically transparent. All the major clients, like the ODBC and JDBC drivers, already support it. Adding optional compression within that would be wonderful - and since the client and server are already designed to communicate through filters (for encryption) it shouldn't be that hard to stack another filter layer on top. It's something I'm going to have to look at myself, actually, though I have some work on the qemu LSI SCSI driver that I *really* have to finish first. -- Craig Ringer
On Mon, Nov 03, 2008 at 10:01:31AM +0900, Craig Ringer wrote: > Sam Mason wrote: > >Your lzop numbers look *very* low; the paper suggests > >compression going up to ~0.3GB/s on a 2GHz Opteron. > > Er ... ENOCOFFEE? . s/Mb(it)?/MB/g . And I'm normally *so* careful about > Mb/MB etc; this was just a complete thinko at some level. My apologies, > and thanks for catching that stupid error. Nice to know we're all human here :) > The paragraph should've read: > > I get 19 MB/s (152 Mb/s) from gzip (deflate) on my 2.4GHz Core 2 Duo > laptop. With lzop (LZO) the machine achieves 45 MB/s (360 Mb/s). In both > cases only a single core is used. With 7zip (LZMA) it only manages 3.1 > MB/s (24.8 Mb/s) using BOTH cores together. Hum, I've just had a look and found that Debian has a version of a lzop compression program. I uncompressed a copy of the Postgres source for a test and I'm getting around 120MBs when compressing on a 2.1GHz Core2 processor (72MB in 0.60 seconds, fast mode). If I save the output and recompress it I get about 40MB/s (22MB in 0.67 seconds), so the compression rate seems to be very dependent on the type of data. As a test, I've just written some code that writes out (what I guess the "LINENUMBER" test is in the X100 paper) a file consisting of small integers (less than 2 decimal digits, i.e. lots of zero bytes) and now get up to 0.4GB/s (200MB in 0.5 seconds), which nicely matches my eyeballing of the figure in the paper. It does point out that compression rates seem to be very data dependent! > So - it's potentially even worth compressing the wire protocol for use > on a 100 megabit LAN if a lightweight scheme like LZO can be used. The problem is that then you're then dedicating most of a processor to doing the compression, one that would otherwise be engaged in doing useful work for other clients. BTW, the X100 work was about trying to become less IO bound; they had a 350MB/s RAID array and were highly IO bound. If I'm reading the paper right, with their PFOR algorithm they got the final query (i.e. decompressing and doing useful work) running at 500MB/s. Sam
On Sun, Nov 2, 2008 at 7:19 PM, Sam Mason <sam@samason.me.uk> wrote: > On Mon, Nov 03, 2008 at 10:01:31AM +0900, Craig Ringer wrote: >> So - it's potentially even worth compressing the wire protocol for use >> on a 100 megabit LAN if a lightweight scheme like LZO can be used. > > The problem is that then you're then dedicating most of a processor to > doing the compression, one that would otherwise be engaged in doing > useful work for other clients. Considering the low cost of gigabit networks nowadays (even my old T42 thinkpad that's 4 years old has gigabit in it) it would be cheaper to buy gig nics and cheap switches than to worry about the network component most the time. On Wans it's another story of course.
Craig Ringer wrote: > So - it's potentially even worth compressing the wire protocol for use > on a 100 megabit LAN if a lightweight scheme like LZO can be used. LZO is under the GPL though.
Peter Eisentraut wrote: > Craig Ringer wrote: >> So - it's potentially even worth compressing the wire protocol for use >> on a 100 megabit LAN if a lightweight scheme like LZO can be used. > > LZO is under the GPL though. Good point. I'm so used to libraries being under more appropriate licenses like the LGPL or BSD license that I completely forgot to check. It doesn't matter that much, anyway, in that deflate would also do the job quite well for any sort of site-to-site or user-to-site WAN link. -- Craig Ringer
> It doesn't matter that much, anyway, in that deflate would also do the > job quite well for any sort of site-to-site or user-to-site WAN link. I used to use that, then switched to bzip. Thing is, if your client is really just issuing SQL, how much does it matter? Compression can't help with latency. Which is why I went with 3 tiers, so that all communication with Postgres occurs on the server, and all communication between server & client is binary, compressed, and a single request/response per user request regardless of how many tables the data is pulled from. -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
Scott Ribe wrote: >> It doesn't matter that much, anyway, in that deflate would also do the >> job quite well for any sort of site-to-site or user-to-site WAN link. > > I used to use that, then switched to bzip. Thing is, if your client is > really just issuing SQL, how much does it matter? It depends a lot on what your requests are. If you have queries that must return significant chunks of data to the client then compression will help with total request time on a slow link, in that there's less data to transfer so the last byte arrives sooner. Of course it's generally preferable to avoid transferring hundreds of KB of data to the client in the first place, but it's not always practical. Additionally, not all connection types have effectively unlimited data transfers. Many mobile networks, for example, tend to have limits on monthly data transfers or charge per MB/KB transferred. Wire compression would be nice for performance on slower networks, but it's mostly appealing for reducing the impact on other users on a WAN, reducing data transfer costs, reducing required WAN capacity, etc. It's appealing because it looks like it should be possible to make it quite simple to enable or disable, so it'd be a simple ODBC/JDBC connection option. > Compression can't help > with latency. Not with network round trip latency, no. -- Craig Ringer
Peter Eisentraut wrote: > Craig Ringer wrote: >> So - it's potentially even worth compressing the wire protocol for use >> on a 100 megabit LAN if a lightweight scheme like LZO can be used. > > LZO is under the GPL though. But liblzf is BSD-style. http://www.goof.com/pcg/marc/liblzf.html
Attachment
On Thu, 2008-11-06 at 00:27 +0100, Ivan Voras wrote: > Peter Eisentraut wrote: > > Craig Ringer wrote: > >> So - it's potentially even worth compressing the wire protocol for use > >> on a 100 megabit LAN if a lightweight scheme like LZO can be used. Yes compressing the wire protocol is a benefit. You can troll the archives for when this has come up in the past. CMD at one time had a hacked up version that proved compression was a benefit (even at 100Mb). Alas it was ugly, :P... If it was done right, it would be a great benefit to folks out there. Joshua D. Drake > > > > LZO is under the GPL though. > > But liblzf is BSD-style. > > http://www.goof.com/pcg/marc/liblzf.html > --
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Ivan Voras > Sent: Wednesday, November 05, 2008 3:28 PM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Are there plans to add data compression feature > to postgresql? > > Peter Eisentraut wrote: > > Craig Ringer wrote: > >> So - it's potentially even worth compressing the wire protocol for > use > >> on a 100 megabit LAN if a lightweight scheme like LZO can be used. > > > > LZO is under the GPL though. > > But liblzf is BSD-style. > > http://www.goof.com/pcg/marc/liblzf.html Here is a 64 bit windows port of that library: http://cap.connx.com/chess-engines/new-approach/liblzf34.zip It has fantastic compression/decompression speed (100 MB well under a second to either compress or decompress) and I seeabout 50% compression.