Thread: Significantly larger toast tables on 8.4?
I just did a fresh import of my alpha database into 8.4 and noticed that the size of the database had increased significantly: 8.4: 6.3G 8.3: 4.9G Tracking it down the main difference seems to a toast tables namely this one: ls -ltrh 8.3/base/16516/430156 -rw------- 1 postgres postgres 145M 2009-01-01 00:12 8.3/base/16516/430156 and the same toast table in 8.4: ls -ltrh 8.4/base/16513/347706 -rw------- 1 postgres postgres 967M 2009-01-01 20:56 8.4/base/16513/347706 This table consists mainly of perl Storable binary blobs in a bytea column schema looks like: Column | Type | Modifiers ---------------+--------------------------+---------------------------------------------------------------------action | text |die_id | integer | not null default nextval('dies_die_id_seq'::regclass)cparam | bytea |date_created | timestamp with time zone |not null default now()db_profile | bytea |debug | bytea |defunct | smallint | not null default 0env | bytea |login | bytea |msg | text |open_user_id | integer |page_load_id | integer |session_id | integer |state | bytea |state_action | bytea |user_id | integer |whoops | bytea | Indexes: "dies_pkey" PRIMARY KEY, btree (die_id) My hunch is its related to http://git.postgresql.org/?p=postgresql.git;a=commit;h=3b6942b3f2fe733572c05a71cb2d12e5ece60cdbits or for the CVS inclined http://archives.postgresql.org/pgsql-committers/2008-03/msg00121.php But if anything that looks like it should help reduce size... Ideas?
On Thu, Jan 1, 2009 at 21:30, Alex Hunsaker <badalex@gmail.com> wrote: > http://git.postgresql.org/?p=postgresql.git;a=commit;h=3b6942b3f2fe733572c05a71cb2d12e5ece60cdbits ... typoed that its http://git.postgresql.org/?p=postgresql.git;a=commit;h=3b6942b3f2fe733572c05a71cb2d12e5ece60cd
On Thu, Jan 1, 2009 at 21:30, Alex Hunsaker <badalex@gmail.com> wrote: > My hunch is its related to > http://git.postgresql.org/?p=postgresql.git;a=commit;h=3b6942b3f2fe733572c05a71cb2d12e5ece60cd > or for the CVS inclined > http::/archives.postgresql.org/pgsql-committers/2008-03/msg00121.php > > But if anything that looks like it should help reduce size... Looking at the patch we dont compress things > 1M anymore so I thought maybe I was hitting that. But no luck there are only 39 rows where the row size > 1M... With those 39 being about 22M each. Unless I calculated something wrong. Oh and CLUSTER and VACUUM dont seem to help. select count(1) from dies;count -------52010 select count(1) from ( select coalesce(pg_column_size(action), 0) + coalesce(pg_column_size(die_id), 0) + coalesce(pg_column_size(cparam),0) + coalesce(pg_column_size(date_created), 0) + coalesce(pg_column_size(db_profile),0) + coalesce(pg_column_size(debug), 0) + coalesce(pg_column_size(defunct),0) + coalesce(pg_column_size(env), 0) + coalesce(pg_column_size(login), 0) + coalesce(pg_column_size(msg), 0) + coalesce(pg_column_size(open_user_id), 0) + coalesce(pg_column_size(page_load_id),0) + coalesce(pg_column_size(session_id), 0) + coalesce(pg_column_size(state),0) + coalesce(pg_column_size(state_action), 0) + coalesce(pg_column_size(user_id),0) + coalesce(pg_column_size(whoops), 0) as row_size from dies ) as foo where foo.row_size> 1024*1024;count ------- 39 BTW is there a "cooler" way to do this?
On Thu, Jan 1, 2009 at 22:44, Alex Hunsaker <badalex@gmail.com> wrote: > Looking at the patch we dont compress things > 1M anymore so I thought > maybe I was hitting that. But no luck there are only 39 rows where > the row size > 1M... With those 39 being about 22M each. Oh my... 25 * 40 = 1000M So I guess my question is are we going to recommend to people that they manually compress their data just for 8.4?
Alex Hunsaker wrote: >On Thu, Jan 1, 2009 at 22:44, Alex Hunsaker <badalex@gmail.com> wrote: >> Looking at the patch we dont compress things > 1M anymore so I thought >> maybe I was hitting that. But no luck there are only 39 rows where >> the row size > 1M... With those 39 being about 22M each. >Oh my... 25 * 40 = 1000M >So I guess my question is are we going to recommend to people that >they manually compress their data just for 8.4? What seems to be hurting the most is the 1MB upper limit. What is the rationale behind that limit? What would be the downside to require compressibility instead? -- Sincerely, Stephen R. van den Berg. "Always remember that you are unique. Just like everyone else."
"Stephen R. van den Berg" <srb@cuci.nl> writes: > What seems to be hurting the most is the 1MB upper limit. What is the > rationale behind that limit? The argument was that compressing/decompressing such large chunks would require a lot of CPU effort; also it would defeat attempts to fetch subsections of a large string. In the past we've required people to explicitly "ALTER TABLE SET STORAGE external" if they wanted to make use of the substring-fetch optimization, but it was argued that this would make that more likely to work automatically. I'm not entirely convinced by Alex' analysis anyway; the only way those 39 large values explain the size difference is if they are *tremendously* compressible, like almost all zeroes. The toast compressor isn't so bright that it's likely to get 10X compression on typical data. regards, tom lane
On Fri, Jan 2, 2009 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Stephen R. van den Berg" <srb@cuci.nl> writes: >> What seems to be hurting the most is the 1MB upper limit. What is the >> rationale behind that limit? > > The argument was that compressing/decompressing such large chunks would > require a lot of CPU effort; also it would defeat attempts to fetch > subsections of a large string. In the past we've required people to > explicitly "ALTER TABLE SET STORAGE external" if they wanted to make > use of the substring-fetch optimization, but it was argued that this > would make that more likely to work automatically. > > I'm not entirely convinced by Alex' analysis anyway; the only way > those 39 large values explain the size difference is if they are > *tremendously* compressible, like almost all zeroes. The toast > compressor isn't so bright that it's likely to get 10X compression > on typical data. I've seen gzip approach 10X on what was basically a large tab-separated values file, but I agree that some more experimentation to determine the real cause of the problem would be useful. I am a little mystified by the apparent double standard regarding compressibility. My suggestion that we disable compression for pg_statistic columns was perfunctorily shot down even though I provided detailed performance results demonstrating that it greatly sped up query planning on toasted statistics and even though the space savings from compression in that case are bound to be tiny. Here, we have a case where the space savings are potentially much larger, and the only argument against it is that someone might be disappointed in the performance of substring operations, if they happen to do any. What if they know that they don't want to do any and want to get compression? Even if the benefit is only 1.5X on their data rather than 10X, that seems like a pretty sane and useful thing to want to do. It's easy to shut off compression if you don't want it; if the system makes an arbitrary decision to disable it, how do you get it back? ...Robert
"Robert Haas" <robertmhaas@gmail.com> writes: > On Fri, Jan 2, 2009 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm not entirely convinced by Alex' analysis anyway; the only way >> those 39 large values explain the size difference is if they are >> *tremendously* compressible, like almost all zeroes. The toast >> compressor isn't so bright that it's likely to get 10X compression >> on typical data. > I've seen gzip approach 10X on what was basically a large > tab-separated values file, but I agree that some more experimentation > to determine the real cause of the problem would be useful. If I'm counting on my fingers correctly, you'd need to assume about 23X compression to get the reported size change... An easy way to prove or disprove the point would be to go into src/backend/utils/adt/pg_lzcompress.c, and change the second entry in strategy_default_data from "1024 * 1024" to "INT_MAX", then re-load the table with the modified backend (or you could even just do "CREATE TABLE foo AS SELECT * FROM existing_table" and then compare sizes). If that puts the toast table size back where it had been, then Alex correctly diagnosed the cause. regards, tom lane
On Fri, Jan 2, 2009 at 09:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: >The argument was that compressing/decompressing such large chunks would >require a lot of CPU effort; also it would defeat attempts to fetch >subsections of a large string. In the past we've required people to >explicitly "ALTER TABLE SET STORAGE external" if they wanted to make >use of the substring-fetch optimization, but it was argued that this >would make that more likely to work automatically. It is *way* faster here are some ruff numbers: $ cat q.sql select length(substring(debug, 1024, 1024)) from dies where die_id = 295; $ ./pgbench -T60 -n -f q.sql 8.4: 1532.327546 8.3: 21.295657 Thats with 8.4 as a --enable-debug --enable-cassert build! (8.3 just has --enable-debug) > I'm not entirely convinced by Alex' analysis anyway; the only way > those 39 large values explain the size difference is if they are > *tremendously* compressible, like almost all zeroes. The toast > compressor isn't so bright that it's likely to get 10X compression > on typical data. Well its certainly not all zeros, but those big rows are the same message repeated (~5k) until perl detected endless recursion died... So its not exactly typical or *useful* data either. The other 51, 971 rows (rows smaller than 1024*1024 bytes) account for 174MB while those 39 account for the other 828MB. Mainly I was just alarmed by the size increase. But I think the pgbench numbers make me happy even if I never do a substring on that data... ------------------ # 8.3 select pg_size_pretty(pg_column_size(debug)) from dies where die_id = 295;pg_size_pretty ----------------1256 kB select pg_size_pretty(pg_column_size(debug)) from dies where die_id = 295;pg_size_pretty ----------------22 MB select pg_size_pretty(sum(row_size)) from ( select coalesce(pg_column_size(action), 0) + coalesce(pg_column_size(die_id), 0) + coalesce(pg_column_size(cparam),0) + coalesce(pg_column_size(date_created), 0) + coalesce(pg_column_size(db_profile),0) + coalesce(pg_column_size(debug), 0) + coalesce(pg_column_size(defunct),0) + coalesce(pg_column_size(env), 0) + coalesce(pg_column_size(login), 0) + coalesce(pg_column_size(msg), 0) + coalesce(pg_column_size(open_user_id), 0) + coalesce(pg_column_size(page_load_id),0) + coalesce(pg_column_size(session_id), 0) + coalesce(pg_column_size(state),0) + coalesce(pg_column_size(state_action), 0) + coalesce(pg_column_size(user_id),0) + coalesce(pg_column_size(whoops), 0) as row_sizefrom dies ) as foo where foo.row_size< 1024*1024;pg_size_pretty ----------------174 MB select pg_size_pretty(sum(row_size)) from ( select coalesce(pg_column_size(action), 0) + coalesce(pg_column_size(die_id), 0) + coalesce(pg_column_size(cparam),0) + coalesce(pg_column_size(date_created), 0) + coalesce(pg_column_size(db_profile),0) + coalesce(pg_column_size(debug), 0) + coalesce(pg_column_size(defunct),0) + coalesce(pg_column_size(env), 0) + coalesce(pg_column_size(login), 0) + coalesce(pg_column_size(msg), 0) + coalesce(pg_column_size(open_user_id), 0) + coalesce(pg_column_size(page_load_id),0) + coalesce(pg_column_size(session_id), 0) + coalesce(pg_column_size(state),0) + coalesce(pg_column_size(state_action), 0) + coalesce(pg_column_size(user_id),0) + coalesce(pg_column_size(whoops), 0) as row_sizefrom dies ) as foo where foo.row_size< 1024*1024;pg_size_pretty ----------------828 MB
On Fri, Jan 2, 2009 at 10:44, Robert Haas <robertmhaas@gmail.com> wrote: > Here, we have a case where the space savings are potentially much > larger, and the only argument against it is that someone might be > disappointed in the performance of substring operations, if they > happen to do any. What if they know that they don't want to do any > and want to get compression? Even if the benefit is only 1.5X on > their data rather than 10X, that seems like a pretty sane and useful > thing to want to do. It's easy to shut off compression if you don't > want it; if the system makes an arbitrary decision to disable it, how > do you get it back? I think we could just add another toast storage type: alter table alter column set storage compress; ? It seems overkill to expose PGLZ_Strategy knobs per column...
On Fri, Jan 2, 2009 at 11:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Robert Haas" <robertmhaas@gmail.com> writes: >> I've seen gzip approach 10X on what was basically a large >> tab-separated values file, but I agree that some more experimentation >> to determine the real cause of the problem would be useful. > > If I'm counting on my fingers correctly, you'd need to assume about > 23X compression to get the reported size change... > > An easy way to prove or disprove the point would be to go into > src/backend/utils/adt/pg_lzcompress.c, and change the second entry > in strategy_default_data from "1024 * 1024" to "INT_MAX", then > re-load the table with the modified backend (or you could even > just do "CREATE TABLE foo AS SELECT * FROM existing_table" and > then compare sizes). If that puts the toast table size back where > it had been, then Alex correctly diagnosed the cause. And the toast file size is *drum roll* 167M.
Alex Hunsaker wrote: >I think we could just add another toast storage type: alter table >alter column set storage compress; ? It seems overkill to expose >PGLZ_Strategy knobs per column... Three things: a. Shouldn't it in theory be possible to have a decompression algorithm which is IO-bound because it decompresses fasterthan the disk can supply the data? (On common current hardware). b. Has the current algorithm been carefully benchmarked and/or optimised and/or chosen to fit the IO-bound target as closeas possible? c. Are there any well-known pitfalls/objections which would prevent me from changing the algorithm to something more efficient(read: IO-bound)? -- Sincerely, Stephen R. van den Berg. "Always remember that you are unique. Just like everyone else."
On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <srb@cuci.nl> wrote: > Alex Hunsaker wrote: >>I think we could just add another toast storage type: alter table >>alter column set storage compress; ? It seems overkill to expose >>PGLZ_Strategy knobs per column... > > Three things: > a. Shouldn't it in theory be possible to have a decompression algorithm > which is IO-bound because it decompresses faster than the disk can > supply the data? (On common current hardware). > b. Has the current algorithm been carefully benchmarked and/or optimised > and/or chosen to fit the IO-bound target as close as possible? > c. Are there any well-known pitfalls/objections which would prevent me from > changing the algorithm to something more efficient (read: IO-bound)? Any compression algorithm is going to require you to decompress the entire string before extracting a substring at a given offset. When the data is uncompressed, you can jump directly to the offset you want to read. Even if the compression algorithm requires no overhead at all, it's going to make the location of the data nondeterministic, and therefore force additional disk reads. ...Robert
"Stephen R. van den Berg" <srb@cuci.nl> writes: > Alex Hunsaker wrote: >>I think we could just add another toast storage type: alter table >>alter column set storage compress; ? It seems overkill to expose >>PGLZ_Strategy knobs per column... > > Three things: > a. Shouldn't it in theory be possible to have a decompression algorithm > which is IO-bound because it decompresses faster than the disk can > supply the data? (On common current hardware). We don't stream the data from disk through the decompressor. So whether it's i/o bound or not, the time spend decompressing is in addition to the time spent doing the i/o. The only way it wins is if the time saved on i/o due to the smaller data size is greater than the time spent decompressing. I think the right value for this setting is going to depend on the environment. If the system is starved for cpu cycles then you won't want to compress large data. If it's starved for i/o bandwidth but has spare cpu cycles then you will. If that's true then we really have to expose this parameter to users. There won't be a single value that is appropriate for everyone. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
Robert Haas wrote: >On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <srb@cuci.nl> wrote: >> Three things: >> a. Shouldn't it in theory be possible to have a decompression algorithm >> which is IO-bound because it decompresses faster than the disk can >> supply the data? (On common current hardware). >> b. Has the current algorithm been carefully benchmarked and/or optimised >> and/or chosen to fit the IO-bound target as close as possible? >> c. Are there any well-known pitfalls/objections which would prevent me from >> changing the algorithm to something more efficient (read: IO-bound)? >Any compression algorithm is going to require you to decompress the >entire string before extracting a substring at a given offset. When >the data is uncompressed, you can jump directly to the offset you want >to read. Even if the compression algorithm requires no overhead at >all, it's going to make the location of the data nondeterministic, and >therefore force additional disk reads. That shouldn't be insurmountable: - I currently have difficulty imagining applications that actually do lots of substring extractions from large compressiblefields. The most likely operation would be a table which contains tsearch indexed large textfields, but thoseare unlikely to participate in a lot of substring extractions. - Even if substring operations would be likely, I could envision a compressed format which compresses in compressed chunksof say 64KB which can then be addressed randomly independently. -- Sincerely, Stephen R. van den Berg. "Always remember that you are unique. Just like everyone else."
On Fri, Jan 2, 2009 at 4:19 PM, Stephen R. van den Berg <srb@cuci.nl> wrote: > Robert Haas wrote: >>On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <srb@cuci.nl> wrote: >>> Three things: >>> a. Shouldn't it in theory be possible to have a decompression algorithm >>> which is IO-bound because it decompresses faster than the disk can >>> supply the data? (On common current hardware). >>> b. Has the current algorithm been carefully benchmarked and/or optimised >>> and/or chosen to fit the IO-bound target as close as possible? >>> c. Are there any well-known pitfalls/objections which would prevent me from >>> changing the algorithm to something more efficient (read: IO-bound)? > >>Any compression algorithm is going to require you to decompress the >>entire string before extracting a substring at a given offset. When >>the data is uncompressed, you can jump directly to the offset you want >>to read. Even if the compression algorithm requires no overhead at >>all, it's going to make the location of the data nondeterministic, and >>therefore force additional disk reads. > > That shouldn't be insurmountable: > - I currently have difficulty imagining applications that actually do > lots of substring extractions from large compressible fields. > The most likely operation would be a table which contains tsearch > indexed large textfields, but those are unlikely to participate in > a lot of substring extractions. I completely agree. If your large text field has interior structure with certain data items at certain positions, you'd presumably break it into multiple fixed-width fields. If it doesn't, what's the use case? > - Even if substring operations would be likely, I could envision a compressed > format which compresses in compressed chunks of say 64KB which can then > be addressed randomly independently. I think this would require some sort of indexing so that you could find the page that contains the first bit of any particular chunk you want to find, so it might be a bit complex to implement, and I expect it would reduce compression ratios as well. I'm sure it could be done, but I doubt it's worth the bother. If you're more concerned about the speed with which you can access your data than the size of it, you can and should turn compression off altogether. ...Robert
On Fri, Jan 02, 2009 at 03:35:18PM -0500, Robert Haas wrote: > Any compression algorithm is going to require you to decompress the > entire string before extracting a substring at a given offset. When > the data is uncompressed, you can jump directly to the offset you want > to read. Even if the compression algorithm requires no overhead at > all, it's going to make the location of the data nondeterministic, and > therefore force additional disk reads. So you compromise. You split the data into say 1MB blobs and compress each individually. Then if someone does a substring at offset 3MB you can find it quickly. This barely costs you anything in the compression ratio mostly. Implementation though, that's harder. The size of the blobs is tunable also. I imagine the optimal value will probably be around 100KB. (12 blocks uncompressed). Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
"Alex Hunsaker" <badalex@gmail.com> writes: > On Fri, Jan 2, 2009 at 11:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> An easy way to prove or disprove the point would be to go into >> src/backend/utils/adt/pg_lzcompress.c, and change the second entry >> in strategy_default_data from "1024 * 1024" to "INT_MAX", > And the toast file size is *drum roll* 167M. Hmmm ... so that's a lot closer to the original 145M, but it still seems like there's something else going on. It looks like the other thing we changed that might result in not compressing things was to increase the third entry (minimum compression rate) from 20% to 25%. Could you try it with that value also changed back? regards, tom lane
"Stephen R. van den Berg" <srb@cuci.nl> writes: > - I currently have difficulty imagining applications that actually do > lots of substring extractions from large compressible fields. The code that's in there to make this happen was written by people who needed the feature. They're going to be upset with you if you propose disabling it. regards, tom lane
Gregory Stark <stark@enterprisedb.com> writes: > I think the right value for this setting is going to depend on the > environment. If the system is starved for cpu cycles then you won't want to > compress large data. If it's starved for i/o bandwidth but has spare cpu > cycles then you will. > If that's true then we really have to expose this parameter to users. There > won't be a single value that is appropriate for everyone. Yeah. The commit message for these changes commented There was some discussion in the earlier threads of exposing someof the compression knobs to users, perhaps even on a per-columnbasis. I have not done anything about that here. It seems to methat if we are changing around the parameters,we'd better get someexperience and be sure we are happy with the design before we setthings in stone by providinguser-visible knobs. and I'm still pretty worried about the longevity of any knob we put in here. But we might not have a lot of choice. It would be fairly easy, I think, to add some reloption fields that would let these parameters be controlled on a per-table level. Per-column would be much more painful; do we really need that? regards, tom lane
On Fri, Jan 2, 2009 at 18:30, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Alex Hunsaker" <badalex@gmail.com> writes: >> On Fri, Jan 2, 2009 at 11:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> An easy way to prove or disprove the point would be to go into >>> src/backend/utils/adt/pg_lzcompress.c, and change the second entry >>> in strategy_default_data from "1024 * 1024" to "INT_MAX", > >> And the toast file size is *drum roll* 167M. > > Hmmm ... so that's a lot closer to the original 145M, but it still > seems like there's something else going on. It looks like the other > thing we changed that might result in not compressing things was to > increase the third entry (minimum compression rate) from 20% to 25%. > Could you try it with that value also changed back? With it back to 20% its now back to 145M.
On Fri, Jan 2, 2009 at 8:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Stephen R. van den Berg" <srb@cuci.nl> writes: >> - I currently have difficulty imagining applications that actually do >> lots of substring extractions from large compressible fields. > > The code that's in there to make this happen was written by people who > needed the feature. They're going to be upset with you if you propose > disabling it. Why didn't they just turn off compression for the relevant columns? ...Robert
"Robert Haas" <robertmhaas@gmail.com> writes: > On Fri, Jan 2, 2009 at 8:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Stephen R. van den Berg" <srb@cuci.nl> writes: >>> - I currently have difficulty imagining applications that actually do >>> lots of substring extractions from large compressible fields. >> >> The code that's in there to make this happen was written by people who >> needed the feature. They're going to be upset with you if you propose >> disabling it. > Why didn't they just turn off compression for the relevant columns? They did --- with the pre-8.4 code, they had no choice, because the toast compressor would kick in if it could save even one byte on the total field size. That's clearly silly. We might have gone too far in the other direction with the current settings, but the point is that compression isn't always a good thing. One point that nobody seems to have focused on is whether Alex's less-compressed table is faster or slower to access than the original. I dunno if he has any easy way of investigating that for his typical query mix, but it's certainly a fair question to ask. regards, tom lane
>> Why didn't they just turn off compression for the relevant columns? > > They did --- with the pre-8.4 code, they had no choice, because the > toast compressor would kick in if it could save even one byte on the > total field size. That's clearly silly. We might have gone too far > in the other direction with the current settings, but the point is > that compression isn't always a good thing. I agree with all of that. It seems to me that categorically refusing to compress anything over 1M, as Alex seems to think the current settings are doing, is clearly silly in the opposite direction. What we want to avoid is trying to compress data that's already been compressed - the early-failure path you added seems like the right general idea, though perhaps a bit too simplistic. But the size of the data is not evidence of anything, so I'm unclear why we think that's relevant. It could also lead to some awfully strange behavior if you have, say, a table with highly compressible data whose rows are gradually updated with longer values over time. When they hit 1MB, the storage requirements of the database will suddenly balloon for no reason that will be obvious to the DBA. > One point that nobody seems to have focused on is whether Alex's > less-compressed table is faster or slower to access than the original. > I dunno if he has any easy way of investigating that for his typical > query mix, but it's certainly a fair question to ask. Sure, but that's largely an orthogonal issue. Compression is generally bad for performance, though there are certainly exceptions. What it is good for is saving disk space, and that is why people use it. If that's not why we're using it, then I'm puzzled. ...Robert
Robert Haas wrote: > > Compression is > generally bad for performance, though there are certainly exceptions. > What it is good for is saving disk space, and that is why people use > it. > I don't think disk space is still the primary reason for using compression; shoot.. I can buy a terabyte drive for $150 on newegg and 1.5TB drives are out. I think the primary reason has been slowlymoving towards performance. Fast downloads, compressed web sites or simply reading/writing less to/from disk are very common use cases for compression; all increase performance. Basically, compression comonly offers faster data delivery. As long as you utilize a fast compressor, like lzo or zlib "NOT level 9", its a performance win. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
On Fri, Jan 2, 2009 at 20:30, Tom Lane <tgl@sss.pgh.pa.us> wrote: > One point that nobody seems to have focused on is whether Alex's > less-compressed table is faster or slower to access than the original. > I dunno if he has any easy way of investigating that for his typical > query mix, but it's certainly a fair question to ask. Other than the quick pgbench numbers I posted upthread where 8.4 blew 8.3 out of the water with a substring. Not really, this table is mainly insert. A few times a day everything inserted that day gets selected. So while I'm almost positive 8.4 is faster, its probably not really noticeable in my workload. That being said here are some quick numbers: (see attached q.sql for how uninteresting the query is, also this is so slow mainly due to the lack of it using an index, it seq-scans the entire table :() ./pgbench -T600 -n -f q.sql 8.4 with 8.3 TOAST: 6.250298 8.4: 6.460312 (note I dont actually use substring on this table...) ./pgbench -T60 -n -f substring.sql 8.4 w 8.3 TOAST: 12.613394 8.4: 6347.456596
Attachment
On Fri, Jan 2, 2009 at 18:46, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > It would be fairly easy, I think, to add some reloption fields that > would let these parameters be controlled on a per-table level. +1 Or something easier that just lets you use PGLZ_Strategy strategy_always_data. ALTER TABLE SET STORAGE ALWAYS_COMPRESS; Or is that way to stop gap? > Per-column would be much more painful; do we really need that? Obviously not for 8.4 this late in the game.
On Fri, 2009-01-02 at 19:27 -0700, Alex Hunsaker wrote: > On Fri, Jan 2, 2009 at 18:30, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Alex Hunsaker" <badalex@gmail.com> writes: > >> On Fri, Jan 2, 2009 at 11:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> An easy way to prove or disprove the point would be to go into > >>> src/backend/utils/adt/pg_lzcompress.c, and change the second entry > >>> in strategy_default_data from "1024 * 1024" to "INT_MAX", > > > >> And the toast file size is *drum roll* 167M. > > > > Hmmm ... so that's a lot closer to the original 145M, but it still > > seems like there's something else going on. It looks like the other > > thing we changed that might result in not compressing things was to > > increase the third entry (minimum compression rate) from 20% to 25%. > > Could you try it with that value also changed back? > > With it back to 20% its now back to 145M. Perspective on this is that the numbers don't sound too bad if we put a big M behind them, but lets imagine that's a G or even a T. Those numbers look pretty sad then. We must retain the option to compress and even better, options to control the compression. Please, please remember that the world genuinely does wish to store multiple Terabytes of data and they won't do it with Postgres unless we act sensibly on this point. Our open source software *enables* massive deployment of database technology. Oracle charge money for their Advanced Compression option, so if lets avoid anything that will immediately justify that cost. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Tom Lane wrote: >and I'm still pretty worried about the longevity of any knob we put in >here. But we might not have a lot of choice. >It would be fairly easy, I think, to add some reloption fields that >would let these parameters be controlled on a per-table level. >Per-column would be much more painful; do we really need that? +1 Per table sounds fine for now. Per column would be nice, but can be worked around if absolutely necessary by splitting tables. To avoid having to add another parameter later, I *would* suggest to use something like: ALTER TABLE mytable SET COMPRESSIONLEVEL = 9; Where it can range from 0 (= no compression), to 9 (= maximum compression). The current algorithm could then either be as simplistic as to kick in anytime COMPRESSIONLEVEL>=1, and not to compress when COMPRESSIONLEVEL==0. More advanced algorithms and decisions can be implemented later. Obviously the algorithm should ideally use the one-dimensional knob to more or less deliver IO-bound (de)compression at level one, and CPU-bound (de)compression at level nine. This kind of one-dimensional knob is well understood by many compression tools and libraries and users, so it'd make sense to provide something similar to the DBA. -- Sincerely, Stephen R. van den Berg. Expect the unexpected!
On Friday 02 January 2009 22:23:13 Stephen R. van den Berg wrote: > Three things: > a. Shouldn't it in theory be possible to have a decompression algorithm > which is IO-bound because it decompresses faster than the disk can > supply the data? (On common current hardware). > b. Has the current algorithm been carefully benchmarked and/or optimised > and/or chosen to fit the IO-bound target as close as possible? > c. Are there any well-known pitfalls/objections which would prevent me from > changing the algorithm to something more efficient (read: IO-bound)? copyright licenses and patents Which doesn't mean changing anything is impossible, but it is tricky in those nontechnical ways.
On Saturday 03 January 2009 03:36:16 Tom Lane wrote: > "Stephen R. van den Berg" <srb@cuci.nl> writes: > > - I currently have difficulty imagining applications that actually do > > lots of substring extractions from large compressible fields. > > The code that's in there to make this happen was written by people who > needed the feature. They're going to be upset with you if you propose > disabling it. I think what he is saying is that it is the less likely use case and should therefore tend to be not the default. Also note that the code in there was written about 8 years ago, when dealing with "large" data was an entirely different game. People where happy to access more than 8 kB then. I would in fact imagine that substring operations are more likely to happen with data smaller than 1 MB, and less likely with data larger than 1 MB, instead of the other way around, which is currently implemented. The main sensible way to access text fields larger than 1 MB is with text search, as was pointed out. And large bytea fields are probably media files that are probably already compressed and have no sensible use for substring operations.
Tom Lane wrote: > It would be fairly easy, I think, to add some reloption fields that > would let these parameters be controlled on a per-table level. > Per-column would be much more painful; do we really need that? > Another +1 on the per-table setting. Or a config file setting to disable this for the instance. We have a 200GB DB that is mostly large text (>1MB) that is not searched with substr. If we see a blowout in size of even 3x, we will not be able to upgrade due to disk space limitations (at least without paying for a lot of disks on mirror servers and hot-standy servers).
On Sat, Jan 3, 2009 at 20:47, Philip Warner <pjw@rhyme.com.au> wrote: > Tom Lane wrote: >> It would be fairly easy, I think, to add some reloption fields that >> would let these parameters be controlled on a per-table level. >> Per-column would be much more painful; do we really need that? >> > > Another +1 on the per-table setting. Or a config file setting to disable > this for the instance. > > We have a 200GB DB that is mostly large text (>1MB) that is not searched > with substr. If we see a blowout in size of even 3x, we will not be able > to upgrade due to disk space limitations (at least without paying for a > lot of disks on mirror servers and hot-standy servers). Well I *really* doubt unless your text is extremely redundant you will see a large increase if any. Even if you dont search by substr, fetching the data is quite could be quite a bit faster. Depending on how beefy the cpu's on the machine are. A quick benchmark here says by as much 200x! (30tps vs 6000tps). Thats just a simple select on a dual 2ghz opteron. For the record I just imported a production database that sits at about ~20G right now with *zero* size increase (rounding to the nearest gigabyte). That's with basically the exact same schema just different data. I don't suppose you could export some random rows and see if you see any size increase for your data? My gut says you wont see an increase.
Alex Hunsaker wrote: > For the record I just imported a production database that sits at > about ~20G right now with *zero* size increase (rounding to the > nearest gigabyte). That's with basically the exact same schema just > different data. > > Guessing you don't have many plain text rows > 1M. > I don't suppose you could export some random rows and see if you see > any size increase for your data? My gut says you wont see an > increase. > Will see what I can do.
On Sat, Jan 3, 2009 at 21:56, Philip Warner <pjw@rhyme.com.au> wrote: > Alex Hunsaker wrote: >> For the record I just imported a production database that sits at >> about ~20G right now with *zero* size increase (rounding to the >> nearest gigabyte). That's with basically the exact same schema just >> different data. >> > Guessing you don't have many plain text rows > 1M. Probably not. >> I don't suppose you could export some random rows and see if you see >> any size increase for your data? My gut says you wont see an >> increase. >> > > Will see what I can do. Actually assuming they dont have any multibyte chars you should just be able to do something like the below on your existing database. -- show anything we save a megabyte on select die_id, pg_size_pretty(savings) from ( select length(debug) - pg_column_size(debug) as savings, die_id from fooa) as foo where savings > 1024*1024 order by savings desc;
Peter Eisentraut wrote: >> c. Are there any well-known pitfalls/objections which would prevent me from >> changing the algorithm to something more efficient (read: IO-bound)? >> > > copyright licenses and patents > Would it be possible to have a plugin facility? I guess the most likely candidate is the LZJB mechanism in ZFS which is CDDL licensed. Would that be compatible in contrib, if not in the main source? James
James Mansion wrote: >Peter Eisentraut wrote: >>>c. Are there any well-known pitfalls/objections which would prevent me >>>from >>> changing the algorithm to something more efficient (read: IO-bound)? >>copyright licenses and patents >Would it be possible to have a plugin facility? >I guess the most likely candidate is the LZJB mechanism in ZFS which is >CDDL licensed. The most likely candidate for a speedy algorithm seems QuickLZ, the author is willing to accomodate the licensing. -- Sincerely, Stephen R. van den Berg. Climate is what you expect. Weather is what you get.
On Sat, Jan 3, 2009 at 1:32 AM, Alex Hunsaker <badalex@gmail.com> wrote: > On Fri, Jan 2, 2009 at 18:46, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> It would be fairly easy, I think, to add some reloption fields that >> would let these parameters be controlled on a per-table level. > > +1 > > Or something easier that just lets you use PGLZ_Strategy strategy_always_data. > > ALTER TABLE SET STORAGE ALWAYS_COMPRESS; > > Or is that way to stop gap? > >> Per-column would be much more painful; do we really need that? Personally, I think the whole built-in compressor framework should be deprecated (it's confusing and unintuitive) and replaced with a pluggable compression family in the vein of pg_crypto. This solves a number of issues simultaneously...we no longer have to 'guess' if the user wants a bytea compressed or not, everything is controlled from the client, and we can make it easier for users to use non license compatible libraries (like lzo) in a standard way. merlin
James Mansion wrote: > Peter Eisentraut wrote: >>> c. Are there any well-known pitfalls/objections which would prevent >>> me from >>> changing the algorithm to something more efficient (read: IO-bound)? >>> >> >> copyright licenses and patents >> > Would it be possible to have a plugin facility? Well, before we consider that, we'd probably want to see proof about the effectiveness of other compression methods.
Peter Eisentraut escribió: > James Mansion wrote: >> Peter Eisentraut wrote: >>>> c. Are there any well-known pitfalls/objections which would prevent >>>> me from >>>> changing the algorithm to something more efficient (read: IO-bound)? >>> >>> copyright licenses and patents >>> >> Would it be possible to have a plugin facility? > > Well, before we consider that, we'd probably want to see proof about the > effectiveness of other compression methods. I did some measurements months ago, and it was very clear that libz compression was a lot tighter than the PGLZ code. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Mon, Jan 5, 2009 at 11:45 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Peter Eisentraut escribió: >> James Mansion wrote: >>> Peter Eisentraut wrote: >>>>> c. Are there any well-known pitfalls/objections which would prevent >>>>> me from >>>>> changing the algorithm to something more efficient (read: IO-bound)? >>>> >>>> copyright licenses and patents >>>> >>> Would it be possible to have a plugin facility? >> >> Well, before we consider that, we'd probably want to see proof about the >> effectiveness of other compression methods. > > I did some measurements months ago, and it was very clear that libz > compression was a lot tighter than the PGLZ code. we have seen amazing results with lzo compression...2-3x faster compression times with only 10-15% less compression: There are tons of supporting examples online, for example: http://mail.jabber.org/pipermail/standards/2005-October/008768.html I think, if the database is automatically compressing things (which, IMO, it shouldn't), a low cpu overhead algorithm should be favored. merlin
On Monday 05 January 2009 18:45:49 Alvaro Herrera wrote: > I did some measurements months ago, and it was very clear that libz > compression was a lot tighter than the PGLZ code. Back to the issue at hand. The question at the top of the thread was which of the following behaviors we'd like by default: (1) Compress everything within reason by default, causing slower retrieval, do not offer substr optimization. [<= 8.3] (2) Compress only up to 1 MB, causing faster retrieval, supporting substr optimization. [8.4devel] I am personally completely puzzled by option number 2. Is there even a single use case for that?
Peter Eisentraut wrote: >(1) Compress everything within reason by default, causing slower retrieval, do >not offer substr optimization. [<= 8.3] >(2) Compress only up to 1 MB, causing faster retrieval, supporting substr >optimization. [8.4devel] >I am personally completely puzzled by option number 2. Is there even a single >use case for that? I can't imagine one, and (in this thread at least) noone has demonstrated such; Tom hinted at one, but he didn't elaborate. -- Sincerely, Stephen R. van den Berg. "Very funny, Mr. Scott. Now beam down my clothes!"
On Tue, Jan 6, 2009 at 12:57, Stephen R. van den Berg <srb@cuci.nl> wrote: > Peter Eisentraut wrote: >>(1) Compress everything within reason by default, causing slower retrieval, do >>not offer substr optimization. [<= 8.3] > >>(2) Compress only up to 1 MB, causing faster retrieval, supporting substr >>optimization. [8.4devel] > >>I am personally completely puzzled by option number 2. Is there even a single >>use case for that? > > I can't imagine one, and (in this thread at least) noone has demonstrated > such; Tom hinted at one, but he didn't elaborate. Well that check got removed today anyway see: http://archives.postgresql.org/pgsql-committers/2009-01/msg00069.php
On Fri, Jan 2, 2009 at 5:48 PM, Martijn van Oosterhout <kleptog@svana.org> wrote: > So you compromise. You split the data into say 1MB blobs and compress > each individually. Then if someone does a substring at offset 3MB you > can find it quickly. This barely costs you anything in the compression > ratio mostly. > > Implementation though, that's harder. The size of the blobs is tunable > also. I imagine the optimal value will probably be around 100KB. (12 > blocks uncompressed). Or have the database do that internally: With the available fast compression algorithms (zlib; lzo; lzf; etc) the diminishing return from larger compression block sizes kicks in rather quickly. Other algos like LZMA or BZIP gain more from bigger block sizes, but I expect all of them are too slow to ever consider using in PostgreSQL. So, I expect that the compression loss from compressing in chunks of 64kbytes would be minimal. The database could then include a list of offsets for the 64kbyte chunks at the beginning of the field, or something like that. A short substring would then require decompressing just one or two blocks, far less overhead then decompressing everything. It would probably be worthwhile to graph compression ratio vs block size for some reasonable input. I'd offer to do it; but I doubt I have a reasonable test set for this.