Thread: Improve compression speeds in pg_lzcompress.c
Hi, hackers, The attached is a patch to improve compression speeds with loss of compression ratios in backend/utils/adt/pg_lzcompress.c. Recent modern compression techniques like google LZ4 and Snappy inspreid me to write this patch. Thre are two points of my patch: 1. Skip at most 255 literals that might be incompressible during pattern matching for LZ compression. 2. Update a hash table every PGLZ_HASH_GAP literals. A sequence of literals is typically mixed up with compressible parts and incompressible ones. Then, IMHO that it is reasonable to skip PGLZ_SKIP_SIZE literals every a match is not found. The skipped multiple literals are just copied to the output buffer, so pglz_out_literal() is re-written (and renamed pglz_out_literals) so as to copy multiple bytes, not a single byte. And also, the current implementation updates a hash table for every a single literal. However, as the updates obviously eat much processor time, skipping the updates dynamically improves compression speeds. I've done quick comparison tests with a Xeon 5670 processor. A sequence logs of Apache hadoop and TREC GOV2 web data were used as test sets. The former is highly compressible (low entroy) and the other is difficult to compress (high entropy). ******************* Compression Speed (Ratio) Apache hadoop logs: gzip 78.22MiB/s ( 5.31%) bzip2 3.34MiB/s ( 3.04%) lz4 939.45MiB/s ( 9.17%) pg_lzcompress(original) 37.80MiB/s (11.76%) pg_lzcompress(patch apaplied) 99.42MiB/s (14.19%) TREC GOV2 web data: gzip 21.22MiB/s (32.66%) bzip2 8.61MiB/s (27.86%) lz4 250.98MiB/s (49.82%) pg_lzcompress(original) 20.44MiB/s (50.09%) pg_lzcompress(patch apaplied) 48.67MiB/s (61.87%) ******************* Obviously, both the compression ratio and the speed in the current implementation are inferior to those in gzip. And, my patch loses gzip and bzip2 in view of compression ratios though, the compression speed overcomes those in gzip and bzip2. Anyway, the compression speed in lz4 is very fast, so in my opinion, there is a room to improve the current implementation in pg_lzcompress. regards, -- ---- Takeshi Yamamuro NTT Cyber Communications Laboratory Group Software Innovation Center (Open Source Software Center) Tel: +81-3-5860-5057 Fax: +81-3-5463-5490 Mail:yamamuro.takeshi@lab.ntt.co.jp
Attachment
On 7 January 2013 07:29, Takeshi Yamamuro <yamamuro.takeshi@lab.ntt.co.jp> wrote: > Anyway, the compression speed in lz4 is very fast, so in my > opinion, there is a room to improve the current implementation > in pg_lzcompress. So why don't we use LZ4? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 1/7/2013 1:10 AM, Simon Riggs wrote: > On 7 January 2013 07:29, Takeshi Yamamuro > <yamamuro.takeshi@lab.ntt.co.jp> wrote: > >> >Anyway, the compression speed in lz4 is very fast, so in my >> >opinion, there is a room to improve the current implementation >> >in pg_lzcompress. > So why don't we use LZ4? what will changing compression formats do for compatability? this is for the compressed data in pg_toast storage or something? will this break pg_upgrade style operations?
On 7 January 2013 09:19, John R Pierce <pierce@hogranch.com> wrote: > On 1/7/2013 1:10 AM, Simon Riggs wrote: >> >> On 7 January 2013 07:29, Takeshi Yamamuro >> <yamamuro.takeshi@lab.ntt.co.jp> wrote: >> >>> >Anyway, the compression speed in lz4 is very fast, so in my >>> >opinion, there is a room to improve the current implementation >>> >in pg_lzcompress. >> >> So why don't we use LZ4? > > what will changing compression formats do for compatability? > > this is for the compressed data in pg_toast storage or something? will this > break pg_upgrade style operations? Anything that changes on-disk format would need to consider how to do pg_upgrade. It's the major blocker in that area. For this, it would be possible to have a new format and old format coexist, but that will take more time to think through than we have for this release, so this is a nice idea for further investigation in 9.4. Thanks for raising that point. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-01-07 09:57:58 +0000, Simon Riggs wrote: > On 7 January 2013 09:19, John R Pierce <pierce@hogranch.com> wrote: > > On 1/7/2013 1:10 AM, Simon Riggs wrote: > >> > >> On 7 January 2013 07:29, Takeshi Yamamuro > >> <yamamuro.takeshi@lab.ntt.co.jp> wrote: > >> > >>> >Anyway, the compression speed in lz4 is very fast, so in my > >>> >opinion, there is a room to improve the current implementation > >>> >in pg_lzcompress. > >> > >> So why don't we use LZ4? > > > > what will changing compression formats do for compatability? > > > > this is for the compressed data in pg_toast storage or something? will this > > break pg_upgrade style operations? > > Anything that changes on-disk format would need to consider how to do > pg_upgrade. It's the major blocker in that area. > > For this, it would be possible to have a new format and old format > coexist, but that will take more time to think through than we have > for this release, so this is a nice idea for further investigation in > 9.4. Thanks for raising that point. I think there should be enough bits available in the toast pointer to indicate the type of compression. I seem to remember somebody even posting a patch to that effect? I agree that it's probably too late in the 9.3 cycle to start with this. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 1/7/2013 2:05 AM, Andres Freund wrote: > I think there should be enough bits available in the toast pointer to > indicate the type of compression. I seem to remember somebody even > posting a patch to that effect? > I agree that it's probably too late in the 9.3 cycle to start with this. so an upgraded database would have old toasted values in the old compression format, and new toasted values in the new format in an existing table? that's kind of ugly.
On 2013-01-07 02:21:26 -0800, John R Pierce wrote: > On 1/7/2013 2:05 AM, Andres Freund wrote: > >I think there should be enough bits available in the toast pointer to > >indicate the type of compression. I seem to remember somebody even > >posting a patch to that effect? > >I agree that it's probably too late in the 9.3 cycle to start with this. > > so an upgraded database would have old toasted values in the old compression > format, and new toasted values in the new format in an existing table? > that's kind of ugly. Well, ISTM thats just life. What you prefer? Converting all toast values during pg_upgrade kinda goes against the aim of quick upgrades. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 7, 2013 at 10:21 AM, John R Pierce <pierce@hogranch.com> wrote: > On 1/7/2013 2:05 AM, Andres Freund wrote: >> >> I think there should be enough bits available in the toast pointer to >> indicate the type of compression. I seem to remember somebody even >> posting a patch to that effect? >> I agree that it's probably too late in the 9.3 cycle to start with this. > > > so an upgraded database would have old toasted values in the old compression > format, and new toasted values in the new format in an existing table? > that's kind of ugly. I haven't looked at the patch. It's not obvious to me from the description that the output isn't backwards compatible. The way the LZ toast compression works the output is self-describing. There are many different outputs that would decompress to the same thing and the compressing code can choose how hard to look for earlier matches and when to just copy bytes wholesale but the decompression will work regardless. -- greg
On 7 January 2013 13:36, Greg Stark <stark@mit.edu> wrote: > On Mon, Jan 7, 2013 at 10:21 AM, John R Pierce <pierce@hogranch.com> wrote: >> On 1/7/2013 2:05 AM, Andres Freund wrote: >>> >>> I think there should be enough bits available in the toast pointer to >>> indicate the type of compression. I seem to remember somebody even >>> posting a patch to that effect? >>> I agree that it's probably too late in the 9.3 cycle to start with this. >> >> >> so an upgraded database would have old toasted values in the old compression >> format, and new toasted values in the new format in an existing table? >> that's kind of ugly. > > I haven't looked at the patch. It's not obvious to me from the > description that the output isn't backwards compatible. The way the LZ > toast compression works the output is self-describing. There are many > different outputs that would decompress to the same thing and the > compressing code can choose how hard to look for earlier matches and > when to just copy bytes wholesale but the decompression will work > regardless. Good point, and a great reason to use this patch rather than LZ4 for 9.3 We could even have tuning parameters for toast compression, as long as we keep the on disk format identical. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 07, 2013 at 09:10:31AM +0000, Simon Riggs wrote: > On 7 January 2013 07:29, Takeshi Yamamuro > <yamamuro.takeshi@lab.ntt.co.jp> wrote: > > > Anyway, the compression speed in lz4 is very fast, so in my > > opinion, there is a room to improve the current implementation > > in pg_lzcompress. > > So why don't we use LZ4? > +1 Regards, Ken
On Mon, Jan 07, 2013 at 01:36:33PM +0000, Greg Stark wrote: > On Mon, Jan 7, 2013 at 10:21 AM, John R Pierce <pierce@hogranch.com> wrote: > > On 1/7/2013 2:05 AM, Andres Freund wrote: > >> > >> I think there should be enough bits available in the toast pointer to > >> indicate the type of compression. I seem to remember somebody even > >> posting a patch to that effect? > >> I agree that it's probably too late in the 9.3 cycle to start with this. > > > > > > so an upgraded database would have old toasted values in the old compression > > format, and new toasted values in the new format in an existing table? > > that's kind of ugly. > > I haven't looked at the patch. It's not obvious to me from the > description that the output isn't backwards compatible. The way the LZ > toast compression works the output is self-describing. There are many > different outputs that would decompress to the same thing and the > compressing code can choose how hard to look for earlier matches and > when to just copy bytes wholesale but the decompression will work > regardless. > I think this comment refers to the lz4 option. I do agree that the patch that was posted to improve the current compression speed should be able to be implemented to allow the current results to be decompressed as well. Regards, Ken
Hi, It seems worth rereading the thread around http://archives.postgresql.org/message-id/CAAZKuFb59sABSa7gCG0vnVnGb-mJCUBBbrKiyPraNXHnis7KMw%40mail.gmail.com for people wanting to work on this. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Takeshi Yamamuro <yamamuro.takeshi@lab.ntt.co.jp> writes: > The attached is a patch to improve compression speeds with loss of > compression ratios in backend/utils/adt/pg_lzcompress.c. Why would that be a good tradeoff to make? Larger stored values require more I/O, which is likely to swamp any CPU savings in the compression step. Not to mention that a value once written may be read many times, so the extra I/O cost could be multiplied many times over later on. Another thing to keep in mind is that the compression area in general is a minefield of patents. We're fairly confident that pg_lzcompress as-is doesn't fall foul of any, but any significant change there would probably require more research. regards, tom lane
On Mon, Jan 7, 2013 at 10:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Takeshi Yamamuro <yamamuro.takeshi@lab.ntt.co.jp> writes: >> The attached is a patch to improve compression speeds with loss of >> compression ratios in backend/utils/adt/pg_lzcompress.c. > > Why would that be a good tradeoff to make? Larger stored values require > more I/O, which is likely to swamp any CPU savings in the compression > step. Not to mention that a value once written may be read many times, > so the extra I/O cost could be multiplied many times over later on. I disagree. pg compression is so awful it's almost never a net win. I turn it off. > Another thing to keep in mind is that the compression area in general > is a minefield of patents. We're fairly confident that pg_lzcompress > as-is doesn't fall foul of any, but any significant change there would > probably require more research. A minefield of *expired* patents. Fast lz based compression is used all over the place -- for example by the lucene. lz4. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Mon, Jan 7, 2013 at 10:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Takeshi Yamamuro <yamamuro.takeshi@lab.ntt.co.jp> writes: >>> The attached is a patch to improve compression speeds with loss of >>> compression ratios in backend/utils/adt/pg_lzcompress.c. >> Why would that be a good tradeoff to make? Larger stored values require >> more I/O, which is likely to swamp any CPU savings in the compression >> step. Not to mention that a value once written may be read many times, >> so the extra I/O cost could be multiplied many times over later on. > I disagree. pg compression is so awful it's almost never a net win. > I turn it off. One report doesn't make it useless, but even if it is so on your data, why would making it even less effective be a win? >> Another thing to keep in mind is that the compression area in general >> is a minefield of patents. We're fairly confident that pg_lzcompress >> as-is doesn't fall foul of any, but any significant change there would >> probably require more research. > A minefield of *expired* patents. Fast lz based compression is used > all over the place -- for example by the lucene. The patents that had to be dodged for original LZ compression are gone, true, but what's your evidence for saying that newer versions don't have newer patents? regards, tom lane
On Mon, Jan 7, 2013 at 11:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Why would that be a good tradeoff to make? Larger stored values require > more I/O, which is likely to swamp any CPU savings in the compression > step. Not to mention that a value once written may be read many times, > so the extra I/O cost could be multiplied many times over later on. I agree with this analysis, but I note that the test results show it actually improving things along both parameters. I'm not sure how general that result is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jan 7, 2013 at 11:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Why would that be a good tradeoff to make? Larger stored values require >> more I/O, which is likely to swamp any CPU savings in the compression >> step. Not to mention that a value once written may be read many times, >> so the extra I/O cost could be multiplied many times over later on. > I agree with this analysis, but I note that the test results show it > actually improving things along both parameters. Hm ... one of us is reading those results backwards, then. regards, tom lane
On Mon, Jan 7, 2013 at 2:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> On Mon, Jan 7, 2013 at 10:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Takeshi Yamamuro <yamamuro.takeshi@lab.ntt.co.jp> writes: >>>> The attached is a patch to improve compression speeds with loss of >>>> compression ratios in backend/utils/adt/pg_lzcompress.c. > >>> Why would that be a good tradeoff to make? Larger stored values require >>> more I/O, which is likely to swamp any CPU savings in the compression >>> step. Not to mention that a value once written may be read many times, >>> so the extra I/O cost could be multiplied many times over later on. > >> I disagree. pg compression is so awful it's almost never a net win. >> I turn it off. > > One report doesn't make it useless, but even if it is so on your data, > why would making it even less effective be a win? That's a fair point. I'm neutral on the OP's proposal -- it's just moving spots around the dog. If we didn't have better options, maybe offering options to tune what we have would be worth implementing... but by your standard ISTM we can't even do *that*. >>> Another thing to keep in mind is that the compression area in general >>> is a minefield of patents. We're fairly confident that pg_lzcompress >>> as-is doesn't fall foul of any, but any significant change there would >>> probably require more research. > >> A minefield of *expired* patents. Fast lz based compression is used >> all over the place -- for example by the lucene. > > The patents that had to be dodged for original LZ compression are gone, > true, but what's your evidence for saying that newer versions don't have > newer patents? That's impossible (at least for a non-attorney) to do because the patents are still flying (for example: http://www.google.com/patents/US7650040). That said, you've framed the debate so that any improvement to postgres compression requires an IP lawyer. That immediately raises some questions: *) why hold only compression type features in postgres to that standard? Patents get mentioned here and there in the context of other features in the archives but only compression seems to require a proven clean pedigree. Why don't we require a patent search for other interesting features? What evidence do *you* offer that lz4 violates any patents? *) why is postgres the only FOSS project that cares about patentability of say, lz4? (google 'lz4 patent') merlin
On 01/07/2013 04:19 PM, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Jan 7, 2013 at 11:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Why would that be a good tradeoff to make? Larger stored values require >>> more I/O, which is likely to swamp any CPU savings in the compression >>> step. Not to mention that a value once written may be read many times, >>> so the extra I/O cost could be multiplied many times over later on. >> I agree with this analysis, but I note that the test results show it >> actually improving things along both parameters. > Hm ... one of us is reading those results backwards, then. > > I just went back and looked. Unless I'm misreading it he has about a 2.5 times speed improvement but about a 20% worse compression result. What would be interesting would be to see if the knobs he's tweaked could be tweaked a bit more to give us substantial speedup without significant space degradation. cheers andrew
On Mon, Jan 7, 2013 at 4:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hm ... one of us is reading those results backwards, then. *looks* It's me. Sorry for the noise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, >>>> Why would that be a good tradeoff to make? Larger stored values require>>>> more I/O, which is likely to swamp any CPU savings in the compression>>>> step. Not to mention that a value oncewritten may be read many times,>>>> so the extra I/O cost could be multiplied many times over later on.>>> I agree withthis analysis, but I note that the test results show it>>> actually improving things along both parameters.>> Hm ...one of us is reading those results backwards, then. I think that it's a parameter-tuning issue. I added the two parameters, PGLZ_SKIP_SIZE and PGLZ_HASH_GAP, and set PGLZ_SKIP_SIZE=3 and PGLZ_HASH_GAP=8 for the quick tests. And also, I found that the performance in my patch was nearly equal to that in the current implementation when PGLZ_SKIP_SIZE=1 and PGLZ_HASH_GAP=1. Apart from my patch, what I care is that the current one might be much slow against I/O. For example, when compressing and writing large values, compressing data (20-40MiB/s) might be a dragger against writing data in disks (50-80MiB/s). Moreover, IMHO modern (and very fast) I/O subsystems such as SSD make a bigger issue in this case. Then, I think it's worth keeping discussions to improve compression stuffs for 9.4, or later. > Another thing to keep in mind is that the compression area in general> is a minefield of patents. We're fairly confidentthat pg_lzcompress> as-is doesn't fall foul of any, but any significant change there would> probably require moreresearch. Agree, and we know ... we need to have patent-free ideas to improve compression issues. For example, pluggable compression IF, or something. > I just went back and looked. Unless I'm misreading it he has about a 2.5> times speed improvement but about a 20% worsecompression result.>> What would be interesting would be to see if the knobs he's tweaked> could be tweaked a bit moreto give us substantial speedup without> significant space degradation. Yes, you're right, and these results highly depend on data sets though. regards, -- ---- Takeshi Yamamuro NTT Cyber Communications Laboratory Group Software Innovation Center (Open Source Software Center) Tel: +81-3-5860-5057 Fax: +81-3-5463-5490 Mail:yamamuro.takeshi@lab.ntt.co.jp
>> So why don't we use LZ4? >> > +1 Agree though, I think there're still patent issues there. regards, -- ---- Takeshi Yamamuro NTT Cyber Communications Laboratory Group Software Innovation Center (Open Source Software Center) Tel: +81-3-5860-5057 Fax: +81-3-5463-5490 Mail:yamamuro.takeshi@lab.ntt.co.jp
Hi, (2013/01/07 22:36), Greg Stark wrote: > On Mon, Jan 7, 2013 at 10:21 AM, John R Pierce<pierce@hogranch.com> wrote: >> On 1/7/2013 2:05 AM, Andres Freund wrote: >>> >>> I think there should be enough bits available in the toast pointer to >>> indicate the type of compression. I seem to remember somebody even >>> posting a patch to that effect? >>> I agree that it's probably too late in the 9.3 cycle to start with this. >> >> >> so an upgraded database would have old toasted values in the old compression >> format, and new toasted values in the new format in an existing table? >> that's kind of ugly. > > I haven't looked at the patch. It's not obvious to me from the > description that the output isn't backwards compatible. The way the LZ > toast compression works the output is self-describing. There are many > different outputs that would decompress to the same thing and the > compressing code can choose how hard to look for earlier matches and > when to just copy bytes wholesale but the decompression will work > regardless. My patch is not backwards compatible, so we need some features to switch these old and new disk formats. I think the discussion below is helpful in this use. That is, PGLZ_Header is used as this purpose. http://archives.postgresql.org/pgsql-hackers/2012-03/msg00971.php regards, -- ---- Takeshi Yamamuro NTT Cyber Communications Laboratory Group Software Innovation Center (Open Source Software Center) Tel: +81-3-5860-5057 Fax: +81-3-5463-5490 Mail:yamamuro.takeshi@lab.ntt.co.jp
On 01/08/2013 10:19 AM, Takeshi Yamamuro wrote: > Hi, > > (2013/01/07 22:36), Greg Stark wrote: >> On Mon, Jan 7, 2013 at 10:21 AM, John R Pierce<pierce@hogranch.com> >> wrote: >>> On 1/7/2013 2:05 AM, Andres Freund wrote: >>>> >>>> I think there should be enough bits available in the toast pointer to >>>> indicate the type of compression. I seem to remember somebody even >>>> posting a patch to that effect? >>>> I agree that it's probably too late in the 9.3 cycle to start with >>>> this. >>> >>> >>> so an upgraded database would have old toasted values in the old >>> compression >>> format, and new toasted values in the new format in an existing table? >>> that's kind of ugly. >> >> I haven't looked at the patch. It's not obvious to me from the >> description that the output isn't backwards compatible. The way the LZ >> toast compression works the output is self-describing. There are many >> different outputs that would decompress to the same thing and the >> compressing code can choose how hard to look for earlier matches and >> when to just copy bytes wholesale but the decompression will work >> regardless. > > My patch is not backwards compatible, so we need some features > to switch these old and new disk formats. Is it a feature of our compressed format that it is hard to make this backwards compatible. Only decompression should work anyway as we have not supported physical compatibility in the other direction in our other tools. That is, we don't have pg_downgrade :) Hannu > > I think the discussion below is helpful in this use. > That is, PGLZ_Header is used as this purpose. > http://archives.postgresql.org/pgsql-hackers/2012-03/msg00971.php > > regards,
On Tue, Jan 8, 2013 at 4:04 AM, Takeshi Yamamuro <yamamuro.takeshi@lab.ntt.co.jp> wrote: > Apart from my patch, what I care is that the current one might > be much slow against I/O. For example, when compressing > and writing large values, compressing data (20-40MiB/s) might be > a dragger against writing data in disks (50-80MiB/s). Moreover, > IMHO modern (and very fast) I/O subsystems such as SSD make a > bigger issue in this case. What about just turning compression off? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jan 8, 2013 at 10:20 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jan 8, 2013 at 4:04 AM, Takeshi Yamamuro > <yamamuro.takeshi@lab.ntt.co.jp> wrote: >> Apart from my patch, what I care is that the current one might >> be much slow against I/O. For example, when compressing >> and writing large values, compressing data (20-40MiB/s) might be >> a dragger against writing data in disks (50-80MiB/s). Moreover, >> IMHO modern (and very fast) I/O subsystems such as SSD make a >> bigger issue in this case. > > What about just turning compression off? I've been relying on compression for some big serialized blob fields for some time now. I bet I'm not alone, lots of people save serialized data to text fields. So rather than removing it, I'd just change the default to off (if that was the decision). However, it might be best to evaluate some of the modern fast compression schemes like snappy/lz4 (250MB/s per core sounds pretty good), and implement pluggable compression schemes instead. Snappy wasn't designed for nothing, it was most likely because it was necessary. Cassandra (just to name a system I'm familiar with) started without compression, and then it was deemed necessary to the point they invested considerable time into it. I've always found the fact that pg does compression of toast tables quite forward-thinking, and I'd say the feature has to remain there, extended and modernized, maybe off by default, but there.
On Tue, Jan 8, 2013 at 9:51 AM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Tue, Jan 8, 2013 at 10:20 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Jan 8, 2013 at 4:04 AM, Takeshi Yamamuro >> <yamamuro.takeshi@lab.ntt.co.jp> wrote: >>> Apart from my patch, what I care is that the current one might >>> be much slow against I/O. For example, when compressing >>> and writing large values, compressing data (20-40MiB/s) might be >>> a dragger against writing data in disks (50-80MiB/s). Moreover, >>> IMHO modern (and very fast) I/O subsystems such as SSD make a >>> bigger issue in this case. >> >> What about just turning compression off? > > I've been relying on compression for some big serialized blob fields > for some time now. I bet I'm not alone, lots of people save serialized > data to text fields. So rather than removing it, I'd just change the > default to off (if that was the decision). > > However, it might be best to evaluate some of the modern fast > compression schemes like snappy/lz4 (250MB/s per core sounds pretty > good), and implement pluggable compression schemes instead. Snappy > wasn't designed for nothing, it was most likely because it was > necessary. Cassandra (just to name a system I'm familiar with) started > without compression, and then it was deemed necessary to the point > they invested considerable time into it. I've always found the fact > that pg does compression of toast tables quite forward-thinking, and > I'd say the feature has to remain there, extended and modernized, > maybe off by default, but there. I'm not offering any opinion on whether we should have compression as a general matter. Maybe yes, maybe no, but my question was about the OP's use case. If he's willing to accept less efficient compression in order to get faster compression, perhaps he should just not use compression at all. Personally, my biggest gripe about the way we do compression is that it's easy to detoast the same object lots of times. More generally, our in-memory representation of user data values is pretty much a mirror of our on-disk representation, even when that leads to excess conversions. Beyond what we do for TOAST, there's stuff like numeric where not only toast but then post-process the results into yet another internal form before performing any calculations - and then of course we have to convert back before returning from the calculation functions. And for things like XML, JSON, and hstore we have to repeatedly parse the string, every time someone wants to do anything to do. Of course, solving this is a very hard problem, and not solving it isn't a reason not to have more compression options - but more compression options will not solve the problems that I personally have in this area, by and large. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Personally, my biggest gripe about the way we do compression is that
it's easy to detoast the same object lots of times. More generally,
our in-memory representation of user data values is pretty much a
mirror of our on-disk representation, even when that leads to excess
conversions. Beyond what we do for TOAST, there's stuff like numeric
where not only toast but then post-process the results into yet
another internal form before performing any calculations - and then of
course we have to convert back before returning from the calculation
functions. And for things like XML, JSON, and hstore we have to
repeatedly parse the string, every time someone wants to do anything
to do. Of course, solving this is a very hard problem, and not
solving it isn't a reason not to have more compression options - but
more compression options will not solve the problems that I personally
have in this area, by and large.
At the risk of saying something totally obvious and stupid as I haven't looked at the actual representation this sounds like a memoisation problem. In ocaml terms:
type 'a rep =
| On_disk_rep of Byte_sequence
| In_memory_rep of 'a
type 'a t = 'a rep ref
let get_mem_rep t converter =
match !t with
| On_disk_rep seq ->
let res = converter seq in
t := In_memory_rep res;
res
| In_memory_rep x -> x
;;
... (if you need the other direction that it's straightforward too)...
Translating this into c is relatively straightforward if you have the luxury of a fresh start
and don't have to be super efficient:
typedef enum { ON_DISK_REP, IN_MEMORY_REP } rep_kind_t;
type t = {
rep_kind_t rep_kind;
union {
char *on_disk;
void *in_memory;
} rep;
};
void *get_mem_rep(t *t, void * (*converter)(char *)) {
void *res;
switch (t->rep_kind) {
case ON_DISK_REP:
res = converter(t->on_disk);
t->rep.in_memory = res;
t->rep_kind = IN_MEMORY_REP;
return res;
case IN_MEMORY_REP;
return t->rep.in_memory;
}
}
Now of course fitting this into the existing types and ensuring that there is neither too early freeing of memory nor memory leaks or other bugs is probably a nightmare and why you said that this is a hard problem.
Cheers,
Bene
type 'a rep =
| On_disk_rep of Byte_sequence
| In_memory_rep of 'a
type 'a t = 'a rep ref
let get_mem_rep t converter =
match !t with
| On_disk_rep seq ->
let res = converter seq in
t := In_memory_rep res;
res
| In_memory_rep x -> x
;;
... (if you need the other direction that it's straightforward too)...
Translating this into c is relatively straightforward if you have the luxury of a fresh start
and don't have to be super efficient:
typedef enum { ON_DISK_REP, IN_MEMORY_REP } rep_kind_t;
type t = {
rep_kind_t rep_kind;
union {
char *on_disk;
void *in_memory;
} rep;
};
void *get_mem_rep(t *t, void * (*converter)(char *)) {
void *res;
switch (t->rep_kind) {
case ON_DISK_REP:
res = converter(t->on_disk);
t->rep.in_memory = res;
t->rep_kind = IN_MEMORY_REP;
return res;
case IN_MEMORY_REP;
return t->rep.in_memory;
}
}
Now of course fitting this into the existing types and ensuring that there is neither too early freeing of memory nor memory leaks or other bugs is probably a nightmare and why you said that this is a hard problem.
Cheers,
Bene