Thread: Is a plan for lmza commpression in pg_dump
Hi.<br /><br /> Is it in todo or in a plan to implement lmza commpression in pg_dump backups?<br /><br /> Thanks<br /><br/> Stano<br /><div class="moz-signature">-- <br /><style></style></div><div class="Section1"><table border="0" cellpadding="0"cellspacing="0" class="MsoNormalTable" style="border: medium none ; margin-left: 3.5pt; border-collapse: collapse;"><tbody><trheight="42" style="height: 35.25pt;"><td colspan="2" height="42" style="border: medium none ; padding:0in 3.5pt 1em; width: 369pt;" valign="top" width="492"><hr align="center" size="2" width="100%" /><p class="MsoNormal"><imgalt="Space Systems" height="33" id="_x0000_i1026" src="cid:part1.09010302.04090901@spacesystems.sk"width="140" /></td></tr><tr style="height: 0.5in;"><td style="border: mediumnone ; padding: 0in 3.5pt; width: 120pt;" width="150"><p class="MsoPlainText"><b style=""><font color="gray" face="Arial"size="1"><span style="font-size: 8pt; font-family: Arial; color: gray; font-weight: bold;">Mgr. Stano LACKO</span></font></b><pclass="MsoPlainText"><font color="gray" face="Arial" size="1"><span style="font-size: 8pt; font-family:Arial; color: gray;">mobil: +421 908 175 753</span></font><p class="MsoPlainText"><font color="gray" face="Arial"size="1"><span style="font-size: 8pt; font-family: Arial; color: gray;">fax.: +421 2 555 72 676</span></font><pclass="MsoPlainText"><font color="gray" face="Arial" size="1"><span nowrap="nowrap" style="font-size:8pt; font-family: Arial; color: gray;">e-mail: <a href="mailto:lacko@spacesystems.sk"><font color="gray"><spanstyle="color: gray;">lacko</span></font><font color="gray"><span lang="EN-US" style="color: gray;">@<spanclass="SpellE">spacesystems.sk</span></span></font></a></span></font><font color="gray" face="Arial" size="1"><spanlang="EN-US" style="font-size: 9pt; font-family: Arial; color: gray;"></span></font></td><td aheight="48" style="border:medium none ; padding: 0in 3.5pt; width: 2.75in; height: 0.5in;" valign="top" width="264"><p class="MsoPlainText"><spanclass="SpellE"><font color="gray" face="Arial" size="1"><span lang="EN-US" style="font-size: 8pt;font-family: Arial; color: gray;"><b>Space Systems, s.r.o.</b></span></font></span><p class="MsoPlainText"><span class="SpellE"><fontcolor="gray" face="Arial" size="1"><span lang="EN-US" style="font-size: 8pt; font-family: Arial; color:gray;">Zámocká 30</span></font></span><p class="MsoPlainText"><span class="SpellE"><font color="gray" face="Arial"size="1"><span lang="EN-US" style="font-size: 8pt; font-family: Arial; color: gray;">811 01 Bratislava</span></font></span><pclass="MsoPlainText"><span class="SpellE"><font color="gray" face="Arial" size="1"><spanlang="EN-US" style="font-size: 8pt; font-family: Arial; color: gray;"><a href="http://www.spacesystems.sk/"style="color: rgb(153, 153, 153);">www.spacesystems.sk</a></span></font></span></td></tr></tbody></table><pclass="MsoPlainText"><font face="Courier New"size="1"><span style="font-size: 8pt;"> </span></font></div>
Stanislav Lacko wrote: > Hi. > > Is it in todo or in a plan to implement lmza commpression in pg_dump > backups? Nope, never heard anything about it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Bruce Momjian > Sent: Wednesday, February 04, 2009 3:28 PM > To: Stanislav Lacko > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Is a plan for lmza commpression in pg_dump > > Stanislav Lacko wrote: > > Hi. > > > > Is it in todo or in a plan to implement lmza commpression in pg_dump > > backups? > > Nope, never heard anything about it. In case the PG group does get interested in insertion of compression algorithms into PostgreSQL {it seems it could be useful in many different areas}, the 7zip format seems to be excellent in a number of ways. Here is an interesting benchmark that shows 7z format winning a large area of the "optimal compressors" performance graph: http://users.elis.ugent.be/~wheirman/compression/ The LZMA SDK is granted to the public domain: http://www.7-zip.org/sdk.html Unfortunately LZOP (which wins the top half of the "optimal compressors" graph where the compression and decompression speed is more important than amount of compression) does not have a liberal license. http://www.lzop.org/
Dann Corbit wrote: > > The LZMA SDK is granted to the public domain: > http://www.7-zip.org/sdk.html > I played with this but found the SDK extremely confusing and flat out horrible. One personal dislike was the unnecessaryuse of C++; although it was the horrible API that turned me off. I'm not even sure if I ever got a test program working. LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy API with many variants; my fav is LZO1X-1(15). Its known for its compresison and decompresison speeds ... its blazing fast. zlib typically gets 5-8% more compression. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
On Wed, Feb 04, 2009 at 10:23:17PM -0500, Andrew Chernow wrote: > Dann Corbit wrote: > > > >The LZMA SDK is granted to the public domain: > >http://www.7-zip.org/sdk.html > > > > I played with this but found the SDK extremely confusing and flat out > horrible. One personal dislike was the unnecessary use of C++; although it > was the horrible API that turned me off. I'm not even sure if I ever got a > test program working. > > LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy > API with many variants; my fav is LZO1X-1(15). Its known for its > compresison and decompresison speeds ... its blazing fast. zlib typically > gets 5-8% more compression. LZO rocks. I wonder if the lzo developer would consider a license exception so that postgresql could use it? What would we need? -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
daveg wrote: > On Wed, Feb 04, 2009 at 10:23:17PM -0500, Andrew Chernow wrote: > >> Dann Corbit wrote: >> >>> The LZMA SDK is granted to the public domain: >>> http://www.7-zip.org/sdk.html >>> >>> >> I played with this but found the SDK extremely confusing and flat out >> horrible. One personal dislike was the unnecessary use of C++; although it >> was the horrible API that turned me off. I'm not even sure if I ever got a >> test program working. >> >> LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy >> API with many variants; my fav is LZO1X-1(15). Its known for its >> compresison and decompresison speeds ... its blazing fast. zlib typically >> gets 5-8% more compression. >> > > LZO rocks. I wonder if the lzo developer would consider a license exception > so that postgresql could use it? What would we need? > > > Probably a BSD license or a clean room implementation which we could BSD license. cheers andrew
daveg wrote: > On Wed, Feb 04, 2009 at 10:23:17PM -0500, Andrew Chernow wrote: > > Dann Corbit wrote: > > > > > >The LZMA SDK is granted to the public domain: > > >http://www.7-zip.org/sdk.html > > > > > > > I played with this but found the SDK extremely confusing and flat out > > horrible. One personal dislike was the unnecessary use of C++; although it > > was the horrible API that turned me off. I'm not even sure if I ever got a > > test program working. > > > > LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy > > API with many variants; my fav is LZO1X-1(15). Its known for its > > compresison and decompresison speeds ... its blazing fast. zlib typically > > gets 5-8% more compression. > > LZO rocks. I wonder if the lzo developer would consider a license exception > so that postgresql could use it? What would we need? The chance of us using anything but one zlib is near zero so please do not persue this; this discussion comes up much too often. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Sat, Feb 07, 2009 at 02:47:05PM -0500, Bruce Momjian wrote: > daveg wrote: > > On Wed, Feb 04, 2009 at 10:23:17PM -0500, Andrew Chernow wrote: > > > Dann Corbit wrote: > > > > > > > >The LZMA SDK is granted to the public domain: > > > >http://www.7-zip.org/sdk.html > > > > > > > > > > I played with this but found the SDK extremely confusing and flat out > > > horrible. One personal dislike was the unnecessary use of C++; although it > > > was the horrible API that turned me off. I'm not even sure if I ever got a > > > test program working. > > > > > > LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy > > > API with many variants; my fav is LZO1X-1(15). Its known for its > > > compresison and decompresison speeds ... its blazing fast. zlib typically > > > gets 5-8% more compression. > > > > LZO rocks. I wonder if the lzo developer would consider a license exception > > so that postgresql could use it? What would we need? > > The chance of us using anything but one zlib is near zero so please do > not persue this; this discussion comes up much too often. That this comes up "much to often" suggests that there is more than near zero interest. Why can only one compression library can be considered? We use multiple readline implementations, for better or worse. I think the context here is for pg_dump only and in that context a faster compression library makes a lot of sense. I'd be happy to prepare a patch if the license issue can be accomodated. Hence my question, what sort of licence accomodation would we need to be able to use this library? -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
On 7 Feb 2009, at 21:08, daveg wrote: >> > > That this comes up "much to often" suggests that there is more than > near > zero interest. Why can only one compression library can be > considered? > We use multiple readline implementations, for better or worse. I don't see anything wrong with using standard unix pipes... and do it in truly unix and scalable way !
> That this comes up "much to often" suggests that there is more than near > zero interest. Why can only one compression library can be considered? > We use multiple readline implementations, for better or worse. > > I think the context here is for pg_dump only and in that context a faster > compression library makes a lot of sense. I'd be happy to prepare a patch > if the license issue can be accomodated. Hence my question, what sort of > licence accomodation would we need to be able to use this library? Based on previous discussions, I suspect that the answer here is "complete relicensing as BSD". I think pursuing any sort of licensing exception is completely futile as there will still be restrictions that will be unacceptable to many in the community. But if someone had an actual BSD-LICENSED compression library that was better than what we have now, I'm not sure why Bruce (or anyone) should be opposed to incorporating it. It's just that all of the proposals that come up for this sort of thing aren't that. ...Robert
Robert Haas wrote: > > That this comes up "much to often" suggests that there is more than near > > zero interest. Why can only one compression library can be considered? > > We use multiple readline implementations, for better or worse. > > > > I think the context here is for pg_dump only and in that context a faster > > compression library makes a lot of sense. I'd be happy to prepare a patch > > if the license issue can be accomodated. Hence my question, what sort of > > licence accomodation would we need to be able to use this library? > > Based on previous discussions, I suspect that the answer here is > "complete relicensing as BSD". I think pursuing any sort of licensing > exception is completely futile as there will still be restrictions > that will be unacceptable to many in the community. > > But if someone had an actual BSD-LICENSED compression library that was > better than what we have now, I'm not sure why Bruce (or anyone) > should be opposed to incorporating it. It's just that all of the > proposals that come up for this sort of thing aren't that. You can be I would oppose it. It is not efficient for us to support every compression-of-the-month project that comes along. If something was BSD, well tested, and clearly superior, we might consider it, but I have seen nothing like that for 10 years and I doubt I will see something the next 5. I am thinking we need to add this to the "Features we do not want" section of our todo list. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Feb 7, 2009, at 4:53 PM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >>> That this comes up "much to often" suggests that there is more >>> than near >>> zero interest. Why can only one compression library can be >>> considered? >>> We use multiple readline implementations, for better or worse. >>> >>> I think the context here is for pg_dump only and in that context a >>> faster >>> compression library makes a lot of sense. I'd be happy to prepare >>> a patch >>> if the license issue can be accomodated. Hence my question, what >>> sort of >>> licence accomodation would we need to be able to use this library? >> >> Based on previous discussions, I suspect that the answer here is >> "complete relicensing as BSD". I think pursuing any sort of >> licensing >> exception is completely futile as there will still be restrictions >> that will be unacceptable to many in the community. >> >> But if someone had an actual BSD-LICENSED compression library that >> was >> better than what we have now, I'm not sure why Bruce (or anyone) >> should be opposed to incorporating it. It's just that all of the >> proposals that come up for this sort of thing aren't that. > > You can be I would oppose it. It is not efficient for us to support > every compression-of-the-month project that comes along. If something > was BSD, well tested, and clearly superior, we might consider it, > but I Well that's pretty much what I said. > have seen nothing like that for 10 years and I doubt I will see > something the next 5. I am thinking I am doubtful too. > we need to add this to the > "Features we do not want" section of our todo list. "Proprietary compression algorithms, even with Postgresql-specific license exceptions"? ...Robert
Robert Haas wrote: > > have seen nothing like that for 10 years and I doubt I will see > > something the next 5. I am thinking > > I am doubtful too. > > > we need to add this to the > > "Features we do not want" section of our todo list. > > "Proprietary compression algorithms, even with Postgresql-specific > license exceptions"? Yep. Does it make sense to make our license more complex to get 1% percent better compression in certain cases? Probably not. Also consider the code maintenance, patents, larger tarball, bugs, etc. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Sat, Feb 07, 2009 at 08:49:29PM -0500, Robert Haas wrote: > On Feb 7, 2009, at 4:53 PM, Bruce Momjian <bruce@momjian.us> wrote: > >> we need to add this to the "Features we do not want" section of our >> todo list. > > "Proprietary compression algorithms, even with Postgresql-specific > license exceptions"? Considering that the entire project ships with a BSD license, which very specifically allows use of all or any tiniest part of it with (skipping some legalese) two restrictions: mention PGDG in the copyright list, and don't sue us no matter what happens, any "Postgresql-specific license exceptions" are equivalent to "that algorithm is no longer proprietary" because any project could simply use PostgreSQL's version and have done. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Sat, Feb 07, 2009 at 08:49:29PM -0500, Robert Haas wrote: > "Proprietary compression algorithms, even with Postgresql-specific > license exceptions"? To be fair, lzo is GPL, which is a stretch to consider proprietary. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
On Sat, Feb 07, 2009 at 08:31:23PM -0800, David Fetter wrote: > Considering that the entire project ships with a BSD license, which > very specifically allows use of all or any tiniest part of it with > (skipping some legalese) two restrictions: mention PGDG in the > copyright list, and don't sue us no matter what happens, any > "Postgresql-specific license exceptions" are equivalent to "that > algorithm is no longer proprietary" because any project could simply > use PostgreSQL's version and have done. Why don't we just add an option to pg_dump --use-compress-program, just like tar and then people can use their "compression algorithm of the week" and we don't need to care about the licence or anything. It's not like the case of TOAST where it actually needs to be builtin. Tar doesn't have any compression builtin, yet you don't see many uncompressed tar files... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
On 8 Feb 2009, at 02:49, Robert Haas <robertmhaas@gmail.com> wrote: > On Feb 7, 2009, at 4:53 PM, Bruce Momjian <bruce@momjian.us> wrote: >>>> >>> >>> >> >> we need to add this to the >> "Features we do not want" section of our todo list. > > "Proprietary compression algorithms, even with Postgresql-specific > license exceptions"? Now that I would agree about. We would have to explain that we're bsd licenced *because* we want people to be able to reuse our code outside postgres including commercial projects
> > Why don't we just add an option to pg_dump --use-compress-program, just > like tar and then people can use their "compression algorithm of the > week" and we don't need to care about the licence or anything. Can't this be done already? pg_dump -Z 0 | compression_binary >mydump If -Z is unspecified, I think it won't compress? Maybe you can just drop the -Z. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
Martijn van Oosterhout wrote: > > Why don't we just add an option to pg_dump --use-compress-program, just > like tar and then people can use their "compression algorithm of the > week" and we don't need to care about the licence or anything. > > It's not like the case of TOAST where it actually needs to be builtin. > Tar doesn't have any compression builtin, yet you don't see many > uncompressed tar files... > > > tar compresses/decompresses the whole archive via a single pipe. pg_dump compresses individual data members. If the compression isn't builtin it will make life much more difficult, and probably make parallel restore as well as some other operations well nigh impossible. cheers andrew
daveg wrote: > I think the context here is for pg_dump only and in that context a faster > compression library makes a lot of sense. I'd be happy to prepare a patch > if the license issue can be accomodated. Some kind of performance data (space and time) would be required to support any change in this area. Notice that the thread originally called for lzma support, which is completely at the opposite end of the spectrum of compression algorithms in terms of space and time, compared to lzo. So it's not really clear what the requirements are in the first place.
Peter Eisentraut wrote: > > Notice that the thread originally called for lzma support, which is > completely at the opposite end of the spectrum of compression algorithms > in terms of space and time, compared to lzo. So it's not really clear > what the requirements are in the first place. > > Instead of trying to figure out the needs/wants of a DBA, a general purpose solution, it might be better to figure out how to make the compression choice user-driven. Maybe the requirement should be to make this the user's decision; pipe'n the output to the compression of choice seems to be the simplest approach. There are cases the highest compression is desired even if it takes forever, and cases for just the opposite. Not sure why this has to be builtin or why it much use zlib, other than this is the current method. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/