Thread: Is a plan for lmza commpression in pg_dump

Is a plan for lmza commpression in pg_dump

From
Stanislav Lacko
Date:
Hi.<br /><br /> Is it in todo or in a plan to implement lmza commpression in pg_dump backups?<br /><br /> Thanks<br
/><br/> Stano<br /><div class="moz-signature">-- <br /><style></style></div><div class="Section1"><table border="0"
cellpadding="0"cellspacing="0" class="MsoNormalTable" style="border: medium none ; margin-left: 3.5pt; border-collapse:
collapse;"><tbody><trheight="42" style="height: 35.25pt;"><td colspan="2" height="42" style="border: medium none ;
padding:0in 3.5pt 1em; width: 369pt;" valign="top" width="492"><hr align="center" size="2" width="100%" /><p
class="MsoNormal"><imgalt="Space Systems" height="33" id="_x0000_i1026"
src="cid:part1.09010302.04090901@spacesystems.sk"width="140" /></td></tr><tr style="height: 0.5in;"><td style="border:
mediumnone ; padding: 0in 3.5pt; width: 120pt;" width="150"><p class="MsoPlainText"><b style=""><font color="gray"
face="Arial"size="1"><span style="font-size: 8pt; font-family: Arial; color: gray; font-weight: bold;">Mgr. Stano
LACKO</span></font></b><pclass="MsoPlainText"><font color="gray" face="Arial" size="1"><span style="font-size: 8pt;
font-family:Arial; color: gray;">mobil: +421 908 175 753</span></font><p class="MsoPlainText"><font color="gray"
face="Arial"size="1"><span style="font-size: 8pt; font-family: Arial; color: gray;">fax.:
+421 2 555 72 676</span></font><pclass="MsoPlainText"><font color="gray" face="Arial" size="1"><span nowrap="nowrap"
style="font-size:8pt; font-family: Arial; color: gray;">e-mail: <a href="mailto:lacko@spacesystems.sk"><font
color="gray"><spanstyle="color: gray;">lacko</span></font><font color="gray"><span lang="EN-US" style="color:
gray;">@<spanclass="SpellE">spacesystems.sk</span></span></font></a></span></font><font color="gray" face="Arial"
size="1"><spanlang="EN-US" style="font-size: 9pt; font-family: Arial; color: gray;"></span></font></td><td aheight="48"
style="border:medium none ; padding: 0in 3.5pt; width: 2.75in; height: 0.5in;" valign="top" width="264"><p
class="MsoPlainText"><spanclass="SpellE"><font color="gray" face="Arial" size="1"><span lang="EN-US" style="font-size:
8pt;font-family: Arial; color: gray;"><b>Space Systems, s.r.o.</b></span></font></span><p class="MsoPlainText"><span
class="SpellE"><fontcolor="gray" face="Arial" size="1"><span lang="EN-US" style="font-size: 8pt; font-family: Arial;
color:gray;">Zámocká 30</span></font></span><p class="MsoPlainText"><span class="SpellE"><font color="gray"
face="Arial"size="1"><span lang="EN-US" style="font-size: 8pt; font-family: Arial; color: gray;">811 01
Bratislava</span></font></span><pclass="MsoPlainText"><span class="SpellE"><font color="gray" face="Arial"
size="1"><spanlang="EN-US" style="font-size: 8pt; font-family: Arial; color: gray;"><a
href="http://www.spacesystems.sk/"style="color: rgb(153, 153,
153);">www.spacesystems.sk</a></span></font></span></td></tr></tbody></table><pclass="MsoPlainText"><font face="Courier
New"size="1"><span style="font-size: 8pt;"> </span></font></div> 

Re: Is a plan for lmza commpression in pg_dump

From
Bruce Momjian
Date:
Stanislav Lacko wrote:
> Hi.
> 
> Is it in todo or in a plan to implement lmza commpression in pg_dump 
> backups?

Nope, never heard anything about it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Is a plan for lmza commpression in pg_dump

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Bruce Momjian
> Sent: Wednesday, February 04, 2009 3:28 PM
> To: Stanislav Lacko
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Is a plan for lmza commpression in pg_dump
>
> Stanislav Lacko wrote:
> > Hi.
> >
> > Is it in todo or in a plan to implement lmza commpression in pg_dump
> > backups?
>
> Nope, never heard anything about it.

In case the PG group does get interested in insertion of compression
algorithms into PostgreSQL {it seems it could be useful in many
different areas}, the 7zip format seems to be excellent in a number of
ways.

Here is an interesting benchmark that shows 7z format winning a large
area of the "optimal compressors" performance graph:
http://users.elis.ugent.be/~wheirman/compression/

The LZMA SDK is granted to the public domain:
http://www.7-zip.org/sdk.html

Unfortunately LZOP (which wins the top half of the "optimal compressors"
graph where the compression and decompression speed is more important
than amount of compression) does not have a liberal license.
http://www.lzop.org/



Re: Is a plan for lmza commpression in pg_dump

From
Andrew Chernow
Date:
Dann Corbit wrote:
> 
> The LZMA SDK is granted to the public domain:
> http://www.7-zip.org/sdk.html
> 

I played with this but found the SDK extremely confusing and flat out horrible.  One personal dislike was the
unnecessaryuse of C++; although it was the 
 
horrible API that turned me off.  I'm not even sure if I ever got a test program 
working.

LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy API 
with many variants; my fav is LZO1X-1(15).  Its known for its compresison and 
decompresison speeds ... its blazing fast.  zlib typically gets 5-8% more 
compression.

-- 
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


Re: Is a plan for lmza commpression in pg_dump

From
daveg
Date:
On Wed, Feb 04, 2009 at 10:23:17PM -0500, Andrew Chernow wrote:
> Dann Corbit wrote:
> >
> >The LZMA SDK is granted to the public domain:
> >http://www.7-zip.org/sdk.html
> >
> 
> I played with this but found the SDK extremely confusing and flat out 
> horrible. One personal dislike was the unnecessary use of C++; although it 
>  was the horrible API that turned me off.  I'm not even sure if I ever got a 
> test program working.
> 
> LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy 
> API with many variants; my fav is LZO1X-1(15).  Its known for its 
> compresison and decompresison speeds ... its blazing fast.  zlib typically 
> gets 5-8% more compression.

LZO rocks. I wonder if the lzo developer would consider a license exception
so that postgresql could use it?  What would we need?

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.


Re: Is a plan for lmza commpression in pg_dump

From
Andrew Dunstan
Date:

daveg wrote:
> On Wed, Feb 04, 2009 at 10:23:17PM -0500, Andrew Chernow wrote:
>   
>> Dann Corbit wrote:
>>     
>>> The LZMA SDK is granted to the public domain:
>>> http://www.7-zip.org/sdk.html
>>>
>>>       
>> I played with this but found the SDK extremely confusing and flat out 
>> horrible. One personal dislike was the unnecessary use of C++; although it 
>>  was the horrible API that turned me off.  I'm not even sure if I ever got a 
>> test program working.
>>
>> LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy 
>> API with many variants; my fav is LZO1X-1(15).  Its known for its 
>> compresison and decompresison speeds ... its blazing fast.  zlib typically 
>> gets 5-8% more compression.
>>     
>
> LZO rocks. I wonder if the lzo developer would consider a license exception
> so that postgresql could use it?  What would we need?
>
>
>   

Probably a BSD license or a clean room implementation which we could BSD 
license.

cheers

andrew


Re: Is a plan for lmza commpression in pg_dump

From
Bruce Momjian
Date:
daveg wrote:
> On Wed, Feb 04, 2009 at 10:23:17PM -0500, Andrew Chernow wrote:
> > Dann Corbit wrote:
> > >
> > >The LZMA SDK is granted to the public domain:
> > >http://www.7-zip.org/sdk.html
> > >
> > 
> > I played with this but found the SDK extremely confusing and flat out 
> > horrible. One personal dislike was the unnecessary use of C++; although it 
> >  was the horrible API that turned me off.  I'm not even sure if I ever got a 
> > test program working.
> > 
> > LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy 
> > API with many variants; my fav is LZO1X-1(15).  Its known for its 
> > compresison and decompresison speeds ... its blazing fast.  zlib typically 
> > gets 5-8% more compression.
> 
> LZO rocks. I wonder if the lzo developer would consider a license exception
> so that postgresql could use it?  What would we need?

The chance of us using anything but one zlib is near zero so please do
not persue this;  this discussion comes up much too often.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Is a plan for lmza commpression in pg_dump

From
daveg
Date:
On Sat, Feb 07, 2009 at 02:47:05PM -0500, Bruce Momjian wrote:
> daveg wrote:
> > On Wed, Feb 04, 2009 at 10:23:17PM -0500, Andrew Chernow wrote:
> > > Dann Corbit wrote:
> > > >
> > > >The LZMA SDK is granted to the public domain:
> > > >http://www.7-zip.org/sdk.html
> > > >
> > > 
> > > I played with this but found the SDK extremely confusing and flat out 
> > > horrible. One personal dislike was the unnecessary use of C++; although it 
> > >  was the horrible API that turned me off.  I'm not even sure if I ever got a 
> > > test program working.
> > > 
> > > LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy 
> > > API with many variants; my fav is LZO1X-1(15).  Its known for its 
> > > compresison and decompresison speeds ... its blazing fast.  zlib typically 
> > > gets 5-8% more compression.
> > 
> > LZO rocks. I wonder if the lzo developer would consider a license exception
> > so that postgresql could use it?  What would we need?
> 
> The chance of us using anything but one zlib is near zero so please do
> not persue this;  this discussion comes up much too often.

That this comes up "much to often" suggests that there is more than near
zero interest.  Why can only one compression library can be considered?
We use multiple readline implementations, for better or worse.

I think the context here is for pg_dump only and in that context a faster
compression library makes a lot of sense. I'd be happy to prepare a patch
if the license issue can be accomodated. Hence my question, what sort of
licence accomodation would we need to be able to use this library?

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.


Re: Is a plan for lmza commpression in pg_dump

From
Grzegorz Jaskiewicz
Date:
On 7 Feb 2009, at 21:08, daveg wrote:
>>
>
> That this comes up "much to often" suggests that there is more than  
> near
> zero interest.  Why can only one compression library can be  
> considered?
> We use multiple readline implementations, for better or worse.

I don't see anything wrong with using standard unix pipes... and do it  
in truly unix and scalable way !



Re: Is a plan for lmza commpression in pg_dump

From
Robert Haas
Date:
> That this comes up "much to often" suggests that there is more than near
> zero interest.  Why can only one compression library can be considered?
> We use multiple readline implementations, for better or worse.
>
> I think the context here is for pg_dump only and in that context a faster
> compression library makes a lot of sense. I'd be happy to prepare a patch
> if the license issue can be accomodated. Hence my question, what sort of
> licence accomodation would we need to be able to use this library?

Based on previous discussions, I suspect that the answer here is
"complete relicensing as BSD".  I think pursuing any sort of licensing
exception is completely futile as there will still be restrictions
that will be unacceptable to many in the community.

But if someone had an actual BSD-LICENSED compression library that was
better than what we have now, I'm not sure why Bruce (or anyone)
should be opposed to incorporating it.  It's just that all of the
proposals that come up for this sort of thing aren't that.

...Robert


Re: Is a plan for lmza commpression in pg_dump

From
Bruce Momjian
Date:
Robert Haas wrote:
> > That this comes up "much to often" suggests that there is more than near
> > zero interest.  Why can only one compression library can be considered?
> > We use multiple readline implementations, for better or worse.
> >
> > I think the context here is for pg_dump only and in that context a faster
> > compression library makes a lot of sense. I'd be happy to prepare a patch
> > if the license issue can be accomodated. Hence my question, what sort of
> > licence accomodation would we need to be able to use this library?
> 
> Based on previous discussions, I suspect that the answer here is
> "complete relicensing as BSD".  I think pursuing any sort of licensing
> exception is completely futile as there will still be restrictions
> that will be unacceptable to many in the community.
> 
> But if someone had an actual BSD-LICENSED compression library that was
> better than what we have now, I'm not sure why Bruce (or anyone)
> should be opposed to incorporating it.  It's just that all of the
> proposals that come up for this sort of thing aren't that.

You can be I would oppose it.  It is not efficient for us to support
every compression-of-the-month project that comes along.  If something
was BSD, well tested, and clearly superior, we might consider it, but I
have seen nothing like that for 10 years and I doubt I will see
something the next 5.  I am thinking we need to add this to the
"Features we do not want" section of our todo list.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Is a plan for lmza commpression in pg_dump

From
Robert Haas
Date:
On Feb 7, 2009, at 4:53 PM, Bruce Momjian <bruce@momjian.us> wrote:

> Robert Haas wrote:
>>> That this comes up "much to often" suggests that there is more  
>>> than near
>>> zero interest.  Why can only one compression library can be  
>>> considered?
>>> We use multiple readline implementations, for better or worse.
>>>
>>> I think the context here is for pg_dump only and in that context a  
>>> faster
>>> compression library makes a lot of sense. I'd be happy to prepare  
>>> a patch
>>> if the license issue can be accomodated. Hence my question, what  
>>> sort of
>>> licence accomodation would we need to be able to use this library?
>>
>> Based on previous discussions, I suspect that the answer here is
>> "complete relicensing as BSD".  I think pursuing any sort of  
>> licensing
>> exception is completely futile as there will still be restrictions
>> that will be unacceptable to many in the community.
>>
>> But if someone had an actual BSD-LICENSED compression library that  
>> was
>> better than what we have now, I'm not sure why Bruce (or anyone)
>> should be opposed to incorporating it.  It's just that all of the
>> proposals that come up for this sort of thing aren't that.
>
> You can be I would oppose it.  It is not efficient for us to support
> every compression-of-the-month project that comes along.  If something
> was BSD, well tested, and clearly superior, we might consider it,  
> but I

Well that's pretty much what I said.

> have seen nothing like that for 10 years and I doubt I will see
> something the next 5.  I am thinking

I am doubtful too.

> we need to add this to the
> "Features we do not want" section of our todo list.

"Proprietary compression algorithms, even with Postgresql-specific  
license exceptions"?

...Robert 


Re: Is a plan for lmza commpression in pg_dump

From
Bruce Momjian
Date:
Robert Haas wrote:
> > have seen nothing like that for 10 years and I doubt I will see
> > something the next 5.  I am thinking
> 
> I am doubtful too.
> 
> > we need to add this to the
> > "Features we do not want" section of our todo list.
> 
> "Proprietary compression algorithms, even with Postgresql-specific  
> license exceptions"?

Yep.  Does it make sense to make our license more complex to get 1%
percent better compression in certain cases?  Probably not.  Also
consider the code maintenance, patents, larger tarball, bugs, etc.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Is a plan for lmza commpression in pg_dump

From
David Fetter
Date:
On Sat, Feb 07, 2009 at 08:49:29PM -0500, Robert Haas wrote:
> On Feb 7, 2009, at 4:53 PM, Bruce Momjian <bruce@momjian.us> wrote:
>
>> we need to add this to the "Features we do not want" section of our
>> todo list.
>
> "Proprietary compression algorithms, even with Postgresql-specific
> license exceptions"?

Considering that the entire project ships with a BSD license, which
very specifically allows use of all or any tiniest part of it with
(skipping some legalese) two restrictions: mention PGDG in the
copyright list, and don't sue us no matter what happens, any
"Postgresql-specific license exceptions" are equivalent to "that
algorithm is no longer proprietary" because any project could simply
use PostgreSQL's version and have done.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Is a plan for lmza commpression in pg_dump

From
daveg
Date:
On Sat, Feb 07, 2009 at 08:49:29PM -0500, Robert Haas wrote:
> "Proprietary compression algorithms, even with Postgresql-specific  
> license exceptions"?

To be fair, lzo is GPL, which is a stretch to consider proprietary.

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.


Re: Is a plan for lmza commpression in pg_dump

From
Martijn van Oosterhout
Date:
On Sat, Feb 07, 2009 at 08:31:23PM -0800, David Fetter wrote:
> Considering that the entire project ships with a BSD license, which
> very specifically allows use of all or any tiniest part of it with
> (skipping some legalese) two restrictions: mention PGDG in the
> copyright list, and don't sue us no matter what happens, any
> "Postgresql-specific license exceptions" are equivalent to "that
> algorithm is no longer proprietary" because any project could simply
> use PostgreSQL's version and have done.

Why don't we just add an option to pg_dump --use-compress-program, just
like tar and then people can use their "compression algorithm of the
week" and we don't need to care about the licence or anything.

It's not like the case of TOAST where it actually needs to be builtin.
Tar doesn't have any compression builtin, yet you don't see many
uncompressed tar files...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: Is a plan for lmza commpression in pg_dump

From
Greg Stark
Date:
On 8 Feb 2009, at 02:49, Robert Haas <robertmhaas@gmail.com> wrote:

> On Feb 7, 2009, at 4:53 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>>>
>>>
>>>
>>
>> we need to add this to the
>> "Features we do not want" section of our todo list.
>
> "Proprietary compression algorithms, even with Postgresql-specific  
> license exceptions"?

Now that I would agree about. We would have to explain that we're bsd  
licenced *because* we want people to be able to reuse our code outside  
postgres including commercial projects


Re: Is a plan for lmza commpression in pg_dump

From
Andrew Chernow
Date:
> 
> Why don't we just add an option to pg_dump --use-compress-program, just
> like tar and then people can use their "compression algorithm of the
> week" and we don't need to care about the licence or anything.

Can't this be done already?

pg_dump -Z 0 | compression_binary >mydump

If -Z is unspecified, I think it won't compress?  Maybe you can just drop the -Z.

-- 
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


Re: Is a plan for lmza commpression in pg_dump

From
Andrew Dunstan
Date:

Martijn van Oosterhout wrote:
>
> Why don't we just add an option to pg_dump --use-compress-program, just
> like tar and then people can use their "compression algorithm of the
> week" and we don't need to care about the licence or anything.
>
> It's not like the case of TOAST where it actually needs to be builtin.
> Tar doesn't have any compression builtin, yet you don't see many
> uncompressed tar files...
>
>
>   

tar compresses/decompresses the whole archive via a single pipe. pg_dump 
compresses individual data members. If the compression isn't builtin it 
will make life much more difficult, and probably make parallel restore 
as well as some other operations well nigh impossible.

cheers

andrew




Re: Is a plan for lmza commpression in pg_dump

From
Peter Eisentraut
Date:
daveg wrote:
> I think the context here is for pg_dump only and in that context a faster
> compression library makes a lot of sense. I'd be happy to prepare a patch
> if the license issue can be accomodated.

Some kind of performance data (space and time) would be required to 
support any change in this area.

Notice that the thread originally called for lzma support, which is 
completely at the opposite end of the spectrum of compression algorithms 
in terms of space and time, compared to lzo.  So it's not really clear 
what the requirements are in the first place.


Re: Is a plan for lmza commpression in pg_dump

From
Andrew Chernow
Date:
Peter Eisentraut wrote:
> 
> Notice that the thread originally called for lzma support, which is 
> completely at the opposite end of the spectrum of compression algorithms 
> in terms of space and time, compared to lzo.  So it's not really clear 
> what the requirements are in the first place.
> 
> 

Instead of trying to figure out the needs/wants of a DBA, a general purpose 
solution, it might be better to figure out how to make the compression choice 
user-driven.  Maybe the requirement should be to make this the user's decision; 
pipe'n the output to the compression of choice seems to be the simplest approach.

There are cases the highest compression is desired even if it takes forever, and 
cases for just the opposite.  Not sure why this has to be builtin or why it much 
use zlib, other than this is the current method.

-- 
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/