Thread: Documenting pglesslog

Documenting pglesslog

From
Bruce Momjian
Date:
In thinking about how to communicate to users about reducing continuous
archiving storage requirements, I realized we don't mention pglesslog in
our official documentation.

The attached patch documents how to use pglesslog and gzip/gunzip to
reduce storage requirements.  Comments?

Also, I assume pg_lesslog removes the padding we use to make all WAL
files 16MB, effectively doing the function of clearxlogtail too, right?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/backup.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/backup.sgml,v
retrieving revision 2.121
diff -c -c -r2.121 backup.sgml
*** doc/src/sgml/backup.sgml    9 Nov 2008 17:51:15 -0000    2.121
--- doc/src/sgml/backup.sgml    11 Jan 2009 01:41:12 -0000
***************
*** 1337,1342 ****
--- 1337,1359 ----
        WAL files are part of the same <application>tar</> file.
        Please remember to add error handling to your backup scripts.
       </para>
+
+      <para>
+       If archive storage size is a concern, use <application>pg_compresslog</>,
+       <ulink url="http://pglesslog.projects.postgresql.org"></ulink>, to
+       remove unnecessary <xref linkend="guc-full-page-writes"> and trailing
+       space from the WAL files.  You can then use
+       <application>gzip</application> to further compress the output of
+       <application>pg_compresslog</>:
+ <programlisting>
+ archive_command = 'pg_compresslog %p - | gzip > /var/lib/pgsql/archive/%f'
+ </programlisting>
+       You will then need to use <application>gunzip</> and
+       <application>pg_decompresslog</> during recovery:
+ <programlisting>
+ restore_command = 'gunzip < /mnt/server/archivedir/%f | pg_decompresslog - %p'
+ </programlisting>
+      </para>
      </sect3>

      <sect3 id="backup-scripts">

Re: Documenting pglesslog

From
Simon Riggs
Date:
On Sat, 2009-01-10 at 21:09 -0500, Bruce Momjian wrote:

> Comments?

If this is for backpatching, it makes sense. We should at least wait
until sync rep is accepted or rejected and docs written.

In general I don't think we should refer/link to other companies'
copyrighted materials in our documentation. That could cause
difficulties.

If you're going to do this, then I think you should go through the docs
and refer directly to many other commonly used tools that are also on
pg_foundry. 

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Documenting pglesslog

From
Bruce Momjian
Date:
Simon Riggs wrote:
> 
> On Sat, 2009-01-10 at 21:09 -0500, Bruce Momjian wrote:
> 
> > Comments?
> 
> If this is for backpatching, it makes sense. We should at least wait
> until sync rep is accepted or rejected and docs written.

No, it is not for backpatching.

> In general I don't think we should refer/link to other companies'
> copyrighted materials in our documentation. That could cause
> difficulties.

It is BSD licensed.  I don't see any copyright issues:
http://pglesslog.projects.postgresql.org/

> If you're going to do this, then I think you should go through the docs
> and refer directly to many other commonly used tools that are also on
> pg_foundry. 

I add items where they fit logically.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Documenting pglesslog

From
Simon Riggs
Date:
On Sat, 2009-01-10 at 23:38 -0500, Bruce Momjian wrote:

> It is BSD licensed.  I don't see any copyright issues:
> 
>     http://pglesslog.projects.postgresql.org/

A licence and copyright are different things. Why do we insist on
changing copyright on our code if it is unimportant?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Documenting pglesslog

From
"Joshua D. Drake"
Date:
On Sun, 2009-01-11 at 03:12 +0000, Simon Riggs wrote:
> On Sat, 2009-01-10 at 21:09 -0500, Bruce Momjian wrote:
> 
> > Comments?
> 
> If this is for backpatching, it makes sense. We should at least wait
> until sync rep is accepted or rejected and docs written.

Why? Even if sync rep is accepted, pglesslog would still be useful for
those who aren't using wal archiving right?

> 
> In general I don't think we should refer/link to other companies'
> copyrighted materials in our documentation. That could cause
> difficulties.

What? That seems a bit odd. I see zero problem with linking to the page,
especially considering it is an open source project hosted on a
postgresql project service.

> 
> If you're going to do this, then I think you should go through the docs
> and refer directly to many other commonly used tools that are also on
> pg_foundry. 
> 

Well have some more information would certainly be useful but in this
particular case I don't know of anything else on pgfoundry that would
actually help with the problem Bruce is trying to solve.

Joshua D. Drake



> -- 
>  Simon Riggs           www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
> 
> 
-- 
PostgreSQL  Consulting, Development, Support, Training  503-667-4564 - http://www.commandprompt.com/  The PostgreSQL
Company,serving since 1997
 



Re: Documenting pglesslog

From
Bruce Momjian
Date:
Simon Riggs wrote:
> 
> On Sat, 2009-01-10 at 23:38 -0500, Bruce Momjian wrote:
> 
> > It is BSD licensed.  I don't see any copyright issues:
> > 
> >     http://pglesslog.projects.postgresql.org/
> 
> A licence and copyright are different things. Why do we insist on
> changing copyright on our code if it is unimportant?

Because the BSD copyright has no enforcement, I don't think the
copyright holder is important --- look at all the companies that take
our code and use it in their commercial products.  We rebrand code we
accept so it is clear who maintains it;  note we take NetBSD code and
add our copyright name to theirs and distribute it as our own.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Documenting pglesslog

From
Simon Riggs
Date:
On Sun, 2009-01-11 at 09:47 -0500, Bruce Momjian wrote:
> Simon Riggs wrote:
> > 
> > On Sat, 2009-01-10 at 23:38 -0500, Bruce Momjian wrote:
> > 
> > > It is BSD licensed.  I don't see any copyright issues:
> > > 
> > A licence and copyright are different things. Why do we insist on
> > changing copyright on our code if it is unimportant?
> 
> Because the BSD copyright has no enforcement

AFAIK there is no such thing as a BSD copyright. There is Copyright and
there is a BSD licence, issued by the copyright holder.

In general, IMHO, I don't think it's a good direction to go in to
include links to works of other copyright holders.

Specifically, I have nothing but good to say about pglesslog and its
authors.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Documenting pglesslog

From
"Robert Haas"
Date:
> In general, IMHO, I don't think it's a good direction to go in to
> include links to works of other copyright holders.

I think it's a great idea.  IMHO, one of the major selling points of
PostgreSQL is its awesome documentation.  However, one of its
weaknesses is that contrib module, pgfoundry projects, etc. are often
not mentioned in the parts of the main documentation to which they
relate.  While I certainly don't want to go in the direction of
telling people "Don't worry about the fact that we handle X poorly
because there is a 5-year old, unmaintained pgfoundry module that
fixes it", giving people references tools that the community thinks
are good and useful seems very helpful to me.

I am completely mystified as what linking "other copyright holders"
has to do with it.  That seems to imply that you fear some sort of
legal entanglement, but I can't imagine what it could possibly be.
Admittedly, IANAL.

...Robert


Re: Documenting pglesslog

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> In thinking about how to communicate to users about reducing continuous
> archiving storage requirements, I realized we don't mention pglesslog in
> our official documentation.
> 
> The attached patch documents how to use pglesslog and gzip/gunzip to
> reduce storage requirements.  Comments?
> 
> Also, I assume pg_lesslog removes the padding we use to make all WAL
> files 16MB, effectively doing the function of clearxlogtail too, right?

Applied.

---------------------------------------------------------------------------


> 
> -- 
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
> 
>   + If your life is a hard drive, Christ can be your backup. +

[ text/x-diff is unsupported, treating like TEXT/PLAIN ]

> Index: doc/src/sgml/backup.sgml
> ===================================================================
> RCS file: /cvsroot/pgsql/doc/src/sgml/backup.sgml,v
> retrieving revision 2.121
> diff -c -c -r2.121 backup.sgml
> *** doc/src/sgml/backup.sgml    9 Nov 2008 17:51:15 -0000    2.121
> --- doc/src/sgml/backup.sgml    11 Jan 2009 01:41:12 -0000
> ***************
> *** 1337,1342 ****
> --- 1337,1359 ----
>         WAL files are part of the same <application>tar</> file.
>         Please remember to add error handling to your backup scripts.
>        </para>
> + 
> +      <para>
> +       If archive storage size is a concern, use <application>pg_compresslog</>,
> +       <ulink url="http://pglesslog.projects.postgresql.org"></ulink>, to
> +       remove unnecessary <xref linkend="guc-full-page-writes"> and trailing
> +       space from the WAL files.  You can then use
> +       <application>gzip</application> to further compress the output of
> +       <application>pg_compresslog</>:
> + <programlisting>
> + archive_command = 'pg_compresslog %p - | gzip > /var/lib/pgsql/archive/%f'
> + </programlisting>
> +       You will then need to use <application>gunzip</> and
> +       <application>pg_decompresslog</> during recovery:
> + <programlisting>
> + restore_command = 'gunzip < /mnt/server/archivedir/%f | pg_decompresslog - %p'
> + </programlisting>
> +      </para>
>       </sect3>
>   
>       <sect3 id="backup-scripts">

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Documenting pglesslog

From
"Koichi Suzuki"
Date:
Hi,

I have no intention to make pglesslog to conflict to PostgreSQL
license.   Any advice is welcome to make pglesslog available without
any license concern.

I've a question and ideas.

Bruce's modification directly points to my pgfoundry page.   I'm not
sure what it means.  Does it mean that I have to maintain the page for
a while?   If pglesslog helps for future releases, can it be a part of
PostgreSQL release, as contrib module so that all the documentation in
pgfoundry (although very simple) is included in the release material?

As many hackers know, I've posted another code to speedup PITR after
slipping FPW, which does work with 8.3 as external module
(pg_readahead).   I'm now working to work this with synchronous
replication.   Maybe it's a good idea to use pglesslog with
pg_readahead.    Although I'm not sure if pg_readahead integration
with synchronous replication will be done within 8.4 development
period, I'm quite ready to post pg_readahead for 8.4 sililar to that
for 8.3, which also could be in contrib module.

Looking forward to inputs.

2009/1/12 Simon Riggs <simon@2ndquadrant.com>:
>
> On Sun, 2009-01-11 at 09:47 -0500, Bruce Momjian wrote:
>> Simon Riggs wrote:
>> >
>> > On Sat, 2009-01-10 at 23:38 -0500, Bruce Momjian wrote:
>> >
>> > > It is BSD licensed.  I don't see any copyright issues:
>> > >
>> > A licence and copyright are different things. Why do we insist on
>> > changing copyright on our code if it is unimportant?
>>
>> Because the BSD copyright has no enforcement
>
> AFAIK there is no such thing as a BSD copyright. There is Copyright and
> there is a BSD licence, issued by the copyright holder.
>
> In general, IMHO, I don't think it's a good direction to go in to
> include links to works of other copyright holders.
>
> Specifically, I have nothing but good to say about pglesslog and its
> authors.
>
> --
>  Simon Riggs           www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
------
Koichi Suzuki


Re: Documenting pglesslog

From
Simon Riggs
Date:
On Tue, 2009-01-13 at 13:21 +0900, Koichi Suzuki wrote:

> I have no intention to make pglesslog to conflict to PostgreSQL
> license.   Any advice is welcome to make pglesslog available without
> any license concern.

I understand, no part of my comments were against you or your work.

> I've a question and ideas.
> 
> Bruce's modification directly points to my pgfoundry page.   I'm not
> sure what it means.  Does it mean that I have to maintain the page for
> a while?   If pglesslog helps for future releases, can it be a part of
> PostgreSQL release, as contrib module so that all the documentation in
> pgfoundry (although very simple) is included in the release material?

I think it would be better to create a Wiki page that is directly
controlled by the project, which describes additions to PITR (or other
aspects of the project) and contains links. I think everyone accepts
that the Wiki can have off-project links.

That way people can submit their work without needing to make
off-project links permanent from the docs. If people then change their
site content in future we can more easily change the link.

For example, Josh can make contributions there as well.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Documenting pglesslog

From
Bruce Momjian
Date:
Koichi Suzuki wrote:
> Hi,
> 
> I have no intention to make pglesslog to conflict to PostgreSQL
> license.   Any advice is welcome to make pglesslog available without
> any license concern.

I certainly have no concerns.

> I've a question and ideas.
> 
> Bruce's modification directly points to my pgfoundry page.   I'm not
> sure what it means.  Does it mean that I have to maintain the page for
> a while?   If pglesslog helps for future releases, can it be a part of
> PostgreSQL release, as contrib module so that all the documentation in
> pgfoundry (although very simple) is included in the release material?

I think eventually we should put pglesslog into /contrib, and if we ever
do that, we would update your web page.  I have not heard any mention of
it being moved into /contrib for 8.4 though.

If you would like me to point to another URL, please let me know.

I think there is definately demand for pglesslog because not only does
it truncate dead space from the WAL file, it also removes full page
write images, and is best done in archive_command, and hence externally
like your tool does.

> As many hackers know, I've posted another code to speedup PITR after
> slipping FPW, which does work with 8.3 as external module
> (pg_readahead).   I'm now working to work this with synchronous
> replication.   Maybe it's a good idea to use pglesslog with
> pg_readahead.    Although I'm not sure if pg_readahead integration
> with synchronous replication will be done within 8.4 development
> period, I'm quite ready to post pg_readahead for 8.4 sililar to that
> for 8.3, which also could be in contrib module.

Sorry, I don't know enough about pg_readahead.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Documenting pglesslog

From
"Koichi Suzuki"
Date:
Pg_readahead is a tool to prefetch data pages before redoing, based on
the contents of archive/active WAL segment.   For 8.3 and 8.4 without
sync.rep, this works together with restore_command.   Pg_readahead
analyze WAL segment, schedule and issue posix_fadvise() to prefetch
data pages quickly before redoing.

Discussions and materials will be found at

http://archives.postgresql.org/pgsql-hackers/2008-10/msg01372.php

So far, external command implemantation speeds up PITR up to six
times!  Therefore, overall recovery time can be a little longer than
that with FPW.


2009/1/14 Bruce Momjian <bruce@momjian.us>:
> Koichi Suzuki wrote:
>> Hi,
>>
>> I have no intention to make pglesslog to conflict to PostgreSQL
>> license.   Any advice is welcome to make pglesslog available without
>> any license concern.
>
> I certainly have no concerns.
>
>> I've a question and ideas.
>>
>> Bruce's modification directly points to my pgfoundry page.   I'm not
>> sure what it means.  Does it mean that I have to maintain the page for
>> a while?   If pglesslog helps for future releases, can it be a part of
>> PostgreSQL release, as contrib module so that all the documentation in
>> pgfoundry (although very simple) is included in the release material?
>
> I think eventually we should put pglesslog into /contrib, and if we ever
> do that, we would update your web page.  I have not heard any mention of
> it being moved into /contrib for 8.4 though.
>
> If you would like me to point to another URL, please let me know.
>
> I think there is definately demand for pglesslog because not only does
> it truncate dead space from the WAL file, it also removes full page
> write images, and is best done in archive_command, and hence externally
> like your tool does.
>
>> As many hackers know, I've posted another code to speedup PITR after
>> slipping FPW, which does work with 8.3 as external module
>> (pg_readahead).   I'm now working to work this with synchronous
>> replication.   Maybe it's a good idea to use pglesslog with
>> pg_readahead.    Although I'm not sure if pg_readahead integration
>> with synchronous replication will be done within 8.4 development
>> period, I'm quite ready to post pg_readahead for 8.4 sililar to that
>> for 8.3, which also could be in contrib module.
>
> Sorry, I don't know enough about pg_readahead.
>
> --
>  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
>
>  + If your life is a hard drive, Christ can be your backup. +
>



-- 
------
Koichi Suzuki


Re: Documenting pglesslog

From
Bruce Momjian
Date:
Koichi Suzuki wrote:
> Pg_readahead is a tool to prefetch data pages before redoing, based on
> the contents of archive/active WAL segment.   For 8.3 and 8.4 without
> sync.rep, this works together with restore_command.   Pg_readahead
> analyze WAL segment, schedule and issue posix_fadvise() to prefetch
> data pages quickly before redoing.
> 
> Discussions and materials will be found at
> 
> http://archives.postgresql.org/pgsql-hackers/2008-10/msg01372.php
> 
> So far, external command implemantation speeds up PITR up to six
> times!  Therefore, overall recovery time can be a little longer than
> that with FPW.

Now that 8.4 is using fsync, sounds like something that should be
integrated into the core code, rather than as a /contrib.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Documenting pglesslog

From
"Koichi Suzuki"
Date:
Pg_readahead uses posix_fadvise, which is included in Greg's patch andI've already posted pg_readahead patch integrated
intothe core.
 

Integration with snc.rep. will be a separate patch which will be
posted in a couple of days.

2009/1/14 Bruce Momjian <bruce@momjian.us>:
> Koichi Suzuki wrote:
>> Pg_readahead is a tool to prefetch data pages before redoing, based on
>> the contents of archive/active WAL segment.   For 8.3 and 8.4 without
>> sync.rep, this works together with restore_command.   Pg_readahead
>> analyze WAL segment, schedule and issue posix_fadvise() to prefetch
>> data pages quickly before redoing.
>>
>> Discussions and materials will be found at
>>
>> http://archives.postgresql.org/pgsql-hackers/2008-10/msg01372.php
>>
>> So far, external command implemantation speeds up PITR up to six
>> times!  Therefore, overall recovery time can be a little longer than
>> that with FPW.
>
> Now that 8.4 is using fsync, sounds like something that should be
> integrated into the core code, rather than as a /contrib.
>
> --
>  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
>
>  + If your life is a hard drive, Christ can be your backup. +
>



-- 
------
Koichi Suzuki