Thread: pg_start_backup and pg_stop_backup Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct

On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> This doesn't contain any changes to pg_start_backup() yet, that's a
> separate issue and still under discussion.

I'm thinking of changing pg_start_backup and pg_stop_backup so that
they just check that wal_level >= 'archive', and changing pg_stop_backup
so that it doesn't wait for archiving when archive_mode is OFF.

This change is very simple and enables us to take a base backup for SR
even if archive_mode is OFF. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Wed, 2010-04-28 at 19:40 +0900, Fujii Masao wrote:
> On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> > This doesn't contain any changes to pg_start_backup() yet, that's a
> > separate issue and still under discussion.
> 
> I'm thinking of changing pg_start_backup and pg_stop_backup so that
> they just check that wal_level >= 'archive', and changing pg_stop_backup
> so that it doesn't wait for archiving when archive_mode is OFF.
> 
> This change is very simple and enables us to take a base backup for SR
> even if archive_mode is OFF. Thought?

Makes sense.

I'm wondering whether this could cause problems with people taking hot
backups that aren't aimed at SR. Perhaps we could have 2 new functions
whose names are more closely linked to the exact purpose:
pg_start_replication_copy() etc..
which then act exactly as you suggest.

-- Simon Riggs           www.2ndQuadrant.com



On Wed, Apr 28, 2010 at 6:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, 2010-04-28 at 19:40 +0900, Fujii Masao wrote:
>> On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>> > This doesn't contain any changes to pg_start_backup() yet, that's a
>> > separate issue and still under discussion.
>>
>> I'm thinking of changing pg_start_backup and pg_stop_backup so that
>> they just check that wal_level >= 'archive', and changing pg_stop_backup
>> so that it doesn't wait for archiving when archive_mode is OFF.
>>
>> This change is very simple and enables us to take a base backup for SR
>> even if archive_mode is OFF. Thought?
>
> Makes sense.
>
> I'm wondering whether this could cause problems with people taking hot
> backups that aren't aimed at SR. Perhaps we could have 2 new functions
> whose names are more closely linked to the exact purpose:
> pg_start_replication_copy() etc..
> which then act exactly as you suggest.

Hmm.  That seems a bit complicated.  Why can't we just let people use
the existing functions the way they always have?

...Robert


On Wed, 2010-04-28 at 06:56 -0400, Robert Haas wrote:
> On Wed, Apr 28, 2010 at 6:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Wed, 2010-04-28 at 19:40 +0900, Fujii Masao wrote:
> >> On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas
> >> <heikki.linnakangas@enterprisedb.com> wrote:
> >> > This doesn't contain any changes to pg_start_backup() yet, that's a
> >> > separate issue and still under discussion.
> >>
> >> I'm thinking of changing pg_start_backup and pg_stop_backup so that
> >> they just check that wal_level >= 'archive', and changing pg_stop_backup
> >> so that it doesn't wait for archiving when archive_mode is OFF.
> >>
> >> This change is very simple and enables us to take a base backup for SR
> >> even if archive_mode is OFF. Thought?
> >
> > Makes sense.
> >
> > I'm wondering whether this could cause problems with people taking hot
> > backups that aren't aimed at SR. Perhaps we could have 2 new functions
> > whose names are more closely linked to the exact purpose:
> > pg_start_replication_copy() etc..
> > which then act exactly as you suggest.
> 
> Hmm.  That seems a bit complicated.  Why can't we just let people use
> the existing functions the way they always have?

We can, but I already gave a reason why we should not. 

IIRC it was you that suggested changing the names of things if the
behaviour changes.

-- Simon Riggs           www.2ndQuadrant.com



Robert Haas wrote:
> On Wed, Apr 28, 2010 at 6:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Wed, 2010-04-28 at 19:40 +0900, Fujii Masao wrote:
>>> On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>> This doesn't contain any changes to pg_start_backup() yet, that's a
>>>> separate issue and still under discussion.
>>> I'm thinking of changing pg_start_backup and pg_stop_backup so that
>>> they just check that wal_level >= 'archive', and changing pg_stop_backup
>>> so that it doesn't wait for archiving when archive_mode is OFF.
>>>
>>> This change is very simple and enables us to take a base backup for SR
>>> even if archive_mode is OFF. Thought?
>> Makes sense.
>>
>> I'm wondering whether this could cause problems with people taking hot
>> backups that aren't aimed at SR. Perhaps we could have 2 new functions
>> whose names are more closely linked to the exact purpose:
>> pg_start_replication_copy() etc..
>> which then act exactly as you suggest.
> 
> Hmm.  That seems a bit complicated.  Why can't we just let people use
> the existing functions the way they always have?

Well, it would be nice to allow using pg_start_backup() on the primary
when streaming replication is enabled, even if archiving isn't.
Otherwise the only way to get the base backup for the standby is to shut
down primary first, or use filesystem snapshot etc.

The straightforward way to enable that would be to allow
pg_start_backup() when wal_level >= 'archive', regardless of
archive_mode. However, I'm worried that someone might take an online
backup without archiving (and replication), not realizing that it's not
safe.

That risk is there already, though, if you restore from an online backup
and forget to create recovery.conf. It will start up in inconsistent
state. The proposed change would make it easier to make that mistake.
I'm not sure what to do about it, maybe throw a warning if you start up
a database and there's a backup_label file in the data directory.
Something like:

WARNING: database system was interrupted while backup was in progress
HINT: If you are restoring from an online backup, you must use a WAL
archive for the restore, or the database can be in inconsistent state

That would also occur if the primary database crashes while a backup is
being taken, in which case the warning can be ignored.

Or maybe we should check in pg_start_backup() that either archive_mode
or streaming replication (max_wal_senders > 0) is enabled.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Wed, Apr 28, 2010 at 8:28 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Or maybe we should check in pg_start_backup() that either archive_mode
> or streaming replication (max_wal_senders > 0) is enabled.

I agree that pg_start_backup checks not only wal_level but also that.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Wed, Apr 28, 2010 at 7:22 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, 2010-04-28 at 06:56 -0400, Robert Haas wrote:
>> On Wed, Apr 28, 2010 at 6:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > On Wed, 2010-04-28 at 19:40 +0900, Fujii Masao wrote:
>> >> On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas
>> >> <heikki.linnakangas@enterprisedb.com> wrote:
>> >> > This doesn't contain any changes to pg_start_backup() yet, that's a
>> >> > separate issue and still under discussion.
>> >>
>> >> I'm thinking of changing pg_start_backup and pg_stop_backup so that
>> >> they just check that wal_level >= 'archive', and changing pg_stop_backup
>> >> so that it doesn't wait for archiving when archive_mode is OFF.
>> >>
>> >> This change is very simple and enables us to take a base backup for SR
>> >> even if archive_mode is OFF. Thought?
>> >
>> > Makes sense.
>> >
>> > I'm wondering whether this could cause problems with people taking hot
>> > backups that aren't aimed at SR. Perhaps we could have 2 new functions
>> > whose names are more closely linked to the exact purpose:
>> > pg_start_replication_copy() etc..
>> > which then act exactly as you suggest.
>>
>> Hmm.  That seems a bit complicated.  Why can't we just let people use
>> the existing functions the way they always have?
>
> We can, but I already gave a reason why we should not.
>
> IIRC it was you that suggested changing the names of things if the
> behaviour changes.

Absolutely, but I'm arguing that we shouldn't change the behavior in
the first place.  At least as I understand it, even when not using
archive_mode, streaming replication, or hot standby, it's still
perfectly legal to use pg_start_backup() to take a hot backup.  I
don't see why we would either (a) break that use case or (b) create
another function that does the same thing but with one extra error
check.

...Robert


Robert Haas wrote:
> At least as I understand it, even when not using
> archive_mode, streaming replication, or hot standby, it's still
> perfectly legal to use pg_start_backup() to take a hot backup.

Nope. The correct procedure to take a hot backup is described in
http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html#BACKUP-TIPS.
It involves setting archive_mode=on, and archive_command to a shell
command that normally just returns true, except when backup is in
progress. You can't take a hot backup without archiving (or streaming)
at least temporarily. (except with filesystem-level snapshot capabilities).

Which is unfortunate, really. I wish we had a mode where the server
simply refrained from removing/recycling WAL segments while the backup
is running. You could then just:

1. pg_start_backup()
2. tar the data directory, except for pg_xlog
3. tar pg_xlog
4. pg_stop_backup().

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Which is unfortunate, really. I wish we had a mode where the server
> simply refrained from removing/recycling WAL segments while the backup
> is running. You could then just:

> 1. pg_start_backup()
> 2. tar the data directory, except for pg_xlog
> 3. tar pg_xlog
> 4. pg_stop_backup().

I think there's a termination issue there --- the safe stop point
would (appear to be) past whatever WAL you'd copied during step 3.

Still, the possibility of adding modes such as this seems to me to be a
good argument for not inventing a new version of pg_start_backup/
pg_stop_backup every time.
        regards, tom lane


On Wed, 2010-04-28 at 11:10 -0400, Robert Haas wrote:
> >
> > IIRC it was you that suggested changing the names of things if the
> > behaviour changes.
> 
> Absolutely, but I'm arguing that we shouldn't change the behavior in
> the first place.  At least as I understand it...

I feel like you're just arguing against whatever I say - your reasoning
makes no sense. Masao would not have proposed it as a change if it
already worked like that, would he? Just reading the thread would tell
you that much. Plus, you clearly don't know how it works now, so not
sure why you're commenting at all, its just minor stuff and a few ideas.

-- Simon Riggs           www.2ndQuadrant.com



On Wed, Apr 28, 2010 at 11:25 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Robert Haas wrote:
>> At least as I understand it, even when not using
>> archive_mode, streaming replication, or hot standby, it's still
>> perfectly legal to use pg_start_backup() to take a hot backup.
>
> Nope. The correct procedure to take a hot backup is described in
> http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html#BACKUP-TIPS.
> It involves setting archive_mode=on, and archive_command to a shell
> command that normally just returns true, except when backup is in
> progress. You can't take a hot backup without archiving (or streaming)
> at least temporarily. (except with filesystem-level snapshot capabilities).

Oh.  Well, in that case the proposed change seems reasonable... but
what do you mean by "except with filesystem-level snapshot
capabilities"?

...Robert


On Wed, 2010-04-28 at 12:44 -0400, Robert Haas wrote:
> On Wed, Apr 28, 2010 at 11:25 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> > Robert Haas wrote:
> >> At least as I understand it, even when not using
> >> archive_mode, streaming replication, or hot standby, it's still
> >> perfectly legal to use pg_start_backup() to take a hot backup.
> >
> > Nope. The correct procedure to take a hot backup is described in
> > http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html#BACKUP-TIPS.
> > It involves setting archive_mode=on, and archive_command to a shell
> > command that normally just returns true, except when backup is in
> > progress. You can't take a hot backup without archiving (or streaming)
> > at least temporarily. (except with filesystem-level snapshot capabilities).
>
> Oh.  Well, in that case the proposed change seems reasonable... but
> what do you mean by "except with filesystem-level snapshot
> capabilities"?

Like LVM, SANS or ZFS.

Joshua D. Drake

>
> ...Robert
>


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Robert Haas wrote:
> but
> what do you mean by "except with filesystem-level snapshot
> capabilities"?

If you have a filesystem that supports atomic snapshots, you can take a
snapshot of the filesystem the data directory resides on, and then copy
the data directory from the snapshot at your leisure, without
pg_start/stop_backup(). It is entirely invisible to PostgreSQL and works
just like copying the data directory after an immediate shutdown. The
server will perform crash recovery after restore.

Virtualization software, logical volume managers and SANs tend to have
such features, in addition to filesystems.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Well, it would be nice to allow using pg_start_backup() on the primary
> when streaming replication is enabled, even if archiving isn't.
> Otherwise the only way to get the base backup for the standby is to shut
> down primary first, or use filesystem snapshot etc.

I think I must be missing something: exactly how would you fire up a new
standby from such a base backup, if you weren't running archiving?
If you aren't archiving then there's no guarantee that you'll still have
a continuous WAL series starting from the start of the backup.

IOW I think that the requirement in pg_start_backup shouldn't be relaxed
without some more thought/work.
        regards, tom lane


> IOW I think that the requirement in pg_start_backup shouldn't be relaxed
> without some more thought/work.

Yeah, I was talking to Bruce about that this AM, and it seems like a
feature we *need* to have ... for 9.1.

I'm sufficiently concerned about the amount of flux HS/SR is in right
now that I'd like to declare it "good enough" and move towards release.Otherwise we'll tinker with it forever and there
willbe no 9.0.
 

"Release early, release often" *is* the OSS mantra, after all.  The
question now isn't "Is binary replication perfect" but "is it *good
enough* for some substantial portion of our users".   And I think the
answer to the latter question is, at this point, yes.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Well, it would be nice to allow using pg_start_backup() on the primary
>> when streaming replication is enabled, even if archiving isn't.
>> Otherwise the only way to get the base backup for the standby is to shut
>> down primary first, or use filesystem snapshot etc.
> 
> I think I must be missing something: exactly how would you fire up a new
> standby from such a base backup, if you weren't running archiving?

I was replying to Robert's thought on using pg_start/stop_backup() for
taking a hot backup. Not for bootstrapping a standby.

> If you aren't archiving then there's no guarantee that you'll still have
> a continuous WAL series starting from the start of the backup.

I wasn't really thinking of this use case, but you could set
wal_keep_segments "high enough". Not a configuration I would recommend
for high availability, but should be fine for setting up a streaming
replication standby for testing etc. If we don't allow
pg_start/stop_backup() with archive_mode=off and max_wal_senders>0,
there's no way to bootstrap a streaming replication standby without
archiving.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Wed, 2010-04-28 at 11:11 -0700, Josh Berkus wrote:
> > IOW I think that the requirement in pg_start_backup shouldn't be relaxed
> > without some more thought/work.
> 
> Yeah, I was talking to Bruce about that this AM, and it seems like a
> feature we *need* to have ... for 9.1.
> 
> I'm sufficiently concerned about the amount of flux HS/SR is in right
> now that I'd like to declare it "good enough" and move towards release.
>  Otherwise we'll tinker with it forever and there will be no 9.0.
> 
> "Release early, release often" *is* the OSS mantra, after all.  The
> question now isn't "Is binary replication perfect" but "is it *good
> enough* for some substantial portion of our users".   And I think the
> answer to the latter question is, at this point, yes.

As of exactly today, my answer, for my piece of this is also "yes". 

I'm not convinced that the same is true across the board. Some important
changes have happened in last few days and I see more coming.

-- Simon Riggs           www.2ndQuadrant.com



Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> If you aren't archiving then there's no guarantee that you'll still have
>> a continuous WAL series starting from the start of the backup.

> I wasn't really thinking of this use case, but you could set
> wal_keep_segments "high enough".

Ah.  Okay, that seems like a workable approach, at least for people with
reasonably predictable WAL loads.  We could certainly improve on it
later to make it more bulletproof, but it's usable now --- if we relax
the error checks.

(wal_keep_segments can be changed without restarting, right?)

> Not a configuration I would recommend
> for high availability, but should be fine for setting up a streaming
> replication standby for testing etc. If we don't allow
> pg_start/stop_backup() with archive_mode=off and max_wal_senders>0,
> there's no way to bootstrap a streaming replication standby without
> archiving.

Right.  +1 for weakening the tests, then.  Is there any use in looking
at wal_keep_segments as part of this test?
        regards, tom lane


Heikki Linnakangas wrote:
> Tom Lane wrote:
>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>>> Well, it would be nice to allow using pg_start_backup() on the primary
>>> when streaming replication is enabled, even if archiving isn't.
>>> Otherwise the only way to get the base backup for the standby is to shut
>>> down primary first, or use filesystem snapshot etc.
>> I think I must be missing something: exactly how would you fire up a new
>> standby from such a base backup, if you weren't running archiving?
> 
> I was replying to Robert's thought on using pg_start/stop_backup() for
> taking a hot backup. Not for bootstrapping a standby.

Scratch that, I just reread what I wrote, and starting a streaming
replication standby from such a backup was exactly what I was describing..

>> If you aren't archiving then there's no guarantee that you'll still have
>> a continuous WAL series starting from the start of the backup.
> 
> I wasn't really thinking of this use case, but you could set
> wal_keep_segments "high enough". Not a configuration I would recommend
> for high availability, but should be fine for setting up a streaming
> replication standby for testing etc. If we don't allow
> pg_start/stop_backup() with archive_mode=off and max_wal_senders>0,
> there's no way to bootstrap a streaming replication standby without
> archiving.

This still makes sense.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> If you aren't archiving then there's no guarantee that you'll still have
>>> a continuous WAL series starting from the start of the backup.
> 
>> I wasn't really thinking of this use case, but you could set
>> wal_keep_segments "high enough".
> 
> Ah.  Okay, that seems like a workable approach, at least for people with
> reasonably predictable WAL loads.  We could certainly improve on it
> later to make it more bulletproof, but it's usable now --- if we relax
> the error checks.

Yeah, wal_keep_segments is wishy-woshy in general, not only with backups.

> (wal_keep_segments can be changed without restarting, right?)

It's PG_SIGHUP.

>> Not a configuration I would recommend
>> for high availability, but should be fine for setting up a streaming
>> replication standby for testing etc. If we don't allow
>> pg_start/stop_backup() with archive_mode=off and max_wal_senders>0,
>> there's no way to bootstrap a streaming replication standby without
>> archiving.
> 
> Right.  +1 for weakening the tests, then.  Is there any use in looking
> at wal_keep_segments as part of this test?

I don't think so. There's no safe setting that would guarantee anything.
We could check for wal_keep_segments>0, but any small number is the same
practice. We don't insist on wal_keep_segments>0 to allow WAL streaming
without archival in general, let's not treat taking the base backup
differently.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Wed, 2010-04-28 at 14:21 -0400, Tom Lane wrote:
> Is there any use in looking
> at wal_keep_segments as part of this test?

I would hope that pg_stop_backup() will have a conditional ERROR message
to say

ERROR backup inconsistent and cannot be used for SR
HINT increase wal_keep_segments or enable archiving for your base backup

I think it would also be useful to add a NOTICE to pg_start_backup()

NOTICE archiving is not enabled. If we reach exceed wal_keep_segments
WAL files then the backup will be invalidated. Expected time for this to
happen is X (using linear extrapolation of WAL creation rate since last
checkpoint)

-- Simon Riggs           www.2ndQuadrant.com



Simon Riggs wrote:
> On Wed, 2010-04-28 at 14:21 -0400, Tom Lane wrote:
>> Is there any use in looking
>> at wal_keep_segments as part of this test?
> 
> I would hope that pg_stop_backup() will have a conditional ERROR message
> to say
> 
> ERROR backup inconsistent and cannot be used for SR
> HINT increase wal_keep_segments or enable archiving for your base backup

Hmm, you could start streaming the WAL before you start the backup, so
the fact that you've already removed some segments that are needed to
restore from the backup by the time pg_stop_backup() is called doesn't
necessarily mean that the backup is useless.

You'd need a stand-alone tool to do the streaming in that case, and no
such tool exists yet, but I would be surprised if one doesn't appear on
pgfoundry sooner or later :-).

In case it's not clear to casual readers out there:
You will get an error as soon as you try to start the standby,
complaining that it can't find the WAL segment it needs in the primary
anymore. Not silent corruption.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


* Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> [100428 14:49]:
> You'd need a stand-alone tool to do the streaming in that case, and no
> such tool exists yet, but I would be surprised if one doesn't appear on
> pgfoundry sooner or later :-).

And this tool is something I will eventually be interested in working on
or collaborating on...  I'm hoping to be able to build a tool that:

1) Connects to PG walsender (a la walreceiver)
2) Streams WAL from pg master
3) Saves WAL into "files" (a la archive)...

i.e. I'm looking to keep a more-up-to-date PITR archive than waiting for
traditional WAL file archiving...

And eventually (9.1+) I'm hoping that walsender will have grown enough
to allow me to configure  PG to wait on the commit until the master has
both sync'ed the WAL file, and received a "sync ack" from my
wal-stream-save-to-file tool...

Because then I'll have a situation where I can easily have a
synchronous, separate machine copy of all my WAL without having to jump
through hoops with stuff like drbd or MD+nbd, etc as my WAL disk...

And yes, I don't personally care about streaming replication replaying
WAL as it comes, or running queries in recovery... I'm looking towards
PG not saying my transaction is committed unless it's safely on that
machines disks (or BBcache) *and* another machine...  That's the type of
replication a paranoid guy like me waits for...  Yes, that's possible
now with exotic os/net/fs configuration, but imagine how nice it will be
when it can all be done in userspace with just PG (and pg-compatible)
tool, etc...

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Hmm, you could start streaming the WAL before you start the backup, so
> the fact that you've already removed some segments that are needed to
> restore from the backup by the time pg_stop_backup() is called doesn't
> necessarily mean that the backup is useless.

> You'd need a stand-alone tool to do the streaming in that case, and no
> such tool exists yet, but I would be surprised if one doesn't appear on
> pgfoundry sooner or later :-).

Yeah.  ISTM the real bottom line here is that we have only a weak grasp
on how these features will end up being used; or for that matter what
the common error scenarios will be.  I think that for the time being
we should err on the side of being permissive.  We can tighten things
up and add more nanny-ism in the warnings later on, when we have
more field experience.
        regards, tom lane


Aidan Van Dyk <aidan@highrise.ca> wrote:
> I'm hoping to be able to build a tool that:
> 
> 1) Connects to PG walsender (a la walreceiver)
> 2) Streams WAL from pg master
> 3) Saves WAL into "files" (a la archive)...
> 
> i.e. I'm looking to keep a more-up-to-date PITR archive than
> waiting for traditional WAL file archiving...
I'm interested in that, too.
> I don't personally care about streaming replication replaying WAL
> as it comes, or running queries in recovery...
I'm with you that far, but I wouldn't want the sender to wait for
remote persistence.
-Kevin


"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Aidan Van Dyk <aidan@highrise.ca> wrote:
>  
>> I'm hoping to be able to build a tool that:
>> 
>> 1) Connects to PG walsender (a la walreceiver)
>> 2) Streams WAL from pg master
>> 3) Saves WAL into "files" (a la archive)...
>> 
>> i.e. I'm looking to keep a more-up-to-date PITR archive than
>> waiting for traditional WAL file archiving...
>  
> I'm interested in that, too.

That looks like we have that integrated into walreceiver the day we have
cascading support, right? Or maybe we need a special mode of operation
where the receiver is (talking to) an archiver.

>> I don't personally care about streaming replication replaying WAL
>> as it comes, or running queries in recovery...
>  
> I'm with you that far, but I wouldn't want the sender to wait for
> remote persistence.

That's synchronous replication and its set of synchronicity setting,
ranging from sent on the network to the slave, fsync()ed at the slave
and applied already on the slave.

IMO the real fun begins when we talk about multi-slaves support and
their roles (a failover slave wants the master to wait for it to have
applied the WAL before to commit, a reporting slave not so much). So
you'd set the Availability level on each slave and wouldn't commit on
the master until each slave got what it's configured for, or something
like that.

SyncRep in 9.1 already sounds darn interesting :)

Regards,
-- 
dim


* Kevin Grittner <Kevin.Grittner@wicourts.gov> [100428 15:51]: 
> > I don't personally care about streaming replication replaying WAL
> > as it comes, or running queries in recovery...
>  
> I'm with you that far, but I wouldn't want the sender to wait for
> remote persistence.

I remember a presentation at pgcon a while ago, it was probaly Fujii
(from NTT?) about their log streaming, and at that time, they talked
about different "sync" options...  So I'ld love to be able to have
comits be:  async (like current option)  local wal sync (like current)  local wal sync + walsender sent  local wal sync
+walsender confirmed
 

And ideally, the "walsender sent/confirmed" would even allow making sure
it was sent/confirmed to $X connections...  I want to be able to
guarantee it's on 2 machines, not that if my slave was connected it
would be on there, but something happened and my "slave"
has disconnected, so it's only got local WAL... 

And then on whatever "tool" is receiving the log streaming, it can be
set to confirm when either:  received buffer  write buffer to file  write buffer to file + sync  write buffer to file +
sync+ replay
 

That should give you all the sync levels they talked about in their
presentation...

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

On Wed, 2010-04-28 at 12:44 -0400, Robert Haas wrote:
> On Wed, Apr 28, 2010 at 11:25 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> > Robert Haas wrote:
> >> At least as I understand it, even when not using
> >> archive_mode, streaming replication, or hot standby, it's still
> >> perfectly legal to use pg_start_backup() to take a hot backup.
> >
> > Nope. The correct procedure to take a hot backup is described in
> > http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html#BACKUP-TIPS.
> > It involves setting archive_mode=on, and archive_command to a shell
> > command that normally just returns true, except when backup is in
> > progress. You can't take a hot backup without archiving (or streaming)
> > at least temporarily. (except with filesystem-level snapshot capabilities).
> 
> Oh.  Well, in that case the proposed change seems reasonable... but
> what do you mean by "except with filesystem-level snapshot
> capabilities"?

Like LVM, SANS or ZFS.

Joshua D. Drake

> 
> ...Robert
> 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering




Aidan Van Dyk wrote:
> I remember a presentation at pgcon a while ago, it was probaly Fujii
> (from NTT?) about their log streaming, and at that time, they talked
> about different "sync" options...

It's all outlined at 
http://wiki.postgresql.org/wiki/Streaming_Replication#Synchronization_capability

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Dimitri Fontaine wrote:
> IMO the real fun begins when we talk about multi-slaves support and
> their roles (a failover slave wants the master to wait for it to have
> applied the WAL before to commit, a reporting slave not so much). So
> you'd set the Availability level on each slave and wouldn't commit on
> the master until each slave got what it's configured for, or something
> like that.
>   

Ultimately the commit is stuck waiting for the slowest committing sync 
operation on the list; it's the bottleneck.  Let's presume that the 
commit waits can be done in parallel, after sending the transaction to 
every slave.  Given that and the situation you describe, having per-node 
sync levels only turns out to be a useful optimization if the reporting 
slave commits slower than the failover slave does.  The master is going 
to be stuck waiting for the slowest one of the batch regardless of 
whether you've optimized them individually.

There is a related situation that I think a per-node sync option would 
be more obviously useful for:  local failover slave, remote disaster 
recovery slave over a WAN, where you accept that a serious disaster 
taking out a whole data center will lose some transactions.  In that 
situation, you'd probably want fsync for the local slave, while going 
async for the remote datacenter.

If the commits are done in a serial fashion, tuning sync per-node would 
be much more valuable in many use cases.

Regardless, I wouldn't want to burden the first sync rep version with 
this requirement.  Let's wait until the current scope is cleared before 
trying to move the goalposts for the people working on that.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



On Wed, 2010-04-28 at 22:17 +0200, Dimitri Fontaine wrote:

> IMO the real fun begins when we talk about multi-slaves support and
> their roles (a failover slave wants the master to wait for it to have
> applied the WAL before to commit, a reporting slave not so much). So
> you'd set the Availability level on each slave and wouldn't commit on
> the master until each slave got what it's configured for, or something
> like that.

Just for the record, I outlined desirable semantics for this on hackers
in 2008 and want to keep those ideas on the table.
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01001.php

My view is that it should be up to the master what happens on master. An
additional standby connection should not have the ability to make
transactions on the master wait. If we give control to the master rather
than the standby, we are then able to allow transactions on the master
choose how robust they should be, just as we do with synchronous_commit.
IMHO that is extremely important, since we already know that sync rep
performs poorly and applications need to mitigate that in some way.

Those are the objectives, the parameters to do that are a different
story and we might expect much debate. One way of doing this would be to
have a parameter called synchronous_replication = N, which would cause
the transaction on primary to wait for at least N standbys to reply that
they have the data. This would allow settings like
synchronous_commit = 0    --async
synchronous_commit = 1    --first reply wins == max performance
synchronous_commit = 2    --multiple replies needed == max availability
...

-- Simon Riggs           www.2ndQuadrant.com



Tom Lane wrote:
> Yeah.  ISTM the real bottom line here is that we have only a weak grasp
> on how these features will end up being used; or for that matter what
> the common error scenarios will be.  I think that for the time being
> we should err on the side of being permissive.  We can tighten things
> up and add more nanny-ism in the warnings later on, when we have
> more field experience.

Ok, here's a proposed patch. Per discussion, it relaxes the checks in
pg_start/stop_backup() so that they can be used as long as wal_level >=
'archive', even if archiving is disabled.

If archiving is not enabled, it can't wait for the files to be archived.
Instead, it prints a notice:

NOTICE:  WAL archiving is not enabled, you must ensure that all required
WAL segments are streamed or copied through other means to restore the
backup

That is instead of the usual notice when archiving is enabled:

NOTICE: pg_stop_backup complete, all required WAL segments have been
archived

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 8200,8217 **** pg_start_backup(PG_FUNCTION_ARGS)
                   errmsg("recovery is in progress"),
                   errhint("WAL control functions cannot be executed during recovery.")));

!     if (!XLogArchivingActive())
!         ereport(ERROR,
!                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("WAL archiving is not active"),
!                  errhint("archive_mode must be enabled at server start.")));
!
!     if (!XLogArchiveCommandSet())
          ereport(ERROR,
                  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("WAL archiving is not active"),
!                  errhint("archive_command must be defined before "
!                          "online backups can be made safely.")));

      backupidstr = text_to_cstring(backupid);

--- 8200,8210 ----
                   errmsg("recovery is in progress"),
                   errhint("WAL control functions cannot be executed during recovery.")));

!     if (!XLogIsNeeded())
          ereport(ERROR,
                  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("WAL level not sufficient for making an online backup"),
!                  errhint("wal_level must be set to 'archive' or 'hot_standby' at server start.")));

      backupidstr = text_to_cstring(backupid);

***************
*** 8399,8409 **** pg_stop_backup(PG_FUNCTION_ARGS)
                   errmsg("recovery is in progress"),
                   errhint("WAL control functions cannot be executed during recovery.")));

!     if (!XLogArchivingActive())
          ereport(ERROR,
                  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("WAL archiving is not active"),
!                  errhint("archive_mode must be enabled at server start.")));

      /*
       * OK to clear forcePageWrites
--- 8392,8402 ----
                   errmsg("recovery is in progress"),
                   errhint("WAL control functions cannot be executed during recovery.")));

!     if (!XLogIsNeeded())
          ereport(ERROR,
                  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("WAL level not sufficient for making an online backup"),
!                  errhint("wal_level must be set to 'archive' or 'hot_standby' at server start.")));

      /*
       * OK to clear forcePageWrites
***************
*** 8511,8526 **** pg_stop_backup(PG_FUNCTION_ARGS)
      CleanupBackupHistory();

      /*
!      * Wait until both the last WAL file filled during backup and the history
!      * file have been archived.  We assume that the alphabetic sorting
!      * property of the WAL files ensures any earlier WAL files are safely
!      * archived as well.
       *
       * We wait forever, since archive_command is supposed to work and we
       * assume the admin wanted his backup to work completely. If you don't
       * wish to wait, you can set statement_timeout.  Also, some notices are
       * issued to clue in anyone who might be doing this interactively.
       */
      XLByteToPrevSeg(stoppoint, _logId, _logSeg);
      XLogFileName(lastxlogfilename, ThisTimeLineID, _logId, _logSeg);

--- 8504,8530 ----
      CleanupBackupHistory();

      /*
!      * If archiving is enabled, wait for all the required WAL files to be
!      * archived before returning. If archiving isn't enabled, the required
!      * WAL needs to be transported via streaming replication (hopefully
!      * with wal_keep_segments set high enough), or some more exotic
!      * mechanism like polling and copying files from pg_xlog with script.
!      * We have no control over those mechanisms, so it's up to the user to
!      * ensure that he gets all the required WAL.
!      *
!      * We wait until both the last WAL file filled during backup and the
!      * history file have been archived, and assume that the alphabetic
!      * sorting property of the WAL files ensures any earlier WAL files are
!      * safely archived as well.
       *
       * We wait forever, since archive_command is supposed to work and we
       * assume the admin wanted his backup to work completely. If you don't
       * wish to wait, you can set statement_timeout.  Also, some notices are
       * issued to clue in anyone who might be doing this interactively.
       */
+     if (XLogArchivingActive())
+     {
+         /* XXX: fix indentation before committing */
      XLByteToPrevSeg(stoppoint, _logId, _logSeg);
      XLogFileName(lastxlogfilename, ThisTimeLineID, _logId, _logSeg);

***************
*** 8559,8564 **** pg_stop_backup(PG_FUNCTION_ARGS)
--- 8563,8572 ----

      ereport(NOTICE,
              (errmsg("pg_stop_backup complete, all required WAL segments have been archived")));
+     }
+     else
+         ereport(NOTICE,
+                 (errmsg("WAL archiving is not enabled, you must ensure that all required WAL segments are streamed or
copiedthrough other means to restore the backup"))); 

      /*
       * We're done.  As a convenience, return the ending WAL location.

On Thu, Apr 29, 2010 at 5:38 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> NOTICE:  WAL archiving is not enabled, you must ensure that all required
> WAL segments are streamed or copied through other means to restore the
> backup

I might think about dropping the words "through other means" from this sentence.

...Robert


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> Yeah.  ISTM the real bottom line here is that we have only a weak grasp
>> on how these features will end up being used; or for that matter what
>> the common error scenarios will be.  I think that for the time being
>> we should err on the side of being permissive.  We can tighten things
>> up and add more nanny-ism in the warnings later on, when we have
>> more field experience.

> Ok, here's a proposed patch. Per discussion, it relaxes the checks in
> pg_start/stop_backup() so that they can be used as long as wal_level >=
> 'archive', even if archiving is disabled.

This patch seems reasonably noncontroversial (except possibly for
message wording, which we can fine-tune later anyway).  Please apply.
9.0beta1 is going to get wrapped in only a few hours.

BTW, the documentation for these functions might need a bit of adjustment.
        regards, tom lane


Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > Tom Lane wrote:
> >> If you aren't archiving then there's no guarantee that you'll still have
> >> a continuous WAL series starting from the start of the backup.
> 
> > I wasn't really thinking of this use case, but you could set
> > wal_keep_segments "high enough".
> 
> Ah.  Okay, that seems like a workable approach, at least for people with
> reasonably predictable WAL loads.  We could certainly improve on it
> later to make it more bulletproof, but it's usable now --- if we relax
> the error checks.
> 
> (wal_keep_segments can be changed without restarting, right?)

Should we allow -1 to mean "keep all segments"?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com


On Fri, Apr 30, 2010 at 12:22 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Tom Lane wrote:
>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> > Tom Lane wrote:
>> >> If you aren't archiving then there's no guarantee that you'll still have
>> >> a continuous WAL series starting from the start of the backup.
>>
>> > I wasn't really thinking of this use case, but you could set
>> > wal_keep_segments "high enough".
>>
>> Ah.  Okay, that seems like a workable approach, at least for people with
>> reasonably predictable WAL loads.  We could certainly improve on it
>> later to make it more bulletproof, but it's usable now --- if we relax
>> the error checks.
>>
>> (wal_keep_segments can be changed without restarting, right?)
>
> Should we allow -1 to mean "keep all segments"?

If that's what you want to do, use archive_mode.

...Robert


Robert Haas wrote:
> On Fri, Apr 30, 2010 at 12:22 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Tom Lane wrote:
> >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> >> > Tom Lane wrote:
> >> >> If you aren't archiving then there's no guarantee that you'll still have
> >> >> a continuous WAL series starting from the start of the backup.
> >>
> >> > I wasn't really thinking of this use case, but you could set
> >> > wal_keep_segments "high enough".
> >>
> >> Ah. ?Okay, that seems like a workable approach, at least for people with
> >> reasonably predictable WAL loads. ?We could certainly improve on it
> >> later to make it more bulletproof, but it's usable now --- if we relax
> >> the error checks.
> >>
> >> (wal_keep_segments can be changed without restarting, right?)
> >
> > Should we allow -1 to mean "keep all segments"?
> 
> If that's what you want to do, use archive_mode.

Uh, I assume that will require me to store the WAL files somewhere else,
rather than keeping them in /pg_xlog, which I thought was the goal.  Am
I missing something?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com


On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote:
> > 
> > (wal_keep_segments can be changed without restarting, right?)
> 
> Should we allow -1 to mean "keep all segments"?

Why is that not called "max_wal_segments"? wal_keep_segments sounds like
its been through Google translate.

-- Simon Riggs           www.2ndQuadrant.com



Simon Riggs wrote:
> On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote:
> > > 
> > > (wal_keep_segments can be changed without restarting, right?)
> > 
> > Should we allow -1 to mean "keep all segments"?
> 
> Why is that not called "max_wal_segments"? wal_keep_segments sounds like
> its been through Google translate.

LOL, good one.

I assume it was done so it would start with 'wal', but I see
'max_wal_senders', which doesn't start with 'wal' and would match your
suggestion exactly.  I think we should either rename 'wal_keep_segments'
or 'max_wal_senders'.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com


On Fri, Apr 30, 2010 at 1:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote:
>> >
>> > (wal_keep_segments can be changed without restarting, right?)
>>
>> Should we allow -1 to mean "keep all segments"?
>
> Why is that not called "max_wal_segments"? wal_keep_segments sounds like
> its been through Google translate.

Because it's not a maximum?

...Robert


On Fri, Apr 30, 2010 at 1:39 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> On Fri, Apr 30, 2010 at 12:22 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> > Tom Lane wrote:
>> >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> >> > Tom Lane wrote:
>> >> >> If you aren't archiving then there's no guarantee that you'll still have
>> >> >> a continuous WAL series starting from the start of the backup.
>> >>
>> >> > I wasn't really thinking of this use case, but you could set
>> >> > wal_keep_segments "high enough".
>> >>
>> >> Ah. ?Okay, that seems like a workable approach, at least for people with
>> >> reasonably predictable WAL loads. ?We could certainly improve on it
>> >> later to make it more bulletproof, but it's usable now --- if we relax
>> >> the error checks.
>> >>
>> >> (wal_keep_segments can be changed without restarting, right?)
>> >
>> > Should we allow -1 to mean "keep all segments"?
>>
>> If that's what you want to do, use archive_mode.
>
> Uh, I assume that will require me to store the WAL files somewhere else,
> rather than keeping them in /pg_xlog, which I thought was the goal.  Am
> I missing something?

Well, one of us is.  Why would you want to retain all of your WAL logs
in pg_xlog forever?

...Robert


On 04/30/2010 01:53 PM, Robert Haas wrote:
>
> Well, one of us is.  Why would you want to retain all of your WAL logs
> in pg_xlog forever?
>
> ...Robert
>

To create or re-synchronize SR slaves, one could change 
wal_keep_segments to -1, run a backup, wait for the slaves to catch up, 
and change it back to the default. This way no segments would be deleted 
until the system has reached a stable state.

-- m. tharp


Robert Haas wrote:
> On Fri, Apr 30, 2010 at 1:39 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Robert Haas wrote:
> >> On Fri, Apr 30, 2010 at 12:22 PM, Bruce Momjian <bruce@momjian.us> wrote:
> >> > Tom Lane wrote:
> >> >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> >> >> > Tom Lane wrote:
> >> >> >> If you aren't archiving then there's no guarantee that you'll still have
> >> >> >> a continuous WAL series starting from the start of the backup.
> >> >>
> >> >> > I wasn't really thinking of this use case, but you could set
> >> >> > wal_keep_segments "high enough".
> >> >>
> >> >> Ah. ?Okay, that seems like a workable approach, at least for people with
> >> >> reasonably predictable WAL loads. ?We could certainly improve on it
> >> >> later to make it more bulletproof, but it's usable now --- if we relax
> >> >> the error checks.
> >> >>
> >> >> (wal_keep_segments can be changed without restarting, right?)
> >> >
> >> > Should we allow -1 to mean "keep all segments"?
> >>
> >> If that's what you want to do, use archive_mode.
> >
> > Uh, I assume that will require me to store the WAL files somewhere else,
> > rather than keeping them in /pg_xlog, which I thought was the goal. ?Am
> > I missing something?
> 
> Well, one of us is.  Why would you want to retain all of your WAL logs
> in pg_xlog forever?

Well, this email thread mentioned a case where you needed to increase
wal_keep_segments to a sufficiently-high value, and of course figuring
out such a value is harder than just having a way of turning off
recycling with -1.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com


On Fri, 2010-04-30 at 13:52 -0400, Robert Haas wrote:
> On Fri, Apr 30, 2010 at 1:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote:
> >> >
> >> > (wal_keep_segments can be changed without restarting, right?)
> >>
> >> Should we allow -1 to mean "keep all segments"?
> >
> > Why is that not called "max_wal_segments"? wal_keep_segments sounds like
> > its been through Google translate.
> 
> Because it's not a maximum?

I see the thinking, but why would you ever set it to be something that
is *less* than the existing numbers? That would be pointless and indeed,
does nothing. The only time you touch it at all is when you set it to be
a value higher than the number of files that would normally be kept, and
when that is the case it *will* be the maximum.

So I say, max_wal_segments = 0 (default) meaning no limit, we just
rotate as needed. We put a comment in the docs to say that if a value is
selected less than 2*checkpoint_segments+1 then the value is overridden.

-- Simon Riggs           www.2ndQuadrant.com



On Fri, 2010-04-30 at 13:58 -0400, Bruce Momjian wrote:
> Robert Haas wrote:
> > On Fri, Apr 30, 2010 at 1:39 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > > Robert Haas wrote:
> > >> On Fri, Apr 30, 2010 at 12:22 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > >> > Tom Lane wrote:
> > >> >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > >> >> > Tom Lane wrote:
> > >> >> >> If you aren't archiving then there's no guarantee that you'll still have
> > >> >> >> a continuous WAL series starting from the start of the backup.
> > >> >>
> > >> >> > I wasn't really thinking of this use case, but you could set
> > >> >> > wal_keep_segments "high enough".
> > >> >>
> > >> >> Ah. ?Okay, that seems like a workable approach, at least for people with
> > >> >> reasonably predictable WAL loads. ?We could certainly improve on it
> > >> >> later to make it more bulletproof, but it's usable now --- if we relax
> > >> >> the error checks.
> > >> >>
> > >> >> (wal_keep_segments can be changed without restarting, right?)
> > >> >
> > >> > Should we allow -1 to mean "keep all segments"?
> > >>
> > >> If that's what you want to do, use archive_mode.
> > >
> > > Uh, I assume that will require me to store the WAL files somewhere else,
> > > rather than keeping them in /pg_xlog, which I thought was the goal. ?Am
> > > I missing something?
> > 
> > Well, one of us is.  Why would you want to retain all of your WAL logs
> > in pg_xlog forever?
> 
> Well, this email thread mentioned a case where you needed to increase
> wal_keep_segments to a sufficiently-high value, and of course figuring
> out such a value is harder than just having a way of turning off
> recycling with -1.

I think the only sensible setting is "as big as my (available) disk
space". Any higher and you're going to crash, any lower and you'll
invalidate your backup for no reason.

-1 emulates current behaviour, BTW

Still think we should rename it, in which case 0 is same as "no
maximum".

-- Simon Riggs           www.2ndQuadrant.com



Robert Haas wrote:
> On Fri, Apr 30, 2010 at 1:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote:
>>>> (wal_keep_segments can be changed without restarting, right?)
>>> Should we allow -1 to mean "keep all segments"?
>> Why is that not called "max_wal_segments"? wal_keep_segments sounds like
>> its been through Google translate.
> 
> Because it's not a maximum?

Yeah, min_wal_segments or something would make sense. It sounds about as
good or bad as wal_keep_segments to me.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Bruce Momjian wrote:
> Tom Lane wrote:
>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>>> Tom Lane wrote:
>>>> If you aren't archiving then there's no guarantee that you'll still have
>>>> a continuous WAL series starting from the start of the backup.
>>> I wasn't really thinking of this use case, but you could set
>>> wal_keep_segments "high enough".
>> Ah.  Okay, that seems like a workable approach, at least for people with
>> reasonably predictable WAL loads.  We could certainly improve on it
>> later to make it more bulletproof, but it's usable now --- if we relax
>> the error checks.
>>
>> (wal_keep_segments can be changed without restarting, right?)
> 
> Should we allow -1 to mean "keep all segments"?

Umm, you can't keep all segments around forever, can you? Surely you
have to recycle them sooner or later or you will run out of disk space.

I guess you could move that responsibility to a user-written script, but
we haven't traditionally encouraged or supported people to mess with the
contents of pg_xlog. That would require some more thinking IMHO, not 9.0
material.

In practice, you can just set wal_keep_segments to some ridiculously
high value to achieve the same result.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> Yeah, min_wal_segments or something would make sense.
Surely it would confuse people to see they have fewer than
min_wal_segments WAL segments.
-Kevin


Heikki Linnakangas wrote:
> Robert Haas wrote:
> > On Fri, Apr 30, 2010 at 1:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote:
> >>>> (wal_keep_segments can be changed without restarting, right?)
> >>> Should we allow -1 to mean "keep all segments"?
> >> Why is that not called "max_wal_segments"? wal_keep_segments sounds like
> >> its been through Google translate.
> > 
> > Because it's not a maximum?
> 
> Yeah, min_wal_segments or something would make sense. It sounds about as
> good or bad as wal_keep_segments to me.

I admit I never liked "keep" but couldn't think of better wording.  I do
like the proposed wording better.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com


Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Apr 30, 2010 at 1:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Why is that not called "max_wal_segments"? wal_keep_segments sounds like
>> its been through Google translate.

> Because it's not a maximum?

Indeed.  It would really be more like min_wal_segments, if we wanted to
name it that way.
        regards, tom lane


Heikki Linnakangas wrote:
> Bruce Momjian wrote:
> > Tom Lane wrote:
> >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> >>> Tom Lane wrote:
> >>>> If you aren't archiving then there's no guarantee that you'll still have
> >>>> a continuous WAL series starting from the start of the backup.
> >>> I wasn't really thinking of this use case, but you could set
> >>> wal_keep_segments "high enough".
> >> Ah.  Okay, that seems like a workable approach, at least for people with
> >> reasonably predictable WAL loads.  We could certainly improve on it
> >> later to make it more bulletproof, but it's usable now --- if we relax
> >> the error checks.
> >>
> >> (wal_keep_segments can be changed without restarting, right?)
> > 
> > Should we allow -1 to mean "keep all segments"?
> 
> Umm, you can't keep all segments around forever, can you? Surely you
> have to recycle them sooner or later or you will run out of disk space.
> 
> I guess you could move that responsibility to a user-written script, but
> we haven't traditionally encouraged or supported people to mess with the
> contents of pg_xlog. That would require some more thinking IMHO, not 9.0
> material.
> 
> In practice, you can just set wal_keep_segments to some ridiculously
> high value to achieve the same result.

Which is where my 'wal_keep_segments = -1' idea came from.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com


On Fri, Apr 30, 2010 at 2:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, 2010-04-30 at 13:52 -0400, Robert Haas wrote:
>> On Fri, Apr 30, 2010 at 1:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote:
>> >> >
>> >> > (wal_keep_segments can be changed without restarting, right?)
>> >>
>> >> Should we allow -1 to mean "keep all segments"?
>> >
>> > Why is that not called "max_wal_segments"? wal_keep_segments sounds like
>> > its been through Google translate.
>>
>> Because it's not a maximum?
>
> I see the thinking, but why would you ever set it to be something that
> is *less* than the existing numbers? That would be pointless and indeed,
> does nothing. The only time you touch it at all is when you set it to be
> a value higher than the number of files that would normally be kept, and
> when that is the case it *will* be the maximum.
>
> So I say, max_wal_segments = 0 (default) meaning no limit, we just
> rotate as needed. We put a comment in the docs to say that if a value is
> selected less than 2*checkpoint_segments+1 then the value is overridden.

As you were quick to point out to me earlier this week, I am not an
expert on our write-ahead logging system; however, I think you are
mistaken.   Perhaps Heikki could speak to the point more definitively,
but I believe that the number of segments that the system retains for
WAL archiving or crash recovery is variable.  The purpose of this
variable is to put a floor under the number of segments that are
retained so that SR slaves can catch up if they fall behind.  Of
course, if archiving is configured, they can do that anyway using
restore_command, but you might be running SR without archiving, or you
might just want to set this to a small value so that the slaves don't
have to keep switching between SR and archive recovery if segments get
archived or checkpointed away at inconvenient times.

It doesn't make a whole lot of sense to set the floor on the number of
segments retained to positive infinity, except in one specific case:
archiving is disabled, and you're trying to hang on to enough segments
in pg_xlog to take a hot backup.   As Tom said, it would be nice to
have a more elegant solution to that problem, but we can do that in a
future release; it's not really the primary purpose of
wal_keep_segments, anyway.  It certainly would not be a good idea to
make the default configuration "retain all WAL forever".  If you did
that, a user who sets up PostgreSQL and is not using SR or HS or hot
backups will eventually and inevitably fill up their hard disk.

...Robert


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Bruce Momjian wrote:
>> Should we allow -1 to mean "keep all segments"?

> Umm, you can't keep all segments around forever, can you? Surely you
> have to recycle them sooner or later or you will run out of disk space.

You couldn't use that as a permanent setting, but it can make sense
as a transient setting, rather than having to guess how much WAL you'll
need to keep while setting up a new standby.

> In practice, you can just set wal_keep_segments to some ridiculously
> high value to achieve the same result.

True.
        regards, tom lane


Bruce Momjian escribió:

> Which is where my 'wal_keep_segments = -1' idea came from.

Are you suggesting that -1 should mean "keep all segments that fit on
disk, but if creating a new segment fails with ENOSPC, recycle the
oldest one"?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Michael Tharp wrote:
> On 04/30/2010 01:53 PM, Robert Haas wrote:
>>
>> Well, one of us is.  Why would you want to retain all of your WAL logs
>> in pg_xlog forever?
> 
> To create or re-synchronize SR slaves, one could change
> wal_keep_segments to -1, run a backup, wait for the slaves to catch up,
> and change it back to the default. This way no segments would be deleted
> until the system has reached a stable state.

A slave can fall behind at any time, though. You would have to know to
set wal_keep_segments to -1 before that happens.

I've been thinking that in the future (read 9.1 or above), we would have
a system for registering slaves in the primary server. The primary would
keep track of how far each slave is, and refrain from removing WAL
segments that it knows to be still needed by a slave. On the flip-side,
the master wouldn't need to keep WAL around that it knows is no longer
needed by any slaves.

If someone has the energy, it would be possible to write a stand-alone
application to do that too. It could serve old WAL files from the
archive and rely recent ones from the real master.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Alvaro Herrera <alvherre@commandprompt.com> writes:
> Bruce Momjian escribi�:
>> Which is where my 'wal_keep_segments = -1' idea came from.

> Are you suggesting that -1 should mean "keep all segments that fit on
> disk, but if creating a new segment fails with ENOSPC, recycle the
> oldest one"?

No, keep means keep.  Even if there were some arguable use for "keep if
you can", a scheme like that would render the machine unusable ---
everything else on the same filesystem would be falling over.
        regards, tom lane


On Fri, 2010-04-30 at 14:42 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Fri, Apr 30, 2010 at 1:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> Why is that not called "max_wal_segments"? wal_keep_segments sounds like
> >> its been through Google translate.
> 
> > Because it's not a maximum?
> 
> Indeed.  It would really be more like min_wal_segments, if we wanted to
> name it that way.

Yeh, agreed: min_wal_segments. I realised while having dinner it was the
opposite, so I'm pleased everybody else got there at same time.

-- Simon Riggs           www.2ndQuadrant.com



Kevin Grittner wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
>  
>> Yeah, min_wal_segments or something would make sense.
>  
> Surely it would confuse people to see they have fewer than
> min_wal_segments WAL segments.

Umm, they wouldn't see that, that's the point of the setting. The
segments are not removed/recycled until there is min_wal_segments
segments in pg_xlog. Except in the beginning when you set or increase
the setting, when there isn't that many segments generated yet.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> Kevin Grittner wrote:
>> Surely it would confuse people to see they have fewer than
>> min_wal_segments WAL segments.
> 
> they wouldn't see that, that's the point of the setting.
I was thinking, in particular, about beginners poking around to see
how things look after an initdb.  Perhaps that state is too
transient to matter, but it struck me that you'd have fewer than the
minimum at the precise time a beginner might be likely to take a
look.  Unless on startup (and reload?) we created min_wal_segments
WAL segments if they didn't already exist.
-Kevin


On Fri, 2010-04-30 at 13:41 -0500, Kevin Grittner wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
>  
> > Yeah, min_wal_segments or something would make sense.
>  
> Surely it would confuse people to see they have fewer than
> min_wal_segments WAL segments.

That does sound like a reasonable argument, though it also applies to
wal_keep_segments, so isn't an argument either way. The user will be
equally confused to see fewer WAL files than they have asked to "keep".

min_wal_segments is much clearer, IMHO.

-- Simon Riggs           www.2ndQuadrant.com



Simon Riggs <simon@2ndQuadrant.com> wrote:
> On Fri, 2010-04-30 at 13:41 -0500, Kevin Grittner wrote:
>> Surely it would confuse people to see they have fewer than
>> min_wal_segments WAL segments.
> 
> That does sound like a reasonable argument, though it also applies
> to wal_keep_segments, so isn't an argument either way. The user
> will be equally confused to see fewer WAL files than they have
> asked to "keep".
The definitions of "keep" in my dictionary include "to restrain from
removal" and "to retain in one's possession".  It defines "minimum"
as "the least quantity assignable, admissible, or possible".  If I'm
understanding the semantics of this GUC (which I'll grant is not a
sure thing), "keep" does a better job of conveying the meaning,
since fewer than that are initially possible, but at least that many
will be *kept* once they exist.
I'm sure I'll figure it out at need, but the assertions that
"minimum" more clearly defines the purpose is shaking *my*
confidence that I understand what the GUC is for.
-Kevin


On Mon, May 3, 2010 at 2:54 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Simon Riggs <simon@2ndQuadrant.com> wrote:
>> On Fri, 2010-04-30 at 13:41 -0500, Kevin Grittner wrote:
>
>>> Surely it would confuse people to see they have fewer than
>>> min_wal_segments WAL segments.
>>
>> That does sound like a reasonable argument, though it also applies
>> to wal_keep_segments, so isn't an argument either way. The user
>> will be equally confused to see fewer WAL files than they have
>> asked to "keep".
>
> The definitions of "keep" in my dictionary include "to restrain from
> removal" and "to retain in one's possession".  It defines "minimum"
> as "the least quantity assignable, admissible, or possible".

It's really both of those things, so we could call it
wal_min_keep_segments, but I think an even better name would be
bikeshed_segments.

...Robert


> It's really both of those things, so we could call it
> wal_min_keep_segments, but I think an even better name would be
> bikeshed_segments.

Speaking from my UI perspective, I don't think users will care what we
call it.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Bruce Momjian wrote:
> Simon Riggs wrote:
> > On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote:
> > > > 
> > > > (wal_keep_segments can be changed without restarting, right?)
> > > 
> > > Should we allow -1 to mean "keep all segments"?
> > 
> > Why is that not called "max_wal_segments"? wal_keep_segments sounds like
> > its been through Google translate.
> 
> LOL, good one.
> 
> I assume it was done so it would start with 'wal', but I see
> 'max_wal_senders', which doesn't start with 'wal' and would match your
> suggestion exactly.  I think we should either rename 'wal_keep_segments'
> or 'max_wal_senders'.

Uh, did we decide that 'wal_keep_segments' was the best name for this
GUC setting?  I know we shipped beta1 using that name.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com


Bruce Momjian <bruce@momjian.us> writes:
> Uh, did we decide that 'wal_keep_segments' was the best name for this
> GUC setting?  I know we shipped beta1 using that name.

I thought min_wal_segments was a reasonable proposal, but it wasn't
clear if there was consensus or not.
        regards, tom lane


On Sat, May 8, 2010 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> Uh, did we decide that 'wal_keep_segments' was the best name for this
>> GUC setting?  I know we shipped beta1 using that name.
>
> I thought min_wal_segments was a reasonable proposal, but it wasn't
> clear if there was consensus or not.

I think most people thought it was another reasonable choice, but I
think the consensus position is probably something like "it's about
the same" rather than "it's definitely better".  We had one or two
people with stronger opinions than that on either side, I believe.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


On Sat, 2010-05-08 at 23:55 -0400, Robert Haas wrote:
> On Sat, May 8, 2010 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> >> Uh, did we decide that 'wal_keep_segments' was the best name for this
> >> GUC setting?  I know we shipped beta1 using that name.
> >
> > I thought min_wal_segments was a reasonable proposal, but it wasn't
> > clear if there was consensus or not.
> 
> I think most people thought it was another reasonable choice, but I
> think the consensus position is probably something like "it's about
> the same" rather than "it's definitely better".  We had one or two
> people with stronger opinions than that on either side, I believe.

It's only a name and not worth a long discussion on.

-- Simon Riggs           www.2ndQuadrant.com



Robert Haas wrote:
> On Sat, May 8, 2010 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> >> Uh, did we decide that 'wal_keep_segments' was the best name for this
> >> GUC setting? ?I know we shipped beta1 using that name.
> >
> > I thought min_wal_segments was a reasonable proposal, but it wasn't
> > clear if there was consensus or not.
> 
> I think most people thought it was another reasonable choice, but I
> think the consensus position is probably something like "it's about
> the same" rather than "it's definitely better".  We had one or two
> people with stronger opinions than that on either side, I believe.

Agreed the current name seems OK.  However, was there agreement that
wal_keep_segments = -1 should keep all WAL segements?  I can see that as
useful for cases where you are doing a dump to be transfered to the
slave, and not using archive_command.  This avoids the need for the "set
a huge value" solution.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + None of us is going to be here forever. +



Allow wal_keep_segments to keep all segments

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Robert Haas wrote:
> > On Sat, May 8, 2010 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > Bruce Momjian <bruce@momjian.us> writes:
> > >> Uh, did we decide that 'wal_keep_segments' was the best name for this
> > >> GUC setting? ?I know we shipped beta1 using that name.
> > >
> > > I thought min_wal_segments was a reasonable proposal, but it wasn't
> > > clear if there was consensus or not.
> >
> > I think most people thought it was another reasonable choice, but I
> > think the consensus position is probably something like "it's about
> > the same" rather than "it's definitely better".  We had one or two
> > people with stronger opinions than that on either side, I believe.
>
> Agreed the current name seems OK.  However, was there agreement that
> wal_keep_segments = -1 should keep all WAL segements?  I can see that as
> useful for cases where you are doing a dump to be transfered to the
> slave, and not using archive_command.  This avoids the need for the "set
> a huge value" solution.

The attached patch allows wal_keep_segments = -1 to keep all segements;
this is particularly useful for taking a base backup, where you need all
the WAL files during startup of the standby.  I have documented this
usage in the patch as well.

I am thinking of applying this after 9.0 beta2 if there is no objection.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +
Index: doc/src/sgml/config.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.280
diff -c -c -r1.280 config.sgml
*** doc/src/sgml/config.sgml    31 May 2010 15:50:48 -0000    1.280
--- doc/src/sgml/config.sgml    2 Jun 2010 19:19:18 -0000
***************
*** 1887,1893 ****
          Specifies the number of past log file segments kept in the
          <filename>pg_xlog</>
          directory, in case a standby server needs to fetch them for streaming
!         replication. Each segment is normally 16 megabytes. If a standby
          server connected to the primary falls behind by more than
          <varname>wal_keep_segments</> segments, the primary might remove
          a WAL segment still needed by the standby, in which case the
--- 1887,1893 ----
          Specifies the number of past log file segments kept in the
          <filename>pg_xlog</>
          directory, in case a standby server needs to fetch them for streaming
!         replication.  Each segment is normally 16 megabytes. If a standby
          server connected to the primary falls behind by more than
          <varname>wal_keep_segments</> segments, the primary might remove
          a WAL segment still needed by the standby, in which case the
***************
*** 1901,1908 ****
          is zero (the default), the system doesn't keep any extra segments
          for standby purposes, and the number of old WAL segments available
          for standbys is determined based only on the location of the previous
!         checkpoint and status of WAL archiving.
!         This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
         </para>
         </listitem>
--- 1901,1909 ----
          is zero (the default), the system doesn't keep any extra segments
          for standby purposes, and the number of old WAL segments available
          for standbys is determined based only on the location of the previous
!         checkpoint and status of WAL archiving.  If <literal>-1</> is
!         specified, log file segments are kept indefinitely. This
!         parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
         </para>
         </listitem>
Index: doc/src/sgml/high-availability.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/high-availability.sgml,v
retrieving revision 1.70
diff -c -c -r1.70 high-availability.sgml
*** doc/src/sgml/high-availability.sgml    29 May 2010 09:01:10 -0000    1.70
--- doc/src/sgml/high-availability.sgml    2 Jun 2010 19:19:19 -0000
***************
*** 750,756 ****
      If you use streaming replication without file-based continuous
      archiving, you have to set <varname>wal_keep_segments</> in the master
      to a value high enough to ensure that old WAL segments are not recycled
!     too early, while the standby might still need them to catch up. If the
      standby falls behind too much, it needs to be reinitialized from a new
      base backup. If you set up a WAL archive that's accessible from the
      standby, wal_keep_segments is not required as the standby can always
--- 750,760 ----
      If you use streaming replication without file-based continuous
      archiving, you have to set <varname>wal_keep_segments</> in the master
      to a value high enough to ensure that old WAL segments are not recycled
!     too early, while the standby might still need them to catch up. This
!     is particularly important when performing a base backup because the
!     standby will need all WAL segments generated since the start of the
!     backup;  consider setting <varname>wal_keep_segments</> to
!     <literal>-1</> temporarily in such cases.  If the
      standby falls behind too much, it needs to be reinitialized from a new
      base backup. If you set up a WAL archive that's accessible from the
      standby, wal_keep_segments is not required as the standby can always
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.414
diff -c -c -r1.414 xlog.c
*** src/backend/access/transam/xlog.c    27 May 2010 00:38:39 -0000    1.414
--- src/backend/access/transam/xlog.c    2 Jun 2010 19:19:20 -0000
***************
*** 7339,7345 ****
       * Delete old log files (those no longer needed even for previous
       * checkpoint or the standbys in XLOG streaming).
       */
!     if (_logId || _logSeg)
      {
          /*
           * Calculate the last segment that we need to retain because of
--- 7339,7345 ----
       * Delete old log files (those no longer needed even for previous
       * checkpoint or the standbys in XLOG streaming).
       */
!     if ((_logId || _logSeg) && wal_keep_segments != -1)
      {
          /*
           * Calculate the last segment that we need to retain because of
Index: src/backend/utils/misc/guc.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v
retrieving revision 1.554
diff -c -c -r1.554 guc.c
*** src/backend/utils/misc/guc.c    2 May 2010 02:10:33 -0000    1.554
--- src/backend/utils/misc/guc.c    2 Jun 2010 19:19:22 -0000
***************
*** 1661,1667 ****
              NULL
          },
          &wal_keep_segments,
!         0, 0, INT_MAX, NULL, NULL
      },

      {
--- 1661,1667 ----
              NULL
          },
          &wal_keep_segments,
!         0, -1, INT_MAX, NULL, NULL
      },

      {

Re: Allow wal_keep_segments to keep all segments

From
Robert Haas
Date:
On Wed, Jun 2, 2010 at 3:20 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Bruce Momjian wrote:
>> Robert Haas wrote:
>> > On Sat, May 8, 2010 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > > Bruce Momjian <bruce@momjian.us> writes:
>> > >> Uh, did we decide that 'wal_keep_segments' was the best name for this
>> > >> GUC setting? ?I know we shipped beta1 using that name.
>> > >
>> > > I thought min_wal_segments was a reasonable proposal, but it wasn't
>> > > clear if there was consensus or not.
>> >
>> > I think most people thought it was another reasonable choice, but I
>> > think the consensus position is probably something like "it's about
>> > the same" rather than "it's definitely better".  We had one or two
>> > people with stronger opinions than that on either side, I believe.
>>
>> Agreed the current name seems OK.  However, was there agreement that
>> wal_keep_segments = -1 should keep all WAL segements?  I can see that as
>> useful for cases where you are doing a dump to be transfered to the
>> slave, and not using archive_command.  This avoids the need for the "set
>> a huge value" solution.
>
> The attached patch allows wal_keep_segments = -1 to keep all segements;
> this is particularly useful for taking a base backup, where you need all
> the WAL files during startup of the standby.  I have documented this
> usage in the patch as well.
>
> I am thinking of applying this after 9.0 beta2 if there is no objection.

+1 for the patch, but why wait until after beta2?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Allow wal_keep_segments to keep all segments

From
Simon Riggs
Date:
On Wed, 2010-06-02 at 15:20 -0400, Bruce Momjian wrote:

> The attached patch allows wal_keep_segments = -1 to keep all segements; 
> this is particularly useful for taking a base backup, where you need all
> the WAL files during startup of the standby.  I have documented this
> usage in the patch as well.
> 
> I am thinking of applying this after 9.0 beta2 if there is no objection.

It's not clear to me why "keep all files until server breaks" is a good
setting. Surely you would set this parameter to the size of your disk.
Why allow it to go higher?

-- Simon Riggs           www.2ndQuadrant.com



Re: Allow wal_keep_segments to keep all segments

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Wed, 2010-06-02 at 15:20 -0400, Bruce Momjian wrote:
> 
> > The attached patch allows wal_keep_segments = -1 to keep all segements; 
> > this is particularly useful for taking a base backup, where you need all
> > the WAL files during startup of the standby.  I have documented this
> > usage in the patch as well.
> > 
> > I am thinking of applying this after 9.0 beta2 if there is no objection.
> 
> It's not clear to me why "keep all files until server breaks" is a good
> setting. Surely you would set this parameter to the size of your disk.
> Why allow it to go higher?

Well, the -1 allows them to set it temporarily without having to compute
their free disk space.  Frankly, because the disk space varies, it is
impossible to know exactly how large the disk is at the time it would
fill up.

I think the normal computation would be:
1) How long is my file system backup and restore to standby   going to take2) How often do I generate a 16MB WAL file

You would do some computation to figure that out, then maybe multiply it
by 10x and set that for wal_keep_segments.  I figured allowing a simple
-1 would be easier.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + None of us is going to be here forever. +


Re: Allow wal_keep_segments to keep all segments

From
Bruce Momjian
Date:
Robert Haas wrote:
> > The attached patch allows wal_keep_segments = -1 to keep all segements;
> > this is particularly useful for taking a base backup, where you need all
> > the WAL files during startup of the standby. ?I have documented this
> > usage in the patch as well.
> >
> > I am thinking of applying this after 9.0 beta2 if there is no objection.
> 
> +1 for the patch, but why wait until after beta2?

I wanted to give people enough time to review/discuss it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + None of us is going to be here forever. +


Re: Allow wal_keep_segments to keep all segments

From
Simon Riggs
Date:
On Wed, 2010-06-02 at 20:28 -0400, Bruce Momjian wrote:
> Simon Riggs wrote:
> > On Wed, 2010-06-02 at 15:20 -0400, Bruce Momjian wrote:
> > 
> > > The attached patch allows wal_keep_segments = -1 to keep all segements; 
> > > this is particularly useful for taking a base backup, where you need all
> > > the WAL files during startup of the standby.  I have documented this
> > > usage in the patch as well.
> > > 
> > > I am thinking of applying this after 9.0 beta2 if there is no objection.
> > 
> > It's not clear to me why "keep all files until server breaks" is a good
> > setting. Surely you would set this parameter to the size of your disk.
> > Why allow it to go higher?
> 
> Well, the -1 allows them to set it temporarily without having to compute
> their free disk space.  Frankly, because the disk space varies, it is
> impossible to know exactly how large the disk is at the time it would
> fill up.
> 
> I think the normal computation would be:
> 
>     1) How long is my file system backup and restore to standby
>        going to take
>     2) How often do I generate a 16MB WAL file
> 
> You would do some computation to figure that out, then maybe multiply it
> by 10x and set that for wal_keep_segments.  I figured allowing a simple
> -1 would be easier.

I think its much easier to find out your free disk space than it is to
calculate how much WAL might be generated during backup. Disk space
doesn't vary significantly on a production database.

If we encourage that laziness then we will get reports that replication
doesn't work and Postgres crashes.

-- Simon Riggs           www.2ndQuadrant.com



Re: Allow wal_keep_segments to keep all segments

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Wed, 2010-06-02 at 20:28 -0400, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > On Wed, 2010-06-02 at 15:20 -0400, Bruce Momjian wrote:
> > > 
> > > > The attached patch allows wal_keep_segments = -1 to keep all segements; 
> > > > this is particularly useful for taking a base backup, where you need all
> > > > the WAL files during startup of the standby.  I have documented this
> > > > usage in the patch as well.
> > > > 
> > > > I am thinking of applying this after 9.0 beta2 if there is no objection.
> > > 
> > > It's not clear to me why "keep all files until server breaks" is a good
> > > setting. Surely you would set this parameter to the size of your disk.
> > > Why allow it to go higher?
> > 
> > Well, the -1 allows them to set it temporarily without having to compute
> > their free disk space.  Frankly, because the disk space varies, it is
> > impossible to know exactly how large the disk is at the time it would
> > fill up.
> > 
> > I think the normal computation would be:
> > 
> >     1) How long is my file system backup and restore to standby
> >        going to take
> >     2) How often do I generate a 16MB WAL file
> > 
> > You would do some computation to figure that out, then maybe multiply it
> > by 10x and set that for wal_keep_segments.  I figured allowing a simple
> > -1 would be easier.
> 
> I think its much easier to find out your free disk space than it is to
> calculate how much WAL might be generated during backup. Disk space
> doesn't vary significantly on a production database.
> 
> If we encourage that laziness then we will get reports that replication
> doesn't work and Postgres crashes.

Well, we don't clean out the archive directory so I don't see this as
anything new.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + None of us is going to be here forever. +


Re: Allow wal_keep_segments to keep all segments

From
Heikki Linnakangas
Date:
On 03/06/10 15:15, Bruce Momjian wrote:
> Simon Riggs wrote:
>> I think its much easier to find out your free disk space than it is to
>> calculate how much WAL might be generated during backup. Disk space
>> doesn't vary significantly on a production database.
>>
>> If we encourage that laziness then we will get reports that replication
>> doesn't work and Postgres crashes.
>
> Well, we don't clean out the archive directory so I don't see this as
> anything new.

We leave that up to the DBA to clean out one way or another. We provide 
restartpoint_command and the %r option in restore_command to help with that.

Surely we don't expect DBAs to delete old files in pg_xlog? I agree with 
Simon here, I think it would be better to not provide -1 as an option 
here. At least you better document well that you should only do that 
temporarily or you will eventually run out of disk space.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Allow wal_keep_segments to keep all segments

From
Bruce Momjian
Date:
Heikki Linnakangas wrote:
> On 03/06/10 15:15, Bruce Momjian wrote:
> > Simon Riggs wrote:
> >> I think its much easier to find out your free disk space than it is to
> >> calculate how much WAL might be generated during backup. Disk space
> >> doesn't vary significantly on a production database.
> >>
> >> If we encourage that laziness then we will get reports that replication
> >> doesn't work and Postgres crashes.
> >
> > Well, we don't clean out the archive directory so I don't see this as
> > anything new.
>
> We leave that up to the DBA to clean out one way or another. We provide
> restartpoint_command and the %r option in restore_command to help with that.
>
> Surely we don't expect DBAs to delete old files in pg_xlog? I agree with
> Simon here, I think it would be better to not provide -1 as an option
> here. At least you better document well that you should only do that
> temporarily or you will eventually run out of disk space.

Using this only temporarily is mentioned in the doc patch.  Do I need
more?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +
Index: doc/src/sgml/config.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.280
diff -c -c -r1.280 config.sgml
*** doc/src/sgml/config.sgml    31 May 2010 15:50:48 -0000    1.280
--- doc/src/sgml/config.sgml    2 Jun 2010 19:19:18 -0000
***************
*** 1887,1893 ****
          Specifies the number of past log file segments kept in the
          <filename>pg_xlog</>
          directory, in case a standby server needs to fetch them for streaming
!         replication. Each segment is normally 16 megabytes. If a standby
          server connected to the primary falls behind by more than
          <varname>wal_keep_segments</> segments, the primary might remove
          a WAL segment still needed by the standby, in which case the
--- 1887,1893 ----
          Specifies the number of past log file segments kept in the
          <filename>pg_xlog</>
          directory, in case a standby server needs to fetch them for streaming
!         replication.  Each segment is normally 16 megabytes. If a standby
          server connected to the primary falls behind by more than
          <varname>wal_keep_segments</> segments, the primary might remove
          a WAL segment still needed by the standby, in which case the
***************
*** 1901,1908 ****
          is zero (the default), the system doesn't keep any extra segments
          for standby purposes, and the number of old WAL segments available
          for standbys is determined based only on the location of the previous
!         checkpoint and status of WAL archiving.
!         This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
         </para>
         </listitem>
--- 1901,1909 ----
          is zero (the default), the system doesn't keep any extra segments
          for standby purposes, and the number of old WAL segments available
          for standbys is determined based only on the location of the previous
!         checkpoint and status of WAL archiving.  If <literal>-1</> is
!         specified, log file segments are kept indefinitely. This
!         parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
         </para>
         </listitem>
Index: doc/src/sgml/high-availability.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/high-availability.sgml,v
retrieving revision 1.70
diff -c -c -r1.70 high-availability.sgml
*** doc/src/sgml/high-availability.sgml    29 May 2010 09:01:10 -0000    1.70
--- doc/src/sgml/high-availability.sgml    2 Jun 2010 19:19:19 -0000
***************
*** 750,756 ****
      If you use streaming replication without file-based continuous
      archiving, you have to set <varname>wal_keep_segments</> in the master
      to a value high enough to ensure that old WAL segments are not recycled
!     too early, while the standby might still need them to catch up. If the
      standby falls behind too much, it needs to be reinitialized from a new
      base backup. If you set up a WAL archive that's accessible from the
      standby, wal_keep_segments is not required as the standby can always
--- 750,760 ----
      If you use streaming replication without file-based continuous
      archiving, you have to set <varname>wal_keep_segments</> in the master
      to a value high enough to ensure that old WAL segments are not recycled
!     too early, while the standby might still need them to catch up. This
!     is particularly important when performing a base backup because the
!     standby will need all WAL segments generated since the start of the
!     backup;  consider setting <varname>wal_keep_segments</> to
!     <literal>-1</> temporarily in such cases.  If the
      standby falls behind too much, it needs to be reinitialized from a new
      base backup. If you set up a WAL archive that's accessible from the
      standby, wal_keep_segments is not required as the standby can always
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.414
diff -c -c -r1.414 xlog.c
*** src/backend/access/transam/xlog.c    27 May 2010 00:38:39 -0000    1.414
--- src/backend/access/transam/xlog.c    2 Jun 2010 19:19:20 -0000
***************
*** 7339,7345 ****
       * Delete old log files (those no longer needed even for previous
       * checkpoint or the standbys in XLOG streaming).
       */
!     if (_logId || _logSeg)
      {
          /*
           * Calculate the last segment that we need to retain because of
--- 7339,7345 ----
       * Delete old log files (those no longer needed even for previous
       * checkpoint or the standbys in XLOG streaming).
       */
!     if ((_logId || _logSeg) && wal_keep_segments != -1)
      {
          /*
           * Calculate the last segment that we need to retain because of
Index: src/backend/utils/misc/guc.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v
retrieving revision 1.554
diff -c -c -r1.554 guc.c
*** src/backend/utils/misc/guc.c    2 May 2010 02:10:33 -0000    1.554
--- src/backend/utils/misc/guc.c    2 Jun 2010 19:19:22 -0000
***************
*** 1661,1667 ****
              NULL
          },
          &wal_keep_segments,
!         0, 0, INT_MAX, NULL, NULL
      },

      {
--- 1661,1667 ----
              NULL
          },
          &wal_keep_segments,
!         0, -1, INT_MAX, NULL, NULL
      },

      {

Re: Allow wal_keep_segments to keep all segments

From
Bruce Momjian
Date:
Heikki Linnakangas wrote:
> On 03/06/10 15:15, Bruce Momjian wrote:
> > Simon Riggs wrote:
> >> I think its much easier to find out your free disk space than it is to
> >> calculate how much WAL might be generated during backup. Disk space
> >> doesn't vary significantly on a production database.
> >>
> >> If we encourage that laziness then we will get reports that replication
> >> doesn't work and Postgres crashes.
> >
> > Well, we don't clean out the archive directory so I don't see this as
> > anything new.
>
> We leave that up to the DBA to clean out one way or another. We provide
> restartpoint_command and the %r option in restore_command to help with that.
>
> Surely we don't expect DBAs to delete old files in pg_xlog? I agree with
> Simon here, I think it would be better to not provide -1 as an option
> here. At least you better document well that you should only do that
> temporarily or you will eventually run out of disk space.

I have updated the doc text to mention "temporarily" everywhere '-1' is
mentioned.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +
Index: doc/src/sgml/config.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.280
diff -c -c -r1.280 config.sgml
*** doc/src/sgml/config.sgml    31 May 2010 15:50:48 -0000    1.280
--- doc/src/sgml/config.sgml    3 Jun 2010 14:05:21 -0000
***************
*** 1901,1908 ****
          is zero (the default), the system doesn't keep any extra segments
          for standby purposes, and the number of old WAL segments available
          for standbys is determined based only on the location of the previous
!         checkpoint and status of WAL archiving.
!         This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
         </para>
         </listitem>
--- 1901,1909 ----
          is zero (the default), the system doesn't keep any extra segments
          for standby purposes, and the number of old WAL segments available
          for standbys is determined based only on the location of the previous
!         checkpoint and status of WAL archiving.  To temporarily keep
!         all log file segments, use the value <literal>-1</>. This
!         parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
         </para>
         </listitem>
Index: doc/src/sgml/high-availability.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/high-availability.sgml,v
retrieving revision 1.70
diff -c -c -r1.70 high-availability.sgml
*** doc/src/sgml/high-availability.sgml    29 May 2010 09:01:10 -0000    1.70
--- doc/src/sgml/high-availability.sgml    3 Jun 2010 14:05:21 -0000
***************
*** 750,756 ****
      If you use streaming replication without file-based continuous
      archiving, you have to set <varname>wal_keep_segments</> in the master
      to a value high enough to ensure that old WAL segments are not recycled
!     too early, while the standby might still need them to catch up. If the
      standby falls behind too much, it needs to be reinitialized from a new
      base backup. If you set up a WAL archive that's accessible from the
      standby, wal_keep_segments is not required as the standby can always
--- 750,760 ----
      If you use streaming replication without file-based continuous
      archiving, you have to set <varname>wal_keep_segments</> in the master
      to a value high enough to ensure that old WAL segments are not recycled
!     too early, while the standby might still need them to catch up. This
!     is particularly important when performing a base backup because the
!     standby will need all WAL segments generated since the start of the
!     backup;  consider setting <varname>wal_keep_segments</> to
!     <literal>-1</> temporarily in such cases.  If the
      standby falls behind too much, it needs to be reinitialized from a new
      base backup. If you set up a WAL archive that's accessible from the
      standby, wal_keep_segments is not required as the standby can always
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.415
diff -c -c -r1.415 xlog.c
*** src/backend/access/transam/xlog.c    2 Jun 2010 09:28:44 -0000    1.415
--- src/backend/access/transam/xlog.c    3 Jun 2010 14:05:22 -0000
***************
*** 7337,7343 ****
       * Delete old log files (those no longer needed even for previous
       * checkpoint or the standbys in XLOG streaming).
       */
!     if (_logId || _logSeg)
      {
          /*
           * Calculate the last segment that we need to retain because of
--- 7337,7343 ----
       * Delete old log files (those no longer needed even for previous
       * checkpoint or the standbys in XLOG streaming).
       */
!     if ((_logId || _logSeg) && wal_keep_segments != -1)
      {
          /*
           * Calculate the last segment that we need to retain because of
Index: src/backend/utils/misc/guc.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v
retrieving revision 1.554
diff -c -c -r1.554 guc.c
*** src/backend/utils/misc/guc.c    2 May 2010 02:10:33 -0000    1.554
--- src/backend/utils/misc/guc.c    3 Jun 2010 14:05:25 -0000
***************
*** 1661,1667 ****
              NULL
          },
          &wal_keep_segments,
!         0, 0, INT_MAX, NULL, NULL
      },

      {
--- 1661,1667 ----
              NULL
          },
          &wal_keep_segments,
!         0, -1, INT_MAX, NULL, NULL
      },

      {

Re: Allow wal_keep_segments to keep all segments

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Heikki Linnakangas wrote:
>> Surely we don't expect DBAs to delete old files in pg_xlog? I agree with 
>> Simon here, I think it would be better to not provide -1 as an option 
>> here. At least you better document well that you should only do that 
>> temporarily or you will eventually run out of disk space.

> I have updated the doc text to mention "temporarily" everywhere '-1' is
> mentioned.

FWIW, I've come to agree with Simon.  Allowing -1 doesn't do anything
that you can't do with a large positive setting, and what it does do
is to encourage people to set the variable to an unsafe value as a
substitute for thinking.
        regards, tom lane


Re: Allow wal_keep_segments to keep all segments

From
Alvaro Herrera
Date:
Excerpts from Bruce Momjian's message of jue jun 03 08:36:28 -0400 2010:

> Using this only temporarily is mentioned in the doc patch.  Do I need
> more?

Yeah, it's far too easy to miss.  Besides, I think the wording you used
is ambiguous -- it can be read as "the server will temporarily keep all
segments if you set it to -1", which is not the same thing at all.  If
you can't add a 20-point-font red blinking warning with a pink dancing
elephant in a tutu, maybe it's best to not offer the dangerous setting
in the first place.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Allow wal_keep_segments to keep all segments

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Excerpts from Bruce Momjian's message of jue jun 03 08:36:28 -0400 2010:
> 
> > Using this only temporarily is mentioned in the doc patch.  Do I need
> > more?
> 
> Yeah, it's far too easy to miss.  Besides, I think the wording you used
> is ambiguous -- it can be read as "the server will temporarily keep all
> segments if you set it to -1", which is not the same thing at all.  If
> you can't add a 20-point-font red blinking warning with a pink dancing
> elephant in a tutu, maybe it's best to not offer the dangerous setting
> in the first place.

Well, it seems enough people don't want this features that I am not
going to add it.  If we decide we want it later, we can add it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + None of us is going to be here forever. +


Re: Allow wal_keep_segments to keep all segments

From
Andrew Dunstan
Date:

Heikki Linnakangas wrote:
>
> We leave that up to the DBA to clean out one way or another. We 
> provide restartpoint_command and the %r option in restore_command to 
> help with that.
>
>

I was in fact just looking into this, and I see that there is no example 
restartpoint_comand script given in the docs, nor in the wiki.

A sample of such a command would be useful. This is all going to feel a 
bit strange to lots of users, and the more we can hold their hands the 
better off we and they will be.

cheers

andrew