Re: pg_start_backup('label',true) why do I need 2nd parameter? - Mailing list pgsql-general

From Bill Moran
Subject Re: pg_start_backup('label',true) why do I need 2nd parameter?
Date
Msg-id 20131106070305.8e2975540ab1d686cc40147b@potentialtech.com
Whole thread Raw
In response to Re: pg_start_backup('label',true) why do I need 2nd parameter?  (David Johnston <polobo@yahoo.com>)
List pgsql-general
On Tue, 5 Nov 2013 19:27:52 -0800 (PST)
David Johnston <polobo@yahoo.com> wrote:

> Bill Moran wrote
> > How long that takes is a factor of other settings (as David mentioned) and
> > also depedent on what other transactions may be running.
>
> While I am inclined to believe this is true the documentation is unclear
> that "other transactions" have any bearing on the delay.  All the
> documentation says is that the checkpoint I/O will be spread out over time.
> Period.  I could see where if there is no pending checkpoint I/O to perform
> that it will return immediately but does having 100MB of I/O to perform,
> versus 10MB of I/O to perform, cause the delay to increase 9-fold up to a
> maximum of whatever timeframe is configured?
>
> The wording implies that the delay, say 2.5 minutes by default (if I am
> reading that right), will be used regardless so the system will incur a rate
> of 4MB/min of checkpoint I/O in the better case and 40MB/min of checkpoint
> I/O for the worse case.
>
> The other possibility is that there is a floor of 10MB/min of checkpoint I/O
> so the first example only takes 1 minute to return (not 2.5) while the
> second uses the entire allotted time and also must increase the I/O rate.
>
> I'm not sure the precise algorithm needs to be documented but "can take a
> long time to finish" seems to be taking to the other extreme.  Assuming one
> of the two examples above is correct including such an example in the
> documentation (i.e., comparing 0MB, 10MB, and 100MB of pending checkpoint
> I/O) is a thought.  Also, is there a way to query how much checkpoint I/O
> currently is outstanding?  If so, and the value is meaningful to this
> determination, a cross-reference to said information would be useful.
>
> Also, assuming the algorithm is fairly constant having it documented at this
> level, with maybe an example query, would allow people to calculate roughly
> the amount of time the "false" call will take to return.

Obviously, opinions may differ, but ...

I don't understand, in the slightest, your focus on this.  In my experience,
if you're running backups, you're willing to wait a little while.  Especially
if you're handling the extra administrative overhead of doing wal-logged
backups, you're probably waiting hours for them to complete, so what's an
extra minute or two?  I have trouble believing that the second parameter is
ever necessary at all.

That being said, details of the algorithm alone is not going to tell anyone
how long it's going to take.  On a quiet server with fast disks, the wait
is (in my experience) not noticable.  Of course, at any time a transaction
running could cause it to be very noticable, and a busy server could cause
it to always take a while.

But the question you post sounds frighteningly like questions I get from
developers on a regular basis: why does my query run fast when I test it
but slow in production.  The answer is complicated and is actually different
every time it's asked, but the high-level answer that nobody seems to
accept is this: on a computer system that is doing many tasks, the
different tasks impact each other, and it's frequently difficult to
understand the nature of those interactions and their impact.

I mean, who makes a backup and sits and watches it?  I "make" dozens of
backups each day, and I never bother to watch any of them -- I get an
email if one of them fails and once a quarter we run through a recovery
drill to make sure everything is working for really-reals.  These are
the kinds of things that (in my experience) are important and deserve
my attention ... how long it takes for the backup process to windup is
not, unless it's such an incredibly long time (on the order of hours)
that it interferes with scheduling.

I guess what I'm saying, is that from my standpoint I can't imagine a
way of improving the documentation that wouldn't either become horribly
wordy without actually helping, or even more confusing.  Perhaps someone
smarter than me can come up with something, though.

--
Bill Moran <wmoran@potentialtech.com>


pgsql-general by date:

Previous
From: Albe Laurenz
Date:
Subject: Re: Row Level Access
Next
From: Leonardo Carneiro
Date:
Subject: Re: Is it advisable to pg_upgrade directly from 9.0 to 9.3?