Thread: pg_start_backup('label',true) why do I need 2nd parameter?

pg_start_backup('label',true) why do I need 2nd parameter?

From
AI Rumman
Date:
Hi all,

A few days back, I faced a problem where I pg_start_backup('label') was hang in the server forever.
I stopped the process and then used pg_start_backup('label',true) and  it worked.

Now I am trying to investigate why I need to use true as second parameter and read the doc
"There is an optional second parameter of type boolean. If true, it specifies executing pg_start_backup as quickly as possible. This forces an immediate checkpoint which will cause a spike in I/O operations, slowing any concurrently executing queries." 

I tried to regenerate the scenario where  pg_start_backup('label')  hanged and I failed.
Any idea, how can I regenerate that issue to investigate.

I am using Postgresql 9.2.

Thanks.

Re: pg_start_backup('label',true) why do I need 2nd parameter?

From
David Johnston
Date:
rummandba wrote
> A few days back, I faced a problem where I *pg_start_backup('label') *was
> hang in the server forever.

so, the server is still hung?  can you define what you mean by "hung" - what
did and did not work?


> I stopped the process and then used *pg_start_backup('label',true) *and
> it
> worked.

Ok, so not forever :)  How long then?  Any guess you give will be more
accurate than "forever".

So when you told it to "start now" it did but when you told it "start
whenever you are ready" is said "ok, I'll get back to ya." ...


> This forces an
> immediate checkpoint which will cause a spike in I/O operations, slowing
> any concurrently executing queries."

From the same documentation (section 24.3.3):


> By default, pg_start_backup can take a long time to finish. This is
> because it performs a checkpoint, and the I/O required for the checkpoint
> will be spread out over a significant period of time, by default half your
> inter-checkpoint interval (see the configuration parameter
> checkpoint_completion_target). This is usually what you want, because it
> minimizes the impact on query processing. If you want to start the backup
> as soon as possible, use: [the true parameter]

So assuming defaults you would be expected to wait about 2.5 minutes:

5 minutes default checkpoint_timeout
times .5 default checkpoint_completion_target

[note, I am confused by the wording above if there should be another .50
multiplier]

That is significantly less than forever but it may be more time that you
gave it before cancelling the process...and that assumes you are using
defaults.

Without knowing the setting for "checkpoint_timeout" (and completion target,
but you are less likely to have changed that) it is impossible to know
whether your finite wait was longer than could be expected.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-start-backup-label-true-why-do-I-need-2nd-parameter-tp5777145p5777153.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: pg_start_backup('label',true) why do I need 2nd parameter?

From
Bill Moran
Date:
On Tue, 5 Nov 2013 15:30:19 -0800 AI Rumman <rummandba@gmail.com> wrote:

> Hi all,
>
> A few days back, I faced a problem where I *pg_start_backup('label') *was
> hang in the server forever.
> I stopped the process and then used *pg_start_backup('label',true) *and  it
> worked.
>
> Now I am trying to investigate why I need to use true as second parameter
> and read the doc
> "There is an optional second parameter of type boolean. If true, it
> specifies executing pg_start_backup as quickly as possible. This forces an
> immediate checkpoint which will cause a spike in I/O operations, slowing
> any concurrently executing queries."

To add to what David said in his email ... essentially with the second
parameter at false (which is the same as omitting it) you are telling
the system to not interfere with other concurrently running processes
and take as much time as necessary to prep the system without interfering.
How long that takes is a factor of other settings (as David mentioned) and
also depedent on what other transactions may be running.

With the second parameter at true, you're telling Postgres to execute the
start backup as fast as possible and if other queries are temporarily
slow or unresponsive while it's getting there, that's OK.

As far as exactly replicating the scenerio, you would need to know what
else was running on the server at the time to exactly replicate it.

>
> I tried to regenerate the scenario where  *pg_start_backup('label')
> *hanged and
> I failed.
> Any idea, how can I regenerate that issue to investigate.
>
> I am using Postgresql 9.2.
>
> Thanks.


--
Bill Moran <wmoran@potentialtech.com>


Re: pg_start_backup('label',true) why do I need 2nd parameter?

From
David Johnston
Date:
Bill Moran wrote
> How long that takes is a factor of other settings (as David mentioned) and
> also depedent on what other transactions may be running.

While I am inclined to believe this is true the documentation is unclear
that "other transactions" have any bearing on the delay.  All the
documentation says is that the checkpoint I/O will be spread out over time.
Period.  I could see where if there is no pending checkpoint I/O to perform
that it will return immediately but does having 100MB of I/O to perform,
versus 10MB of I/O to perform, cause the delay to increase 9-fold up to a
maximum of whatever timeframe is configured?

The wording implies that the delay, say 2.5 minutes by default (if I am
reading that right), will be used regardless so the system will incur a rate
of 4MB/min of checkpoint I/O in the better case and 40MB/min of checkpoint
I/O for the worse case.

The other possibility is that there is a floor of 10MB/min of checkpoint I/O
so the first example only takes 1 minute to return (not 2.5) while the
second uses the entire allotted time and also must increase the I/O rate.

I'm not sure the precise algorithm needs to be documented but "can take a
long time to finish" seems to be taking to the other extreme.  Assuming one
of the two examples above is correct including such an example in the
documentation (i.e., comparing 0MB, 10MB, and 100MB of pending checkpoint
I/O) is a thought.  Also, is there a way to query how much checkpoint I/O
currently is outstanding?  If so, and the value is meaningful to this
determination, a cross-reference to said information would be useful.

Also, assuming the algorithm is fairly constant having it documented at this
level, with maybe an example query, would allow people to calculate roughly
the amount of time the "false" call will take to return.

And yes, I am getting a little carried away here...

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-start-backup-label-true-why-do-I-need-2nd-parameter-tp5777145p5777173.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: pg_start_backup('label',true) why do I need 2nd parameter?

From
Bill Moran
Date:
On Tue, 5 Nov 2013 19:27:52 -0800 (PST)
David Johnston <polobo@yahoo.com> wrote:

> Bill Moran wrote
> > How long that takes is a factor of other settings (as David mentioned) and
> > also depedent on what other transactions may be running.
>
> While I am inclined to believe this is true the documentation is unclear
> that "other transactions" have any bearing on the delay.  All the
> documentation says is that the checkpoint I/O will be spread out over time.
> Period.  I could see where if there is no pending checkpoint I/O to perform
> that it will return immediately but does having 100MB of I/O to perform,
> versus 10MB of I/O to perform, cause the delay to increase 9-fold up to a
> maximum of whatever timeframe is configured?
>
> The wording implies that the delay, say 2.5 minutes by default (if I am
> reading that right), will be used regardless so the system will incur a rate
> of 4MB/min of checkpoint I/O in the better case and 40MB/min of checkpoint
> I/O for the worse case.
>
> The other possibility is that there is a floor of 10MB/min of checkpoint I/O
> so the first example only takes 1 minute to return (not 2.5) while the
> second uses the entire allotted time and also must increase the I/O rate.
>
> I'm not sure the precise algorithm needs to be documented but "can take a
> long time to finish" seems to be taking to the other extreme.  Assuming one
> of the two examples above is correct including such an example in the
> documentation (i.e., comparing 0MB, 10MB, and 100MB of pending checkpoint
> I/O) is a thought.  Also, is there a way to query how much checkpoint I/O
> currently is outstanding?  If so, and the value is meaningful to this
> determination, a cross-reference to said information would be useful.
>
> Also, assuming the algorithm is fairly constant having it documented at this
> level, with maybe an example query, would allow people to calculate roughly
> the amount of time the "false" call will take to return.

Obviously, opinions may differ, but ...

I don't understand, in the slightest, your focus on this.  In my experience,
if you're running backups, you're willing to wait a little while.  Especially
if you're handling the extra administrative overhead of doing wal-logged
backups, you're probably waiting hours for them to complete, so what's an
extra minute or two?  I have trouble believing that the second parameter is
ever necessary at all.

That being said, details of the algorithm alone is not going to tell anyone
how long it's going to take.  On a quiet server with fast disks, the wait
is (in my experience) not noticable.  Of course, at any time a transaction
running could cause it to be very noticable, and a busy server could cause
it to always take a while.

But the question you post sounds frighteningly like questions I get from
developers on a regular basis: why does my query run fast when I test it
but slow in production.  The answer is complicated and is actually different
every time it's asked, but the high-level answer that nobody seems to
accept is this: on a computer system that is doing many tasks, the
different tasks impact each other, and it's frequently difficult to
understand the nature of those interactions and their impact.

I mean, who makes a backup and sits and watches it?  I "make" dozens of
backups each day, and I never bother to watch any of them -- I get an
email if one of them fails and once a quarter we run through a recovery
drill to make sure everything is working for really-reals.  These are
the kinds of things that (in my experience) are important and deserve
my attention ... how long it takes for the backup process to windup is
not, unless it's such an incredibly long time (on the order of hours)
that it interferes with scheduling.

I guess what I'm saying, is that from my standpoint I can't imagine a
way of improving the documentation that wouldn't either become horribly
wordy without actually helping, or even more confusing.  Perhaps someone
smarter than me can come up with something, though.

--
Bill Moran <wmoran@potentialtech.com>