Re: pg_start_backup('label',true) why do I need 2nd parameter? - Mailing list pgsql-general
From | Bill Moran |
---|---|
Subject | Re: pg_start_backup('label',true) why do I need 2nd parameter? |
Date | |
Msg-id | 20131106070305.8e2975540ab1d686cc40147b@potentialtech.com Whole thread Raw |
In response to | Re: pg_start_backup('label',true) why do I need 2nd parameter? (David Johnston <polobo@yahoo.com>) |
List | pgsql-general |
On Tue, 5 Nov 2013 19:27:52 -0800 (PST) David Johnston <polobo@yahoo.com> wrote: > Bill Moran wrote > > How long that takes is a factor of other settings (as David mentioned) and > > also depedent on what other transactions may be running. > > While I am inclined to believe this is true the documentation is unclear > that "other transactions" have any bearing on the delay. All the > documentation says is that the checkpoint I/O will be spread out over time. > Period. I could see where if there is no pending checkpoint I/O to perform > that it will return immediately but does having 100MB of I/O to perform, > versus 10MB of I/O to perform, cause the delay to increase 9-fold up to a > maximum of whatever timeframe is configured? > > The wording implies that the delay, say 2.5 minutes by default (if I am > reading that right), will be used regardless so the system will incur a rate > of 4MB/min of checkpoint I/O in the better case and 40MB/min of checkpoint > I/O for the worse case. > > The other possibility is that there is a floor of 10MB/min of checkpoint I/O > so the first example only takes 1 minute to return (not 2.5) while the > second uses the entire allotted time and also must increase the I/O rate. > > I'm not sure the precise algorithm needs to be documented but "can take a > long time to finish" seems to be taking to the other extreme. Assuming one > of the two examples above is correct including such an example in the > documentation (i.e., comparing 0MB, 10MB, and 100MB of pending checkpoint > I/O) is a thought. Also, is there a way to query how much checkpoint I/O > currently is outstanding? If so, and the value is meaningful to this > determination, a cross-reference to said information would be useful. > > Also, assuming the algorithm is fairly constant having it documented at this > level, with maybe an example query, would allow people to calculate roughly > the amount of time the "false" call will take to return. Obviously, opinions may differ, but ... I don't understand, in the slightest, your focus on this. In my experience, if you're running backups, you're willing to wait a little while. Especially if you're handling the extra administrative overhead of doing wal-logged backups, you're probably waiting hours for them to complete, so what's an extra minute or two? I have trouble believing that the second parameter is ever necessary at all. That being said, details of the algorithm alone is not going to tell anyone how long it's going to take. On a quiet server with fast disks, the wait is (in my experience) not noticable. Of course, at any time a transaction running could cause it to be very noticable, and a busy server could cause it to always take a while. But the question you post sounds frighteningly like questions I get from developers on a regular basis: why does my query run fast when I test it but slow in production. The answer is complicated and is actually different every time it's asked, but the high-level answer that nobody seems to accept is this: on a computer system that is doing many tasks, the different tasks impact each other, and it's frequently difficult to understand the nature of those interactions and their impact. I mean, who makes a backup and sits and watches it? I "make" dozens of backups each day, and I never bother to watch any of them -- I get an email if one of them fails and once a quarter we run through a recovery drill to make sure everything is working for really-reals. These are the kinds of things that (in my experience) are important and deserve my attention ... how long it takes for the backup process to windup is not, unless it's such an incredibly long time (on the order of hours) that it interferes with scheduling. I guess what I'm saying, is that from my standpoint I can't imagine a way of improving the documentation that wouldn't either become horribly wordy without actually helping, or even more confusing. Perhaps someone smarter than me can come up with something, though. -- Bill Moran <wmoran@potentialtech.com>
pgsql-general by date: