Thread: pg_start_backup() takes too long
Hello, What is the reason for select pg_start_backup('label'); taking 10 minutes on not so loaded system even right after manual checkpoint?
Ivan Zolotukhin wrote: > Hello, > > What is the reason for > > select pg_start_backup('label'); > > taking 10 minutes on not so loaded system even right after manual checkpoint? No idea; something is seriously wrong if that is happening. Do the database server logs or kernel logs show anything unusual? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hello, Nothing bad both in system and postgres logs :( No serious activity during backup. I've had to change statement_timeout for backup user to make it work. But I cannot reproduce this case unfortunately. Regards, Ivan On Tue, Sep 23, 2008 at 6:18 AM, Bruce Momjian <bruce@momjian.us> wrote: > Ivan Zolotukhin wrote: >> Hello, >> >> What is the reason for >> >> select pg_start_backup('label'); >> >> taking 10 minutes on not so loaded system even right after manual checkpoint? > > No idea; something is seriously wrong if that is happening. Do the > database server logs or kernel logs show anything unusual? > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + >
Ivan Zolotukhin wrote: > Hello, > > Nothing bad both in system and postgres logs :( No serious activity > during backup. I've had to change statement_timeout for backup user to > make it work. But I cannot reproduce this case unfortunately. This is actually not uncommon and PostgreSQL shows exactly nothing in terms of why it is taking so long. The only assumption I have come up with is that start_backup does cause a checkpoint. Sincerely, Joshua D. Drake
On Sun, 2008-09-28 at 08:35 -0700, Joshua D. Drake wrote: > Ivan Zolotukhin wrote: > > Hello, > > > > Nothing bad both in system and postgres logs :( No serious activity > > during backup. I've had to change statement_timeout for backup user to > > make it work. But I cannot reproduce this case unfortunately. > > This is actually not uncommon and PostgreSQL shows exactly nothing in > terms of why it is taking so long. The only assumption I have come up > with is that start_backup does cause a checkpoint. Yes, it does a normal checkpoint and writes a file. No reason for it to take longer than any other checkpoint. At 8.2 and below checkpoints were frequently delayed on busy systems. This was because of lwlock starvation during commit phase of transactions. That was fixed in 8.3. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Guys, This is all not about checkpoints. As I've mentioned in the first message, even right after manual run of CHECKPOINT command in psql pg_start_backup() takes same time (~10 minutes). Regards, Ivan On Sun, Sep 28, 2008 at 8:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Sun, 2008-09-28 at 08:35 -0700, Joshua D. Drake wrote: >> Ivan Zolotukhin wrote: >> > Hello, >> > >> > Nothing bad both in system and postgres logs :( No serious activity >> > during backup. I've had to change statement_timeout for backup user to >> > make it work. But I cannot reproduce this case unfortunately. >> >> This is actually not uncommon and PostgreSQL shows exactly nothing in >> terms of why it is taking so long. The only assumption I have come up >> with is that start_backup does cause a checkpoint. > > Yes, it does a normal checkpoint and writes a file. No reason for it to > take longer than any other checkpoint. > > At 8.2 and below checkpoints were frequently delayed on busy systems. > This was because of lwlock starvation during commit phase of > transactions. That was fixed in 8.3. > > -- > Simon Riggs www.2ndQuadrant.com > PostgreSQL Training, Services and Support > >
On Mon, 2008-09-29 at 13:39 +0400, Ivan Zolotukhin wrote: > This is all not about checkpoints. As I've mentioned in the first > message, even right after manual run of CHECKPOINT command in psql > pg_start_backup() takes same time (~10 minutes). As explained, there's not very much going on apart from the checkpoint and that can be problematic. What version are you running? What are your checkpoint_timeout and checkpoint_completion_target settings? My guesses are 8.3, 20 minutes, and default. pg_start_backup() doesn't do an immediate checkpoint, it does a smooth one, so doing a CHECKPOINT beforehand should make no difference in 8.3. (Looks at code...) I'm surprised that checkpoint smoothing moves slowly even when it has so little to do. ISTM checkpoint completion target should set its write rate according to the thought that if shared_buffers were all dirty it would write them out in checkpoint_timeout * checkpoint_completion_target seconds. However, what it does is write them *all* out in that time, no matter how many dirty blocks there are. If there is just a few blocks to write, we take the *same* time to write them as if it were all dirty. Which looks fairly ludicrous to me, but the only time that is possible in current code is pg_start_backup() since no other code requests a checkpoint exactly when you ask, but slowly. It makes more sense to have a constant write rate during checkpoint, or at very least a minimum rate during checkpoint. IMHO the checkpoint smoothing code is wrong, but since it only shows itself for pg_start_backup() I think people will say we should change that instead. If we do, then we'll get people saying "how come pg_start_backup() causes such a performance drop?" because we start doing an immediate checkpoint. The idea of this is that "online backup" should have as little effect as possible on normal running. So I suggest we change the checkpoint code instead. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes: > I'm surprised that checkpoint smoothing moves slowly even when it has so > little to do. AFAIK that's operating as designed. The point being that we shouldn't create any more I/O load than we absolutely have to. It's not clear to me that it's a bug for pg_start_backup to take awhile. If it is a bug then I'd vote for just making it do an immediate checkpoint --- that might cause big I/O load but it's hardly likely to be worse than what will happen when you start taking the subsequent filesystem backup. The checkpoint code is too complicated already; I don't want to make it support a third behavior. And I'd vote against taking out the current default behavior. regards, tom lane
On Mon, 2008-09-29 at 08:35 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > I'm surprised that checkpoint smoothing moves slowly even when it has so > > little to do. > > AFAIK that's operating as designed. The point being that we shouldn't > create any more I/O load than we absolutely have to. > > It's not clear to me that it's a bug for pg_start_backup to take awhile. > If it is a bug then I'd vote for just making it do an immediate > checkpoint --- that might cause big I/O load but it's hardly likely to > be worse than what will happen when you start taking the subsequent > filesystem backup. It was a clear intention for it to *not* cause a spike if we could avoid it. The idea was if you wanted it to happen quickly then you could do a checkpoint command first... oh well. People might want to I/O limit the backup also, which they can do without needing to let us know. I'm happy to put an option in for this, so we have another function: pg_start_backup(label text, immediate_chkpt boolean). I'll not be rushing to do this though given my current TODO. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Mon, 29 Sep 2008, Simon Riggs wrote: > I'm surprised that checkpoint smoothing moves slowly even when it has so > little to do. ISTM checkpoint completion target should set its write > rate according to the thought that if shared_buffers were all dirty it > would write them out in checkpoint_timeout * > checkpoint_completion_target seconds. However, what it does is write > them *all* out in that time, no matter how many dirty blocks there are. > If there is just a few blocks to write, we take the *same* time to write > them as if it were all dirty. The checkpoint smoothing code that made it into 8.3 missed a couple of nice to have features that just didn't make the schedule cut-off. Enforcing a minimum rate was one, another was smoothing fsync calls. Back when we were talking about the patch to sort writes at checkpoint time, someone (I think you actually) commented that it might be worthwile to create some sort of hook for making behavior of checkpoint-time dirty buffer processing easy to change with a custom strategy. The sorted behavior would then be the first such strategy available. Another one I was thinking of was something that specified min+max write writes, which would make this problem go away--might even auto-tune checkpoint_segments or replace it altogether with an implementation based on those inputs. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Simon Riggs wrote: > > If it is a bug then I'd vote for just making it do an immediate > > checkpoint --- that might cause big I/O load but it's hardly likely to > > be worse than what will happen when you start taking the subsequent > > filesystem backup. > > It was a clear intention for it to *not* cause a spike if we could avoid > it. The idea was if you wanted it to happen quickly then you could do a > checkpoint command first... oh well. > > People might want to I/O limit the backup also, which they can do > without needing to let us know. > > I'm happy to put an option in for this, so we have another function: > pg_start_backup(label text, immediate_chkpt boolean). I'll not be > rushing to do this though given my current TODO. I agree with Tom; either we make the pg_start_backup() checkpoint immediate or leave the behavior unchanged. Personally I think immediate makes more sense because issuing pg_start_backup() seems like it should behave like a manual CHECKPOINT command. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Mon, 29 Sep 2008 19:06:46 -0400 (EDT) Bruce Momjian <bruce@momjian.us> wrote: > I agree with Tom; either we make the pg_start_backup() checkpoint > immediate or leave the behavior unchanged. > > Personally I think immediate makes more sense because issuing > pg_start_backup() seems like it should behave like a manual CHECKPOINT > command. > I am not actually that concerned that it takes a long time now that I know why. I am more concerned that it isn't obvious why it takes so long. Something like this would be more than sufficient: elog(NOTICE, "pg_start_backup called, issuing CHECKPOINT"); RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT); Sincerely, Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/
On Mon, Sep 29, 2008 at 2:12 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Mon, 2008-09-29 at 13:39 +0400, Ivan Zolotukhin wrote: > >> This is all not about checkpoints. As I've mentioned in the first >> message, even right after manual run of CHECKPOINT command in psql >> pg_start_backup() takes same time (~10 minutes). > > As explained, there's not very much going on apart from the checkpoint > and that can be problematic. > > What version are you running? > What are your checkpoint_timeout and checkpoint_completion_target > settings? > > My guesses are 8.3, 20 minutes, and default. You're dead right: 8.3.3, 30 minutes (it's intended), and default. > pg_start_backup() doesn't do an immediate checkpoint, it does a smooth > one, so doing a CHECKPOINT beforehand should make no difference in 8.3. Aham, now I see. Just a few points on pg_start_backup() from user point of view. I personally would prefer to have some control over the process, e.g. it would be nice to have proposed pg_start_backup(label text, immediate_chkpt boolean). But if that's not what you want by some other reason, it would be nice to add at least log notice telling that system is going to make a spreaded checkpoint and it's gonna take this approximate time (one can estimate a time limit from config constants, right?).
On Tue, 2008-09-30 at 12:58 +0400, Ivan Zolotukhin wrote: > Just a few points on pg_start_backup() from user point of view. I > personally would prefer to have some control over the process, e.g. it > would be nice to have proposed pg_start_backup(label text, > immediate_chkpt boolean). I've added this function to Infrastructure Changes for Recovery (v8) patch, with a NOTICE message as suggested also. Seemed easier to work on it while I was there. No changes to bgwriter smoothing. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support