Thread: archive wal's failure and load increase.
Honorable members of the list, I would like to share with you a side effect that I discovered today on our postgresql 8.1 server. We ve been running this instance with PITR for now 2 months without any problems. The wal's are copied to a remote machine with the pg_archive_command and locally to some other directory. For some independant reasons we made the remote machine unreachable for some hours. The pg_archive_command returned as expected a failure value. Now to what puzzles me: the load on the box that normally is kept between 0.7 and 1.5 did suddenly rise to 4.5 -5.5 and the processes responsiveness got bad. The dir pg_xlog has plenty of space to keep several day of wal's. there was no unfinished backup's or whatever that could have apparently slowed the machine that much. So the question is: is there a correlation between not getting the wal's archived and this "massive" load growth? In my understanding, as the pgl engine has nothing more to do with the filled up log except just to make sure it's archived correctly ther should not be any significant load increase for this reason. Looking at the logs the engine tried approx. every 3 minutes to archive the wal's. Is this behaviour expected, If it is then is it reasonnable to burden the engine that is already in a inexpected situation with some IMHO unecessary load increase. your thougths are welcome Cedric
Cedric Boudin <cedric@dreamgnu.com> writes: > So the question is: is there a correlation between not getting the wal's > archived and this "massive" load growth? Shouldn't be. Do you want to force the condition again and try to see *where* the cycles are going? "High load factor" alone is a singularly useless report. Also, how many unarchived WAL files were there? regards, tom lane
On Thu, 2006-09-28 at 21:41 +0200, Cedric Boudin wrote: > I would like to share with you a side effect that I discovered today on > our postgresql 8.1 server. > We ve been running this instance with PITR for now 2 months without any > problems. > The wal's are copied to a remote machine with the pg_archive_command and > locally to some other directory. > For some independant reasons we made the remote machine unreachable for > some hours. The pg_archive_command returned as expected a failure value. > > Now to what puzzles me: > the load on the box that normally is kept between 0.7 and 1.5 did > suddenly rise to 4.5 -5.5 and the processes responsiveness got bad. > The dir pg_xlog has plenty of space to keep several day of wal's. > there was no unfinished backup's or whatever that could have apparently > slowed the machine that much. > > So the question is: is there a correlation between not getting the wal's > archived and this "massive" load growth? > In my understanding, as the pgl engine has nothing more to do with the > filled up log except just to make sure it's archived correctly ther > should not be any significant load increase for this reason. Looking at > the logs the engine tried approx. every 3 minutes to archive the wal's. > Is this behaviour expected, If it is then is it reasonnable to burden > the engine that is already in a inexpected situation with some IMHO > unecessary load increase. archiver will attempt to run archive_command 3 times before it fails. Success or failure should be visible in the logs. archiver will try this a *minimum* of every 60 seconds, so if there is a delay of 3 minutes then I'm guessing the archive_command itself has some kind of timeout on it before failure. That should be investigated. If archive_command succeeds then archiver will process all outstanding files. If it fails then it stops trying - it doesn't retry *every* outstanding file, so the retries themselves do not grow in cost as the number of outstanding files increases. So IMHO the archiver itself is not the source of any issues. There is one negative effect from having outstanding archived files: Every time we switch xlogs we would normally reuse an existing file. When those files are locked because of pending archive operations we are unable to do that, so must create a new xlog file, zero it and fsync - look at xlog.c:XLogFileInit(). While that occurs all WAL write operations will be halted and the log jam that results probably slows the server down somewhat, since we peform those actions with WALWriteLock held. We could improve that situation by 1. (server change) notifying bgwriter that we have an archiver failure situation and allow new xlogs to be created as a background task. We discussed putting PreallocXlogFiles() in bgwriter once before, but I think last time we discussed that idea it was rejected, IIRC. 2. (server or manual change) preallocating more xlog files 3. (user change) enhancing the archive_command script so that it begins reusing files once archiving has been disabled for a certain length of time/size of xlog directory. You can do this by having a script try the archive operation and if it fails (and has been failing) then return a "success" message to the server to allow it reuse files. That means you start dropping WAL data and hence would prevent a recovery from going past the point you started dropping files - I'd never do that, but some have argued previously that might be desirable. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes: > We discussed putting PreallocXlogFiles() in bgwriter once before, but I > think last time we discussed that idea it was rejected, IIRC. We already do that: it's called a checkpoint. If the rate of WAL generation was more than checkpoint_segments per checkpoint_timeout, then indeed there would be a problem with foreground processes having to manufacture WAL segment files for themselves, but it would be a bursty thing (ie, problem goes away after a checkpoint, then comes back). It's a good thought but I don't think the theory holds water for explaining Cedric's problem, unless there was *also* some effect preventing checkpoints from completing ... which would be a much more serious problem than the archiver failing. regards, tom lane
On Fri, 2006-09-29 at 10:29 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > We discussed putting PreallocXlogFiles() in bgwriter once before, but I > > think last time we discussed that idea it was rejected, IIRC. > > We already do that: it's called a checkpoint. Yes, but not enough. PreallocXlogFiles() adds only a *single* xlog file, sometimes. On a busy system, that would be used up too quickly to make a difference. After that the effect of adding new files would continue as suggested. If it did add more than one... it might work better for this case. > It's a good thought but I don't think the theory holds water for > explaining Cedric's problem, unless there was *also* some effect > preventing checkpoints from completing ... which would be a much more > serious problem than the archiver failing. Still the best explanation for me. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes: > PreallocXlogFiles() adds only a *single* xlog file, sometimes. Hm, you are right. I wonder why it's so unaggressive ... perhaps because under normal circumstances we soon settle into a steady state where each checkpoint recycles the right number of files. regards, tom lane
On Fri, 2006-09-29 at 11:55 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > PreallocXlogFiles() adds only a *single* xlog file, sometimes. > > Hm, you are right. I wonder why it's so unaggressive ... perhaps > because under normal circumstances we soon settle into a steady > state where each checkpoint recycles the right number of files. That is normally the case, yes. But only for people that have correctly judged (or massively overestimated) what checkpoint_segments should be set at. Currently, when we don't have enough we add one, maybe. When we have too many we truncate right back to checkpoint_segments as quickly as possible. Seems like we should try and automate that completely for 8.3: - calculate the number required by keeping a running average which ignores a single peak value, yet takes 5 consistently high values as the new average - add more segments with increasing aggressiveness 1,1,2,3,5,8 segments at a time when required - handle out-of-space errors fairly gracefully by waking up the archiver, complaining to the logs and then eventually preventing transactions from writing to logs rather than taking server down - shrink back more slowly by halving the difference between the overlimit and the typical value - get rid of checkpoint_segments GUC That should handle peaks caused by data loads, archiving interruptions or other peak loadings. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com