Thread: archive wal's failure and load increase.

archive wal's failure and load increase.

From
Cedric Boudin
Date:
Honorable members of the list,


I would like to share with you a side effect that I discovered today on
our postgresql 8.1 server.
We ve been running this instance with PITR for now 2 months without any
problems.
The wal's are copied to a remote machine with the pg_archive_command and
locally to some other directory.
For some independant reasons we made the remote machine unreachable for
some hours. The pg_archive_command returned as expected a failure value.

Now to what puzzles me:
the load on the box that normally is kept between 0.7 and 1.5 did
suddenly rise to 4.5 -5.5 and the processes responsiveness  got bad.
The dir pg_xlog has plenty of space to keep several day of wal's.
there was no unfinished backup's or whatever that could have apparently
slowed the machine that much.

So the question is: is there a correlation between not getting the wal's
archived and this "massive" load growth?
In my understanding, as the pgl engine has nothing more to do with the
filled up log except just to make sure it's archived correctly ther
should not be any significant load increase for this reason. Looking at
the logs the engine tried approx. every 3 minutes to archive the wal's.
Is this behaviour expected, If it is then is it reasonnable to burden
the engine that is already in a inexpected situation with some IMHO
unecessary load increase.

your thougths are welcome

Cedric



Re: archive wal's failure and load increase.

From
Tom Lane
Date:
Cedric Boudin <cedric@dreamgnu.com> writes:
> So the question is: is there a correlation between not getting the wal's
> archived and this "massive" load growth?

Shouldn't be.  Do you want to force the condition again and try to see
*where* the cycles are going?  "High load factor" alone is a singularly
useless report.  Also, how many unarchived WAL files were there?

            regards, tom lane

Re: archive wal's failure and load increase.

From
Simon Riggs
Date:
On Thu, 2006-09-28 at 21:41 +0200, Cedric Boudin wrote:

> I would like to share with you a side effect that I discovered today on
> our postgresql 8.1 server.
> We ve been running this instance with PITR for now 2 months without any
> problems.
> The wal's are copied to a remote machine with the pg_archive_command and
> locally to some other directory.
> For some independant reasons we made the remote machine unreachable for
> some hours. The pg_archive_command returned as expected a failure value.
>
> Now to what puzzles me:
> the load on the box that normally is kept between 0.7 and 1.5 did
> suddenly rise to 4.5 -5.5 and the processes responsiveness  got bad.
> The dir pg_xlog has plenty of space to keep several day of wal's.
> there was no unfinished backup's or whatever that could have apparently
> slowed the machine that much.
>
> So the question is: is there a correlation between not getting the wal's
> archived and this "massive" load growth?
> In my understanding, as the pgl engine has nothing more to do with the
> filled up log except just to make sure it's archived correctly ther
> should not be any significant load increase for this reason. Looking at
> the logs the engine tried approx. every 3 minutes to archive the wal's.
> Is this behaviour expected, If it is then is it reasonnable to burden
> the engine that is already in a inexpected situation with some IMHO
> unecessary load increase.

archiver will attempt to run archive_command 3 times before it fails.
Success or failure should be visible in the logs. archiver will try this
a *minimum* of every 60 seconds, so if there is a delay of 3 minutes
then I'm guessing the archive_command itself has some kind of timeout on
it before failure. That should be investigated.

If archive_command succeeds then archiver will process all outstanding
files. If it fails then it stops trying - it doesn't retry *every*
outstanding file, so the retries themselves do not grow in cost as the
number of outstanding files increases. So IMHO the archiver itself is
not the source of any issues.

There is one negative effect from having outstanding archived files:
Every time we switch xlogs we would normally reuse an existing file.
When those files are locked because of pending archive operations we are
unable to do that, so must create a new xlog file, zero it and fsync -
look at xlog.c:XLogFileInit(). While that occurs all WAL write
operations will be halted and the log jam that results probably slows
the server down somewhat, since we peform those actions with
WALWriteLock held.

We could improve that situation by
1. (server change) notifying bgwriter that we have an archiver failure
situation and allow new xlogs to be created as a background task. We
discussed putting PreallocXlogFiles() in bgwriter once before, but I
think last time we discussed that idea it was rejected, IIRC.

2. (server or manual change) preallocating more xlog files

3. (user change) enhancing the archive_command script so that it begins
reusing files once archiving has been disabled for a certain length of
time/size of xlog directory. You can do this by having a script try the
archive operation and if it fails (and has been failing) then return a
"success" message to the server to allow it reuse files. That means you
start dropping WAL data and hence would prevent a recovery from going
past the point you started dropping files - I'd never do that, but some
have argued previously that might be desirable.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Re: archive wal's failure and load increase.

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> We discussed putting PreallocXlogFiles() in bgwriter once before, but I
> think last time we discussed that idea it was rejected, IIRC.

We already do that: it's called a checkpoint.  If the rate of WAL
generation was more than checkpoint_segments per checkpoint_timeout,
then indeed there would be a problem with foreground processes having to
manufacture WAL segment files for themselves, but it would be a bursty
thing (ie, problem goes away after a checkpoint, then comes back).

It's a good thought but I don't think the theory holds water for
explaining Cedric's problem, unless there was *also* some effect
preventing checkpoints from completing ... which would be a much more
serious problem than the archiver failing.

            regards, tom lane

Re: archive wal's failure and load increase.

From
Simon Riggs
Date:
On Fri, 2006-09-29 at 10:29 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > We discussed putting PreallocXlogFiles() in bgwriter once before, but I
> > think last time we discussed that idea it was rejected, IIRC.
>
> We already do that: it's called a checkpoint.

Yes, but not enough.

PreallocXlogFiles() adds only a *single* xlog file, sometimes.

On a busy system, that would be used up too quickly to make a
difference. After that the effect of adding new files would continue as
suggested.

If it did add more than one... it might work better for this case.

> It's a good thought but I don't think the theory holds water for
> explaining Cedric's problem, unless there was *also* some effect
> preventing checkpoints from completing ... which would be a much more
> serious problem than the archiver failing.

Still the best explanation for me.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Re: archive wal's failure and load increase.

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> PreallocXlogFiles() adds only a *single* xlog file, sometimes.

Hm, you are right.  I wonder why it's so unaggressive ... perhaps
because under normal circumstances we soon settle into a steady
state where each checkpoint recycles the right number of files.

            regards, tom lane

Re: archive wal's failure and load increase.

From
Simon Riggs
Date:
On Fri, 2006-09-29 at 11:55 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > PreallocXlogFiles() adds only a *single* xlog file, sometimes.
>
> Hm, you are right.  I wonder why it's so unaggressive ... perhaps
> because under normal circumstances we soon settle into a steady
> state where each checkpoint recycles the right number of files.

That is normally the case, yes. But only for people that have correctly
judged (or massively overestimated) what checkpoint_segments should be
set at.

Currently, when we don't have enough we add one, maybe. When we have too
many we truncate right back to checkpoint_segments as quickly as
possible.

Seems like we should try and automate that completely for 8.3:
- calculate the number required by keeping a running average which
ignores a single peak value, yet takes 5 consistently high values as the
new average
- add more segments with increasing aggressiveness 1,1,2,3,5,8 segments
at a time when required
- handle out-of-space errors fairly gracefully by waking up the
archiver, complaining to the logs and then eventually preventing
transactions from writing to logs rather than taking server down
- shrink back more slowly by halving the difference between the
overlimit and the typical value
- get rid of checkpoint_segments GUC

That should handle peaks caused by data loads, archiving interruptions
or other peak loadings.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com