Home > mailing lists

Thread: Weird WAL problem - 9.0.3

Re: Weird WAL problem - 9.0.3

From

Adrian Klaver

Date:

13 April 2011, 10:29:02

On Wednesday, April 13, 2011 6:09:25 am Rafael Martinez wrote:

> Hello

> Yesterday we had a weird problem with the pg_xlog partition in one of

> our servers:

> - The amount of WAL files was much higher than (2*checkpoint_segments)+1

> (over 360 WAL files)

Might want to take a look at:

http://www.postgresql.org/docs/9.0/interactive/wal-configuration.html

In particular:

"There will always be at least one WAL segment file, and will normally not be more files than the higher of wal_keep_segments or (2 + checkpoint_completion_target) * checkpoint_segments + 1. Each segment file is normally 16 MB (though this size can be altered when building the server). You can use this to estimate space requirements for WAL. Ordinarily, when old log segment files are no longer needed, they are recycled (renamed to become the next segments in the numbered sequence). If, due to a short-term peak of log output rate, there are more than 3 * checkpoint_segments + 1 segment files, the unneeded segment files will be deleted instead of recycled until the system gets back under this limit."

Adrian Klaver

adrian.klaver@gmail.com

Re: Weird WAL problem - 9.0.3

From

Rafael Martinez

Date:

13 April 2011, 10:47:09

On Wed, 2011-04-13 at 06:28 -0700, Adrian Klaver wrote:
> On Wednesday, April 13, 2011 6:09:25 am Rafael Martinez wrote:

>
> Might want to take a look at:
>
[......]
>  sequence). If, due to a short-term peak of log output rate, there are
> more than 3 * checkpoint_segments + 1 segment files, the unneeded
> segment files will be deleted instead of recycled until the system
> gets back under this limit."
>

Thank you.

This explains the number of WAL files generated, I was not aware of the
(3*checkpoint_segments + 1) limit (I will RTBM better next time)

But this doesn't explain the WAL files not been created/recycled
time-ordered. I wonder if this happened because the partition got full
while the WALs were created/recycled?

regards,
--
Rafael Martinez Guerrero
Center for Information Technology
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/

Attachment

signature.asc

Re: Weird WAL problem - 9.0.3

From

Tom Lane

Date:

13 April 2011, 13:59:14

Rafael Martinez <r.m.guerrero@usit.uio.no> writes:
> But this doesn't explain the WAL files not been created/recycled
> time-ordered. I wonder if this happened because the partition got full
> while the WALs were created/recycled?

When a checkpoint finishes, it scans the pg_xlog directory to find WAL
files that are no longer needed (because they're before the checkpoint's
WAL replay point).  It will either rename them "forward" to become ready
for future use, or delete them if there are already enough future WAL
files present (as determined by checkpoint_segments).  The order in
which old segments get renamed to be future ones is basically chance,
because it's determined by the order in which readdir() visits them.
So there's no reason to think that their file timestamps will be in
order.

I would expect WAL files that are *behind* the current write point to
have increasing write timestamps.  But not those ahead.

            regards, tom lane