Thread: Weird WAL problem - 9.0.3
Hello Yesterday we had a weird problem with the pg_xlog partition in one of our servers: - The amount of WAL files was much higher than (2*checkpoint_segments)+1 (over 360 WAL files) - The WAL files were not created/recycle time-ordered. Here is an example: ..................... 16777216 Apr 12 17:49 000000010000000D0000001C 16777216 Apr 12 17:49 000000010000000D0000001D 16777216 Apr 12 17:49 000000010000000D0000001E 16777216 Apr 12 17:52 000000010000000D0000001F 16777216 Apr 12 17:50 000000010000000D00000020 16777216 Apr 12 17:51 000000010000000D00000021 16777216 Apr 12 17:49 000000010000000D00000022 16777216 Apr 12 17:49 000000010000000D00000023 16777216 Apr 12 17:49 000000010000000D00000024 16777216 Apr 12 17:51 000000010000000D00000025 ..................... This is the first time I see this behavior with the result of a full pg_xlog partition. This happened when testing an upgrade procedure and moving on the fly with "pg_dumpall | psql" around 30 databases (ca.140GB) from a 8.3 server to a 9.0 one. Is this normal? If it is, how can we find out the max.number of WAL files a 9.0 system can generate in the worst case scenario? Some relevant information about this system: ------------------------------------------------ PostgreSQL 9.0.3 - ext4 - RHEL5.6 - 2.6.18-238.5.1.el5 - x86_64 checkpoint_segments: 128 wal_buffers: 512kB wal_level: archive wal_sync_method: fdatasync shared_buffers: 10GB ------------------------------------------------ regards, -- Rafael Martinez Guerrero Center for Information Technology University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
Attachment
On Wednesday, April 13, 2011 6:09:25 am Rafael Martinez wrote:
> Hello
>
> Yesterday we had a weird problem with the pg_xlog partition in one of
> our servers:
>
> - The amount of WAL files was much higher than (2*checkpoint_segments)+1
> (over 360 WAL files)
>
Might want to take a look at:
http://www.postgresql.org/docs/9.0/interactive/wal-configuration.html
In particular:
"There will always be at least one WAL segment file, and will normally not be more files than the higher of wal_keep_segments or (2 + checkpoint_completion_target) * checkpoint_segments + 1. Each segment file is normally 16 MB (though this size can be altered when building the server). You can use this to estimate space requirements for WAL. Ordinarily, when old log segment files are no longer needed, they are recycled (renamed to become the next segments in the numbered sequence). If, due to a short-term peak of log output rate, there are more than 3 * checkpoint_segments + 1 segment files, the unneeded segment files will be deleted instead of recycled until the system gets back under this limit."
--
Adrian Klaver
adrian.klaver@gmail.com
On Wed, 2011-04-13 at 06:28 -0700, Adrian Klaver wrote: > On Wednesday, April 13, 2011 6:09:25 am Rafael Martinez wrote: > > Might want to take a look at: > [......] > sequence). If, due to a short-term peak of log output rate, there are > more than 3 * checkpoint_segments + 1 segment files, the unneeded > segment files will be deleted instead of recycled until the system > gets back under this limit." > Thank you. This explains the number of WAL files generated, I was not aware of the (3*checkpoint_segments + 1) limit (I will RTBM better next time) But this doesn't explain the WAL files not been created/recycled time-ordered. I wonder if this happened because the partition got full while the WALs were created/recycled? regards, -- Rafael Martinez Guerrero Center for Information Technology University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
Attachment
Rafael Martinez <r.m.guerrero@usit.uio.no> writes: > But this doesn't explain the WAL files not been created/recycled > time-ordered. I wonder if this happened because the partition got full > while the WALs were created/recycled? When a checkpoint finishes, it scans the pg_xlog directory to find WAL files that are no longer needed (because they're before the checkpoint's WAL replay point). It will either rename them "forward" to become ready for future use, or delete them if there are already enough future WAL files present (as determined by checkpoint_segments). The order in which old segments get renamed to be future ones is basically chance, because it's determined by the order in which readdir() visits them. So there's no reason to think that their file timestamps will be in order. I would expect WAL files that are *behind* the current write point to have increasing write timestamps. But not those ahead. regards, tom lane