BUG #7902: lazy cleanup of extraneous WAL files can cause out of disk issues - Mailing list pgsql-bugs

From jeff@pgexperts.com
Subject BUG #7902: lazy cleanup of extraneous WAL files can cause out of disk issues
Date
Msg-id E1U91WW-0006rq-82@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #7902: lazy cleanup of extraneous WAL files can cause out of disk issues  (Rafael Martinez Guerrero <r.m.guerrero@usit.uio.no>)
Re: BUG #7902: lazy cleanup of extraneous WAL files can cause out of disk issues  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      7902
Logged by:          Jeff Frost
Email address:      jeff@pgexperts.com
PostgreSQL version: 9.2.3
Operating system:   Ubuntu 12.04
Description:        =


While doing acceptance testing on a new Ubuntu 12.04 PostgreSQL server
running 9.2.3, we set checkpoint_segments =3D 128,
checkpoint_completion_target =3D 0.9 and placed pg_xlog on a separate 20G
partition. Also, archive_mode =3D off on this system.

According to the docs, you would expect the system to attempt to keep the
WAL files down close to 3 * checkpoint_segments + 1.  Unfortunately, this
does not appear to be the case because a pgbench run would run the pg_xlog
partition out of space.

The pgbench run script looks like this:

#!/bin/bash

dropdb bench
createdb bench
pgbench -i -s 1000 bench
vacuumdb -a --analyze-only
psql -c "checkpoint"
pgbench -c 64 -j 16 -r -T 600 bench

While the pgbench does cause lots of xlog based checkpoints, they never seem
to remove more than a few files and often pg_xlog grows to more than 20G and
the postgresql service falls over.

After moving pg_xlog to a larger partition, it seems it peaks at about 22G
in size. =


A manual checkpoint after the run always brings it back down to ~ 4G in
size.

Interestingly, I was unable to reproduce this with 9.2.3 on our inhouse test
system; however, the inhouse system has much less RAM and CPU resources, so
this may only be an issue on larger systems. The system that exhibits the
issue has 128G of RAM and 16 cores (32 with hyperthreading). =


I also tested 9.2.2 on the affected system and it acted the same.

Hope to test 9.1.8 in the next few days.

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: new BUG: "postgresql 9.2.3: very long query time"
Next
From: James R Skaggs
Date:
Subject: Re: BUG #7853: Incorrect statistics in table with many dead rows.