On Mar 26, 2014, at 9:04 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Mar 25, 2014 at 6:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tuesday, March 25, 2014, Steven Schlansker <steven@likeness.com> wrote:
> Hi everyone,
>
> I have a Postgres 9.3.3 database machine. Due to some intelligent work on the part of someone who shall remain
nameless,the WAL archive command included a ‘> /dev/null 2>&1’ which masked archive failures until the disk entirely
filledwith 400GB of pg_xlog entries.
>
> PostgreSQL itself should be logging failures to the server log, regardless of whether those failures log themselves.
>
>
> I have fixed the archive command and can see WAL segments being shipped off of the server, however the xlog remains
ata stable size and is not shrinking. In fact, it’s still growing at a (much slower) rate.
>
> The leading edge of the log files should be archived as soon as they fill up, and recycled/deleted two checkpoints
later. The trailing edge should be archived upon checkpoints and then recycled or deleted. I think there is a throttle
onhow many off the trailing edge are archived each checkpoint. So issues a bunch of "CHECKPOINT;" commands for a
whileand see if that clears it up.
Indeed, forcing a bunch of CHECKPOINTS started to get things moving again.
>
> Actually my description is rather garbled, mixing up what I saw when wal_keep_segments was lowered, not when
recoveringfrom a long lasting archive failure. Nevertheless, checkpoints are what provoke the removal of excessive WAL
files. Are you logging checkpoints? What do they say? Also, what is in pg_xlog/archive_status ?
>
I do log checkpoints, but most of them recycle and don’t remove:
Mar 26 16:09:36 prd-db1a postgres[29161]: [221-1] db=,user= LOG: checkpoint complete: wrote 177293 buffers (4.2%); 0
transactionlog file(s) added, 0 removed, 56 recycled; write=539.838 s, sync=0.049 s, total=539.909 s; sync files=342,
longest=0.015s, average=0.000 s
That said, after letting the db run / checkpoint / archive overnight, the xlog did indeed start to slowly shrink. The
paceat which it is shrinking is somewhat unsatisfying, but at least we are making progress now!
I guess if I had just been patient I could have saved some mailing list traffic. But patience is hard when your
productiondatabase system is running at 0% free disk :)
Thanks everyone for the help, if the log continues to shrink, I should be out of the woods now.
Best,
Steven