Pg_xlog increase due to postgres crash (disk full) - Mailing list pgsql-general

From Cliff de Carteret
Subject Pg_xlog increase due to postgres crash (disk full)
Date
Msg-id CAC+bnxkjsPHs6np68mVOfsVW8MApEa731w3ZjV6pWXHbSJqTCg@mail.gmail.com
Whole thread Raw
Responses Re: Pg_xlog increase due to postgres crash (disk full)  (Adrian Klaver <adrian.klaver@gmail.com>)
List pgsql-general
My database crashed a couple of days ago during an upgrade several seconds after committing a large transaction to the database. Eventually we found out that this was due to the disk being full as the transaction had created several gigs of data. A day or so later the disk is full again and PostgreSQL crashes due to the pg_xlog file taking up all of the disk space. I have cleaned up the drive to have so extra space which allows PostgreSQL to start again but the xlogs are still increasing. I have two errors in my pg_log: 

"WARNING: transaction log file "00000001000000A800000078" could not be archived: too many failures" and 

"LOG: archive command failed with exit code 1
DETAIL: The failed archive command was: test ! -f /opt/postgres/remote_pgsql/wal_archive/00000001000000A800000078 && cp pg_xlog/00000001000000A800000078 /opt/postgres/remote_pgsql/wal_archive/00000001000000A800000078"

Postgres version 9.0.3 conf:

  • wal_level = hot_standby
    archive_mode = true
    archive_command = 'test ! -f /opt/postgres/remote_pgsql/wal_archive/%f && cp %p /opt/postgres/remote_pgsql/wal_archive/%f' # command to use to archive a logfile segment
    archive_timeout = 1800 
    max_wal_senders = 1
    max_standby_archive_delay = 900s
    max_standby_streaming_delay = 900s
    default_statistics_target = 50 # pgtune wizard 2010-11-18
    maintenance_work_mem = 480MB # pgtune wizard 2010-11-18
    constraint_exclusion = on # pgtune wizard 2010-11-18
    checkpoint_completion_target = 0.9 # pgtune wizard 2010-11-18
    effective_cache_size = 5632MB # pgtune wizard 2010-11-18
    work_mem = 48MB # pgtune wizard 2010-11-18
    wal_buffers = 8MB # pgtune wizard 2010-11-18
    checkpoint_segments = 16 # pgtune wizard 2010-11-18
    shared_buffers = 1920MB # pgtune wizard 2010-11-18
    max_connections = 80 # pgtune wizard 2010-11-18

I've tried stopping postgres and then deleting the 00000001000000A800000078.ready file and starting postgres but this appears to be recreated instantly and the error is still in the log file.

I've read about the pg_reset_xlog() command but with having to pg_dump our db with a large amount of blobs and restoring it again is highly problematic as the pg_restore has struggled to restore.

Will setting zero_damaged_pages (true) work in 9.0.1 and would this resolve the issue?

Would creating an empty file and replacing the offending xlog work, would this need to be a specific size?

Any ideas?

pgsql-general by date:

Previous
From: Johann Spies
Date:
Subject: Re: Question on Trigram GIST indexes
Next
From: Leif Jensen
Date:
Subject: Re: Update rule on a view - what am I doing wrong