Re: Vacuum full - disk space eaten by WAL logfiles - Mailing list pgsql-admin

From Lee Wu
Subject Re: Vacuum full - disk space eaten by WAL logfiles
Date
Msg-id ECAB83AA52BCC043A0E24BBC000010241114E4@mxhq-exch.corp.mxlogic.com
Whole thread Raw
In response to Vacuum full - disk space eaten by WAL logfiles  ("Lee Wu" <Lwu@mxlogic.com>)
Responses Re: Vacuum full - disk space eaten by WAL logfiles  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-admin
Hi Tom,

1. shared_buffers                 | 32768
2. I/O bandwidth is not an issue to best our knowledge
3. It is "vacuum full" as shown:
Jan  8 20:25:38 mybox postgres[8603]: [15] FATAL:  The database system
is in recovery mode
Jan  8 20:25:38 mybox postgres[7284]: [14] LOG:  statement: vacuum full
analyze the_35G_table
Jan  8 20:25:39 mybox postgres[8604]: [15] FATAL:  The database system
is in recovery mode

Also this error happened last 2 Saturdays and matched our vacuum log
timing:
20050108-194000         End vacuum full analyze on table1.
20050108-194000         Begin vacuum full analyze on the_35G_table.
WARNING:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost
20050108-202539         End vacuum full analyze on the_35G_table.
20050108-202539         Begin vacuum full analyze on table3.
psql: FATAL:  The database system is in recovery mode

We only do vacuum full on Saturday. This error has not been seen
occurring other time.

4. PG upgrade issue - out of my (an DBA) control

Thanks,

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, January 10, 2005 2:27 PM
To: Lee Wu
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Vacuum full - disk space eaten by WAL logfiles

"Lee Wu" <Lwu@mxlogic.com> writes:
> When we do weekly "vacuum full", PG uses all space and causes PG down.

This implies that checkpoints aren't completing for some reason.
If they were, they'd be recycling WAL space.

I'm not aware of any problems in 7.3 that would block a checkpoint
indefinitely, but we have seen cases where it just took too darn long
to do the checkpoint --- implying either a ridiculously large
shared_buffers setting, or a drastic shortage of I/O bandwidth.

You might want to try strace'ing the checkpoint process to see if it
seems to be making progress or not.

Also, are you certain that this is happening during a VACUUM?  The
log messages you show refer to COPY commands.

>  PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96

Are you aware of the number and significance of post-7.3.2 bug fixes
in the 7.3 branch?  You really ought to be on 7.3.8, if you can't afford
to migrate to 7.4 right now.

            regards, tom lane

pgsql-admin by date:

Previous
From: Tom Lane
Date:
Subject: Re: Vacuum full - disk space eaten by WAL logfiles
Next
From: Tom Lane
Date:
Subject: Re: Vacuum full - disk space eaten by WAL logfiles