Hello everyone,
my company has been using pg_standby as a replication solution for a
while and it has been working great for our needs. Unfortunately, about
once a month we get the following error on the standby bases:
vacuumdb: vacuuming of database "xxxx" failed: ERROR: could not access
status of transaction 3625953267
DETAIL: Could not open file "pg_clog/0D81": No such file or directory.
I've read your solutions to similar problems posted on this mailing list
but our situation is a bit different. First, we cannot lose any data so
creating a zero-filled clog is out of the question (we did try that
though, but the error came up again with older clogs missing). Second, I
don't believe this is a memory problem (as suggested in other posts)
because this happens on two different machines (we have one master and
two slaves). The important part is that the master database never had
this problem, it only happens on the slaves.
So my questions are:
1) how do the clogs relate to wal shipping based replication? Clearly
the master doesn't need that clog but the slave does.
2) could 'pg_clearxlogtail' in archive_command be a cause of this? This
is our archive_command:
'cat %p | pg_clearxlogtail | gzip -c |
ssh slavehost "cat > /var/lib/postgresql/replication/in/%f"'
3) is there a faster way to debug this problem? Clogs fill slowly. It
takes about a month on a very busy production server for a clog to be
removed by master DB.
More info:
PostgreSQL 8.2.14 64-bit (though this happened in older versions also)
pg_standby from PostgreSQL 8.3.6
I'd be grateful for any ideas on this problem. If needed I can provide
more setup info.
--
Patryk Sidzina