Patryk Sidzina wrote:
> 1) how do the clogs relate to wal shipping based replication? Clearly
> the master doesn't need that clog but the slave does.
>
They should just be kept in sync. There's some useful background on
this topic at
http://old.nabble.com/control-the-number-of-clog-files-and-xlog-files-td19173165.html
> 2) could 'pg_clearxlogtail' in archive_command be a cause of this? This
> is our archive_command:
> 'cat %p | pg_clearxlogtail | gzip -c |
> ssh slavehost "cat > /var/lib/postgresql/replication/in/%f"'
>
Kind of doubt that. If pg_clearxlogtail were mangling your data, I'd
expect a more dramatic failure to restore.
> 3) is there a faster way to debug this problem? Clogs fill slowly. It
> takes about a month on a very busy production server for a clog to be
> removed by master DB.
>
You could create a bunch of transactions and then freeze things,
following the ideas in the reference I suggested above.
> More info:
> PostgreSQL 8.2.14 64-bit (though this happened in older versions also)
> pg_standby from PostgreSQL 8.3.6
>
There was a bug in this area fixed in 8.2.10:
http://www.postgresql.org/docs/8.2/static/release-8-2-10.html
"Fix potential miscalculation of datfrozenxid (Alvaro)
*
This error may explain some recent reports of failure to remove
old pg_clog data."
If you were running this database with a version before that, I wonder
if maybe there's still some junk left behind from that old, buggy
version that's causing your issues. You might try doing some manual
VACUUM or VACUUM FREEZE work to remove any lingering issues and then
re-create your standby systems afterwards. I'm not quite familiar
enough with this specific bug to suggest a clearer resolution path, or
if in fact this is the same issue you're seeing. It sure seems possible
they're related though.
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us