On Fri, 2006-05-05 at 14:31 -0400, Tom Lane wrote:
> Rod Taylor <pg@rbt.ca> writes:
> > At some point it must have failed in copying the data across, aborted,
> > and restarted.
>
> Unless you had an actual backend crash, that's not an adequate
> explanation. Transaction abort does clean up created files.
The only reason I noticed is because pg_database_size didn't match
sum(pg_total_relation_size()) and was investigating what I thought was a
bug in one of those functions.
I'm afraid we don't have all of the monitoring, logging, and change
control bits hooked up to the non-production DBs, so that is pretty much
all I have other than conjecture.
The only thing I can come up with is that perhaps someone forcefully
gave it a kick. SIGTERM is a necessary action once in a while to unwedge
a stuck db connection (killing the client script doesn't always get it
immediately).
Slony holds open a transaction on the master while reindexing the slave,
so perhaps someone thought the slave needed help. Making a copy of the
master takes several weeks. They may have killed slony, found the
statements still working away, SIGTERM'd them both, then restarted
slony. It wouldn't be an unusual pattern of events, particularly since
they've not been taught about pg_cancel_backend() yet (no 8.1 training).
How about this?
BEGIN;
TRUNCATE;
COPY;
REINDEX <SIGTERM during REINDEX>;
pg_class references old files. New files in their aborted state are left
behind?
--