Thread: Data corruption / recover
Hi,
We're dealing with the following situation (PG version is 7.4.30): somehow, a DB has gone missing from pg_database (psql \l listing doesn't show it anymore). More, while you can still \c to it, a \d table listing only shows a fraction of the tables. A \d on the existing tables shows missing columns as well.. However, all the tables (still visible or not) may be queried by selects via psql and the actual data and columns in the result sets seem just fine.
We backed up the data folder and for the last 24h tried to recover the situation to a point where pg_dump works. Unfortunately we haven't been fully successful yet although we tried tons of suggestions in previous similar posts. Reindex didn't help, but a "VACUUM FULL FREEZE [ANALYZE]" did free 70% (14GB to 4) of the used disk space. Of course, the database wasn't vacuumed regularly as it should have been so:
WARNING: some databases have not been vacuumed in over 2 billion transactions
DETAIL: You may have already suffered transaction-wraparound data loss.
A full vacuum without the freeze option does make all databases, tables and users visible, but duplicated, i.e. \l then shows the database twice and a \d on the database lists all tables twice. The postgres user appears twice in pg_user, etc. pg_dump won't work if the database is not listed and won't either if the postgres user is duplicated..
Does this sound familiar to anyone? It's a pity that all the data seems to be there, but we're not able to recover and restore it to a clean DB. We'd appreciate any suggestions.
Thank you!
We're dealing with the following situation (PG version is 7.4.30): somehow, a DB has gone missing from pg_database (psql \l listing doesn't show it anymore). More, while you can still \c to it, a \d table listing only shows a fraction of the tables. A \d on the existing tables shows missing columns as well.. However, all the tables (still visible or not) may be queried by selects via psql and the actual data and columns in the result sets seem just fine.
We backed up the data folder and for the last 24h tried to recover the situation to a point where pg_dump works. Unfortunately we haven't been fully successful yet although we tried tons of suggestions in previous similar posts. Reindex didn't help, but a "VACUUM FULL FREEZE [ANALYZE]" did free 70% (14GB to 4) of the used disk space. Of course, the database wasn't vacuumed regularly as it should have been so:
WARNING: some databases have not been vacuumed in over 2 billion transactions
DETAIL: You may have already suffered transaction-wraparound data loss.
A full vacuum without the freeze option does make all databases, tables and users visible, but duplicated, i.e. \l then shows the database twice and a \d on the database lists all tables twice. The postgres user appears twice in pg_user, etc. pg_dump won't work if the database is not listed and won't either if the postgres user is duplicated..
Does this sound familiar to anyone? It's a pity that all the data seems to be there, but we're not able to recover and restore it to a clean DB. We'd appreciate any suggestions.
Thank you!
Daniel Paval <dani@scorpionsoftware.ro> writes: > We're dealing with the following situation (PG version is 7.4.30): 7.4? I sure hope you're going to migrate onto something less obsolete once you get out of this problem. In any still-in-support release series, autovacuum would have saved you from this. > somehow, a DB has gone missing from pg_database (psql \l listing doesn't > show it anymore). More, while you can still \c to it, a \d table listing > only shows a fraction of the tables. Transaction IDs in the system catalogs have wrapped around, so that some rows appear to be "in the future" and thus invisible. > A full vacuum _without_ the freeze option does make all databases, > tables and users visible, but duplicated, i.e. \l then shows the > database twice and a \d on the database lists all tables twice. Not too surprising. You'll need to examine each pair of duplicate catalog rows, figure out which version is newer, and then delete the older version. (Use the ctid column to disambiguate.) More than likely, there are some duplicates in your data as well as in the system catalogs, so I'm afraid you've got quite a bit of manual repair in front of you. regards, tom lane
On Thu, Feb 28, 2013 at 7:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Daniel Paval <dani@scorpionsoftware.ro> writes: >> We're dealing with the following situation (PG version is 7.4.30): > > 7.4? I sure hope you're going to migrate onto something less obsolete > once you get out of this problem. In any still-in-support release > series, autovacuum would have saved you from this. Also don't later versions refuse to start in anything other than single user mode once they get close to wrap around?
On 28.02.2013 18:48, Scott Marlowe wrote: > On Thu, Feb 28, 2013 at 7:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Daniel Paval <dani@scorpionsoftware.ro> writes: >>> We're dealing with the following situation (PG version is 7.4.30): >> 7.4? I sure hope you're going to migrate onto something less obsolete >> once you get out of this problem. In any still-in-support release >> series, autovacuum would have saved you from this. > Also don't later versions refuse to start in anything other than > single user mode once they get close to wrap around? > > Yes, I found out that newer versions play on the safe side and won't allow wrap-around at all.