Re: pg_dumpall problem - duplicated users - Mailing list pgsql-admin
From | Tom Lane |
---|---|
Subject | Re: pg_dumpall problem - duplicated users |
Date | |
Msg-id | 13804.1125670088@sss.pgh.pa.us Whole thread Raw |
In response to | pg_dumpall problem - duplicated users (Przemysław Nogaj <pn@tey.pl>) |
List | pgsql-admin |
Bartosz Nowak <grubby@go2.pl> writes: > Tom Lane napisa�(a): >>> Yeah. We need to look at the contents of the infomask for these rows. > Item 1 -- Length: 95 Offset: 8096 (0x1fa0) Flags: USED > XID: min (1) CMIN|XMAX: 50469 CMAX|XVAC: 0 > Block Id: 0 linp Index: 29 Attributes: 8 Size: 24 > infomask: 0x0501 (HASNULL|XMIN_COMMITTED|XMAX_COMMITTED) > t_bits: [0]: 0x1f > 1fa0: 01000000 25c50000 00000000 00000000 ....%........... > 1fb0: 1d000800 0105181f 706f7374 67726573 ........postgres Well, you are definitely suffering from transaction ID wraparound. Other rows in the table bear XIDs as high as 2559800779, so your current transaction counter is at least that much. Transaction 50469, which obsoleted this row, was a *long* time ago. The reason that this row is suddenly being seen as valid, rather than deleted, is that the XID counter has wrapped around and so 50469 is now seen as "in the future" not "in the past" --- that is, the code thinks the row is deleted, but by a concurrent transaction that started later than our own transaction. So it's valid, but you can't delete it because it's already deleted. I can't see any evidence of data corruption. I think you just forgot to vacuum pg_shadow for a very long time. This row should have been physically deleted long ago --- but evidently VACUUM was never run across pg_shadow, until it was too late. While you could perhaps manually repair this one row with a hex editor, I don't think that will get you out of trouble. The odds are good that there are similar problems elsewhere, including both system catalogs and user tables (ie, if you weren't vacuuming pg_shadow, what else weren't you vacuuming?). I'm not sure that you are going to be able to get out of this without losing data. Here's what I'd try (but bear in mind that I just made up this procedure on the spot): 1. Stop the postmaster and take a physical dump of the $PGDATA tree, if you don't have one already. This will at least let you get back to where you are now if the rest doesn't work. 2. Restart the postmaster, but don't let anyone else in (for safety I'd suggest modifying pg_hba.conf to guarantee this). You're going to be down for a little bit :-( 3. Do a VACUUM FREEZE (database-wide) in *each* database, or at least each one you care about recovering. *Don't* use FULL. 4. Stop the postmaster again. Use pg_controldata to note the current transaction counter ("latest checkpoint's NextXID"). Then use pg_resetxlog with the -x switch to deliberately back up the XID counter to something less than 2 billion. 5. Restart the postmaster, and try to pg_dump everything. Also do whatever sanity tests you can think of --- look for missing and duplicated rows in particular. 6. If you think you have a good dump, initdb and reload the dump, and you're back in business. If it didn't work, well, revert to your last backup :-( After you get out of your immediate problem, you had better look at your vacuuming procedures and find out why you are in this fix. See http://www.postgresql.org/docs/7.3/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND regards, tom lane
pgsql-admin by date: