Thread: Proposed toast info extraction function for disaster recovery
I was just helping out some Red Hat colleagues who were trying to deal with a corrupted database in which pg_dump was failing like so: pg_dump: Error message from server: ERROR: missing chunk number 3 for toast value 205026 Frequently the best recovery action for this involves finding and deleting the row that references the TOAST value ... but there is no good way to find that row. AFAIK you have to engage in a very tedious binary search to see where you get the error and where you don't. On a large table this is just not fun at all; especially if you have or think you may have multiple corrupted rows. It strikes me that it'd be worth defining a function that could extract the toast OID of any toasted datum, along the lines of function pg_get_toast_id(anyelement) returns oid When faced with such a problem, you could do select ctid from my_giant_tablewhere pg_get_toast_id(a_toastable_column) = 205026 or pg_get_toast_id(another_toastable_column)= 205026 ...; This would involve a seqscan since there'd be no way to index it, but at least it'd be only one seqscan rather than repeated trials. I'd envision making the thing return NULL if given a varlena value that happened not to be toasted out-of-line. If given a non-varlena data type, it could either return NULL or raise an error, depending on whether you think that represents pilot error or not. Comments? Anyone else ever wished they had this? regards, tom lane
On Tue, Jun 07, 2005 at 05:48:31PM -0400, Tom Lane wrote: > Comments? Anyone else ever wished they had this? Hmm. There's another problem which shows up when the server throws an error about an invalid allocation request. This also happens when a varlena field is corrupted, but you don't have the toast id because the error message doesn't say. I guess one way to solve it would be to add errcontext() calls within toast realms. Now that I think about it, maybe my problem is not related to TOAST at all, but to a corrupted varlena field. So if the corruption does not involve toasting, I'm in the same position as before, i.e. I haven't found out what is the corrupted tuple. -- Alvaro Herrera (<alvherre[a]surnet.cl>) "Saca el libro que tu religión considere como el indicado para encontrar la oración que traiga paz a tu alma. Luego rebootea el computador y ve si funciona" (Carlos Duclós)
Alvaro Herrera <alvherre@surnet.cl> writes: > Now that I think about it, maybe my problem is not related to TOAST at > all, but to a corrupted varlena field. Right. > So if the corruption does not > involve toasting, I'm in the same position as before, i.e. I haven't > found out what is the corrupted tuple. Hmm. Maybe we need something more like a "lint check" for tables, ie run through and look for visibly corrupt data, such as obviously impossible lengths for varlena fields. (I thought about adding more checks to the standard code paths, such as heap_deformtuple, but aside from the speed penalty involved, most of those places don't actually have enough context to issue a useful error message. A "lint check" could reasonably expect to tell you by ctid which tuple has a problem.) Come to think of it, didn't someone already write something close to this a few years ago? regards, tom lane
Tom Lane wrote: > Hmm. Maybe we need something more like a "lint check" for tables, ie > run through and look for visibly corrupt data, such as obviously > impossible lengths for varlena fields. > Come to think of it, didn't someone already write something close to > this a few years ago? Sounds like pgfsck: http://svana.org/kleptog/pgsql/pgfsck.html Unfortunately it doesn't seem to have been updated to work with recent versions of PG (> 7.3) -Neil