Re: Somewhat automated method of cleaning table of corrupt records for pg_dump - Mailing list pgsql-general

From Martijn van Oosterhout
Subject Re: Somewhat automated method of cleaning table of corrupt records for pg_dump
Date
Msg-id 20121022203419.GA27021@svana.org
Whole thread Raw
In response to Re: Somewhat automated method of cleaning table of corrupt records for pg_dump  (Heiko Wundram <modelnine@modelnine.org>)
Responses Re: Somewhat automated method of cleaning table ofcorrupt records for pg_dump  (Heiko Wundram <modelnine@modelnine.org>)
List pgsql-general
On Mon, Oct 22, 2012 at 11:54:47AM +0200, Heiko Wundram wrote:
> If there's any other possibility of "out of the box" recovery -
> except writing myself a small script to walk all rows - I'd still be
> grateful for a hint.

Something that has worked for me in the past is:

$ SELECT ctid FROM table WHERE length(field) < 0;

This gives you a list of ctids (if it works) which you can delete. You
can also look for very large lengths. This works because length()
doesn't actually unpack the string, it just pulls the length.

It doesn't always work, it depends on the kind of corruption. You also
need to start at the leftmost text field and work forwards, because it
blows up while unpacking the tuples.

Otherwise you're back to doing things like:

$ SELECT sum(length(field || '')) FROM table OFFSET x LIMIT y;

And doing a bisect type algorithm to narrow down where it is. The
sum(length()) is so you throw away the output after checking field can
be extracted properly.  Once you get close you start printing the ctids
and take a stab at the ctid of the broken row and delete it.

If your table is large the OFFSET/LIMIT get slow. Unfortunatly Postgres
doesn't understand clauses like 'ctid > (page,tuple)' to start scanning
at a particular spot in the table.

It's not automated, though it might not be hard to do.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Attachment

pgsql-general by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Plug-pull testing worked, diskchecker.pl failed
Next
From: Nikolas Everett
Date:
Subject: 9.1 to 9.2 requires a dump/reload?