Re: Somewhat automated method of cleaning table ofcorrupt records for pg_dump - Mailing list pgsql-general

From Heiko Wundram
Subject Re: Somewhat automated method of cleaning table ofcorrupt records for pg_dump
Date
Msg-id d987e7cb60484100b6f7363e85fedc4a@modelnine.org
Whole thread Raw
In response to Re: Somewhat automated method of cleaning table of corrupt records for pg_dump  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-general
Am 22.10.2012 22:34, schrieb Martijn van Oosterhout:
> Something that has worked for me in the past is:
>
> $ SELECT ctid FROM table WHERE length(field) < 0;

As the structure of the tables (about four were affected) isn't
something that I wanted to actually look at, I set off writing a small
script in Python which fetches ctids from all tables in the DB, then
tries to fetch the corresponding row, and in case that fails, removes
it. That worked beautifully for me (it removed somewhere around
eight/nine rows from the DB), and I can now dump the whole database
correctly. Running the script took around 48 hours for the complete
database.

It's not pretty, but works, and for anybody else finding themselves in
this situation, here's the code:

--------------------------------
# -*- coding: utf-8 -*-

# Imports
# -------

import psycopg2


# Open database
# -------------

conn = psycopg2.connect("dbname=<name of DB> user=postgres")


# Select tables
# -------------

cur = conn.cursor()
cur.execute("SELECT relname FROM pg_class WHERE relkind = 'r' AND
relnamespace = 2200")
TABS = cur.fetchall()
cur.close()


# Walk tables
# -----------

for tab in TABS:
     # Fetching from tab.
     print "Fetching from:", tab[0]

     # Fetch cursor.
     cur = conn.cursor()
     cur.execute("SELECT ctid FROM %s" % (tab[0],))
     stmt = "SELECT * FROM %s WHERE ctid = %%s" % (tab[0],)
     dstmt = "DELETE FROM %s WHERE ctid = %%s" % (tab[0],)
     todelete = []
     while True:
         # Fetch an element.
         ctid = cur.fetchone()
         if ctid is None:
             print "Done with:", tab[0]
             break

         # Fetch complete element.
         curr = conn.cursor()
         try:
             curr.execute(stmt, ctid)
             curr.fetchone()
             curr.close()
         except:
             # Failed to fetch, clean any pending transaction.
             print "Failed to fetch:", ctid[0]
             curr.close()
             cur.close()
             conn.rollback()

             # Create new cursor on connection.
             curr = conn.cursor()
             curr.execute(dstmt, ctid)
             conn.commit()
             curr.close()

             # Reset cursor.
             cur = conn.cursor()
             cur.execute("SELECT ctid FROM %s" % (tab[0],))
--------------------------------

Thanks again for all the hints!

--
--- Heiko.


pgsql-general by date:

Previous
From: Murray Cumming
Date:
Subject: Disable unix-domain sockets?
Next
From: Tom Lane
Date:
Subject: Re: Disable unix-domain sockets?