Re: cannot delete corrupted rows after DB corruption: tuple concurrently updated - Mailing list pgsql-general

From john gale
Subject Re: cannot delete corrupted rows after DB corruption: tuple concurrently updated
Date
Msg-id 1E61A5BF-2388-4B35-861D-BC7A7323161B@smadness.com
Whole thread Raw
In response to Re: cannot delete corrupted rows after DB corruption: tuple concurrently updated  ("Tomas Vondra" <tv@fuzzy.cz>)
List pgsql-general

On Feb 26, 2014, at 2:59 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 26 Únor 2014, 8:45, john gale wrote:

munin2=# delete from testruns where ctid = '(37069305,4)';
ERROR:  tuple concurrently updated

AFAIK this error is raised when a before trigger modifies the row that is
being deleted. Imagine a trigger that does this

  UPDATE testruns SET mycolumn = 1 WHERE id = OLD.id;
  RETURN OLD;

Given the way MVCC in postgres works (copying row when updating), the
error makes sense. In 9.0 this worked by silently skipping the DELETE
(incidentally, I had a few reports about tables that can't be deleted
because of this in the past month).

Anyway, do you have any triggers on the table? If yes, try to disable
them. I suspect the data are corrupted in a way that causes update on the
deleted row - either directly, or maybe because of a cascading effect.


There were a few triggers auto-created by a foreign key constraint but we removed the constraint, which removed the triggers:

munin2=# select * from pg_trigger;
 tgrelid | tgname | tgfoid | tgtype | tgenabled | tgisinternal | tgconstrrelid | tgconstrindid | tgconstraint | tgdeferrable | tginitdeferred | tgnargs | tgattr | tgargs | tgqual 
---------+--------+--------+--------+-----------+--------------+---------------+---------------+--------------+--------------+----------------+---------+--------+--------+--------
(0 rows)

The tuple error still exists, however:

munin2=# delete from testruns where ctid = '(37069305,4)';
ERROR:  tuple concurrently updated
munin2=# select id from testruns where ctid = '(37069305,4)';
    id     
-----------
 141908486
(1 row)

munin2=# delete from testruns where id = 141908486;
ERROR:  tuple concurrently updated
munin2=# select * from testruns where id = 141908486;
ERROR:  unexpected chunk number 0 (expected 1) for toast value 155900302 in pg_toast_16822


I'm wondering if it might be caused by RI triggers - maybe yes, but I'm
not aware of any RI trigger doing updates.

That being said, I think that what you're doing is wrong. If you think you
have a corrupted database, I'd strongly suggest doing dump/restore. Or how
do you know there's no other corruption lurking in the files, slowly
spreading to other parts of the database?


We're aware that we're shoveling dirt to patch a crack in a large highway.  However at the moment we value uptime rather than strict integrity of the data (141mil rows allows some wiggle room), and since we don't modify these rows after they're inserted, I can't imagine how this kind of row corruption can "slowly spread to other parts of the database".

~ john

pgsql-general by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Cancelling of autovacuums considered harmful
Next
From: Jeff Janes
Date:
Subject: Re: Cancelling of autovacuums considered harmful