Thread: ERROR: attempted to delete invisible tuple

ERROR: attempted to delete invisible tuple

From
Bryan Murphy
Date:
We had a hardware failure last week and had to switch over to our spare.  Unfortunately, at some point we managed to get some data corruption.  I've been going through the database table by table, record by record, trying to find the problems and fix them.  

This one has me stumped.  We have one table with about 13 million records in it.  I've identified 82 bad records.  When I try to query for the records, we get the following:

ERROR:  missing chunk number 0 for toast value 25692661 in pg_toast_25497233

That's fine.  I've run into that in a few other tables and have just been deleting the offending records.  However, in this particular table, when I try and delete the records, I get the following error message:

ERROR:  attempted to delete invisible tuple

I'm at a loss what to do next.  

Thanks,
Bryan

Re: ERROR: attempted to delete invisible tuple

From
Greg Stark
Date:
On Mon, Aug 17, 2009 at 4:23 PM, Bryan Murphy<bmurphy1976@gmail.com> wrote:
>  I've identified 82 bad records.  When I try to query for the records,
> we get the following:
> ERROR:  missing chunk number 0 for toast value 25692661 in pg_toast_25497233
> That's fine.  I've run into that in a few other tables and have just been
> deleting the offending records.  However, in this particular table, when I
> try and delete the records, I get the following error message:
> ERROR:  attempted to delete invisible tuple
> I'm at a loss what to do next.

Can you select xmin,xmax,ctid from the bad tuples?

And also the output of pg_resetxlog -n. I suspect you may have failed
to restore properly and have tuples from the "future" in your
database. Perhaps you're missing at the last WAL file in your
recovery?

If that's the case you may be able to get out of it with pg_resetxlog
and then cleaning up but it would be good to see the data first before
changing things.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: ERROR: attempted to delete invisible tuple

From
Bryan Murphy
Date:
On Mon, Aug 17, 2009 at 11:35 AM, Greg Stark <gsstark@mit.edu> wrote:
On Mon, Aug 17, 2009 at 4:23 PM, Bryan Murphy<bmurphy1976@gmail.com> wrote:
>  I've identified 82 bad records.  When I try to query for the records,
> we get the following:
> ERROR:  missing chunk number 0 for toast value 25692661 in pg_toast_25497233
> That's fine.  I've run into that in a few other tables and have just been
> deleting the offending records.  However, in this particular table, when I
> try and delete the records, I get the following error message:
> ERROR:  attempted to delete invisible tuple
> I'm at a loss what to do next.

Can you select xmin,xmax,ctid from the bad tuples?

And also the output of pg_resetxlog -n. I suspect you may have failed
to restore properly and have tuples from the "future" in your
database. Perhaps you're missing at the last WAL file in your
recovery?

If that's the case you may be able to get out of it with pg_resetxlog
and then cleaning up but it would be good to see the data first before
changing things.

Here's the xmin/xmax/ctid for three problematic records:

prodpublic=# select xmin,xmax,ctid from items_extended where id in ('34537ed90d7546d78f2c172fc8eed687', '3e1d99b7124742b7aaf2f869f7637b0e', '499b464f141a48619c5ce0475cbe9150');
   xmin    |   xmax    |    ctid    
-----------+-----------+------------
 262232659 | 308810030 | (689496,3)
 262629744 | 308810034 | (692765,1)
 262643188 | 308810035 | (692920,9)
(3 rows)

I have not yet run pg_resetxlog.

Thanks,
Bryan

Re: ERROR: attempted to delete invisible tuple

From
Bryan Murphy
Date:
Could I run pg_resetxlog on a warm spare?  Would that give the same result?  Unfortunately, this is our production system and I simply cannot bring it down at the moment to run pg_resetxlog.  

Bryan

On Mon, Aug 17, 2009 at 11:35 AM, Greg Stark <gsstark@mit.edu> wrote:
On Mon, Aug 17, 2009 at 4:23 PM, Bryan Murphy<bmurphy1976@gmail.com> wrote:
>  I've identified 82 bad records.  When I try to query for the records,
> we get the following:
> ERROR:  missing chunk number 0 for toast value 25692661 in pg_toast_25497233
> That's fine.  I've run into that in a few other tables and have just been
> deleting the offending records.  However, in this particular table, when I
> try and delete the records, I get the following error message:
> ERROR:  attempted to delete invisible tuple
> I'm at a loss what to do next.

Can you select xmin,xmax,ctid from the bad tuples?

And also the output of pg_resetxlog -n. I suspect you may have failed
to restore properly and have tuples from the "future" in your
database. Perhaps you're missing at the last WAL file in your
recovery?

If that's the case you may be able to get out of it with pg_resetxlog
and then cleaning up but it would be good to see the data first before
changing things.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: ERROR: attempted to delete invisible tuple

From
Tom Lane
Date:
Bryan Murphy <bmurphy1976@gmail.com> writes:
> Here's the xmin/xmax/ctid for three problematic records:

> prodpublic=# select xmin,xmax,ctid from items_extended where id in
> ('34537ed90d7546d78f2c172fc8eed687', '3e1d99b7124742b7aaf2f869f7637b0e',
> '499b464f141a48619c5ce0475cbe9150');
>    xmin    |   xmax    |    ctid
> -----------+-----------+------------
>  262232659 | 308810030 | (689496,3)
>  262629744 | 308810034 | (692765,1)
>  262643188 | 308810035 | (692920,9)
> (3 rows)

Hm, what's your current XID counter?  (pg_controldata would give an
approximate answer.)  I'm wondering if the xmax's are marked committed
but are in the future ...

            regards, tom lane

Re: ERROR: attempted to delete invisible tuple

From
Bryan Murphy
Date:
On Mon, Aug 17, 2009 at 12:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bryan Murphy <bmurphy1976@gmail.com> writes:
> Here's the xmin/xmax/ctid for three problematic records:

> prodpublic=# select xmin,xmax,ctid from items_extended where id in
> ('34537ed90d7546d78f2c172fc8eed687', '3e1d99b7124742b7aaf2f869f7637b0e',
> '499b464f141a48619c5ce0475cbe9150');
>    xmin    |   xmax    |    ctid
> -----------+-----------+------------
>  262232659 | 308810030 | (689496,3)
>  262629744 | 308810034 | (692765,1)
>  262643188 | 308810035 | (692920,9)
> (3 rows)

Hm, what's your current XID counter?  (pg_controldata would give an
approximate answer.)  I'm wondering if the xmax's are marked committed
but are in the future ...

                       regards, tom lane

postgres@domU-12-31-36-00-38-C1:~$ /usr/lib/postgresql/8.3/bin/pg_controldata /srv/pg_data/8.3/main/
pg_control version number:            833
Catalog version number:               200711281
Database system identifier:           5184368167085382525
Database cluster state:               in production
pg_control last modified:             Mon 17 Aug 2009 05:16:50 PM UTC
Latest checkpoint location:           B4A/45E0DE38
Prior checkpoint location:            B4A/3690AC00
Latest checkpoint's REDO location:    B4A/36929DF0
Latest checkpoint's TimeLineID:       5
Latest checkpoint's NextXID:          0/308829887
Latest checkpoint's NextOID:          26030968
Latest checkpoint's NextMultiXactId:  5958225
Latest checkpoint's NextMultiOffset:  40968054
Time of latest checkpoint:            Mon 17 Aug 2009 04:56:38 PM UTC
Minimum recovery ending location:     A17/C6D9A98
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Maximum length of locale name:        128
LC_COLLATE:                           en_US.UTF-8

Thanks,
Bryan 

Re: ERROR: attempted to delete invisible tuple

From
Tom Lane
Date:
Bryan Murphy <bmurphy1976@gmail.com> writes:
> On Mon, Aug 17, 2009 at 12:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hm, what's your current XID counter?  (pg_controldata would give an
>> approximate answer.)  I'm wondering if the xmax's are marked committed
>> but are in the future ...

> Latest checkpoint's NextXID:          0/308829887

Hmm, well, they're in the past *now*, but by less than 20000
transactions.  You said this was a production DB so that might not
represent very much elapsed time.  Does the DELETE still fail?

            regards, tom lane

Re: ERROR: attempted to delete invisible tuple

From
Greg Stark
Date:
On Mon, Aug 17, 2009 at 6:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Bryan Murphy <bmurphy1976@gmail.com> writes:
>> Here's the xmin/xmax/ctid for three problematic records:
>
>> prodpublic=# select xmin,xmax,ctid from items_extended where id in
>> ('34537ed90d7546d78f2c172fc8eed687', '3e1d99b7124742b7aaf2f869f7637b0e',
>> '499b464f141a48619c5ce0475cbe9150');
>>    xmin    |   xmax    |    ctid
>> -----------+-----------+------------
>>  262232659 | 308810030 | (689496,3)
>>  262629744 | 308810034 | (692765,1)
>>  262643188 | 308810035 | (692920,9)

Hm, perhaps I should have asked for xvac as well, as that can also
trigger an "invisible" status. You can't select xvac directly but if
you query cmin it might be possible to tell what's going on.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: ERROR: attempted to delete invisible tuple

From
Greg Stark
Date:
On Mon, Aug 17, 2009 at 6:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Hm, what's your current XID counter?  (pg_controldata would give an
> approximate answer.)  I'm wondering if the xmax's are marked committed
> but are in the future ...
>

FWIW that doesn't look right. That would result in
HeapTupleBeingUpdated. The only case where it could result in
"invisible" is if it's our own xid.

Excluding the cases where our own xid is in the tuple I think the
relevant cases are either

xmin aborted or in progress (or in future)
MOVED_OFF and xvac committed
MOVED_IN and xvac aborted or is in progress (or in future)

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: ERROR: attempted to delete invisible tuple

From
Bryan Murphy
Date:
On Mon, Aug 17, 2009 at 12:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bryan Murphy <bmurphy1976@gmail.com> writes:
> On Mon, Aug 17, 2009 at 12:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hm, what's your current XID counter?  (pg_controldata would give an
>> approximate answer.)  I'm wondering if the xmax's are marked committed
>> but are in the future ...

> Latest checkpoint's NextXID:          0/308829887

Hmm, well, they're in the past *now*, but by less than 20000
transactions.  You said this was a production DB so that might not
represent very much elapsed time.  Does the DELETE still fail?

                       regards, tom lane

Wow, how crazy is that.  I was just able to delete those three records.  I tried the remaining, and two were left that I could delete.  Waited 30 seconds, tried again and now they're gone as well.  As far as I'm aware, that's all the bad records in that table.  

The database went down last Weds.  How could they have gotten that far into the future?

Bryan

Re: ERROR: attempted to delete invisible tuple

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Excluding the cases where our own xid is in the tuple I think the
> relevant cases are either

> xmin aborted or in progress (or in future)
> MOVED_OFF and xvac committed
> MOVED_IN and xvac aborted or is in progress (or in future)

Ah.  I hadn't bothered to check the code in detail before asking about
the current XID.  Given subsequent data, it seems that current XID must
have moved past xvac while we were wondering about it.  This could mean
either corrupted xvac values, or that the crash caused current XID to go
backwards (suggesting loss of both the current pg_control and a big
chunk of WAL).  Since multiple tuples on different pages were involved,
I'm inclined to believe the second theory.

            regards, tom lane

Re: ERROR: attempted to delete invisible tuple

From
Greg Stark
Date:
On Mon, Aug 17, 2009 at 7:59 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Greg Stark <gsstark@mit.edu> writes:
>> Excluding the cases where our own xid is in the tuple I think the
>> relevant cases are either
>
>> xmin aborted or in progress (or in future)
>> MOVED_OFF and xvac committed
>> MOVED_IN and xvac aborted or is in progress (or in future)
>
> Ah.  I hadn't bothered to check the code in detail before asking about
> the current XID.  Given subsequent data, it seems that current XID must
> have moved past xvac while we were wondering about it.  This could mean
> either corrupted xvac values, or that the crash caused current XID to go
> backwards (suggesting loss of both the current pg_control and a big
> chunk of WAL).  Since multiple tuples on different pages were involved,
> I'm inclined to believe the second theory.

I would think xmin would be the fewest entities possibility but we'll
never know.

For what it's worth at EDB I dealt with another case like this and I
imagine others have too. I think it's too easy to do things in the
wrong order or miss a step and end up with these kinds of problems.

I would really like to know what happened here which caused the
problem. Do you have records of how you created the slave? When you
took the initial image, did you use a cold backup or a hot backup? Did
you use pg_start_backup()/pg_stop_backup()?

When you failed over was there anything special happening? Was it
because of a failure on the master? Was a vacuum full running?

When the slave came up do you have the log messages saying it was
starting recovery and when it was finishing recovery and starting
normal operations?

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: ERROR: attempted to delete invisible tuple

From
Bryan Murphy
Date:


On Mon, Aug 17, 2009 at 4:02 PM, Greg Stark <gsstark@mit.edu> wrote:
For what it's worth at EDB I dealt with another case like this and I
imagine others have too. I think it's too easy to do things in the
wrong order or miss a step and end up with these kinds of problems.

I would really like to know what happened here which caused the
problem. Do you have records of how you created the slave? When you
took the initial image, did you use a cold backup or a hot backup? Did
you use pg_start_backup()/pg_stop_backup()?

I always use log shipping.   The original snapshot was taken like the following (many months ago): 

start log shipping
pg_start_backup()
tar snapshot to nfs volume
pg_stop_backup()
restore snap shot on another machine
startup in recovery mode

The machine we restored to was not the original warm spare.  It was built as a snapshot from the original warm spare.  Every few weeks (or when paranoia sets in) I replace old spares with new spares to verify that the backups are working and bring the old spares online, verify them, then throw them away.

xfs_freeze -f filesystem
file system snapshot (EBS snapshot)
xfs_freeze -u filesystem 
restore snapshot on another machine
startup in recovery mode

When you failed over was there anything special happening? Was it
because of a failure on the master? Was a vacuum full running?

We never run vacuum full, but autovacuum is always turned on.  

At the time of failure, best I can tell is that the hard drive which contained the root file system and our swap partition was failing.  Unfortunately, I was not at the office when this occurred and I'm the best one at diagnosing what's going on.  Also, they're Amazon EC2 instances, so when it went dead, it went really dead.  It's no longer available to us, and I had to quickly reclaim the EBS volumes it was using in order to get a new hot spare up and running.

So, we basically lost everything but the console output of the original database, and of course, I can't find where I put that.  I do remember it strongly indicated that we were facing imminent drive failure.

When the slave came up do you have the log messages saying it was
starting recovery and when it was finishing recovery and starting
normal operations?

This is tricky as well.  We failed over to our spare, only to find out our load had grown so much the spare was no longer able to carry us the way it was configured.  We had to quickly rebuild a new spare, and then shift over to that one.  

So, we basically shutdown the spare, shifted the volume over to the new machine, and then brought it back online.  I then rebuilt a new warm spare from the new machine.

Which leads me to the one big flaw in all of this, the log files were all going to the local drives and not the EBS volumes so I've lost them and am now kicking myself in the ass for it.  

Bryan