Thread: problems with table corruption continued

problems with table corruption continued

From
"Brian Hirt"
Date:
Okay, here's a follow up to my previous messages "ACK table corrupted,
unique index violated."

I've been trying to clean up the corruptions that i mentioned earlier.  I
felt most comfortable shutting down all my application servers, restarting
postgres, doing a dump of my database and rebuilding it with a pginit and
complete reload.  So far so good.  I went to fix one of the corrupted tables
and i have another strange experience.  I'm still looking into other
possibilities such as a hardware failure; but i thought this might be
interesting or helpful in the context of my previous post:  Basically the
table with duplicate oid/id now has unique oid from the relead, so I'm going
to delete the duplicate rows and recreate the unique index on the identity
column.

basement=# select count(*),developer_aka_id from developer_aka group by
developer_aka_id having count(*) <> 1;count | developer_aka_id
-------+------------------    2 |             9789    2 |            10025    2 |            40869
(3 rows)

basement=# select oid,* from developer_aka where developer_aka_id in
(9789,10025,40869); oid  | developer_id | developer_aka_id |    first_name     | last_name
-------+--------------+------------------+-------------------+-----------48390 |         1916 |             9789 |
Chris            | Smith48402 |        35682 |            40869 | Donald "Squirral" | Fisk48425 |         4209 |
   10025 | Mike              | Glosecki48426 |         1916 |             9789 | Chris             | Smith48427 |
35682 |            40869 | Donald "Squirral" | Fisk48428 |         4209 |            10025 | Mike              |
Glosecki
(6 rows)

basement=# delete from developer_aka where oid in (48390,48402,48425);
DELETE 3
basement=# select count(*),developer_aka_id from developer_aka group by
developer_aka_id having count(*) <> 1;count | developer_aka_id
-------+------------------
(0 rows)

basement=# create unique index developer_aka_pkey on
developer_aka(developer_aka_id);
CREATE
basement=# VACUUM ANALYZE developer_aka;
ERROR:  Cannot insert a duplicate key into unique index developer_aka_pkey




Re: problems with table corruption continued

From
"Brian Hirt"
Date:
Tom & All:

I've been looking into this problem some more and I've been able to
consistantly reproduce the error.  I've testing it on two different
machines; both running 7.1.3 on a i686-pc-linux-gnu configuration.  One
machine is RH7.2 and the other is RH6.2.

I still haven't been able to reproduce the problem with corrupted
indexes/tables (NUMBER OF TUPLES IS NOT THE SAME AS HEAP -- duplicate rows
with same oid & pkey); but I'm hopefull that these two problems are related.
Also, i wanted to inform you that the same two tables became corrupted on
12-15-2001 after my 12-12-2001 reload).

I've attached three files, a typescript, and two sql files.  I found that if
the commands were combined into a single file and run in a single pgsql
session, the error does not occur -- so it's important to follow the
commands exactly like they are in the typescript.  If for some reason the
errors aren't reproducable on your machines, let me know and we will try to
find out what's unique about my setup.

Thanks.

----- Original Message -----
From: "Brian Hirt" <bhirt@mobygames.com>
To: "Postgres Hackers" <pgsql-hackers@postgresql.org>
Cc: "Brian A Hirt" <bhirt@berkhirt.com>
Sent: Wednesday, December 12, 2001 11:30 AM
Subject: [HACKERS] problems with table corruption continued


> Okay, here's a follow up to my previous messages "ACK table corrupted,
> unique index violated."
>
> I've been trying to clean up the corruptions that i mentioned earlier.  I
> felt most comfortable shutting down all my application servers, restarting
> postgres, doing a dump of my database and rebuilding it with a pginit and
> complete reload.  So far so good.  I went to fix one of the corrupted
tables
> and i have another strange experience.  I'm still looking into other
> possibilities such as a hardware failure; but i thought this might be
> interesting or helpful in the context of my previous post:  Basically the
> table with duplicate oid/id now has unique oid from the relead, so I'm
going
> to delete the duplicate rows and recreate the unique index on the identity
> column.
>
> basement=# select count(*),developer_aka_id from developer_aka group by
> developer_aka_id having count(*) <> 1;
>  count | developer_aka_id
> -------+------------------
>      2 |             9789
>      2 |            10025
>      2 |            40869
> (3 rows)
>
> basement=# select oid,* from developer_aka where developer_aka_id in
> (9789,10025,40869);
>   oid  | developer_id | developer_aka_id |    first_name     | last_name
> -------+--------------+------------------+-------------------+-----------
>  48390 |         1916 |             9789 | Chris             | Smith
>  48402 |        35682 |            40869 | Donald "Squirral" | Fisk
>  48425 |         4209 |            10025 | Mike              | Glosecki
>  48426 |         1916 |             9789 | Chris             | Smith
>  48427 |        35682 |            40869 | Donald "Squirral" | Fisk
>  48428 |         4209 |            10025 | Mike              | Glosecki
> (6 rows)
>
> basement=# delete from developer_aka where oid in (48390,48402,48425);
> DELETE 3
> basement=# select count(*),developer_aka_id from developer_aka group by
> developer_aka_id having count(*) <> 1;
>  count | developer_aka_id
> -------+------------------
> (0 rows)
>
> basement=# create unique index developer_aka_pkey on
> developer_aka(developer_aka_id);
> CREATE
> basement=# VACUUM ANALYZE developer_aka;
> ERROR:  Cannot insert a duplicate key into unique index developer_aka_pkey
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Re: problems with table corruption continued

From
Tom Lane
Date:
"Brian Hirt" <bhirt@mobygames.com> writes:
> I've attached three files, a typescript, and two sql files.  I found that if
> the commands were combined into a single file and run in a single pgsql
> session, the error does not occur -- so it's important to follow the
> commands exactly like they are in the typescript.  If for some reason the
> errors aren't reproducable on your machines, let me know and we will try to
> find out what's unique about my setup.

Indeed, it reproduces just fine here --- not only that, but I get an
Assert failure when I try it in current sources.

Many, many thanks!  I shall start digging forthwith...
        regards, tom lane


Re: problems with table corruption continued

From
"Brian Hirt"
Date:
Great,  I'm also trying to create a reproducable test case for the original
problem i reported with duplicate rows/oids/pkeys.  Maybe both problems are
a result of the same bug; i don't know.

BTW, if you remove the index that is based on the pgpsql function
"developer_aka_seach_idx" the problem is not reproducable.  But you've
probably figured that out by now.

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Brian Hirt" <bhirt@mobygames.com>
Cc: "Postgres Hackers" <pgsql-hackers@postgresql.org>; "Brian A Hirt"
<bhirt@berkhirt.com>
Sent: Tuesday, December 18, 2001 10:52 AM
Subject: Re: [HACKERS] problems with table corruption continued


> "Brian Hirt" <bhirt@mobygames.com> writes:
> > I've attached three files, a typescript, and two sql files.  I found
that if
> > the commands were combined into a single file and run in a single pgsql
> > session, the error does not occur -- so it's important to follow the
> > commands exactly like they are in the typescript.  If for some reason
the
> > errors aren't reproducable on your machines, let me know and we will try
to
> > find out what's unique about my setup.
>
> Indeed, it reproduces just fine here --- not only that, but I get an
> Assert failure when I try it in current sources.
>
> Many, many thanks!  I shall start digging forthwith...
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>



Re: problems with table corruption continued

From
Tom Lane
Date:
"Brian Hirt" <bhirt@mobygames.com> writes:
> [ example case that creates a duplicate tuple ]

Got it.  The problem arises in this section of vacuum.c, near the bottom
of repair_frag:
               if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED))               {                   if
((TransactionId)tuple.t_data->t_cmin != myXID)                       elog(ERROR, "Invalid XID in t_cmin (3)");
        if (tuple.t_data->t_infomask & HEAP_MOVED_OFF)                   {                       itemid->lp_flags &=
~LP_USED;                      num_tuples++;                   }                   else
elog(ERROR,"HEAP_MOVED_OFF was expected (2)");               }
 

This is trying to get rid of the original copy of a tuple that's been
moved to another page.  The problem is that your index function causes a
table scan, which means that by the time control gets here, someone else
has looked at this tuple and marked it good --- so the initial test of
HEAP_XMIN_COMMITTED fails, and the tuple is never removed!

I would say that it's incorrect for vacuum.c to assume that
HEAP_XMIN_COMMITTED can't become set on HEAP_MOVED_OFF/HEAP_MOVED_IN
tuples during the course of vacuum's processing; after all, the xmin
definitely does refer to a committed xact, and we can't realistically
assume that we know what processing will be induced by user-defined
index functions.  Vadim, what do you think?  How should we fix this?

In the meantime, Brian, I think you ought to get rid of your index
CREATE  INDEX "developer_aka_search_idx" on "developer_aka" using btree ( developer_aka_search_name
("developer_aka_id")"varchar_ops" );
 
I do not think this index is well-defined: an index function ought
to have the property that its output depends solely on its input and
cannot change over time.  This function cannot make that claim.
(In 7.2, you can't even create the index unless you mark the function
iscachable, which is really a lie.)  I'm not even real sure what you
expect the index to do for you...

I do not know whether this effect explains all the reports of duplicate
tuples we've seen in the last few weeks.  We need to ask whether the
other complainants were using index functions that tried to do table
scans.
        regards, tom lane


Re: problems with table corruption continued

From
Tom Lane
Date:
"Brian Hirt" <bhirt@mobygames.com> writes:
> Great,  I'm also trying to create a reproducable test case for the original
> problem i reported with duplicate rows/oids/pkeys.  Maybe both problems are
> a result of the same bug; i don't know.

Were the duplicate rows all in tables that had functional indexes based
on functions similar to developer_aka_search_name?  The problem we're
seeing here seems to be due to VACUUM not being able to cope with the
side effects of the SELECT inside the index function.
        regards, tom lane


Re: problems with table corruption continued

From
"Brian Hirt"
Date:
Yes, both tables had similiar functions, and the corruption was limited to
only those two tables.

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Brian Hirt" <bhirt@mobygames.com>
Cc: "Postgres Hackers" <pgsql-hackers@postgresql.org>; "Brian A Hirt"
<bhirt@berkhirt.com>
Sent: Tuesday, December 18, 2001 11:53 AM
Subject: Re: [HACKERS] problems with table corruption continued


> "Brian Hirt" <bhirt@mobygames.com> writes:
> > Great,  I'm also trying to create a reproducable test case for the
original
> > problem i reported with duplicate rows/oids/pkeys.  Maybe both problems
are
> > a result of the same bug; i don't know.
>
> Were the duplicate rows all in tables that had functional indexes based
> on functions similar to developer_aka_search_name?  The problem we're
> seeing here seems to be due to VACUUM not being able to cope with the
> side effects of the SELECT inside the index function.
>
> regards, tom lane
>



Re: problems with table corruption continued

From
"Brian Hirt"
Date:
Tom,

I'm glad you found the cause.  I already deleted the index a few days back
after I strongly suspected that they were related to the problem.  The
reason i created the index was to help locate a record based on the name of
a person with an index lookup instead of a sequential scan on a table with
several hundred thousand rows.

I was trying to avoid adding additional computed fields to the tables and
maintaining them with triggers, indexing and searching on them.  The index
function seemed like an elegant solution to the problem; although at the
time I was completely unaware of 'iscachable';

Do you think this might also explain the following errors i was seeing?

NOTICE:  Cannot rename init file
/moby/pgsql/base/156130/pg_internal.init.19833 to
/moby/pgsql/base/156130/pg_internal.init: No such file or directory

--brian

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Brian Hirt" <bhirt@mobygames.com>
Cc: "Vadim Mikheev" <vmikheev@sectorbase.com>; "Postgres Hackers"
<pgsql-hackers@postgresql.org>; "Brian A Hirt" <bhirt@berkhirt.com>
Sent: Tuesday, December 18, 2001 11:48 AM
Subject: Re: [HACKERS] problems with table corruption continued


> "Brian Hirt" <bhirt@mobygames.com> writes:
> > [ example case that creates a duplicate tuple ]
>
> Got it.  The problem arises in this section of vacuum.c, near the bottom
> of repair_frag:
>
>                 if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED))
>                 {
>                     if ((TransactionId) tuple.t_data->t_cmin != myXID)
>                         elog(ERROR, "Invalid XID in t_cmin (3)");
>                     if (tuple.t_data->t_infomask & HEAP_MOVED_OFF)
>                     {
>                         itemid->lp_flags &= ~LP_USED;
>                         num_tuples++;
>                     }
>                     else
>                         elog(ERROR, "HEAP_MOVED_OFF was expected (2)");
>                 }
>
> This is trying to get rid of the original copy of a tuple that's been
> moved to another page.  The problem is that your index function causes a
> table scan, which means that by the time control gets here, someone else
> has looked at this tuple and marked it good --- so the initial test of
> HEAP_XMIN_COMMITTED fails, and the tuple is never removed!
>
> I would say that it's incorrect for vacuum.c to assume that
> HEAP_XMIN_COMMITTED can't become set on HEAP_MOVED_OFF/HEAP_MOVED_IN
> tuples during the course of vacuum's processing; after all, the xmin
> definitely does refer to a committed xact, and we can't realistically
> assume that we know what processing will be induced by user-defined
> index functions.  Vadim, what do you think?  How should we fix this?
>
> In the meantime, Brian, I think you ought to get rid of your index
> CREATE  INDEX "developer_aka_search_idx" on "developer_aka" using btree
( developer_aka_search_name ("developer_aka_id") "varchar_ops" );
> I do not think this index is well-defined: an index function ought
> to have the property that its output depends solely on its input and
> cannot change over time.  This function cannot make that claim.
> (In 7.2, you can't even create the index unless you mark the function
> iscachable, which is really a lie.)  I'm not even real sure what you
> expect the index to do for you...
>
> I do not know whether this effect explains all the reports of duplicate
> tuples we've seen in the last few weeks.  We need to ask whether the
> other complainants were using index functions that tried to do table
> scans.
>
> regards, tom lane
>



Re: problems with table corruption continued

From
"Mikheev, Vadim"
Date:
> This is trying to get rid of the original copy of a tuple that's been
> moved to another page.  The problem is that your index 
> function causes a
> table scan, which means that by the time control gets here, 
> someone else
> has looked at this tuple and marked it good --- so the initial test of
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> HEAP_XMIN_COMMITTED fails, and the tuple is never removed!
> 
> I would say that it's incorrect for vacuum.c to assume that
> HEAP_XMIN_COMMITTED can't become set on HEAP_MOVED_OFF/HEAP_MOVED_IN
> tuples during the course of vacuum's processing; after all, the xmin
> definitely does refer to a committed xact, and we can't realistically
> assume that we know what processing will be induced by user-defined
> index functions.  Vadim, what do you think?  How should we fix this?

But it's incorrect for table scan to mark tuple as good neither.
Looks like we have to add checks for the case
TransactionIdIsCurrentTransactionId(tuple->t_cmin) when
there is HEAP_MOVED_OFF or HEAP_MOVED_IN in t_infomask to
all HeapTupleSatisfies* in tqual.c as we do in
HeapTupleSatisfiesDirty - note comments about uniq btree-s there.

Vadim


Re: problems with table corruption continued

From
Tom Lane
Date:
"Brian Hirt" <bhirt@mobygames.com> writes:
> I was trying to avoid adding additional computed fields to the tables and
> maintaining them with triggers, indexing and searching on them.  The index
> function seemed like an elegant solution to the problem

Understood, but can you write the index function in a way that avoids
having it do a SELECT to get at data that it hasn't been passed?  I'm
wondering if you can't define the function as justf(first_name, last_name) = upper(first_name || ' ' || last_name)
and create the index on f(first_name, last_name).  You haven't shown us
the queries you expect the index to be helpful for, so maybe this is not
workable...
        regards, tom lane


Re: problems with table corruption continued

From
Tom Lane
Date:
"Brian Hirt" <bhirt@mobygames.com> writes:
> Do you think this might also explain the following errors i was seeing?

> NOTICE:  Cannot rename init file
> /moby/pgsql/base/156130/pg_internal.init.19833 to
> /moby/pgsql/base/156130/pg_internal.init: No such file or directory

No, that error is not coming from anywhere near VACUUM; it's from
relcache startup (see write_irels in src/backend/utils/cache/relcache.c).
The rename source file has just been created in the very same routine,
so it's difficult to see how one would get a "No such file" failure from
rename().

It is possible that several backends create temp init files at the
same time and all try to rename their own temp files into place as
the pg_internal.init file.  However, that should work: the rename
man page says
    The rename() system call causes the source file to be renamed to    target.  If target exists, it is first removed.
Both source and    target must be of the same type (that is, either directories or    nondirectories), and must reside
onthe same file system.
 
    If target can be created or if it existed before the call, rename()    guarantees that an instance of target will
exist,even if the system    crashes in the midst of the operation.
 

so we should end up with the extra copies deleted and just one init file
remaining after the slowest backend renames its copy into place.

Do you by chance have your database directory mounted via NFS, or some
other strange filesystem where the normal semantics of concurrent rename
might not work quite right?

FWIW, I believe this condition is pretty harmless, but it would be nice
to understand exactly why you're seeing the message.
        regards, tom lane


Re: problems with table corruption continued

From
Tom Lane
Date:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
>> I would say that it's incorrect for vacuum.c to assume that
>> HEAP_XMIN_COMMITTED can't become set on HEAP_MOVED_OFF/HEAP_MOVED_IN
>> tuples during the course of vacuum's processing; after all, the xmin
>> definitely does refer to a committed xact, and we can't realistically
>> assume that we know what processing will be induced by user-defined
>> index functions.  Vadim, what do you think?  How should we fix this?

> But it's incorrect for table scan to mark tuple as good neither.

Oh, that makes sense.

> Looks like we have to add checks for the case
> TransactionIdIsCurrentTransactionId(tuple->t_cmin) when
> there is HEAP_MOVED_OFF or HEAP_MOVED_IN in t_infomask to
> all HeapTupleSatisfies* in tqual.c as we do in
> HeapTupleSatisfiesDirty - note comments about uniq btree-s there.

Sounds like a plan.  Do you want to work on this, or shall I?
        regards, tom lane


Re: problems with table corruption continued

From
"Brian Hirt"
Date:
Tom,

I probably could have written the function like you suggest in this email,
and i probably will look into doing so.   This was the first time I tried
creating an index function, and one of the few times i've used plpgsql.  In
general i have very little experience with doing this kind of stuff in
postgres (i try to stick to standard SQL as much as possible) and it looks
like i've stumbled onto this problem because of a bad design decision on my
part and a lack of understanding of how index functions work.

Thanks for the suggestion.

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Brian Hirt" <bhirt@mobygames.com>
Cc: "Postgres Hackers" <pgsql-hackers@postgresql.org>; "Brian A Hirt"
<bhirt@berkhirt.com>
Sent: Tuesday, December 18, 2001 12:20 PM
Subject: Re: [HACKERS] problems with table corruption continued


> "Brian Hirt" <bhirt@mobygames.com> writes:
> > I was trying to avoid adding additional computed fields to the tables
and
> > maintaining them with triggers, indexing and searching on them.  The
index
> > function seemed like an elegant solution to the problem
>
> Understood, but can you write the index function in a way that avoids
> having it do a SELECT to get at data that it hasn't been passed?  I'm
> wondering if you can't define the function as just
> f(first_name, last_name) = upper(first_name || ' ' || last_name)
> and create the index on f(first_name, last_name).  You haven't shown us
> the queries you expect the index to be helpful for, so maybe this is not
> workable...
>
> regards, tom lane
>



Re: problems with table corruption continued

From
Tom Lane
Date:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
>> Sounds like a plan. Do you want to work on this, or shall I?

> Sorry, I'm not able to do this fast enough for current pre-release stage -:(

Okay.  I'll work on a patch and send it to you for review, if you like.
        regards, tom lane


Re: problems with table corruption continued

From
"Brian Hirt"
Date:
Well, when my application starts, about 100 backend database connections
start up at the same time so this fits in with the circumstance you
describe.  However, I'm just using a standard ext2 filesystem on 2.2 linux
kernel.

It's good to know that this error should be harmless.

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Brian Hirt" <bhirt@mobygames.com>
Cc: "Postgres Hackers" <pgsql-hackers@postgresql.org>; "Brian A Hirt"
<bhirt@berkhirt.com>
Sent: Tuesday, December 18, 2001 12:32 PM
Subject: Re: [HACKERS] problems with table corruption continued


> "Brian Hirt" <bhirt@mobygames.com> writes:
> > Do you think this might also explain the following errors i was seeing?
>
> > NOTICE:  Cannot rename init file
> > /moby/pgsql/base/156130/pg_internal.init.19833 to
> > /moby/pgsql/base/156130/pg_internal.init: No such file or directory
>
> No, that error is not coming from anywhere near VACUUM; it's from
> relcache startup (see write_irels in src/backend/utils/cache/relcache.c).
> The rename source file has just been created in the very same routine,
> so it's difficult to see how one would get a "No such file" failure from
> rename().
>
> It is possible that several backends create temp init files at the
> same time and all try to rename their own temp files into place as
> the pg_internal.init file.  However, that should work: the rename
> man page says
>
>      The rename() system call causes the source file to be renamed to
>      target.  If target exists, it is first removed.  Both source and
>      target must be of the same type (that is, either directories or
>      nondirectories), and must reside on the same file system.
>
>      If target can be created or if it existed before the call, rename()
>      guarantees that an instance of target will exist, even if the system
>      crashes in the midst of the operation.
>
> so we should end up with the extra copies deleted and just one init file
> remaining after the slowest backend renames its copy into place.
>
> Do you by chance have your database directory mounted via NFS, or some
> other strange filesystem where the normal semantics of concurrent rename
> might not work quite right?
>
> FWIW, I believe this condition is pretty harmless, but it would be nice
> to understand exactly why you're seeing the message.
>
> regards, tom lane
>



Re: problems with table corruption continued

From
"Mikheev, Vadim"
Date:
> > Looks like we have to add checks for the case
> > TransactionIdIsCurrentTransactionId(tuple->t_cmin) when
> > there is HEAP_MOVED_OFF or HEAP_MOVED_IN in t_infomask to
> > all HeapTupleSatisfies* in tqual.c as we do in
> > HeapTupleSatisfiesDirty - note comments about uniq btree-s there.
> 
> Sounds like a plan. Do you want to work on this, or shall I?

Sorry, I'm not able to do this fast enough for current pre-release stage -:(

Vadim


Re: problems with table corruption continued

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane
> 
> "Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
> >> I would say that it's incorrect for vacuum.c to assume that
> >> HEAP_XMIN_COMMITTED can't become set on HEAP_MOVED_OFF/HEAP_MOVED_IN
> >> tuples during the course of vacuum's processing; after all, the xmin
> >> definitely does refer to a committed xact, and we can't realistically
> >> assume that we know what processing will be induced by user-defined
> >> index functions.  Vadim, what do you think?  How should we fix this?
> 
> > But it's incorrect for table scan to mark tuple as good neither.
> 
> Oh, that makes sense.
> 
> > Looks like we have to add checks for the case
> > TransactionIdIsCurrentTransactionId(tuple->t_cmin) when
> > there is HEAP_MOVED_OFF or HEAP_MOVED_IN in t_infomask to
> > all HeapTupleSatisfies* in tqual.c as we do in
> > HeapTupleSatisfiesDirty - note comments about uniq btree-s there.

Should the change be TransactionIdIsInProgress(tuple->t_cmin) ?

The cause of Brian's issue was exactly what I was afraid of. 
VACUUM is guarded by AccessExclusive lock but IMHO we
shouldn't rely on it too heavily. 

regards,
Hiroshi Inoue


Re: problems with table corruption continued

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> Should the change be TransactionIdIsInProgress(tuple->t_cmin) ?

I'd be willing to doif (tuple->t_cmin is my XID)    do something;Assert(!TransactionIdIsInProgress(tuple->t_cmin));
if that makes you feel better.  But anything that's scanning
a table exclusive-locked by another transaction is broken.
(BTW: contrib/pgstattuple is thus broken.  Will fix.)
        regards, tom lane


Re: problems with table corruption continued

From
Hiroshi Inoue
Date:
Tom Lane wrote:
> 
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > Should the change be TransactionIdIsInProgress(tuple->t_cmin) ?
> 
> I'd be willing to do
>         if (tuple->t_cmin is my XID)
>                 do something;
>         Assert(!TransactionIdIsInProgress(tuple->t_cmin));
> if that makes you feel better. 

It's useless in hard to reproduce cases. 
I've thought but given up many times to propose this
change and my decision seems to have been right because
I see only *Assert* even after this issue.

> But anything that's scanning
> a table exclusive-locked by another transaction is broken.
> (BTW: contrib/pgstattuple is thus broken.  Will fix.)

It seems dangerous to me to rely only on developers'
carefulness. There are some places where relations
are opened with NoLock. We had better change them.
I once examined them but AFAIR there are few cases
when they are really opened with NoLock. In most
cases they are already opened previously with other
lock modes. I don't remember well if there was a
really dangerous(scan an relation opened with NoLock)
place.

regards,
Hiroshi Inoue


Re: problems with table corruption continued

From
Tom Lane
Date:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> I don't remember well if there was a really dangerous(scan an relation
> opened with NoLock) place.

I have just looked for that, and the only such place is in the new
contrib module pgstattuple.  This is clearly broken, since there is
nothing stopping someone from (eg) dropping the relation while
pgsstattuple is scanning it.  I think it should get AccessShareLock
on the relation.

The ri_triggers code has a lot of places that open things NoLock,
but it only looks into the relcache entry and doesn't try to scan
the relation.  Nonetheless that code bothers me; we could be using
an obsolete relcache entry if someone has just committed an ALTER
TABLE on the relation.  Some of the cases may be safe because a lock
is held higher up (eg, on the table from which the trigger was fired)
but I doubt they all are.
        regards, tom lane


Re: problems with table corruption continued

From
Stephan Szabo
Date:
On Tue, 18 Dec 2001, Tom Lane wrote:

> The ri_triggers code has a lot of places that open things NoLock,
> but it only looks into the relcache entry and doesn't try to scan
> the relation.  Nonetheless that code bothers me; we could be using
> an obsolete relcache entry if someone has just committed an ALTER
> TABLE on the relation.  Some of the cases may be safe because a lock
> is held higher up (eg, on the table from which the trigger was fired)
> but I doubt they all are.

Probably not, since it looks like that's being done for the other table of
the constraint (not the one on which the trigger was fired).



Re: problems with table corruption continued

From
Hiroshi Inoue
Date:
Stephan Szabo wrote:
> 
> On Tue, 18 Dec 2001, Tom Lane wrote:
> 
> > The ri_triggers code has a lot of places that open things NoLock,
> > but it only looks into the relcache entry and doesn't try to scan
> > the relation.  Nonetheless that code bothers me; we could be using
> > an obsolete relcache entry if someone has just committed an ALTER
> > TABLE on the relation.  Some of the cases may be safe because a lock
> > is held higher up (eg, on the table from which the trigger was fired)
> > but I doubt they all are.
> 
> Probably not, since it looks like that's being done for the other table of
> the constraint (not the one on which the trigger was fired).

If a lock is held already, acquiring an AccessShareLock
would cause no addtional conflict. I don't see any reason
to walk a tightrope with NoLock intentionally.

regards,
Hiroshi Inoue


Re: problems with table corruption continued

From
Stephan Szabo
Date:
On Wed, 19 Dec 2001, Hiroshi Inoue wrote:

> Stephan Szabo wrote:
> >
> > On Tue, 18 Dec 2001, Tom Lane wrote:
> >
> > > The ri_triggers code has a lot of places that open things NoLock,
> > > but it only looks into the relcache entry and doesn't try to scan
> > > the relation.  Nonetheless that code bothers me; we could be using
> > > an obsolete relcache entry if someone has just committed an ALTER
> > > TABLE on the relation.  Some of the cases may be safe because a lock
> > > is held higher up (eg, on the table from which the trigger was fired)
> > > but I doubt they all are.
> >
> > Probably not, since it looks like that's being done for the other table of
> > the constraint (not the one on which the trigger was fired).
>
> If a lock is held already, acquiring an AccessShareLock
> would cause no addtional conflict. I don't see any reason
> to walk a tightrope with NoLock intentionally.

I don't know why NoLock was used there, I was just pointing out that the
odds of a lock being held higher up is probably low.



Re: problems with table corruption continued

From
Tom Lane
Date:
Okay, I've applied the attached patch to tqual.c.  This brings all the
variants of HeapTupleSatisfies up to speed: I have compared them
carefully and they now all make equivalent series of tests on the tuple
status (though they may interpret the results differently).

I decided that Hiroshi had a good point about testing
TransactionIdIsInProgress, so the code now does that before risking
a change to t_infomask, even in the VACUUM cases.
        regards, tom lane


*** src/backend/utils/time/tqual.c.orig    Mon Oct 29 15:31:08 2001
--- src/backend/utils/time/tqual.c    Wed Dec 19 12:09:32 2001
***************
*** 31,37 **** /*  * HeapTupleSatisfiesItself  *        True iff heap tuple is valid for "itself."
!  *        "{it}self" means valid as of everything that's happened  *        in the current transaction, _including_
thecurrent command.  *  * Note:
 
--- 31,37 ---- /*  * HeapTupleSatisfiesItself  *        True iff heap tuple is valid for "itself."
!  *        "itself" means valid as of everything that's happened  *        in the current transaction, _including_ the
currentcommand.  *  * Note:
 
***************
*** 53,59 **** bool HeapTupleSatisfiesItself(HeapTupleHeader tuple) {
-      if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))     {         if (tuple->t_infomask & HEAP_XMIN_INVALID)
--- 53,58 ----
***************
*** 61,86 ****          if (tuple->t_infomask & HEAP_MOVED_OFF)         {
!             if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))
!             {
!                 tuple->t_infomask |= HEAP_XMIN_INVALID;                 return false;             }         }
elseif (tuple->t_infomask & HEAP_MOVED_IN)         {
 
!             if (!TransactionIdDidCommit((TransactionId) tuple->t_cmin))             {
!                 tuple->t_infomask |= HEAP_XMIN_INVALID;
!                 return false;             }         }         else if
(TransactionIdIsCurrentTransactionId(tuple->t_xmin))        {             if (tuple->t_infomask & HEAP_XMAX_INVALID)
/*xid invalid */                 return true;             if (tuple->t_infomask & HEAP_MARKED_FOR_UPDATE)
 return true;             return false;         }         else if (!TransactionIdDidCommit(tuple->t_xmin))
 
--- 60,102 ----          if (tuple->t_infomask & HEAP_MOVED_OFF)         {
!             if (TransactionIdIsCurrentTransactionId((TransactionId) tuple->t_cmin))                 return false;
+             if (!TransactionIdIsInProgress((TransactionId) tuple->t_cmin))
+             {
+                 if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))
+                 {
+                     tuple->t_infomask |= HEAP_XMIN_INVALID;
+                     return false;
+                 }
+                 tuple->t_infomask |= HEAP_XMIN_COMMITTED;             }         }         else if (tuple->t_infomask
&HEAP_MOVED_IN)         {
 
!             if (!TransactionIdIsCurrentTransactionId((TransactionId) tuple->t_cmin))             {
!                 if (TransactionIdIsInProgress((TransactionId) tuple->t_cmin))
!                     return false;
!                 if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))
!                     tuple->t_infomask |= HEAP_XMIN_COMMITTED;
!                 else
!                 {
!                     tuple->t_infomask |= HEAP_XMIN_INVALID;
!                     return false;
!                 }             }         }         else if (TransactionIdIsCurrentTransactionId(tuple->t_xmin))
{             if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid */                 return true;
 
+ 
+             Assert(TransactionIdIsCurrentTransactionId(tuple->t_xmax));
+              if (tuple->t_infomask & HEAP_MARKED_FOR_UPDATE)                 return true;
+              return false;         }         else if (!TransactionIdDidCommit(tuple->t_xmin))
***************
*** 89,97 ****                 tuple->t_infomask |= HEAP_XMIN_INVALID; /* aborted */             return false;
}
!         tuple->t_infomask |= HEAP_XMIN_COMMITTED;     }
!     /* the tuple was inserted validly */      if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid invalid or aborted
*/        return true;
 
--- 105,115 ----                 tuple->t_infomask |= HEAP_XMIN_INVALID; /* aborted */             return false;
}
 
!         else
!             tuple->t_infomask |= HEAP_XMIN_COMMITTED;     }
! 
!     /* by here, the inserting transaction has committed */      if (tuple->t_infomask & HEAP_XMAX_INVALID)    /* xid
invalidor aborted */         return true;
 
***************
*** 117,123 ****         return true;     } 
!     /* by here, deleting transaction has committed */     tuple->t_infomask |= HEAP_XMAX_COMMITTED;      if
(tuple->t_infomask& HEAP_MARKED_FOR_UPDATE)
 
--- 135,141 ----         return true;     } 
!     /* xmax transaction committed */     tuple->t_infomask |= HEAP_XMAX_COMMITTED;      if (tuple->t_infomask &
HEAP_MARKED_FOR_UPDATE)
***************
*** 177,194 ****          if (tuple->t_infomask & HEAP_MOVED_OFF)         {
!             if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))
!             {
!                 tuple->t_infomask |= HEAP_XMIN_INVALID;                 return false;             }         }
elseif (tuple->t_infomask & HEAP_MOVED_IN)         {
 
!             if (!TransactionIdDidCommit((TransactionId) tuple->t_cmin))             {
!                 tuple->t_infomask |= HEAP_XMIN_INVALID;
!                 return false;             }         }         else if
(TransactionIdIsCurrentTransactionId(tuple->t_xmin))
--- 195,225 ----          if (tuple->t_infomask & HEAP_MOVED_OFF)         {
!             if (TransactionIdIsCurrentTransactionId((TransactionId) tuple->t_cmin))                 return false;
+             if (!TransactionIdIsInProgress((TransactionId) tuple->t_cmin))
+             {
+                 if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))
+                 {
+                     tuple->t_infomask |= HEAP_XMIN_INVALID;
+                     return false;
+                 }
+                 tuple->t_infomask |= HEAP_XMIN_COMMITTED;             }         }         else if (tuple->t_infomask
&HEAP_MOVED_IN)         {
 
!             if (!TransactionIdIsCurrentTransactionId((TransactionId) tuple->t_cmin))             {
!                 if (TransactionIdIsInProgress((TransactionId) tuple->t_cmin))
!                     return false;
!                 if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))
!                     tuple->t_infomask |= HEAP_XMIN_COMMITTED;
!                 else
!                 {
!                     tuple->t_infomask |= HEAP_XMIN_INVALID;
!                     return false;
!                 }             }         }         else if (TransactionIdIsCurrentTransactionId(tuple->t_xmin))
***************
*** 215,221 ****                 tuple->t_infomask |= HEAP_XMIN_INVALID; /* aborted */             return false;
}
 
!         tuple->t_infomask |= HEAP_XMIN_COMMITTED;     }      /* by here, the inserting transaction has committed */
--- 246,253 ----                 tuple->t_infomask |= HEAP_XMIN_INVALID; /* aborted */             return false;
}
 
!         else
!             tuple->t_infomask |= HEAP_XMIN_COMMITTED;     }      /* by here, the inserting transaction has committed
*/
***************
*** 257,348 **** }  int
! HeapTupleSatisfiesUpdate(HeapTuple tuple) {
!     HeapTupleHeader th = tuple->t_data;      if (AMI_OVERRIDE)         return HeapTupleMayBeUpdated; 
!     if (!(th->t_infomask & HEAP_XMIN_COMMITTED))     {
!         if (th->t_infomask & HEAP_XMIN_INVALID) /* xid invalid or aborted */             return HeapTupleInvisible; 
!         if (th->t_infomask & HEAP_MOVED_OFF)         {
!             if (TransactionIdDidCommit((TransactionId) th->t_cmin))
!             {
!                 th->t_infomask |= HEAP_XMIN_INVALID;                 return HeapTupleInvisible;             }
}
!         else if (th->t_infomask & HEAP_MOVED_IN)         {
!             if (!TransactionIdDidCommit((TransactionId) th->t_cmin))             {
!                 th->t_infomask |= HEAP_XMIN_INVALID;
!                 return HeapTupleInvisible;             }         }
!         else if (TransactionIdIsCurrentTransactionId(th->t_xmin))         {
!             if (CommandIdGEScanCommandId(th->t_cmin))                 return HeapTupleInvisible;        /* inserted
afterscan                                                  * started */ 
 
!             if (th->t_infomask & HEAP_XMAX_INVALID)        /* xid invalid */                 return
HeapTupleMayBeUpdated;
 
!             Assert(TransactionIdIsCurrentTransactionId(th->t_xmax)); 
!             if (th->t_infomask & HEAP_MARKED_FOR_UPDATE)                 return HeapTupleMayBeUpdated; 
!             if (CommandIdGEScanCommandId(th->t_cmax))                 return HeapTupleSelfUpdated;    /* updated
afterscan                                                  * started */             else                 return
HeapTupleInvisible;       /* updated before scan                                                  * started */
}
!         else if (!TransactionIdDidCommit(th->t_xmin))         {
!             if (TransactionIdDidAbort(th->t_xmin))
!                 th->t_infomask |= HEAP_XMIN_INVALID;    /* aborted */             return HeapTupleInvisible;
}
!         th->t_infomask |= HEAP_XMIN_COMMITTED;     }      /* by here, the inserting transaction has committed */ 
!     if (th->t_infomask & HEAP_XMAX_INVALID)        /* xid invalid or aborted */         return HeapTupleMayBeUpdated;

!     if (th->t_infomask & HEAP_XMAX_COMMITTED)     {
!         if (th->t_infomask & HEAP_MARKED_FOR_UPDATE)             return HeapTupleMayBeUpdated;         return
HeapTupleUpdated;   /* updated by other */     } 
 
!     if (TransactionIdIsCurrentTransactionId(th->t_xmax))     {
!         if (th->t_infomask & HEAP_MARKED_FOR_UPDATE)             return HeapTupleMayBeUpdated;
!         if (CommandIdGEScanCommandId(th->t_cmax))             return HeapTupleSelfUpdated;        /* updated after
scan                                                 * started */         else             return HeapTupleInvisible;
/* updated before scan started */     } 
 
!     if (!TransactionIdDidCommit(th->t_xmax))     {
!         if (TransactionIdDidAbort(th->t_xmax))         {
!             th->t_infomask |= HEAP_XMAX_INVALID;        /* aborted */             return HeapTupleMayBeUpdated;
 }         /* running xact */
 
--- 289,394 ---- }  int
! HeapTupleSatisfiesUpdate(HeapTuple htuple) {
!     HeapTupleHeader tuple = htuple->t_data;      if (AMI_OVERRIDE)         return HeapTupleMayBeUpdated; 
!     if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))     {
!         if (tuple->t_infomask & HEAP_XMIN_INVALID)             return HeapTupleInvisible; 
!         if (tuple->t_infomask & HEAP_MOVED_OFF)         {
!             if (TransactionIdIsCurrentTransactionId((TransactionId) tuple->t_cmin))                 return
HeapTupleInvisible;
+             if (!TransactionIdIsInProgress((TransactionId) tuple->t_cmin))
+             {
+                 if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))
+                 {
+                     tuple->t_infomask |= HEAP_XMIN_INVALID;
+                     return HeapTupleInvisible;
+                 }
+                 tuple->t_infomask |= HEAP_XMIN_COMMITTED;             }         }
!         else if (tuple->t_infomask & HEAP_MOVED_IN)         {
!             if (!TransactionIdIsCurrentTransactionId((TransactionId) tuple->t_cmin))             {
!                 if (TransactionIdIsInProgress((TransactionId) tuple->t_cmin))
!                     return HeapTupleInvisible;
!                 if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))
!                     tuple->t_infomask |= HEAP_XMIN_COMMITTED;
!                 else
!                 {
!                     tuple->t_infomask |= HEAP_XMIN_INVALID;
!                     return HeapTupleInvisible;
!                 }             }         }
!         else if (TransactionIdIsCurrentTransactionId(tuple->t_xmin))         {
!             if (CommandIdGEScanCommandId(tuple->t_cmin))                 return HeapTupleInvisible;        /*
insertedafter scan                                                  * started */ 
 
!             if (tuple->t_infomask & HEAP_XMAX_INVALID)        /* xid invalid */                 return
HeapTupleMayBeUpdated;
 
!             Assert(TransactionIdIsCurrentTransactionId(tuple->t_xmax)); 
!             if (tuple->t_infomask & HEAP_MARKED_FOR_UPDATE)                 return HeapTupleMayBeUpdated; 
!             if (CommandIdGEScanCommandId(tuple->t_cmax))                 return HeapTupleSelfUpdated;    /* updated
afterscan                                                  * started */             else                 return
HeapTupleInvisible;       /* updated before scan                                                  * started */
}
!         else if (!TransactionIdDidCommit(tuple->t_xmin))         {
!             if (TransactionIdDidAbort(tuple->t_xmin))
!                 tuple->t_infomask |= HEAP_XMIN_INVALID;    /* aborted */             return HeapTupleInvisible;
 }
 
!         else
!             tuple->t_infomask |= HEAP_XMIN_COMMITTED;     }      /* by here, the inserting transaction has committed
*/
 
!     if (tuple->t_infomask & HEAP_XMAX_INVALID)        /* xid invalid or aborted */         return
HeapTupleMayBeUpdated;
 
!     if (tuple->t_infomask & HEAP_XMAX_COMMITTED)     {
!         if (tuple->t_infomask & HEAP_MARKED_FOR_UPDATE)             return HeapTupleMayBeUpdated;         return
HeapTupleUpdated;   /* updated by other */     } 
 
!     if (TransactionIdIsCurrentTransactionId(tuple->t_xmax))     {
!         if (tuple->t_infomask & HEAP_MARKED_FOR_UPDATE)             return HeapTupleMayBeUpdated;
!         if (CommandIdGEScanCommandId(tuple->t_cmax))             return HeapTupleSelfUpdated;        /* updated after
scan                                                 * started */         else             return HeapTupleInvisible;
/* updated before scan started */     } 
 
!     if (!TransactionIdDidCommit(tuple->t_xmax))     {
!         if (TransactionIdDidAbort(tuple->t_xmax))         {
!             tuple->t_infomask |= HEAP_XMAX_INVALID;        /* aborted */             return HeapTupleMayBeUpdated;
    }         /* running xact */
 
***************
*** 350,358 ****     }      /* xmax transaction committed */
!     th->t_infomask |= HEAP_XMAX_COMMITTED; 
!     if (th->t_infomask & HEAP_MARKED_FOR_UPDATE)         return HeapTupleMayBeUpdated;      return HeapTupleUpdated;
 /* updated by other */
 
--- 396,404 ----     }      /* xmax transaction committed */
!     tuple->t_infomask |= HEAP_XMAX_COMMITTED; 
!     if (tuple->t_infomask & HEAP_MARKED_FOR_UPDATE)         return HeapTupleMayBeUpdated;      return
HeapTupleUpdated;   /* updated by other */
 
***************
*** 374,396 ****          if (tuple->t_infomask & HEAP_MOVED_OFF)         {
-             /*
-              * HeapTupleSatisfiesDirty is used by unique btree-s and so
-              * may be used while vacuuming.
-              */             if (TransactionIdIsCurrentTransactionId((TransactionId) tuple->t_cmin))
returnfalse;
 
!             if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))             {
!                 tuple->t_infomask |= HEAP_XMIN_INVALID;
!                 return false;             }
-             tuple->t_infomask |= HEAP_XMIN_COMMITTED;         }         else if (tuple->t_infomask & HEAP_MOVED_IN)
     {             if (!TransactionIdIsCurrentTransactionId((TransactionId) tuple->t_cmin))             {
 if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))                     tuple->t_infomask |=
HEAP_XMIN_COMMITTED;                else
 
--- 420,443 ----          if (tuple->t_infomask & HEAP_MOVED_OFF)         {             if
(TransactionIdIsCurrentTransactionId((TransactionId)tuple->t_cmin))                 return false;
 
!             if (!TransactionIdIsInProgress((TransactionId) tuple->t_cmin))             {
!                 if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))
!                 {
!                     tuple->t_infomask |= HEAP_XMIN_INVALID;
!                     return false;
!                 }
!                 tuple->t_infomask |= HEAP_XMIN_COMMITTED;             }         }         else if (tuple->t_infomask
&HEAP_MOVED_IN)         {             if (!TransactionIdIsCurrentTransactionId((TransactionId) tuple->t_cmin))
  {
 
+                 if (TransactionIdIsInProgress((TransactionId) tuple->t_cmin))
+                     return false;                 if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))
          tuple->t_infomask |= HEAP_XMIN_COMMITTED;                 else
 
***************
*** 416,425 ****         {             if (TransactionIdDidAbort(tuple->t_xmin))             {
!                 tuple->t_infomask |= HEAP_XMIN_INVALID; /* aborted */                 return false;             }
       SnapshotDirty->xmin = tuple->t_xmin;             return true;        /* in insertion by other */         }
 else
 
--- 463,473 ----         {             if (TransactionIdDidAbort(tuple->t_xmin))             {
!                 tuple->t_infomask |= HEAP_XMIN_INVALID;                 return false;             }
SnapshotDirty->xmin= tuple->t_xmin;
 
+             /* XXX shouldn't we fall through to look at xmax? */             return true;        /* in insertion by
other*/         }         else
 
***************
*** 474,479 ****
--- 522,528 ----     if (AMI_OVERRIDE)         return true; 
+     /* XXX this is horribly ugly: */     if (ReferentialIntegritySnapshotOverride)         return
HeapTupleSatisfiesNow(tuple);
 
***************
*** 484,501 ****          if (tuple->t_infomask & HEAP_MOVED_OFF)         {
!             if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))
!             {
!                 tuple->t_infomask |= HEAP_XMIN_INVALID;                 return false;             }         }
elseif (tuple->t_infomask & HEAP_MOVED_IN)         {
 
!             if (!TransactionIdDidCommit((TransactionId) tuple->t_cmin))             {
!                 tuple->t_infomask |= HEAP_XMIN_INVALID;
!                 return false;             }         }         else if
(TransactionIdIsCurrentTransactionId(tuple->t_xmin))
--- 533,563 ----          if (tuple->t_infomask & HEAP_MOVED_OFF)         {
!             if (TransactionIdIsCurrentTransactionId((TransactionId) tuple->t_cmin))                 return false;
+             if (!TransactionIdIsInProgress((TransactionId) tuple->t_cmin))
+             {
+                 if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))
+                 {
+                     tuple->t_infomask |= HEAP_XMIN_INVALID;
+                     return false;
+                 }
+                 tuple->t_infomask |= HEAP_XMIN_COMMITTED;             }         }         else if (tuple->t_infomask
&HEAP_MOVED_IN)         {
 
!             if (!TransactionIdIsCurrentTransactionId((TransactionId) tuple->t_cmin))             {
!                 if (TransactionIdIsInProgress((TransactionId) tuple->t_cmin))
!                     return false;
!                 if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))
!                     tuple->t_infomask |= HEAP_XMIN_COMMITTED;
!                 else
!                 {
!                     tuple->t_infomask |= HEAP_XMIN_INVALID;
!                     return false;
!                 }             }         }         else if (TransactionIdIsCurrentTransactionId(tuple->t_xmin))
***************
*** 519,528 ****         else if (!TransactionIdDidCommit(tuple->t_xmin))         {             if
(TransactionIdDidAbort(tuple->t_xmin))
!                 tuple->t_infomask |= HEAP_XMIN_INVALID; /* aborted */             return false;         }
!         tuple->t_infomask |= HEAP_XMIN_COMMITTED;     }      /*
--- 581,591 ----         else if (!TransactionIdDidCommit(tuple->t_xmin))         {             if
(TransactionIdDidAbort(tuple->t_xmin))
!                 tuple->t_infomask |= HEAP_XMIN_INVALID;             return false;         }
!         else
!             tuple->t_infomask |= HEAP_XMIN_COMMITTED;     }      /*
***************
*** 623,646 ****             return HEAPTUPLE_DEAD;         else if (tuple->t_infomask & HEAP_MOVED_OFF)         {
      if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))             {                 tuple->t_infomask |=
HEAP_XMIN_INVALID;                return HEAPTUPLE_DEAD;             }
 
-             /* Assume we can only get here if previous VACUUM aborted, */
-             /* ie, it couldn't still be in progress */             tuple->t_infomask |= HEAP_XMIN_COMMITTED;
}        else if (tuple->t_infomask & HEAP_MOVED_IN)         {
 
!             if (!TransactionIdDidCommit((TransactionId) tuple->t_cmin))             {
-                 /* Assume we can only get here if previous VACUUM aborted */                 tuple->t_infomask |=
HEAP_XMIN_INVALID;                return HEAPTUPLE_DEAD;             }
 
-             tuple->t_infomask |= HEAP_XMIN_COMMITTED;         }         else if
(TransactionIdIsInProgress(tuple->t_xmin))            return HEAPTUPLE_INSERT_IN_PROGRESS;
 
--- 686,715 ----             return HEAPTUPLE_DEAD;         else if (tuple->t_infomask & HEAP_MOVED_OFF)         {
+             if (TransactionIdIsCurrentTransactionId((TransactionId) tuple->t_cmin))
+                 return HEAPTUPLE_DELETE_IN_PROGRESS;
+             if (TransactionIdIsInProgress((TransactionId) tuple->t_cmin))
+                 return HEAPTUPLE_DELETE_IN_PROGRESS;             if (TransactionIdDidCommit((TransactionId)
tuple->t_cmin))            {                 tuple->t_infomask |= HEAP_XMIN_INVALID;                 return
HEAPTUPLE_DEAD;            }             tuple->t_infomask |= HEAP_XMIN_COMMITTED;         }         else if
(tuple->t_infomask& HEAP_MOVED_IN)         {
 
!             if (TransactionIdIsCurrentTransactionId((TransactionId) tuple->t_cmin))
!                 return HEAPTUPLE_INSERT_IN_PROGRESS;
!             if (TransactionIdIsInProgress((TransactionId) tuple->t_cmin))
!                 return HEAPTUPLE_INSERT_IN_PROGRESS;
!             if (TransactionIdDidCommit((TransactionId) tuple->t_cmin))
!                 tuple->t_infomask |= HEAP_XMIN_COMMITTED;
!             else             {                 tuple->t_infomask |= HEAP_XMIN_INVALID;                 return
HEAPTUPLE_DEAD;            }         }         else if (TransactionIdIsInProgress(tuple->t_xmin))             return
HEAPTUPLE_INSERT_IN_PROGRESS;
***************
*** 671,676 ****
--- 740,751 ----     if (tuple->t_infomask & HEAP_XMAX_INVALID)         return HEAPTUPLE_LIVE; 
+     if (tuple->t_infomask & HEAP_MARKED_FOR_UPDATE)
+     {
+         /* "deleting" xact really only marked it for update */
+         return HEAPTUPLE_LIVE;
+     }
+      if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))     {         if (TransactionIdIsInProgress(tuple->t_xmax))
***************
*** 698,709 ****     /*      * Deleter committed, but check special cases.      */
- 
-     if (tuple->t_infomask & HEAP_MARKED_FOR_UPDATE)
-     {
-         /* "deleting" xact really only marked it for update */
-         return HEAPTUPLE_LIVE;
-     }      if (TransactionIdEquals(tuple->t_xmin, tuple->t_xmax))     {
--- 773,778 ----