Re: invalid tid errors in latest 7.3.4 stable. - Mailing list pgsql-hackers

From Kevin Brown
Subject Re: invalid tid errors in latest 7.3.4 stable.
Date
Msg-id 20030927100021.GC6073@filer
Whole thread Raw
In response to Re: invalid tid errors in latest 7.3.4 stable.  (Stephan Szabo <sszabo@megazone.bigpanda.com>)
List pgsql-hackers
Stephan Szabo wrote:
> > The problem in the scenario you described should be solved if we mark any
> > rows that are selected with the "for update" option (either implicitly,
> > as with RI triggers, or explicitly) as having been modified by the
> > selecting transaction, the equivalent of (in the case of T2*) "update pk
> > set id=id where id=1" but without firing any of the ON MODIFY triggers.
> > A rollback would, of course, not have any effect on the data in those
> > rows since there weren't any real changes.  This "fix" won't work,
> > of course, if the serialization code is so broken that it doesn't work
> > properly even in the face of updates (something I'd find hard to believe).
> 
> That fixes the case above which will fix the ri constraints for right now
> (although they really have to stop using for update eventually), but
> doesn't really solve the serialization problem since it still exists
> AFAICS without for update. Without the for update, you still have T2*
> getting 1 row and T1* getting 0 which can't happen for either ordering of
> the transactions.  It gets worse if that select as a holder at the
> beginning of T1 was say select * from fk where id=1 because SQL tells us
> that the later select can't see a different set of rows from the earlier
> one, so T2 shouldn't be allowed to commit before T1.

That's what I was afraid of, and what I figured serialization really
meant: what you see is a snapshot of the database as it was at
transaction start time.

I can't think of any good way to implement proper serialization without
destroying a serialized transaction's read performance, because it
seems to me that the only way to properly implement serialization is to
somehow record on-disk all the rows a serializable transaction visits,
which means that a serializable transaction is going to be *much* slower
than a read-committed transaction.  You have to mark such rows because
other transactions (even read-committed transactions) have to abort
if they attempt to modify such a row, and the list of such rows can
grow far too large to record it in shared memory.  Worse, you have to
maintain a dynamic list of serializable transactions that have seen the
row and remove a transaction from the list once it commits or rolls back,
because the only time a transaction needs to care about this when changing
a row is when there's a currently-running transaction that's seen it.

We could use the MVCC mechanism to implement it: duplicate the row being
examined and assign the reader's transaction ID to the duplicate just
as if it had modified the row.  But you also have to somehow flag the
duplicate as being there as a result of a serializable read, so that
other serializable transactions that try to modify the row after the one
in question has committed won't themselves throw a serialization error
(because without the flag they'd think they were attempting to read a
row that had been modified by someone else during their lifetime).


The other situation you have to deal with is when you have two
transactions, 1 and 2, that start and commit in that order but which
have overlapping times of execution.  If transaction 1 modifies a row
after transaction 2 starts, then commits before transaction 2 reads it,
transaction 2 has to be able to detect that and throw a serialization
error.

The way around that problem is to assign a commit ID to each transaction
at commit time.  The commit ID is just the transaction ID that will be
assigned to the next transaction that runs.  It might make sense for
assignment of commit IDs to increment the transaction ID counter the
way assignment of a transaction ID does.  Anyway, if a serializable
transaction reads a row that has a commit ID greater than the reader's
transaction ID, it throws a serialization error.  It's probably sufficient
to store the commit ID along with the transaction ID of the committer
in the transaction log as well as in shared memory, so that the commit
ID can be quickly looked up from the transaction ID.



Maybe there's a better way around all this, but I certainly can't think
of one at the moment.  :-(



-- 
Kevin Brown                          kevin@sysexperts.com


pgsql-hackers by date:

Previous
From: Richard Huxton
Date:
Subject: Re: 2-phase commit
Next
From: Peter Eisentraut
Date:
Subject: Re: initdb failure (was Re: [GENERAL] sequence's plpgsql)