Re: [SQL] Duplicate tuples with unique index - Mailing list pgsql-sql

From Tom Lane
Subject Re: [SQL] Duplicate tuples with unique index
Date
Msg-id 16091.948865123@sss.pgh.pa.us
Whole thread Raw
In response to Re: [SQL] Duplicate tuples with unique index  (Palle Girgensohn <girgen@partitur.se>)
Responses Re: [SQL] Duplicate tuples with unique index  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-sql
Palle Girgensohn <girgen@partitur.se> writes:
>>>> Nope. pg_upgrade was "disabled in this release because the
>>>> internal blahblahblah disk layout changed from previous versions".
>> 
>> Sorry, you have to edit the script to reenable it.

> That's OK. I didn't bother to check. the dump/restore sequence was pretty quick.
> The layout hasn't really changed since 6.5, right?

No; the disk layout is the same.  There is room for trouble nonetheless.
There is some doubt about whether pg_upgrade will work right under MVCC
semantics: in MVCC, whether a tuple stored in a table is considered
valid or not will depend on whether its creator transaction is marked
committed in pg_log (and, if it is marked with a deletor transaction,
whether the deletor is NOT committed).

pg_upgrade tries to deal with this by copying the old database
installation's pg_log into the new.  OK, that ensures that all the
user-table tuples preserve their commit state; but what about tuples
in the system tables?  The idea behind pg_upgrade is to paste together
a set of user tables with a set of system tables --- but unless the
transaction number history of the user tables is exactly the same as
the transaction number history of the system tables, there is clearly
a risk that committed tuples will suddenly be considered not-committed
or vice versa.  And ordinarily those histories will *not* be identical.

The only reason pg_upgrade has any chance at all of working is that
for efficiency reasons we don't want to go back and consult pg_log
for every single tuple we read.  So, there are really six states of
a tuple on disk, which may be described as:1. I was created by transaction N, but I dunno if it committed2. I was
createdby a transaction that definitely committed,   so I'm good; no need to look at pg_log3. I was created by a
transactionthat definitely aborted,   so I'm dead; no need to look at pg_log4. I was deleted by transaction N, but I
dunnoif it committed5. I was deleted by a transaction that definitely committed,   so I'm dead; no need to look at
pg_log6.I was deleted by a transaction that definitely aborted,   so I'm still good; no need to look at pg_log
 
(I'm fuzzing over some fine points that arise when multiple transactions
try to delete the same tuple, but this level of detail will do for now.)
Whenever any backend examines a tuple in state 1 or 4, it will consult
pg_log to discover the state of the source transaction.  If the source
transaction has now committed or aborted, the tuple will be rewritten
with the new state (2,3,5,6 as appropriate) so that future readers of
the tuple don't have to look at pg_log again.

Now states 2,3,5,6 do not depend on what pg_log says.  Therefore,
pg_upgrade's copy of an old pg_log file into a new database will
work *if and only if* all the tuples in the system tables are in
one of these states, and none of them are in states 1 or 4.  We
can survive with user-table tuples that are in the uncertain states,
because the associated pg_log info will be copied over with them.
We cannot afford to have any uncertainly-committed system-table
tuples, because we will overwrite the pg_log data about their status.

Bruce thinks that the pg_upgrade script will ensure that the system-
table tuples are all in frozen states (by VACUUMing them).  I don't
trust it worth a dime, myself.  Maybe it will work, but it hasn't been
proven in the field.  So, if you'd like to try it, by all means do so
--- but make a pg_dump backup first!  And let us know whether you have
problems or not!
        regards, tom lane


pgsql-sql by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [SQL] Duplicate tuples with unique index
Next
From: Bruce Momjian
Date:
Subject: Re: [SQL] Duplicate tuples with unique index