Re: CREATE INDEX and HOT - revised design - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: CREATE INDEX and HOT - revised design |
Date | |
Msg-id | 1174636236.3826.20.camel@localhost.localdomain Whole thread Raw |
In response to | Re: CREATE INDEX and HOT - revised design ("Pavan Deolasee" <pavan.deolasee@gmail.com>) |
Responses |
Re: CREATE INDEX and HOT - revised design
Re: CREATE INDEX and HOT - revised design |
List | pgsql-hackers |
Ühel kenal päeval, N, 2007-03-22 kell 23:30, kirjutas Pavan Deolasee: > > > On 3/22/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > > When CREATE INDEX starts, it acquires ShareLock on the > table. > > At this point we may have one or more HOT-update chains in > the > > table. Tuples in this chain may be visible to one or more > running > > transactions. The fact that we have ShareLock on the table > means > > that all tuples in the chain except the one at the head > either > > RECENTLY_DEAD or were UPDATEd by the same transaction > > that is now running CREATE INDEX. > > This logic seems pretty questionable to me in view of the > recent VACUUM > FULL bug fixes. We now know that an update chain can contain > tuples that > appear DEAD but are later than ones that are > RECENTLY_DEAD. How are > you defining a HOT chain exactly --- will it be just a part of > an > update chain that consists only of contiguous tuples that > appear live or > RECENTLY_DEAD to the CREATE INDEX transaction? > > > HOT-chain is something which contains contiguous tuples that share > the same index keys for all the existing indexes and share a single > index entry. It does not really matter if there are RECENTLY_DEAD > tuples that appear before a DEAD tuple because from VACUUM FULL > bug fix we know that they are DEAD too. > > My argument is that its enough to index only the LIVE tuple which > is at the end of the chain if we don't use the new index for queries > in transactions which were started before CREATE INDEX. You mean, which were started before CREATE INDEX completes ? Just wait for all concurrent transactions to complete before marking the index as usable in plans. > I am > proposing to do that by storing an xid in the pg_index row. I don't think it is a good idea to store xid's anywhere but in xmin/xmax columns, as doing so would cause nasty xid wraparound problems. Instead you should wait, after completeing the index , for all concurrent transactions to end before you mark the index as "usable for queries", similar to the way CREATE INDEX CONCURRENTLY does. > A > special case is where a tuple is UPDATEd multiple times by > the same transaction which is also creating the index, in which case > there are more than one LIVE versions of the tuple. But again > we are safe by indexing only the latest version because all other > versions would be invisible (even to us) once CREATE INDEX commits. > > > > In fact, the serializable transactions started before CREATE > INDEX > > can not anyway see the index so all this is done to handle > > read-committed transactions. > > You are laboring under an illusion that system catalog > accesses are MVCC. > SnapshotNow does not behave that way: the system can see the > new index > as soon as it's committed. (It had better, since it has to > start > updating the index immediately, whether it's safe to scan it > or not.) > I'm not sure whether that's fundamental to your argument or > not, but > it's certainly wrong. > > > Oh, thanks for pointing that out. But thats certainly not fundamental > to the argument as you probably already guessed. The xid still > controls the usage of index for query planning, somewhat similar to > "isindvalid" flag for CREATE INDEX CONCURRENTLY. Xids are unstable and will come back to bite you after 2G transactions. Why not just use the "isindvalid" flag ? -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
pgsql-hackers by date: