Thread: Indexing dead tuples
During CREATE INDEX we include all tuples, even if they are already dead when we build an index. What purpose does this serve? A pre-existing transaction can't see the index, so there is no danger that it can use the index and unknowingly avoid touching a valid row. (If it *can* see the index, is there some benefit in that behaviour?) I suggest that we should not include dead rows in an index when it is created, to increase the speed of index creation in certain cases. This also simplifies a number of the index build routines, which actually contain special logic for handling dead rows. Comments? Best Regards, Simon Riggs
On 2005-08-31, Simon Riggs <simon@2ndquadrant.com> wrote: > During CREATE INDEX we include all tuples, even if they are already dead > when we build an index. > > What purpose does this serve? > > A pre-existing transaction can't see the index, Yes, it can; the catalog is read in SnapshotNow rather than in the query snapshot. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
Andrew - Supernews <andrew+nonews@supernews.com> writes: > On 2005-08-31, Simon Riggs <simon@2ndquadrant.com> wrote: >> During CREATE INDEX we include all tuples, even if they are already dead >> when we build an index. >> >> What purpose does this serve? >> >> A pre-existing transaction can't see the index, > Yes, it can; the catalog is read in SnapshotNow rather than in the query > snapshot. In fact, it had better be able to, since once the CREATE INDEX commits, pre-existing xacts are responsible to insert index entries for anything they insert into the table. regards, tom lane
On Wed, 2005-08-31 at 19:06 -0400, Tom Lane wrote: > Andrew - Supernews <andrew+nonews@supernews.com> writes: > > On 2005-08-31, Simon Riggs <simon@2ndquadrant.com> wrote: > >> During CREATE INDEX we include all tuples, even if they are already dead > >> when we build an index. > >> > >> What purpose does this serve? > >> > >> A pre-existing transaction can't see the index, > > > Yes, it can; the catalog is read in SnapshotNow rather than in the query > > snapshot. Thanks Andrew, didn't see your post to me. I suspected that was the case, but wasn't sure why... though Tom explains this. > In fact, it had better be able to, since once the CREATE INDEX commits, > pre-existing xacts are responsible to insert index entries for anything > they insert into the table. So would it be possible to have CREATE INDEX call GetOldestXmin, just as VACUUM does, so it can work out which rows to ignore? The overhead of that is fairly low and could actually speed up many index builds by reducing the number of rows needing to be sorted/manipulated. (The call to GetOldestXmin would only scan procs for the current databaseid). Perhaps this could apply only for larger tables, where the sort cost is likely to be pretty high? That way having the CREATE INDEX ignore dead tuples would always be cheaper than doing a VACUUM + CREATE INDEX. Why do two scans when we can do one? Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > On Wed, 2005-08-31 at 19:06 -0400, Tom Lane wrote: >> In fact, it had better be able to, since once the CREATE INDEX commits, >> pre-existing xacts are responsible to insert index entries for anything >> they insert into the table. > So would it be possible to have CREATE INDEX call GetOldestXmin, just as > VACUUM does, so it can work out which rows to ignore? It already does. See IndexBuildHeapScan(). regards, tom lane