Thread: Indexing dead tuples

Indexing dead tuples

From
Simon Riggs
Date:
During CREATE INDEX we include all tuples, even if they are already dead
when we build an index.

What purpose does this serve?

A pre-existing transaction can't see the index, so there is no danger
that it can use the index and unknowingly avoid touching a valid row.
(If it *can* see the index, is there some benefit in that behaviour?)

I suggest that we should not include dead rows in an index when it is
created, to increase the speed of index creation in certain cases. This
also simplifies a number of the index build routines, which actually
contain special logic for handling dead rows.

Comments?

Best Regards, Simon Riggs




Re: Indexing dead tuples

From
Andrew - Supernews
Date:
On 2005-08-31, Simon Riggs <simon@2ndquadrant.com> wrote:
> During CREATE INDEX we include all tuples, even if they are already dead
> when we build an index.
>
> What purpose does this serve?
>
> A pre-existing transaction can't see the index,

Yes, it can; the catalog is read in SnapshotNow rather than in the query
snapshot.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: Indexing dead tuples

From
Tom Lane
Date:
Andrew - Supernews <andrew+nonews@supernews.com> writes:
> On 2005-08-31, Simon Riggs <simon@2ndquadrant.com> wrote:
>> During CREATE INDEX we include all tuples, even if they are already dead
>> when we build an index.
>> 
>> What purpose does this serve?
>> 
>> A pre-existing transaction can't see the index,

> Yes, it can; the catalog is read in SnapshotNow rather than in the query
> snapshot.

In fact, it had better be able to, since once the CREATE INDEX commits,
pre-existing xacts are responsible to insert index entries for anything
they insert into the table.
        regards, tom lane


Re: Indexing dead tuples

From
Simon Riggs
Date:
On Wed, 2005-08-31 at 19:06 -0400, Tom Lane wrote:
> Andrew - Supernews <andrew+nonews@supernews.com> writes:
> > On 2005-08-31, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> During CREATE INDEX we include all tuples, even if they are already dead
> >> when we build an index.
> >> 
> >> What purpose does this serve?
> >> 
> >> A pre-existing transaction can't see the index,
> 
> > Yes, it can; the catalog is read in SnapshotNow rather than in the query
> > snapshot.

Thanks Andrew, didn't see your post to me. I suspected that was the
case, but wasn't sure why... though Tom explains this.

> In fact, it had better be able to, since once the CREATE INDEX commits,
> pre-existing xacts are responsible to insert index entries for anything
> they insert into the table.

So would it be possible to have CREATE INDEX call GetOldestXmin, just as
VACUUM does, so it can work out which rows to ignore? The overhead of
that is fairly low and could actually speed up many index builds by
reducing the number of rows needing to be sorted/manipulated. (The call
to GetOldestXmin would only scan procs for the current databaseid).

Perhaps this could apply only for larger tables, where the sort cost is
likely to be pretty high? That way having the CREATE INDEX ignore dead
tuples would always be cheaper than doing a VACUUM + CREATE INDEX. Why
do two scans when we can do one?

Best Regards, Simon Riggs






Re: Indexing dead tuples

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Wed, 2005-08-31 at 19:06 -0400, Tom Lane wrote:
>> In fact, it had better be able to, since once the CREATE INDEX commits,
>> pre-existing xacts are responsible to insert index entries for anything
>> they insert into the table.

> So would it be possible to have CREATE INDEX call GetOldestXmin, just as
> VACUUM does, so it can work out which rows to ignore?

It already does.  See IndexBuildHeapScan().
        regards, tom lane