Re: proposed TODO: non-locking CREATE INDEX / REINDEX - Mailing list pgsql-hackers

From Tom Lane
Subject Re: proposed TODO: non-locking CREATE INDEX / REINDEX
Date
Msg-id 11658.1118507132@sss.pgh.pa.us
Whole thread Raw
In response to Re: proposed TODO: non-locking CREATE INDEX / REINDEX  (Hannu Krosing <hannu@tm.ee>)
Responses Re: proposed TODO: non-locking CREATE INDEX / REINDEX
List pgsql-hackers
Hannu Krosing <hannu@tm.ee> writes:
> On R, 2005-06-10 at 12:12 -0400, Tom Lane wrote:
>> Have you forgotten Zeno's paradox?  I don't see a
>> reason to assume the indexer can *ever* catch up --- it's entirely
>> likely that adding a new unindexed row is faster than adding an index
>> entry for it.

> The same is true of doing a lazy vacuum over a table where tuples are
> constantly added -  there is no guarantee that the vacuum will ever
> finish.

No, there definitely is such a guarantee: the vacuum only scans as many
blocks as were in the relation when it started.  The vacuum need not
worry about tuples added after it starts, because it couldn't delete
them under MVCC rules.  And there is no logical-consistency requirement
for it to promise to scan every tuple, anyway.

>> This implies that you are hoping for an asynchronous change in the
>> behavior of other processes, which you are not going to get without
>> taking out locks, which is what you wanted to avoid.

> One way to avoid locking, is to allow the "add tuple to index" routine
> silently succeed if the index already has it.

... thereby silently breaking unique-index checking, you mean?

> Then we can announce the change in behaviour to running backends, wait
> for all backends to confirm they have learned about it and only then
> record CTID_INDEX_MAX.

You can't wait for confirmation from all other backends without
expecting to create deadlock issues left and right.  And what is it you
are waiting for, anyway?  For a backend to confirm that it is prepared
to insert into an index that it can't even see yet (because the CREATE
INDEX hasn't committed)?  In the case of REINDEX, are you expecting
that backends will be prepared to insert into *both* old and new
versions of the index?  They'd better, since there's still every
prospect of the REINDEX failing and rolling back, leaving the old
version as the live copy.  Speaking of rollback, what happens when
those backends try to insert into the new copy just after the REINDEX
has failed and rolled back and deleted the new copy?  Or equivalently,
what happens when they are still trying to insert into the old copy
just after the REINDEX commits and deletes that one?

The complexity and fragility of what you are proposing vastly outweighs
any potential benefit, even if it could be made to work at all, which I
continue to doubt.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: The Contrib Roundup (long)
Next
From: "Dave Page"
Date:
Subject: Re: User Quota Implementation