Alvaro Herrera <alvherre@atentus.com> writes:
> Now I'm thinking about concurrency: suppose table A and B have
> indisclustered set on indexes ind_a and ind_b respectively. The
> user fires a CLUSTER without arguments; the backend will begin
> clustering table A on ind_a.
> Now, while this is going on, the user fires a CLUSTER on table B on
> index ind_b_2, on a separate session. This table is shorter than table
> A and finishes first.
> When the first cluster finishes clustering table A, it will start
> clustering table B on ind_b. ...
> So the info saved about table B is old and overrides the new
> cluster that the user has done on another session.
You must acquire exclusive lock on a table before you even look to
see if it has a clusterable index, I think. Otherwise there's too
much risk of the state changing underneath you.
The bigger problem with implementing CLUSTER ALL this way is that it's
going to try to get exclusive lock on a large number of tables, which
is going to lead to very high risk of deadlock --- even if other
transactions are not doing CLUSTERs, but only ordinary table accesses.
I think the only practical way to do CLUSTER ALL (or REINDEX ALL for
that matter) is to make it work the way VACUUM does: run a separate
transaction for each table to be processed. In this way you can
release the lock on each table as you finish with it, and avoid
deadlock problems.
If you study the VACUUM code you will also notice that it is prepared
for tables to "go away" before it reaches them; CLUSTER ALL will have
the same issue, along with the issue about clustering status changing.
So what you need is something like:
* make preliminary list of things to cluster
* END starting transaction
* for each item in list:
START new transaction
Attempt to open and exclusive-lock target table
If successful, *and* index still is the clustered index, CLUSTER
END transaction
* START closing transaction so that we return with an open transaction
regards, tom lane