Re: CLUSTER all tables - Mailing list pgsql-patches

From Alvaro Herrera
Subject Re: CLUSTER all tables
Date
Msg-id 20020901013021.738fd248.alvherre@atentus.com
Whole thread Raw
In response to CLUSTER all tables  (Alvaro Herrera <alvherre@atentus.com>)
Responses Re: CLUSTER all tables
Re: CLUSTER all tables
List pgsql-patches
En Fri, 30 Aug 2002 19:02:47 -0400 (CLT)
Alvaro Herrera <alvherre@atentus.com> escribió:

> What it does:
>
> - if CLUSTER is called with no arguments, cluster all indexes that
> have  indisclustered set (in the current database).  There's no "ALL"
>   argument: that's just pollution IMHO.
>
> - Gets a list of such indexes (checking ownership of each) and passes
>   them one by one to the standard cluster routine (modified a little
>   so it accepts OIDs of table and index, not names).

The same as before, but I do include documentation patch (feel free to
rewrite or suggest improvements) and regression test.


Now I'm thinking about concurrency: suppose table A and B have
indisclustered set on indexes ind_a and ind_b respectively.  The
user fires a CLUSTER without arguments; the backend will begin
clustering table A on ind_a.

Now, while this is going on, the user fires a CLUSTER on table B on
index ind_b_2, on a separate session.  This table is shorter than table
A and finishes first.

When the first cluster finishes clustering table A, it will start
clustering table B on ind_b.  This is because the cluster-all creates a
list of the tables to be clustered, and _then_ it clusters them one by
one.  So the info saved about table B is old and overrides the new
cluster that the user has done on another session.

The question is: is this a situation worth to protect against? and what
is the best way to do it?  I can see two ways:

1. allow only one cluster operation at the same time, with some kind of
  lock (can the lightweight lock manager be used for this?)

2. generate the list of tables as it goes.  This requires keeping
  pg_index open (with AccessShareLock) for a potentially long time (I
  don't need to tell you that cluster can be *slow*).  Is this
  acceptable?

Is there another?

--
Alvaro Herrera (<alvherre[a]atentus.com>)
"Granting software the freedom to evolve guarantees only different
results, not better ones." (Zygo Blaxell)

pgsql-patches by date:

Previous
From: "Vishnu"
Date:
Subject: syntax to access/retrieve data from multiple databases
Next
From: Alvaro Herrera
Date:
Subject: Re: CLUSTER all tables