Thread: Clustering with minimal locking

Clustering with minimal locking

From
Scott Ribe
Date:
If I'm not totally off-base, here's one way to enable clustering on systems
that run 24/7:

1 cluster current rows
    1.1 note current last committed transaction
    1.2 copy all visible rows to new table in cluster order
    1.3 build indexes on new table
2 add changes
    2.1 note current last committed transaction
    2.2 apply to new table (& indexes) all changes committed since 1.1
3 put new table into service
    3.1 take exclusive lock on table
    3.2 apply to new table (& indexes) all changes committed since 2.1
    3.3 switch in new table
    3.4 release lock
    3.5 clean up old table storage

I don't know enough about pg internals to know how big a project this would
be, but it seems to me that the WAL provides many of the pieces needed to
support steps 1.1 and 2.2, for instance. (Even so, I know it's still not
trivial, just perhaps not huge.)

- I guess there's still the possibility that 3.1 could stall in the presence
of long-lived transactions--but this is certainly no worse than the current
situation where it would stall before starting the cluster operation.

- By "apply changes" I mean insert, update, delete rows--of course schema
changes would be locked out during the cluster, even if it takes days ;-)

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: Clustering with minimal locking

From
Decibel!
Date:
On May 28, 2008, at 11:21 AM, Scott Ribe wrote:
> If I'm not totally off-base, here's one way to enable clustering on
> systems
> that run 24/7:
>
> 1 cluster current rows
>     1.1 note current last committed transaction
>     1.2 copy all visible rows to new table in cluster order
>     1.3 build indexes on new table
> 2 add changes
>     2.1 note current last committed transaction
>     2.2 apply to new table (& indexes) all changes committed since 1.1
> 3 put new table into service
>     3.1 take exclusive lock on table

BOOM! Deadlock.

>     3.2 apply to new table (& indexes) all changes committed since 2.1
>     3.3 switch in new table
>     3.4 release lock
>     3.5 clean up old table storage
>
> I don't know enough about pg internals to know how big a project
> this would
> be, but it seems to me that the WAL provides many of the pieces
> needed to
> support steps 1.1 and 2.2, for instance. (Even so, I know it's
> still not
> trivial, just perhaps not huge.)
>
> - I guess there's still the possibility that 3.1 could stall in the
> presence
> of long-lived transactions--but this is certainly no worse than the
> current
> situation where it would stall before starting the cluster operation.
>
> - By "apply changes" I mean insert, update, delete rows--of course
> schema
> changes would be locked out during the cluster, even if it takes
> days ;-)

What you're describing is possible; it's done for CREATE INDEX
CONCURRENT. But it's not very easy to add. I think what makes a lot
more sense is to have a form of clustering that puts effort into
placing tuples in the correct location. If you had that, you could
effectively migrate stuff into proper cluster order in userland; or
just let it take care of itself. Presumable the table would
eventually end up clustered if rows are updated often enough.
--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Attachment

Re: Clustering with minimal locking

From
Scott Ribe
Date:
> BOOM! Deadlock.

No more likely than with the current cluster command. Acquiring the lock is
the same risk; but it is held for much less time.

> ...I think what makes a lot
> more sense is to have a form of clustering that puts effort into
> placing tuples in the correct location.

Agreed that would be more desirable; thought it might be more difficult.


--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: Clustering with minimal locking

From
Decibel!
Date:
On Jun 17, 2008, at 11:37 AM, Scott Ribe wrote:
>> BOOM! Deadlock.
>
> No more likely than with the current cluster command. Acquiring the
> lock is
> the same risk; but it is held for much less time.


Actually, no (at least in 8.2). CLUSTER grabs an exclusive lock
before it does any work meaning that it can't deadlock by itself. Of
course you could always do something like

BEGIN;
SELECT * FROM a;
CLUSTER .. ON a;
COMMIT;

Which does introduce the risk of a deadlock, but that's your fault,
not Postgres.
--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Attachment

Re: Clustering with minimal locking

From
"Gurjeet Singh"
Date:
On Wed, Jun 18, 2008 at 9:26 AM, Decibel! <decibel@decibel.org> wrote:
On Jun 17, 2008, at 11:37 AM, Scott Ribe wrote:
BOOM! Deadlock.

No more likely than with the current cluster command. Acquiring the lock is
the same risk; but it is held for much less time.


Actually, no (at least in 8.2). CLUSTER grabs an exclusive lock before it does any work meaning that it can't deadlock by itself. Of course you could always do something like

BEGIN;
SELECT * FROM a;
CLUSTER .. ON a;
COMMIT;

Which does introduce the risk of a deadlock

Really!!? Am I missing something? How can a single transaction, running synchronous commands, deadlock itself!

Best regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device