Thread: Clustering with minimal locking
If I'm not totally off-base, here's one way to enable clustering on systems that run 24/7: 1 cluster current rows 1.1 note current last committed transaction 1.2 copy all visible rows to new table in cluster order 1.3 build indexes on new table 2 add changes 2.1 note current last committed transaction 2.2 apply to new table (& indexes) all changes committed since 1.1 3 put new table into service 3.1 take exclusive lock on table 3.2 apply to new table (& indexes) all changes committed since 2.1 3.3 switch in new table 3.4 release lock 3.5 clean up old table storage I don't know enough about pg internals to know how big a project this would be, but it seems to me that the WAL provides many of the pieces needed to support steps 1.1 and 2.2, for instance. (Even so, I know it's still not trivial, just perhaps not huge.) - I guess there's still the possibility that 3.1 could stall in the presence of long-lived transactions--but this is certainly no worse than the current situation where it would stall before starting the cluster operation. - By "apply changes" I mean insert, update, delete rows--of course schema changes would be locked out during the cluster, even if it takes days ;-) -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
On May 28, 2008, at 11:21 AM, Scott Ribe wrote: > If I'm not totally off-base, here's one way to enable clustering on > systems > that run 24/7: > > 1 cluster current rows > 1.1 note current last committed transaction > 1.2 copy all visible rows to new table in cluster order > 1.3 build indexes on new table > 2 add changes > 2.1 note current last committed transaction > 2.2 apply to new table (& indexes) all changes committed since 1.1 > 3 put new table into service > 3.1 take exclusive lock on table BOOM! Deadlock. > 3.2 apply to new table (& indexes) all changes committed since 2.1 > 3.3 switch in new table > 3.4 release lock > 3.5 clean up old table storage > > I don't know enough about pg internals to know how big a project > this would > be, but it seems to me that the WAL provides many of the pieces > needed to > support steps 1.1 and 2.2, for instance. (Even so, I know it's > still not > trivial, just perhaps not huge.) > > - I guess there's still the possibility that 3.1 could stall in the > presence > of long-lived transactions--but this is certainly no worse than the > current > situation where it would stall before starting the cluster operation. > > - By "apply changes" I mean insert, update, delete rows--of course > schema > changes would be locked out during the cluster, even if it takes > days ;-) What you're describing is possible; it's done for CREATE INDEX CONCURRENT. But it's not very easy to add. I think what makes a lot more sense is to have a form of clustering that puts effort into placing tuples in the correct location. If you had that, you could effectively migrate stuff into proper cluster order in userland; or just let it take care of itself. Presumable the table would eventually end up clustered if rows are updated often enough. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Attachment
> BOOM! Deadlock. No more likely than with the current cluster command. Acquiring the lock is the same risk; but it is held for much less time. > ...I think what makes a lot > more sense is to have a form of clustering that puts effort into > placing tuples in the correct location. Agreed that would be more desirable; thought it might be more difficult. -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
On Jun 17, 2008, at 11:37 AM, Scott Ribe wrote: >> BOOM! Deadlock. > > No more likely than with the current cluster command. Acquiring the > lock is > the same risk; but it is held for much less time. Actually, no (at least in 8.2). CLUSTER grabs an exclusive lock before it does any work meaning that it can't deadlock by itself. Of course you could always do something like BEGIN; SELECT * FROM a; CLUSTER .. ON a; COMMIT; Which does introduce the risk of a deadlock, but that's your fault, not Postgres. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Attachment
On Wed, Jun 18, 2008 at 9:26 AM, Decibel! <decibel@decibel.org> wrote:
Really!!? Am I missing something? How can a single transaction, running synchronous commands, deadlock itself!
Best regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device
On Jun 17, 2008, at 11:37 AM, Scott Ribe wrote:Actually, no (at least in 8.2). CLUSTER grabs an exclusive lock before it does any work meaning that it can't deadlock by itself. Of course you could always do something likeBOOM! Deadlock.
No more likely than with the current cluster command. Acquiring the lock is
the same risk; but it is held for much less time.
BEGIN;
SELECT * FROM a;
CLUSTER .. ON a;
COMMIT;
Which does introduce the risk of a deadlock
Really!!? Am I missing something? How can a single transaction, running synchronous commands, deadlock itself!
Best regards,
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
EnterpriseDB http://www.enterprisedb.com
Mail sent from my BlackLaptop device