Re: Equivalent praxis to CLUSTERED INDEX? - Mailing list pgsql-performance
From | Bruce Momjian |
---|---|
Subject | Re: Equivalent praxis to CLUSTERED INDEX? |
Date | |
Msg-id | 200408261818.i7QIIrr09640@candle.pha.pa.us Whole thread Raw |
In response to | Re: Equivalent praxis to CLUSTERED INDEX? ("J. Andrew Rogers" <jrogers@neopolitan.com>) |
Responses |
Re: Equivalent praxis to CLUSTERED INDEX?
Re: Equivalent praxis to CLUSTERED INDEX? Re: Equivalent praxis to CLUSTERED INDEX? |
List | pgsql-performance |
How do vendors actually implement auto-clustering? I assume they move rows around during quiet periods or have lots of empty space in each value bucket. --------------------------------------------------------------------------- J. Andrew Rogers wrote: > On Tue, 2004-08-24 at 22:28, Mischa Sandberg wrote: > > I see that PG has a one-shot CLUSTER command, but doesn't support > > continuously-updated clustered indexes. > > > > What I infer from newsgroup browsing is, such an index is impossible, > > given the MVCC versioning of records (happy to learn I'm wrong). > > > It is possible to have MVCC and ordered/indexed heaps, but it isn't > something you can just tack onto the currently supported types -- I > looked into this myself. It would take substantial additional code > infrastructure to support it, basically an alternative heap system and > adding support for tables with odd properties to many parts of the > system. Pretty non-trivial. > > This is probably my #1 "I wish postgres had this feature" feature. It > is a serious scalability enhancer for big systems and a pain to work > around not having. > > > > I'd be curious to know what other people, who've crossed this same > > bridge from MSSQL or Oracle or Sybase to PG, have devised, > > faced with the same kind of desired performance gain for retrieving > > blocks of rows with the same partial key. > > > The CLUSTER command is often virtually useless for precisely the kinds > of tables that need to be clustered. My databases are on-line 24x7, and > the tables that are ideal candidates for clustering are in the range of > 50-100 million rows. I can afford to lock these tables up for no more > than 5-10 minutes during off-peak in the hopes that no one notices, and > CLUSTER does not work remotely in the ballpark of that fast for tables > of that size. People who can run CLUSTER in a cron job must either have > relatively small tables or regular large maintenance windows. > > > My solution, which may or may not work for you, was to write a table > partitioning system using the natural flexibility and programmability of > postgresql (e.g. table inheritance). From this I automatically get a > roughly ordered heap according to the index I would cluster on, with > only slightly funky SQL access. The end result works much better with > CLUSTER too, though CLUSTER is much less necessary at that point > because, at least for my particular purposes, the rows are mostly > ordered due to how the data was partitioned. > > So there are ways to work around CLUSTER, but you'll have to be clever > and it will require tailoring the solution to your particular > requirements. > > > J. Andrew Rogers > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
pgsql-performance by date: