Re: Equivalent praxis to CLUSTERED INDEX? - Mailing list pgsql-performance

From Bruce Momjian
Subject Re: Equivalent praxis to CLUSTERED INDEX?
Date
Msg-id 200408261818.i7QIIrr09640@candle.pha.pa.us
Whole thread Raw
In response to Re: Equivalent praxis to CLUSTERED INDEX?  ("J. Andrew Rogers" <jrogers@neopolitan.com>)
Responses Re: Equivalent praxis to CLUSTERED INDEX?
Re: Equivalent praxis to CLUSTERED INDEX?
Re: Equivalent praxis to CLUSTERED INDEX?
List pgsql-performance
How do vendors actually implement auto-clustering?  I assume they move
rows around during quiet periods or have lots of empty space in each
value bucket.

---------------------------------------------------------------------------

J. Andrew Rogers wrote:
> On Tue, 2004-08-24 at 22:28, Mischa Sandberg wrote:
> > I see that PG has a one-shot CLUSTER command, but doesn't support
> > continuously-updated clustered indexes.
> >
> > What I infer from newsgroup browsing is, such an index is impossible,
> > given the MVCC versioning of records (happy to learn I'm wrong).
>
>
> It is possible to have MVCC and ordered/indexed heaps, but it isn't
> something you can just tack onto the currently supported types -- I
> looked into this myself.  It would take substantial additional code
> infrastructure to support it, basically an alternative heap system and
> adding support for tables with odd properties to many parts of the
> system.  Pretty non-trivial.
>
> This is probably my #1 "I wish postgres had this feature" feature.  It
> is a serious scalability enhancer for big systems and a pain to work
> around not having.
>
>
> > I'd be curious to know what other people, who've crossed this same
> > bridge from MSSQL or Oracle or Sybase to PG, have devised,
> > faced with the same kind of desired performance gain for retrieving
> > blocks of rows with the same partial key.
>
>
> The CLUSTER command is often virtually useless for precisely the kinds
> of tables that need to be clustered.  My databases are on-line 24x7, and
> the tables that are ideal candidates for clustering are in the range of
> 50-100 million rows. I can afford to lock these tables up for no more
> than 5-10 minutes during off-peak in the hopes that no one notices, and
> CLUSTER does not work remotely in the ballpark of that fast for tables
> of that size.  People who can run CLUSTER in a cron job must either have
> relatively small tables or regular large maintenance windows.
>
>
> My solution, which may or may not work for you, was to write a table
> partitioning system using the natural flexibility and programmability of
> postgresql (e.g. table inheritance).  From this I automatically get a
> roughly ordered heap according to the index I would cluster on, with
> only slightly funky SQL access.  The end result works much better with
> CLUSTER too, though CLUSTER is much less necessary at that point
> because, at least for my particular purposes, the rows are mostly
> ordered due to how the data was partitioned.
>
> So there are ways to work around CLUSTER, but you'll have to be clever
> and it will require tailoring the solution to your particular
> requirements.
>
>
> J. Andrew Rogers
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-performance by date:

Previous
From: "Matt Clark"
Date:
Subject: Re: Disabling transaction/outdated-tuple behaviour
Next
From: Josh Berkus
Date:
Subject: Re: Disabling transaction/outdated-tuple behaviour