Re: Equivalent praxis to CLUSTERED INDEX? - Mailing list pgsql-performance

From Mischa Sandberg
Subject Re: Equivalent praxis to CLUSTERED INDEX?
Date
Msg-id 3csXc.56326$X12.25148@edtnps84
Whole thread Raw
In response to Re: Equivalent praxis to CLUSTERED INDEX?  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Equivalent praxis to CLUSTERED INDEX?
List pgsql-performance
Ummm ... not quite. In MSSQL/Sybase/Oracle, a clustered index maintains
its space saturation as part of each update operation. High activity
does indeed result in less-full pages (typically 60-80% full for tables
with heavy deletions or rowsize changes). To bring the percentage back
up, you run DBCC INDEXDEFRAG, which also does what you'd expect of a
normal file defragmenter -- put related disk pages together on the platter.

But the performance difference is hardly as severe as I gather it can be
if you neglect to vacuum.

As for SQL Server being a 'single-user database' ... ummm ... no, I
don't think so. I'm REALLY happy to be shut of the Microsoft world, but
MSSQL 7/2000/2005 is a serious big DB engine. It also has some serious
bright heads behind it. They hired Goetz Graefe and Paul (aka Per-Ake)
Larsen away from academia, and it shows, in the join and aggregate
processing. I'll be a happy camper if I manage to contribute something
to PG that honks the way their stuff does. Happy to discuss, too.

Josh Berkus wrote:
> Bruce,
>
>
>>How do vendors actually implement auto-clustering?  I assume they move
>>rows around during quiet periods or have lots of empty space in each
>>value bucket.
>
>
> That's how SQL Server does it.   In old versions (6.5) you had to manually
> send commands to update the cluster, same as PG.   Also, when you create a
> cluster (or an index or table for that matter) you can manually set an amount
> of "space" to be held open on each data page for updates.
>
> Also keep in mind that SQL Server, as a "single-user database" has a much
> easier time with this.  They don't have to hold several versions of an index
> in memory and collapse it into a single version at commit time.
>
> All that being said, we could do a better job of "auto-balancing" clustered
> tables.   I believe that someone was working on this in Hackers through what
> they called "B-Tree Tables".  What happened to that?
>

pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Performance hit on loading from HD
Next
From: Mischa Sandberg
Date:
Subject: Re: Equivalent praxis to CLUSTERED INDEX?