Re: ToDo List Item - System Table Index Clustering - Mailing list pgsql-hackers

From Simone Aiken
Subject Re: ToDo List Item - System Table Index Clustering
Date
Msg-id 007401cbb7f4$b1539820$13fac860$@quietlyCompetent.com
Whole thread Raw
In response to Re: ToDo List Item - System Table Index Clustering  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
> Robert
> 
> I think the first 
> thing to do would be to try to come up with a reproducible test case 
> where clustering the tables improves performance.  
>

On that note, is there any standard way you guys do benchmarks?  


> Bruce
>
>I think CLUSTER is a win when you are looking up multiple rows in the same
table, either using a non-unique index or a range search.  What places do
such lookups?  >Having them all in adjacent pages would be a win ---
single-row lookups are usually not.
>

Mostly the tables that track column level data.  Typically you will want to
grab rows for multiple columns for a given table at once so it would be
helpful to have them be contiguous on disk. 

I could design a benchmark to display this by building a thousand tables one
column at a time using 'alter add column' to scatter the catalog rows for
the tables across many blocks.  So they'll be a range with column 1 for each
table and column 2 for each table and column three for each table.  Then
fill a couple data tables with a lot of data and set some noise makers to
loop through them over and over with full table scans ... filling up cache
with unrelated data and hopefully ageing out the cache of the pg_tables.
Then do some benchmark index lookup queries to see the retrieval time before
and after clustering the pg_ctalog tables to record a difference.

If the criteria is "doesn't hurt anything and helps a little" I think this
passes.  Esp since clusters aren't maintained automatically so adding them
has no negative impact on insert or update.  It'd just be a nice thing to do
if you know it can be done that doesn't harm anyone who doesn't know.


-Simone Aiken






pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on
Next
From: Merlin Moncure
Date:
Subject: Re: limiting hint bit I/O