Re: Equivalent praxis to CLUSTERED INDEX? - Mailing list pgsql-performance
From | Jim C. Nasby |
---|---|
Subject | Re: Equivalent praxis to CLUSTERED INDEX? |
Date | |
Msg-id | 20040831220503.GK78395@decibel.org Whole thread Raw |
In response to | Re: Equivalent praxis to CLUSTERED INDEX? (Greg Stark <gsstark@mit.edu>) |
List | pgsql-performance |
On Thu, Aug 26, 2004 at 11:39:42PM -0400, Greg Stark wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Updated TODO item: > > > > o Automatically maintain clustering on a table > > > > This would require some background daemon to maintain clustering > > during periods of low usage. It might also require tables to be only > > paritally filled for easier reorganization. It also might require > > creating a merged heap/index data file so an index lookup would > > automatically access the heap data too. > > Fwiw, I would say the first "would" is also a "might". None of the previous > discussions here presumed a maintenance daemon. The discussions before talked > about a mechanism to try to place new tuples as close as possible to the > proper index position. > > I would also suggest making some distinction between a cluster system similar > to what we have now but improved to maintain the clustering continuously, and > an actual index-organized-table where the tuples are actually only stored in a > btree structure. > > They're two different approaches to similar problems. But they might both be > useful to have, and have markedly different implementation details. There's a third approach that I think is worth considering. Half of the benefit to clustered tables is limiting the number of pages you need to access when scanning the primary key. The order of tuples in the pages themselves isn't nearly as important as ordering of the pages. This means you can get most of the benefit of an index-organized table just by being careful about what page you place a tuple on. What I'm thinking of is some means to ensure all the tuples on a page are within some PK range, but not worrying about the exact order within the page since it's relatively cheap to scan through the page in memory. Some pros: This would probably mean less change to the code that inserts tuples. No need for a background daemon. No need to create a new B-Tree table structure. Ideally, there won't be need to move tuples around, which should mean that current indexing code doesn't need to change. Cons: Need to have some way to deal with pages that fill up. To gain full benefit some means of indicating what range of PK values are on a page might be needed. It's not as beneficial as a true IOT since you don't get the benefit of storing your tuples inline with your B-Tree. I'm sure there's a ton of things I'm missing, especially since I'm not familiar with the postgresql code, but hopefully others can explore this further. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
pgsql-performance by date: