Re: About b-tree usage - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: About b-tree usage |
Date | |
Msg-id | 1110149134.4089.397.camel@jeff Whole thread Raw |
In response to | About b-tree usage (Ioannis Theoharis <theohari@ics.forth.gr>) |
Responses |
Re: About b-tree usage
|
List | pgsql-hackers |
If I understand your question, you want to reduce the index size by only pointing to the first tuple in a table with a given key in att0, since the rest of the tuples will be right afterward (because you keep the table clustered on that key). However, from the docs: <http://www.postgresql.org/docs/8.0/static/sql-cluster.html> "When a table is clustered, it is physically reordered based on the index information. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows according to their index order. If one wishes, one can periodically recluster by issuing the command again." So, there is no guarantee that there won't be tuples with the same key in a different physical location between your CLUSTER commands. That means your index idea won't really work. Perhaps you can alter your table layout/design to better suit your needs. For example: * If you have a very small number of values in att0, you could make a different table for each one and make a view that's the union of those tables. * You could make att1 an array. Yes, it is a horrible design from a relational standpoint, but if you need performance, there may not be many other options. You can then use set-returning functions and other functions to make it behave more like a relation. Ideally, postgresql would have relation-valued attributes, but currently arrays seem like the best option available for simulating a relation-valued attribute. If there are many identical values in att0, are you sure a sequential scan isn't more efficient? Also, are you sure the index isn't working well? It seems to me since you have the table clustered, it might be fairly efficient as-is (it would get a huge benefit from the spatial locality of the tuples in the table). Index size alone shouldn't destroy your performance, since the idea of an index lookup is that it only has to read O(log n) pages from the disk per lookup. Regards,Jeff Davis On Sun, 2005-03-06 at 23:33 +0200, Ioannis Theoharis wrote: > > Please let me know, if there is any option in postgresql to achieve the > following usage of a b-tree index: > > For a relation R(att0, att1) and a btree index on attribute att0 > > In each insertion of a tuple on table: > - look on index if the value of att0 of new entry does already exist in > index, and > - if no, allow the aprorpiate entry on b-tree > - if yes, do not allow an entry. > > In my aplication i have always my relation clustered according to att0. > And the only information needed for a query with a range condition over > att0 in WHERE clause, is the place on disc where the first tuple with a > given value on att0 is placed. > > The hint, is that beacause of too many raws of table, the index size is > too big. But the number of discrete values of att0 is orders of > magnitudes smaller than the number of tuples. > > I try to investigate, if there is a way to use an alternative of b-tree > index, to decrease the blocks of indexed that are fetched into memory. > > Thanks. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend
pgsql-hackers by date: