Re: visualizing B-tree index coverage - Mailing list pgsql-general
From | PFC |
---|---|
Subject | Re: visualizing B-tree index coverage |
Date | |
Msg-id | opsk9slnxrth1vuj@musicbox Whole thread Raw |
In response to | Re: visualizing B-tree index coverage (TJ O'Donnell <tjo@acm.org>) |
Responses |
Does indexing help >= as well as = for integer columns?
|
List | pgsql-general |
> I realize that using OR will not result in an index scan. > I will never be interested in a OR condition for the kinds > of searches I use. In my Select statements, I always name > every column of the multi-column index in same order that > they were named when creating the index. I always use > the >= condition, and very rarely, the = condition. All the leftmost index column must be named, but the order is unimportant. You can use (a BETWEEN x AND y) instead of (a>=x AND a<=y), it is cleaner. > However, I am concerned that I must place > the most selective column first in my index. I cannot tell, > a priori, which column will be most selective. That depends on the > nature of search, which can vary widely each time. > Are you saying that if my first column is not selective, even though the > remaining > columns are, the planner may choose not to use the index after > seeing that the first column is not very selective? I thought this was true but made some tests and the index scanner is smart. Try this : CREATE TABLE test (id serial primary key, a INTEGER, z INTEGER, e INTEGER, r INTEGER, t INTEGER, y INTEGER ) WITHOUT OIDS; INSERT 1M rows into table using a plpgsql function, with a,z,e,r,t,y being floor(random()*10) for instance. Then you can try various selects. a,z,e,r,t,y are a linear distribution between 0 and 9 included, so : a>=A AND z>=Z ... y>=Y gives a result set of about (10-A)*(10-Z)*...*(10-Y) results. You'll see the planner will use an index scan when needed. You can try the easiest case (a>=9) which just explores one part of the tree, and the worst case which explores a part of all leafs (y>=9). Both should yield about the same number of results, but the first should be faster. To know how much, just try ;) > That seems like an oversight, IMHO. Shouldn't the overall effect of > using all the columns be considered before choosing not to use an > index scan? I think it is. There are no cross column correlation stats though. > Since I'm using every column of my multi-column index for every search, > and I always use >=, Explain Analyze always shows that every column > is considered in the index scan. However, that is only when the > index scan is used. Sometimes, Explain Analyze shows it is not used. > That appears to happen when my search condition is very general. > This it to be expected, so I am not worried. Most of my searches will > be intermediate, namely not VERY selective, but also not VERY general. > So the idea of the multi-column index is to "characterize" each row > sufficiently, even when it is a perfectly ordinary row with no ONE > feature being distinctive, but rather several features together giving > it it's distinctive character. That is my interpretation of the > multi-column index. If you have some features which are highly selective, you can create a single column index on them. It won't be used often, but when it will, it will really work.
pgsql-general by date: