Re: visualizing B-tree index coverage - Mailing list pgsql-general

From PFC
Subject Re: visualizing B-tree index coverage
Date
Msg-id opsk9slnxrth1vuj@musicbox
Whole thread Raw
In response to Re: visualizing B-tree index coverage  (TJ O'Donnell <tjo@acm.org>)
Responses Does indexing help >= as well as = for integer columns?
List pgsql-general
> I realize that using OR will not result in an index scan.
> I will never be interested in a OR condition for the kinds
> of searches I use.  In my Select statements, I always name
> every column of the multi-column index in same order that
> they were named when creating the index.  I always use
> the >= condition, and very rarely, the = condition.

    All the leftmost index column must be named, but the order is unimportant.
    You can use (a BETWEEN x AND y) instead of (a>=x AND a<=y), it is cleaner.

> However, I am concerned that I must place
> the most selective column first in my index.  I cannot tell,
> a priori, which column will be most selective.  That depends on the
> nature of search, which can vary widely each time.
> Are you saying that if my first column is not selective, even though the
> remaining
> columns are, the planner may choose not to use the index after
> seeing that the first column is not very selective?

    I thought this was true but made some tests and the index scanner is
smart.

    Try this :
CREATE TABLE test (id serial primary key, a INTEGER, z INTEGER, e INTEGER,
r INTEGER, t INTEGER, y INTEGER ) WITHOUT OIDS;
INSERT 1M rows into table using a plpgsql function, with a,z,e,r,t,y being
floor(random()*10) for instance.

    Then you can try various selects. a,z,e,r,t,y are a linear distribution
between 0 and 9 included, so :
a>=A AND z>=Z ... y>=Y gives a result set of about
(10-A)*(10-Z)*...*(10-Y) results. You'll see the planner will use an index
scan when needed. You can try the easiest case (a>=9) which just explores
one part of the tree, and the worst case which explores a part of all
leafs (y>=9). Both should yield about the same number of results, but the
first should be faster. To know how much, just try ;)

> That seems like an oversight, IMHO.  Shouldn't the overall effect of
> using all the columns be considered before choosing not to use an
> index scan?

    I think it is. There are no cross column correlation stats though.

> Since I'm using every column of my multi-column index for every search,
> and I always use >=, Explain Analyze always shows that every column
> is considered in the index scan.  However, that is only when the
> index scan is used.  Sometimes, Explain Analyze shows it is not used.
> That appears to happen when my search condition is very general.
> This it to be expected, so I am not worried.  Most of my searches will
> be intermediate, namely not VERY selective, but also not VERY general.
> So the idea of the multi-column index is to "characterize" each row
> sufficiently, even when it is a perfectly ordinary row with no ONE
> feature being distinctive, but rather several features together giving
> it it's distinctive character.  That is my interpretation of the
> multi-column index.

    If you have some features which are highly selective, you can create a
single column index on them. It won't be used often, but when it will, it
will really work.




pgsql-general by date:

Previous
From: Greg Stark
Date:
Subject: Re: visualizing B-tree index coverage
Next
From: MargaretGillon@chromalloy.com
Date:
Subject: Re: pg_dump shell script with ~/.pgpass