Re: Index optimization ? - Mailing list pgsql-general

From Florian G. Pflug
Subject Re: Index optimization ?
Date
Msg-id 41EDB4BE.2020700@phlo.org
Whole thread Raw
In response to Re: Index optimization ?  (Bo Lorentsen <bl@netgroup.dk>)
Responses Re: Index optimization ?  ("Jim C. Nasby" <decibel@decibel.org>)
Re: Index optimization ?  (Bo Lorentsen <bl@netgroup.dk>)
List pgsql-general
Bo Lorentsen wrote:
> Greg Stark wrote:
>> If Postgres used an index it would call odd(), which would return 1
>> because
>> it's the first time, and then Postgres would go look up the rows where
>> col is
>> 1 and return all of them. That's a very different behaviour from if
>> the index
>> isn't used. If all the records have col=1 then you're getting all of the
>> records instead of half of them. If col=0 then you're getting none of
>> them
>> instead of half of them.
> This is the differance between "volatile" and "stable" functions, but
> not the answer to why an index lookuo are per query and not per row, is
> it not ?

Because the _whole_ _point_ of an index is to find matching rows
_without_ scanning the whole table. IF you have to look at every row
anyway, then just might as well to an sequential scan.

The performance boost an index gives you comes from that very fact that
you can avoid looking most of the rows. You only have to traverse the
index tree from the root to the node(s) you are looking for. If you have
an unique index, this means you have to traverse (roughly) log2(n)
nodes, if your table has n rows. (If your table has 2^32, or about 4
billion rows, you just need to do 32(!) comparisons when walking the
index tree, but whould need to do at worst 2^32 if you sequentially
scanned the table.

_But_ the "downside" is that you skipped rows. You just didn't look at
them _ever_. You don't even _know_ how many rows you skipped (Thats why
count(*) on a hughe table is slow). So you _can't_ use an index for a
where-condition that has to inspect every row. But, since the SQL-spec
requires you to call volatile functions _for_every_row_, you _can't_ use
an index in your case.

You can, howevery, accelerate something like "where f in (1,2,3,4)". You
just scan the index 4 times, each time for a different value. Of course,
if the number of values becomes larger and larger, there is a point
where it's more efficient to do a sequential scan _once_, instead of a
few tousand index scans (depends on the number of rows in the table).
The postgres optimizer tries to estimate this, and will switch to an
seq-scan, if it would have to do too many index lookups.

Your example (the one with currval()) would translate to a IN-clause
with about as many entries in the IN-list is you have rows in the table
(Because the function has to be called _for_ _every_ _row_). Clearly,
it's not feasable to use an index scan here - it would be slower than a
seq scan.

greetings, florian pflug

Attachment

pgsql-general by date:

Previous
From: Ron Mayer
Date:
Subject: Is initdb needed from 8.0.0rc3 to 8.0.0?
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Index optimization ?