Re: Index size - Mailing list pgsql-general

From Martijn van Oosterhout
Subject Re: Index size
Date
Msg-id 20050302205535.GA12207@svana.org
Whole thread Raw
In response to Re: Index size  (Ioannis Theoharis <theohari@ics.forth.gr>)
List pgsql-general
On Wed, Mar 02, 2005 at 10:08:58PM +0200, Ioannis Theoharis wrote:
> I have a relation like this: (att0 varchar(1000), att1 int4)
>
> i create a b-tree index on att1 ()
> i cluster my raltion according to index
>
> now i have a query
> select     *
> form     tc2000000000
> where     att1<=900000000 and att1>=0 ;
>
> As far as i can see from explain analyze an index scan is used:
> Index Scan using inst_id_idx on tc2000000000
>   Index Cond: ((att1 <= 900000000) AND (att1 >= 0))
>
> If for each entry in table, an entry in index is beeing held, then the
> index size is populated too fast.
>
> I guess, that postgres uses index to find the first entry satisfying the
> index conition, after find the last one and then do a sequential scan on
> the appropriate fraction of the table (to take advantage of physical
> clustering).

What makes you think that? Clustering is nice, but postgresql needs to
get the right answer and that the table in clustered is not something
postgresql can rely on. It uses the index to find *every* row you're
looking for, there's no shortcut here.

> In my case, discrete values on att1 are orders of magnitude less than
> number of table raws.
>
> Thus, the big index size is useless for me. I want to avoid the overhead
> of scanning such a big index, just permitting ONLY the discrete values to
> entry in index. In such a way the whole scenario i presented before for
> how i guess, that postgres evaluates my query, is still in use.

There's no special relationship between two rows with the same att1.
Either you find the rows by using an index for each row, or scanning
the whole table. There's no inbetween. The only thing clustering
acheives is that due to values being together, the chance that
succeeding indexes entries will already have been loaded is higher,
thus reducing the overall cost.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

pgsql-general by date:

Previous
From: Arcane_Rhino
Date:
Subject: Fwd: Re: [ADMIN] pg_shadow passwd decrypt
Next
From: Peter Eisentraut
Date:
Subject: Re: pgpool