Thread: "Overlaping" indexes

"Overlaping" indexes

From

"Rigmor Ukuhe"

Date:

02 February 2004, 10:47:28

Hi,

I have many indexes somehow overlaping like:
... btree ("STATUS", "VISIBLE", "NP_ID");
... btree ("STATUS", "VISIBLE");

is perfomance gained by "more exact" index worth overhead with managing
indexes.

Rigmor Ukuhe
Finestmedia Ltd
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.564 / Virus Database: 356 - Release Date: 19.01.2004

Re: "Overlaping" indexes

From

Tomasz Myrta

Date:

02 February 2004, 11:04:09

Dnia 2004-02-02 15:46, Użytkownik Rigmor Ukuhe napisał:
> Hi,
>
> I have many indexes somehow overlaping like:
> ... btree ("STATUS", "VISIBLE", "NP_ID");
> ... btree ("STATUS", "VISIBLE");
>
> is perfomance gained by "more exact" index worth overhead with managing
> indexes.

The second (2 columns) index is useless - it's function is well done by
the first one (3 columns).

Regards,
Tomasz Myrta

Re: "Overlaping" indexes

From

"scott.marlowe"

Date:

02 February 2004, 14:38:32

On Mon, 2 Feb 2004, Tomasz Myrta wrote:

> Dnia 2004-02-02 15:46, U?ytkownik Rigmor Ukuhe napisa3:
> > Hi,
> >
> > I have many indexes somehow overlaping like:
> > ... btree ("STATUS", "VISIBLE", "NP_ID");
> > ... btree ("STATUS", "VISIBLE");
> >
> > is perfomance gained by "more exact" index worth overhead with managing
> > indexes.
>
> The second (2 columns) index is useless - it's function is well done by
> the first one (3 columns).

Not entirely, since it only has to sort two columns, it will be smaller,
and will therefore be somewhat faster.

On the other hand, I've seen a lot of folks create multi column indexes
who didn't really understand how they work in Postgresql.

Re: "Overlaping" indexes

From

Tomasz Myrta

Date:

02 February 2004, 14:48:06

Dnia 2004-02-02 19:30, Użytkownik scott.marlowe napisał:
> Not entirely, since it only has to sort two columns, it will be smaller,
> and will therefore be somewhat faster.

Can you say something more about it? Will it be enough faster to keep
them both? Did anyone make such tests?

Regards,
Tomasz Myrta

Re: "Overlaping" indexes

From

"scott.marlowe"

Date:

02 February 2004, 15:03:43

On Mon, 2 Feb 2004, Tomasz Myrta wrote:

> Dnia 2004-02-02 19:30, U¿ytkownik scott.marlowe napisa³:
> > Not entirely, since it only has to sort two columns, it will be smaller,
> > and will therefore be somewhat faster.
>
> Can you say something more about it? Will it be enough faster to keep
> them both? Did anyone make such tests?

that really depends on the distribution of the third column.  If there's
only a couple of values in the third column, no big deal.  If each entry
is unique, and it's a large table, very big deal.

It is only useful to have a three column index if you actually use it.  If
you have an index on (a,b,c) and select order by b, the index won't get
used unless the a part is in the where clause.

the other issue is updates.  IT WILL cost more to update two indexes
rather than one.  Generally, you can drop / readd the index and use
explain analyze on one of your own queries to see if that helps.

Re: "Overlaping" indexes

From

Rod Taylor

Date:

02 February 2004, 15:17:45

On Mon, 2004-02-02 at 13:43, Tomasz Myrta wrote:
> Dnia 2004-02-02 19:30, Użytkownik scott.marlowe napisał:
> > Not entirely, since it only has to sort two columns, it will be smaller,
> > and will therefore be somewhat faster.
>
> Can you say something more about it? Will it be enough faster to keep
> them both? Did anyone make such tests?

You can actually come up with test cases where both indexes are useful.
The three column index will have more data to sift through. That said,
having both indexes used means there is less ram available for cache.

The biggest mistake I see is people doing everything they can to
optimize a single query, then they optimize the next query, etc.

When you consider the entire set of queries, those two indexes are very
likely to slow select throughput down due to increased memory
requirements and the system hitting disk a little more often.

It's similar to the mistake of benchmarking a set of 1000 row tables and
optimizing memory settings for that, then using that configuration on
the 10M row tables in production.