Thread: single index on more than two coulumns a bad thing?

single index on more than two coulumns a bad thing?

From
Palle Girgensohn
Date:
Hi,

I remember reading a post ages ago, maybe from Vadim, about the fact that
people creating indices on more than two columns will be the first to be
put againts the wall when the revolution comes... sort of...

Is it always bad to create index xx on yy (field1, field2, field3);

I guess the problem is that the index might often grow bigger than the
table, or at least big enough not to speed up the queries?

/Palle


Re: single index on more than two coulumns a bad thing?

From
Josh Berkus
Date:
Palle,

> Is it always bad to create index xx on yy (field1, field2, field3);

No, it seldom bad, in fact.    I have some indexes that run up to seven
columns, becuase they are required for unique keys.

Indexes of 3-4 columns are often *required* for many-to-many join tables.

I'm afraid that you've been given some misleading advice.

> I guess the problem is that the index might often grow bigger than the
> table, or at least big enough not to speed up the queries?

Well, yes ... a 4-column index on a 5-column table could be bigger than the
table if allowed to bloat and not re-indexed.   But that's just a reason for
better maintainence.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: single index on more than two coulumns a bad thing?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> Is it always bad to create index xx on yy (field1, field2, field3);

> I'm afraid that you've been given some misleading advice.

I'd say it's a matter of getting your optimizations straight.

If you have a query that can make use of that index, and the query is
executed often enough to make it worth maintaining the index during
table updates, then by all means make the index.

The standard advice is meant to warn you against creating a zillion
indexes without any thought to what you'll be paying in update costs.
Indexes with more than a couple of columns are usually of only narrow
applicability, and so you have to be sure that they'll really pay for
themselves...

            regards, tom lane

Re: single index on more than two coulumns a bad thing?

From
Greg Stark
Date:
Palle Girgensohn <girgen@pingpong.net> writes:

> Is it always bad to create index xx on yy (field1, field2, field3);

All generalisations are false...

Seriously, it's true that as the length of your index key gets longer the
harder and harder it is to justify it. That doesn't mean they're always wrong,
but you should consider whether a shorter key would perform just as well.

The other problem with long index keys is that they often show up in the same
place as having dozens of indexes on the same table. Usually in shops where
the indexes were created after the fact looking at specific queries.

--
greg

Re: single index on more than two coulumns a bad thing?

From
"Aaron Werman"
Date:
another thing that I have all over the place is a hierarchy:
index on grandfather_table(grandfather)
index on father_table(grandfather, father)
index on son_table(grandfather, father, son)

almost all of my indices are composite. Are you thinking about composite
indices with low cardinality leading columns?

/Aaron

----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: "Palle Girgensohn" <girgen@pingpong.net>;
<pgsql-performance@postgresql.org>
Sent: Thursday, April 01, 2004 7:35 PM
Subject: Re: [PERFORM] single index on more than two coulumns a bad thing?


> Palle,
>
> > Is it always bad to create index xx on yy (field1, field2, field3);
>
> No, it seldom bad, in fact.    I have some indexes that run up to seven
> columns, becuase they are required for unique keys.
>
> Indexes of 3-4 columns are often *required* for many-to-many join tables.
>
> I'm afraid that you've been given some misleading advice.
>
> > I guess the problem is that the index might often grow bigger than the
> > table, or at least big enough not to speed up the queries?
>
> Well, yes ... a 4-column index on a 5-column table could be bigger than
the
> table if allowed to bloat and not re-indexed.   But that's just a reason
for
> better maintainence.
>
> --
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

Re: single index on more than two coulumns a bad thing?

From
Bruno Wolff III
Date:
On Fri, Apr 02, 2004 at 01:00:45 +0200,
  Palle Girgensohn <girgen@pingpong.net> wrote:
>
> Is it always bad to create index xx on yy (field1, field2, field3);
>
> I guess the problem is that the index might often grow bigger than the
> table, or at least big enough not to speed up the queries?

One place where you need them in postgres is enforcing unique multicolumn
keys. These will get created implicitly from the unique (or primary key)
constraint. It isn't all that unusual to have a table that describes
a many to many (to many ...) relationship where the primary key is all
of the columns.

Re: single index on more than two coulumns a bad thing?

From
Palle Girgensohn
Date:
--On fredag, april 02, 2004 09.56.04 -0600 Bruno Wolff III <bruno@wolff.to>
wrote:

> On Fri, Apr 02, 2004 at 01:00:45 +0200,
>   Palle Girgensohn <girgen@pingpong.net> wrote:
>>
>> Is it always bad to create index xx on yy (field1, field2, field3);
>>
>> I guess the problem is that the index might often grow bigger than the
>> table, or at least big enough not to speed up the queries?
>
> One place where you need them in postgres is enforcing unique multicolumn
> keys. These will get created implicitly from the unique (or primary key)
> constraint. It isn't all that unusual to have a table that describes
> a many to many (to many ...) relationship where the primary key is all
> of the columns.

True, of course!

/Palle


Re: single index on more than two coulumns a bad thing?

From
"Leeuw van der, Tim"
Date:
Hi Aaron,

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of
> Aaron Werman
> Sent: vrijdag 2 april 2004 13:57
>
>
> another thing that I have all over the place is a hierarchy:
> index on grandfather_table(grandfather)
> index on father_table(grandfather, father)
> index on son_table(grandfather, father, son)
>

It depends on your data-distribution, but I find that in almost all cases it's beneficial to have your indexes the
otherway round in such cases: 

index on grandfather_table(grandfather)
index on father_table(father, grandfather)
index on son_table(son, father, grandfather)

That usually gives a less common, more selective value at the start of the index, making the initial selection in the
indexsmaller. 

And AFAIK I don't have to rewrite my queries for that; the planner doesn't care about the order of expressions in the
querythat are on the same level. 

That said, I tend to use 'surrogate keys'; keys generated from sequences or auto-number columns for my tables. It makes
thetables less readable, but the indexes remain smaller. 


Greetings,

--Tim



Re: single index on more than two coulumns a bad thing?

From
"Aaron Werman"
Date:
You're absolutely correct that the general rule is to lead a composite index
with the highest cardinality index columns for fastest selectivity. Indices
and all physical design are based on usage. In this case of unique indices
supporting primary keys in a hierarchy, it depends. For selection of small
sets of arbitrary rows, your arrangement is best. For hierarchy based
queries,  such as "for grandparent of foo, and parent of bar, give average
age of sons" - the hierarchy based index is often more efficient.

Surrogate keys have a role, and can improve performance, but also carry an
enormous penalty of intentionally obfuscating logical keys and data
semantics, and almost always lead to data errors not being caught because
they obscure irrational relationships. I hate them, but use them frequently
in high transaction rate operational systems where there is much functional
validation outside the dbms (and the apps behave therefore like object
databases and surrogate keys are network database pointers) and in data
warehousing (where downstream data cannot be corrected anyway).

/Aaron

----- Original Message -----
From: "Leeuw van der, Tim" <tim.leeuwvander@nl.unisys.com>
To: <pgsql-performance@postgresql.org>
Sent: Sunday, April 04, 2004 5:06 PM
Subject: Re: [PERFORM] single index on more than two coulumns a bad thing?


Hi Aaron,

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of
> Aaron Werman
> Sent: vrijdag 2 april 2004 13:57
>
>
> another thing that I have all over the place is a hierarchy:
> index on grandfather_table(grandfather)
> index on father_table(grandfather, father)
> index on son_table(grandfather, father, son)
>

It depends on your data-distribution, but I find that in almost all cases
it's beneficial to have your indexes the other way round in such cases:

index on grandfather_table(grandfather)
index on father_table(father, grandfather)
index on son_table(son, father, grandfather)

That usually gives a less common, more selective value at the start of the
index, making the initial selection in the index smaller.

And AFAIK I don't have to rewrite my queries for that; the planner doesn't
care about the order of expressions in the query that are on the same level.

That said, I tend to use 'surrogate keys'; keys generated from sequences or
auto-number columns for my tables. It makes the tables less readable, but
the indexes remain smaller.


Greetings,

--Tim



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster