Home > mailing lists

Re: single index on more than two coulumns a bad thing? - Mailing list pgsql-performance

From	Aaron Werman
Subject	Re: single index on more than two coulumns a bad thing?
Date	April 5, 2004 19:19:59
Msg-id	BAY9-DAV30oQREMe8bO000243eb@hotmail.com Whole thread Raw
In response to	Re: single index on more than two coulumns a bad thing? ("Leeuw van der, Tim" <tim.leeuwvander@nl.unisys.com>)
List	pgsql-performance

Tree view

You're absolutely correct that the general rule is to lead a composite index
with the highest cardinality index columns for fastest selectivity. Indices
and all physical design are based on usage. In this case of unique indices
supporting primary keys in a hierarchy, it depends. For selection of small
sets of arbitrary rows, your arrangement is best. For hierarchy based
queries,  such as "for grandparent of foo, and parent of bar, give average
age of sons" - the hierarchy based index is often more efficient.

Surrogate keys have a role, and can improve performance, but also carry an
enormous penalty of intentionally obfuscating logical keys and data
semantics, and almost always lead to data errors not being caught because
they obscure irrational relationships. I hate them, but use them frequently
in high transaction rate operational systems where there is much functional
validation outside the dbms (and the apps behave therefore like object
databases and surrogate keys are network database pointers) and in data
warehousing (where downstream data cannot be corrected anyway).

/Aaron

----- Original Message -----
From: "Leeuw van der, Tim" <tim.leeuwvander@nl.unisys.com>
To: <pgsql-performance@postgresql.org>
Sent: Sunday, April 04, 2004 5:06 PM
Subject: Re: [PERFORM] single index on more than two coulumns a bad thing?


Hi Aaron,

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of
> Aaron Werman
> Sent: vrijdag 2 april 2004 13:57
>
>
> another thing that I have all over the place is a hierarchy:
> index on grandfather_table(grandfather)
> index on father_table(grandfather, father)
> index on son_table(grandfather, father, son)
>

It depends on your data-distribution, but I find that in almost all cases
it's beneficial to have your indexes the other way round in such cases:

index on grandfather_table(grandfather)
index on father_table(father, grandfather)
index on son_table(son, father, grandfather)

That usually gives a less common, more selective value at the start of the
index, making the initial selection in the index smaller.

And AFAIK I don't have to rewrite my queries for that; the planner doesn't
care about the order of expressions in the query that are on the same level.

That said, I tend to use 'surrogate keys'; keys generated from sequences or
auto-number columns for my tables. It makes the tables less readable, but
the indexes remain smaller.


Greetings,

--Tim



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

pgsql-performance by date:

From: "Gary Doades"
Date: 05 April 2004, 19:14:52
Subject: Re: PostgreSQL and Linux 2.6 kernel.

From: "Kevin Barnard"
Date: 05 April 2004, 19:35:10
Subject: Re: atrocious update performance

Re: single index on more than two coulumns a bad thing? - Mailing list pgsql-performance

Previous

Next