Thread: single index on more than two coulumns a bad thing?
Hi, I remember reading a post ages ago, maybe from Vadim, about the fact that people creating indices on more than two columns will be the first to be put againts the wall when the revolution comes... sort of... Is it always bad to create index xx on yy (field1, field2, field3); I guess the problem is that the index might often grow bigger than the table, or at least big enough not to speed up the queries? /Palle
Palle, > Is it always bad to create index xx on yy (field1, field2, field3); No, it seldom bad, in fact. I have some indexes that run up to seven columns, becuase they are required for unique keys. Indexes of 3-4 columns are often *required* for many-to-many join tables. I'm afraid that you've been given some misleading advice. > I guess the problem is that the index might often grow bigger than the > table, or at least big enough not to speed up the queries? Well, yes ... a 4-column index on a 5-column table could be bigger than the table if allowed to bloat and not re-indexed. But that's just a reason for better maintainence. -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: >> Is it always bad to create index xx on yy (field1, field2, field3); > I'm afraid that you've been given some misleading advice. I'd say it's a matter of getting your optimizations straight. If you have a query that can make use of that index, and the query is executed often enough to make it worth maintaining the index during table updates, then by all means make the index. The standard advice is meant to warn you against creating a zillion indexes without any thought to what you'll be paying in update costs. Indexes with more than a couple of columns are usually of only narrow applicability, and so you have to be sure that they'll really pay for themselves... regards, tom lane
Palle Girgensohn <girgen@pingpong.net> writes: > Is it always bad to create index xx on yy (field1, field2, field3); All generalisations are false... Seriously, it's true that as the length of your index key gets longer the harder and harder it is to justify it. That doesn't mean they're always wrong, but you should consider whether a shorter key would perform just as well. The other problem with long index keys is that they often show up in the same place as having dozens of indexes on the same table. Usually in shops where the indexes were created after the fact looking at specific queries. -- greg
another thing that I have all over the place is a hierarchy: index on grandfather_table(grandfather) index on father_table(grandfather, father) index on son_table(grandfather, father, son) almost all of my indices are composite. Are you thinking about composite indices with low cardinality leading columns? /Aaron ----- Original Message ----- From: "Josh Berkus" <josh@agliodbs.com> To: "Palle Girgensohn" <girgen@pingpong.net>; <pgsql-performance@postgresql.org> Sent: Thursday, April 01, 2004 7:35 PM Subject: Re: [PERFORM] single index on more than two coulumns a bad thing? > Palle, > > > Is it always bad to create index xx on yy (field1, field2, field3); > > No, it seldom bad, in fact. I have some indexes that run up to seven > columns, becuase they are required for unique keys. > > Indexes of 3-4 columns are often *required* for many-to-many join tables. > > I'm afraid that you've been given some misleading advice. > > > I guess the problem is that the index might often grow bigger than the > > table, or at least big enough not to speed up the queries? > > Well, yes ... a 4-column index on a 5-column table could be bigger than the > table if allowed to bloat and not re-indexed. But that's just a reason for > better maintainence. > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
On Fri, Apr 02, 2004 at 01:00:45 +0200, Palle Girgensohn <girgen@pingpong.net> wrote: > > Is it always bad to create index xx on yy (field1, field2, field3); > > I guess the problem is that the index might often grow bigger than the > table, or at least big enough not to speed up the queries? One place where you need them in postgres is enforcing unique multicolumn keys. These will get created implicitly from the unique (or primary key) constraint. It isn't all that unusual to have a table that describes a many to many (to many ...) relationship where the primary key is all of the columns.
--On fredag, april 02, 2004 09.56.04 -0600 Bruno Wolff III <bruno@wolff.to> wrote: > On Fri, Apr 02, 2004 at 01:00:45 +0200, > Palle Girgensohn <girgen@pingpong.net> wrote: >> >> Is it always bad to create index xx on yy (field1, field2, field3); >> >> I guess the problem is that the index might often grow bigger than the >> table, or at least big enough not to speed up the queries? > > One place where you need them in postgres is enforcing unique multicolumn > keys. These will get created implicitly from the unique (or primary key) > constraint. It isn't all that unusual to have a table that describes > a many to many (to many ...) relationship where the primary key is all > of the columns. True, of course! /Palle
Hi Aaron, > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of > Aaron Werman > Sent: vrijdag 2 april 2004 13:57 > > > another thing that I have all over the place is a hierarchy: > index on grandfather_table(grandfather) > index on father_table(grandfather, father) > index on son_table(grandfather, father, son) > It depends on your data-distribution, but I find that in almost all cases it's beneficial to have your indexes the otherway round in such cases: index on grandfather_table(grandfather) index on father_table(father, grandfather) index on son_table(son, father, grandfather) That usually gives a less common, more selective value at the start of the index, making the initial selection in the indexsmaller. And AFAIK I don't have to rewrite my queries for that; the planner doesn't care about the order of expressions in the querythat are on the same level. That said, I tend to use 'surrogate keys'; keys generated from sequences or auto-number columns for my tables. It makes thetables less readable, but the indexes remain smaller. Greetings, --Tim
You're absolutely correct that the general rule is to lead a composite index with the highest cardinality index columns for fastest selectivity. Indices and all physical design are based on usage. In this case of unique indices supporting primary keys in a hierarchy, it depends. For selection of small sets of arbitrary rows, your arrangement is best. For hierarchy based queries, such as "for grandparent of foo, and parent of bar, give average age of sons" - the hierarchy based index is often more efficient. Surrogate keys have a role, and can improve performance, but also carry an enormous penalty of intentionally obfuscating logical keys and data semantics, and almost always lead to data errors not being caught because they obscure irrational relationships. I hate them, but use them frequently in high transaction rate operational systems where there is much functional validation outside the dbms (and the apps behave therefore like object databases and surrogate keys are network database pointers) and in data warehousing (where downstream data cannot be corrected anyway). /Aaron ----- Original Message ----- From: "Leeuw van der, Tim" <tim.leeuwvander@nl.unisys.com> To: <pgsql-performance@postgresql.org> Sent: Sunday, April 04, 2004 5:06 PM Subject: Re: [PERFORM] single index on more than two coulumns a bad thing? Hi Aaron, > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of > Aaron Werman > Sent: vrijdag 2 april 2004 13:57 > > > another thing that I have all over the place is a hierarchy: > index on grandfather_table(grandfather) > index on father_table(grandfather, father) > index on son_table(grandfather, father, son) > It depends on your data-distribution, but I find that in almost all cases it's beneficial to have your indexes the other way round in such cases: index on grandfather_table(grandfather) index on father_table(father, grandfather) index on son_table(son, father, grandfather) That usually gives a less common, more selective value at the start of the index, making the initial selection in the index smaller. And AFAIK I don't have to rewrite my queries for that; the planner doesn't care about the order of expressions in the query that are on the same level. That said, I tend to use 'surrogate keys'; keys generated from sequences or auto-number columns for my tables. It makes the tables less readable, but the indexes remain smaller. Greetings, --Tim ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster