Thread: index on more-than-one columns?
Hi all, Is there any advantage to use index on more columns against to use more separate indexes, e.g. creating index on t1 (a,b) instead of index on t1 (a) plus index on t1 (b)? Let's suppose that the queries include all sorts of combinations. Thx: Circum __ @ / \ _ _ Engard Ferenc l | ( \ / | | (\/) mailto:s-fery@kkt.sote.hu \__/ | | \_ \_/ I I http://pons.sote.hu/~s-fery
Engard Ferenc <fery@pons.sote.hu> writes: > Is there any advantage to use index on more columns against to use > more separate indexes, e.g. creating index on t1 (a,b) instead of > index on t1 (a) plus index on t1 (b)? If those are your two choices then the two separate indexes are definitely more flexible, since the index on (a,b) can't be used as a standalone index on b for queries that only involve b. In general I'd say that a multicolumn index is a pretty specialized beast, and probably only worth its keep if you frequently do sorts with that order and/or multikey joins against another table with a similar index. For example: let's say you are trying to optimize for queries like SELECT ... FROM a, b WHERE a.f1 = b.f1 AND a.f2 = b.f2; If you have indexes on (f1,f2) for both tables then a two-column mergejoin can be used --- that is, we scan both tables in the order of the indexes. (BTW, 6.5 only manages to do this if both columns are the same data type, but that will be fixed in 6.6.) Although that looks cool, it's probably not a huge win compared to the one-column merge you could do with indexes on either f1 or f2. It'd only be a big win if neither f1 or f2 have a lot of distinct values by themselves but f1 * f2 does. Also, although an index on (f1,f2) can be used as an index on f1 alone, it's going to be bigger and therefore slower to access than the single-column index. So, just adding on columns that you "might need some of the time" isn't a win. Finally, remember that every index costs you time whenever the table is modified, because the index has to be updated too. So it's not a win to make a whole lot of indexes without specific reasons for each one, unless the table is queried a *lot* more often than it is modified. I'd probably suggest making one or two indexes on the individual columns that are used most frequently in WHERE conditions, and not going beyond that except when you have a specific frequently-used query that you need to optimize. regards, tom lane
Tom, does this right for all kind of indices ? For example, it's probably faster to do find for 2 indices separately in case of log(N) comlexity for btree-indices. Oleg On Tue, 17 Aug 1999, Tom Lane wrote: > Date: Tue, 17 Aug 1999 17:04:29 -0400 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: s-fery@kkt.sote.hu > Cc: PostgreSQL list <pgsql-sql@postgreSQL.org> > Subject: Re: [SQL] index on more-than-one columns? > > Engard Ferenc <fery@pons.sote.hu> writes: > > Is there any advantage to use index on more columns against to use > > more separate indexes, e.g. creating index on t1 (a,b) instead of > > index on t1 (a) plus index on t1 (b)? > > If those are your two choices then the two separate indexes are > definitely more flexible, since the index on (a,b) can't be used > as a standalone index on b for queries that only involve b. > > In general I'd say that a multicolumn index is a pretty specialized > beast, and probably only worth its keep if you frequently do sorts with > that order and/or multikey joins against another table with a similar > index. > > For example: let's say you are trying to optimize for queries like > > SELECT ... FROM a, b WHERE a.f1 = b.f1 AND a.f2 = b.f2; > > If you have indexes on (f1,f2) for both tables then a two-column > mergejoin can be used --- that is, we scan both tables in the order > of the indexes. (BTW, 6.5 only manages to do this if both columns are > the same data type, but that will be fixed in 6.6.) Although that > looks cool, it's probably not a huge win compared to the one-column > merge you could do with indexes on either f1 or f2. It'd only be a > big win if neither f1 or f2 have a lot of distinct values by themselves > but f1 * f2 does. > > Also, although an index on (f1,f2) can be used as an index on f1 > alone, it's going to be bigger and therefore slower to access than > the single-column index. So, just adding on columns that you "might > need some of the time" isn't a win. > > Finally, remember that every index costs you time whenever the table > is modified, because the index has to be updated too. So it's not > a win to make a whole lot of indexes without specific reasons for > each one, unless the table is queried a *lot* more often than it > is modified. > > I'd probably suggest making one or two indexes on the individual > columns that are used most frequently in WHERE conditions, and not > going beyond that except when you have a specific frequently-used > query that you need to optimize. > > regards, tom lane > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
How can I return the number of records (ie a count(user_id)) when using a statement like this: select distinct user_id from sales_by_region; I have tried things like: select count(distinct user_id) from sales_by_region (parse error) select distinct count(user_id) from sales_by_region (useless because count is performed before the distinct) HELP! Thanks, Drew
Drew, Not really the answer you were asking, but in general, your program will probably be using some interface other than psql, i.e. libpq or Pg.pm, or whatever. E.g. in C, if your PgQuery struct is pg1, then you would do your query and retrieve the number of records found with "pg1->PQntuples(pg1->reply)" Troy > > How can I return the number of records (ie a count(user_id)) when using a > statement like this: > > select distinct user_id from sales_by_region; > > I have tried things like: > > select count(distinct user_id) from sales_by_region (parse error) > select distinct count(user_id) from sales_by_region (useless because count > is performed before the distinct) > > HELP! > > Thanks, > > Drew > > > > ************ > Check out "PostgreSQL Wearables" @ http://www.pgsql.com > >
Drew Whittle <drew@albatross.co.nz> writes: > How can I return the number of records (ie a count(user_id)) when using a > statement like this: > select distinct user_id from sales_by_region; count() with a GROUP BY will do what you want, I think --- see other thread with Doug Younger for an example. > I have tried things like: > select count(distinct user_id) from sales_by_region (parse error) Unfortunately we don't handle count(distinct ...) --- we should, since it's in SQL92. regards, tom lane