Thread: index on more-than-one columns?

index on more-than-one columns?

From
Engard Ferenc
Date:
Hi all,

Is there any advantage to use index on more columns against to use
more separate indexes, e.g. creating index on t1 (a,b) instead of
index on t1 (a) plus index on t1 (b)? Let's suppose that the queries
include all sorts of combinations.

Thx:
Circum
__  @
/  \    _   _                                           Engard Ferenc
l    | ( \ /  | | (\/)                      mailto:s-fery@kkt.sote.hu
\__/ | |   \_ \_/ I  I                    http://pons.sote.hu/~s-fery




Re: [SQL] index on more-than-one columns?

From
Tom Lane
Date:
Engard Ferenc <fery@pons.sote.hu> writes:
> Is there any advantage to use index on more columns against to use
> more separate indexes, e.g. creating index on t1 (a,b) instead of
> index on t1 (a) plus index on t1 (b)?

If those are your two choices then the two separate indexes are
definitely more flexible, since the index on (a,b) can't be used
as a standalone index on b for queries that only involve b.

In general I'd say that a multicolumn index is a pretty specialized
beast, and probably only worth its keep if you frequently do sorts with
that order and/or multikey joins against another table with a similar
index.

For example: let's say you are trying to optimize for queries like
SELECT ... FROM a, b WHERE a.f1 = b.f1 AND a.f2 = b.f2;

If you have indexes on (f1,f2) for both tables then a two-column
mergejoin can be used --- that is, we scan both tables in the order
of the indexes.  (BTW, 6.5 only manages to do this if both columns are
the same data type, but that will be fixed in 6.6.)  Although that
looks cool, it's probably not a huge win compared to the one-column
merge you could do with indexes on either f1 or f2.  It'd only be a
big win if neither f1 or f2 have a lot of distinct values by themselves
but f1 * f2 does.

Also, although an index on (f1,f2) can be used as an index on f1
alone, it's going to be bigger and therefore slower to access than
the single-column index.  So, just adding on columns that you "might
need some of the time" isn't a win.

Finally, remember that every index costs you time whenever the table
is modified, because the index has to be updated too.  So it's not
a win to make a whole lot of indexes without specific reasons for
each one, unless the table is queried a *lot* more often than it
is modified.

I'd probably suggest making one or two indexes on the individual
columns that are used most frequently in WHERE conditions, and not
going beyond that except when you have a specific frequently-used
query that you need to optimize.
        regards, tom lane


Re: [SQL] index on more-than-one columns?

From
Oleg Bartunov
Date:
Tom,

does this right for all kind of indices ?
For example, it's probably faster to do find for 2 indices separately
in case of log(N) comlexity for btree-indices.
Oleg

On Tue, 17 Aug 1999, Tom Lane wrote:

> Date: Tue, 17 Aug 1999 17:04:29 -0400
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: s-fery@kkt.sote.hu
> Cc: PostgreSQL list <pgsql-sql@postgreSQL.org>
> Subject: Re: [SQL] index on more-than-one columns? 
> 
> Engard Ferenc <fery@pons.sote.hu> writes:
> > Is there any advantage to use index on more columns against to use
> > more separate indexes, e.g. creating index on t1 (a,b) instead of
> > index on t1 (a) plus index on t1 (b)?
> 
> If those are your two choices then the two separate indexes are
> definitely more flexible, since the index on (a,b) can't be used
> as a standalone index on b for queries that only involve b.
> 
> In general I'd say that a multicolumn index is a pretty specialized
> beast, and probably only worth its keep if you frequently do sorts with
> that order and/or multikey joins against another table with a similar
> index.
> 
> For example: let's say you are trying to optimize for queries like
> 
>     SELECT ... FROM a, b WHERE a.f1 = b.f1 AND a.f2 = b.f2;
> 
> If you have indexes on (f1,f2) for both tables then a two-column
> mergejoin can be used --- that is, we scan both tables in the order
> of the indexes.  (BTW, 6.5 only manages to do this if both columns are
> the same data type, but that will be fixed in 6.6.)  Although that
> looks cool, it's probably not a huge win compared to the one-column
> merge you could do with indexes on either f1 or f2.  It'd only be a
> big win if neither f1 or f2 have a lot of distinct values by themselves
> but f1 * f2 does.
> 
> Also, although an index on (f1,f2) can be used as an index on f1
> alone, it's going to be bigger and therefore slower to access than
> the single-column index.  So, just adding on columns that you "might
> need some of the time" isn't a win.
> 
> Finally, remember that every index costs you time whenever the table
> is modified, because the index has to be updated too.  So it's not
> a win to make a whole lot of indexes without specific reasons for
> each one, unless the table is queried a *lot* more often than it
> is modified.
> 
> I'd probably suggest making one or two indexes on the individual
> columns that are used most frequently in WHERE conditions, and not
> going beyond that except when you have a specific frequently-used
> query that you need to optimize.
> 
>             regards, tom lane
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Counting the number of distinct rows returned

From
Drew Whittle
Date:
How can I return the number of records (ie a count(user_id)) when using a
statement like this:

select distinct user_id from sales_by_region;

I have tried things like:

select count(distinct user_id) from sales_by_region (parse error)
select distinct count(user_id) from sales_by_region (useless because count
is performed before the distinct)

HELP!

Thanks,

Drew




Re: [SQL] Counting the number of distinct rows returned

From
"tjk@tksoft.com"
Date:
Drew,

Not really the answer you were asking, but in general, your
program will probably be using some interface other than
psql, i.e. libpq or Pg.pm, or whatever.

E.g. in C, if your PgQuery struct is pg1, then you
would do your query and retrieve the number of records
found with "pg1->PQntuples(pg1->reply)"




Troy


>
> How can I return the number of records (ie a count(user_id)) when using a
> statement like this:
>
> select distinct user_id from sales_by_region;
>
> I have tried things like:
>
> select count(distinct user_id) from sales_by_region (parse error)
> select distinct count(user_id) from sales_by_region (useless because count
> is performed before the distinct)
>
> HELP!
>
> Thanks,
>
> Drew
>
>
>
> ************
> Check out "PostgreSQL Wearables" @ http://www.pgsql.com
>
>

Re: [SQL] Counting the number of distinct rows returned

From
Tom Lane
Date:
Drew Whittle <drew@albatross.co.nz> writes:
> How can I return the number of records (ie a count(user_id)) when using a
> statement like this:

> select distinct user_id from sales_by_region;

count() with a GROUP BY will do what you want, I think --- see other
thread with Doug Younger for an example.

> I have tried things like:
> select count(distinct user_id) from sales_by_region (parse error)

Unfortunately we don't handle count(distinct ...) --- we should, since
it's in SQL92.
        regards, tom lane