Thread: Two Index Questions

Two Index Questions

From

Josh Berkus

Date:

19 July 2002, 15:34:09

Folks (esp. Tom, Stephan, and Bruce):

I have two questions for my "Adventures in PostgreSQL" article reasearch:

Multi-Column Indexes and GROUP BY:
Q:  If you group a table by multiple colums, e.g.SELECT t1.A, t1.B, t1.C, MAX(t1.G)FROM t1GROUP BY t1.A, t1.B, t1.C
Thenwould a multi-column index on A, B, C be faster than seperate indexes  
on A, B and C?   I've run a few tests, but I don't have enough data in the
seperate tables to really get a feel for the difference.

DESC Alpha Sort Indexes:
Q: In PostgreSQL 7.0, there was an issue that indexes where never consulted
for DESC alpha sorts.   Has this been resolved?  If so, does one need to
create any special indexes to take advantage of indexes for DESC sorts?

Thanks for your help!

--
-Josh BerkusAglio Database SolutionsSan Francisco

Re: Two Index Questions

From

Tom Lane

Date:

19 July 2002, 15:44:40

Josh Berkus <josh@agliodbs.com> writes:
> Q:  If you group a table by multiple colums, e.g.
>     SELECT t1.A, t1.B, t1.C, MAX(t1.G)
>     FROM t1
>     GROUP BY t1.A, t1.B, t1.C
>   Then would a multi-column index on A, B, C be faster than seperate indexes 
> on A, B and C?

An indexscan can only use one index, so separate indexes would be
completely worthless for this query.  An index on A,B,C is potentially
useful, but in most cases I think the planner will prefer an explicit
sort anyway if there's no WHERE clause.

> Q: In PostgreSQL 7.0, there was an issue that indexes where never consulted 
> for DESC alpha sorts.   Has this been resolved?  If so, does one need to 
> create any special indexes to take advantage of indexes for DESC sorts?

Yes; no.

regression=# create table foo (f1 text);
CREATE TABLE
regression=# create index fooi on foo(f1);
CREATE INDEX
regression=# explain select * from foo order by f1;                            QUERY PLAN
---------------------------------------------------------------------Index Scan using fooi on foo  (cost=0.00..52.00
rows=1000width=32)
 
(1 row)

regression=# explain select * from foo order by f1 desc;                                 QUERY PLAN
------------------------------------------------------------------------------Index Scan Backward using fooi on foo
(cost=0.00..52.00rows=1000 width=32)
 
(1 row)

        regards, tom lane

Re: Two Index Questions

From

Bruno Wolff III

Date:

19 July 2002, 15:54:11

On Fri, Jul 19, 2002 at 09:33:59 -0700, Josh Berkus <josh@agliodbs.com> wrote:
> Folks (esp. Tom, Stephan, and Bruce):
> 
> I have two questions for my "Adventures in PostgreSQL" article reasearch:
> 
> Multi-Column Indexes and GROUP BY:
> Q:  If you group a table by multiple colums, e.g.
>     SELECT t1.A, t1.B, t1.C, MAX(t1.G)
>     FROM t1
>     GROUP BY t1.A, t1.B, t1.C
>   Then would a multi-column index on A, B, C be faster than seperate indexes 
> on A, B and C?   I've run a few tests, but I don't have enough data in the 
> seperate tables to really get a feel for the difference.

If there are lots of G entries for fixed As, Bs and Cs, then another option
would be to have an index on all 4 tables and use a subquery with a limit 1
clause to get the row with the max G value for any A, B and C.

Re: Two Index Questions

From

Josh Berkus

Date:

19 July 2002, 16:28:40

Bruno,

> > I have two questions for my "Adventures in PostgreSQL" article reasearch:
> >
> > Multi-Column Indexes and GROUP BY:
> > Q:  If you group a table by multiple colums, e.g.
> >     SELECT t1.A, t1.B, t1.C, MAX(t1.G)
> >     FROM t1
> >     GROUP BY t1.A, t1.B, t1.C
> >   Then would a multi-column index on A, B, C be faster than seperate
indexes
> > on A, B and C?   I've run a few tests, but I don't have enough data in the
> > seperate tables to really get a feel for the difference.
>
> If there are lots of G entries for fixed As, Bs and Cs, then another option
> would be to have an index on all 4 tables and use a subquery with a limit 1
> clause to get the row with the max G value for any A, B and C.

We're talking about only one table, here.  Not four.

I generally try to avoid using LIMIT, as it is a non-SQL92 extension.  Also,
LIMIT in subqueries might someday be disallowed as it interferes with the
fundmentally unordered nature of subqueries.

--
-Josh BerkusAglio Database SolutionsSan Francisco

Re: Two Index Questions

From

Bruno Wolff III

Date:

19 July 2002, 16:51:09

On Fri, Jul 19, 2002 at 10:28:18 -0700, Josh Berkus <josh@agliodbs.com> wrote:
> 
> Bruno,
> 
> > > I have two questions for my "Adventures in PostgreSQL" article reasearch:
> > > 
> > > Multi-Column Indexes and GROUP BY:
> > > Q:  If you group a table by multiple colums, e.g.
> > >     SELECT t1.A, t1.B, t1.C, MAX(t1.G)
> > >     FROM t1
> > >     GROUP BY t1.A, t1.B, t1.C
> > >   Then would a multi-column index on A, B, C be faster than seperate 
> indexes 
> > > on A, B and C?   I've run a few tests, but I don't have enough data in the 
> > > seperate tables to really get a feel for the difference.
> > 
> > If there are lots of G entries for fixed As, Bs and Cs, then another option
> > would be to have an index on all 4 tables and use a subquery with a limit 1
> > clause to get the row with the max G value for any A, B and C.
> 
> We're talking about only one table, here.  Not four.

Typo. I meant to say columns. The issue is that max doesn't use an index,
but if there are a lot of different values of G for a given A, B and C,
it may be better to use an index then to search through the applicable
rows to find the maximum.

> I generally try to avoid using LIMIT, as it is a non-SQL92 extension.  Also, 
> LIMIT in subqueries might someday be disallowed as it interferes with the 
> fundmentally unordered nature of subqueries.

I can see not using it because it is nonstandard. I don't think it would be
disabled for the reason you have given. The documentation makes it pretty
clear that you need to use order by if you want a particular tuple.

Re: Two Index Questions

From

Tom Lane

Date:

19 July 2002, 16:54:01

Josh Berkus <josh@agliodbs.com> writes:
> I generally try to avoid using LIMIT, as it is a non-SQL92 extension.

Fair enough ...

> Also, 
> LIMIT in subqueries might someday be disallowed as it interferes with the 
> fundmentally unordered nature of subqueries.

I don't think so; we went out of our way to make it work, so we're
unlikely to discard the feature.  It's true that it does prevent some
optimizations that might otherwise occur, but not using it is a
sufficient answer to that.
        regards, tom lane

Re: Two Index Questions

From

Josh Berkus

Date:

19 July 2002, 17:12:47

Bruno, Tom,

> Typo. I meant to say columns. The issue is that max doesn't use an index,
> but if there are a lot of different values of G for a given A, B and C,
> it may be better to use an index then to search through the applicable
> rows to find the maximum.

That's odd ... you're correct.   Tom, why doesn't MAX() use an index?   I
understand why indexes are generally useless for SUM(), AVG, and COUNT, but
it seems that MAX() and MIN() should *always* use an index.

--
-Josh BerkusAglio Database SolutionsSan Francisco

Re: Two Index Questions

From

Bruce Momjian

Date:

19 July 2002, 17:28:13

Josh Berkus wrote:
> Bruno, Tom,
> 
> > Typo. I meant to say columns. The issue is that max doesn't use an index,
> > but if there are a lot of different values of G for a given A, B and C,
> > it may be better to use an index then to search through the applicable
> > rows to find the maximum.
> 
> That's odd ... you're correct.   Tom, why doesn't MAX() use an index?   I 
> understand why indexes are generally useless for SUM(), AVG, and COUNT, but 
> it seems that MAX() and MIN() should *always* use an index.

Index access methods don't know about aggregates, and our type system
makes such linkage difficult.  The FAQ does have:    However, <SMALL>LIMIT</SMALL> combined with <SMALL>ORDER
BY</SMALL>   often will use an index because only a small portion of the table    is returned.  In fact, though MAX()
andMIN() don't use indexes,    it is possible to retrieve such values using an index with ORDER BY    and LIMIT:<PRE>
SELECT col    FROM tab    ORDER BY col [ DESC ]    LIMIT 1</PRE>

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Two Index Questions

From

Tom Lane

Date:

19 July 2002, 17:35:48

Josh Berkus <josh@agliodbs.com> writes:
> Tom, why doesn't MAX() use an index?

Don't tell me you haven't seen that discussed many times before :-(

Because Postgres uses an extensible set of aggregate functions, we treat
all aggregates as "black boxes": the implementation strategy is always
to pass all the specified values through the aggregate.

Special-casing MIN and MAX would be nice from the point of view of
performance, but there's this little problem that our sets of datatypes
and index types are also extensible.  We'd need to devise some
non-hard-wired way of identifying which aggregate functions are related
to the sort orderings of what indexes.

Finally, the transformation into an optimized form is just not that easy
to do automatically in the general case --- it's easy enough if you
write "SELECT max(col) FROM tab", but how about "SELECT max(col),
min(col) FROM tab"?  What if there are WHERE clauses (with or without
constraints on col)?  And the GROUP BY case that we started this
discussion with is *very* nontrivial.

This issue is on the TODO list, but given that the ORDER BY/LIMIT
workaround is available (and offers more functionality than MAX/MIN
anyway), I don't think it's a very high-priority problem.  We've got
plenty of TODO items for which there is no good workaround...
        regards, tom lane

Re: Two Index Questions

From

Josh Berkus

Date:

19 July 2002, 17:58:20

Tom, Bruce, Bruno

> Don't tell me you haven't seen that discussed many times before :-(

Actually, I'll have to admint to not having read the relevant discussions.
Honestly, on this list I usually only answer the questions I asked or the
questions I'm going to respond to.   Short-sighted, but I can't keep up
otherwise.

The explanation certainly makes sense.  It just goes against the grain for me
to use a non-SQL standard structure in major parts of my database schema.
But I'll learn to live with it.

--
-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________                                       Josh Berkus  Complete
informationtechnology     josh@agliodbs.com   and data management solutions     (415) 565-7293  for law firms, small
businesses     fax 621-2533   and non-profit organizations.     San Francisco