Thread: btree index and max()

btree index and max()

From
leonbloy@sinectis.com.ar
Date:
This issue applies to postgresql 6.5.3 & 7.0

Say I have a table 'FACTURAS' (~400k rows), with
a 'RID' field, which is indexed with an BTREE index.

If I want to get the max(rid), the index is not
used:

=> explain select max(rid) from facturas;
NOTICE:  QUERY PLAN:

Aggregate  (cost=21139.66 rows=342414 width=4)
  ->  Seq Scan on facturas  (cost=21139.66 rows=342414 width=4)

(yes, I run 'vacuum analyze').

I understand that the query planner cannot be so clever
to grasp that this particular function (max or min)
might be evaluated by just travelling the BTREE index.
Am I correct?

If I modify the query with a dummy restriction:

=> explain select max(rid) from facturas where rid>0;
NOTICE:  QUERY PLAN:

Aggregate  (cost=9582.90 rows=114139 width=4)
  ->  Index Scan using facturas_rid_key on facturas  (cost=9582.90 rows=114139
width=4)

... the index is used, but only to get the restricted set of rows,
not to evaluate the maximum. Hence, the performance the same.

Cheers

Hernan Gonzalez
Buenos Aires, Argentina

Re: btree index and max()

From
Ed Loehr
Date:
leonbloy@sinectis.com.ar wrote:
>
> => explain select max(rid) from facturas;
> NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=21139.66 rows=342414 width=4)
>   ->  Seq Scan on facturas  (cost=21139.66 rows=342414 width=4)
>
> I understand that the query planner cannot be so clever
> to grasp that this particular function (max or min)
> might be evaluated by just travelling the BTREE index.
> Am I correct?

I believe you are unfortunately correct.  :(

Regards,
Ed Loehr

Re: btree index and max()

From
Bruce Momjian
Date:
> leonbloy@sinectis.com.ar wrote:
> >
> > => explain select max(rid) from facturas;
> > NOTICE:  QUERY PLAN:
> >
> > Aggregate  (cost=21139.66 rows=342414 width=4)
> >   ->  Seq Scan on facturas  (cost=21139.66 rows=342414 width=4)
> >
> > I understand that the query planner cannot be so clever
> > to grasp that this particular function (max or min)
> > might be evaluated by just travelling the BTREE index.
> > Am I correct?
>
> I believe you are unfortunately correct.  :(

That would be a good optimization.  Let me add it to the TODO list.
Much better than trying to keep the max stored somewhere.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: btree index and max()

From
Ed Loehr
Date:
Bruce Momjian wrote:
>
> > leonbloy@sinectis.com.ar wrote:
> > >
> > > => explain select max(rid) from facturas;
> > > NOTICE:  QUERY PLAN:
> > >
> > > Aggregate  (cost=21139.66 rows=342414 width=4)
> > >   ->  Seq Scan on facturas  (cost=21139.66 rows=342414 width=4)
> > >
> > > I understand that the query planner cannot be so clever
> > > to grasp that this particular function (max or min)
> > > might be evaluated by just travelling the BTREE index.
> > > Am I correct?
> >
> > I believe you are unfortunately correct.  :(
>
> That would be a good optimization.  Let me add it to the TODO list.
> Much better than trying to keep the max stored somewhere.

There was a lot of discussion about this on the hackers list recently,
but I don't recall the outcome.

Regards,
Ed Loehr

Re: btree index and max()

From
Tom Lane
Date:
leonbloy@sinectis.com.ar writes:
> I understand that the query planner cannot be so clever
> to grasp that this particular function (max or min)
> might be evaluated by just travelling the BTREE index.
> Am I correct?

You are correct --- the system has no idea that there is any
connection between the MIN and MAX aggregates and the sort order
of any particular index.  (In fact, the system knows nothing
about the specific semantics of any aggregate function; they're
all black boxes, which is a very good thing for most purposes.)

However, if you think of your problem as "how can I use the sort order
of this index to get the min/max?", a semi-obvious answer pops out:

SELECT foo FROM table ORDER BY foo LIMIT 1;        -- get the min
SELECT foo FROM table ORDER BY foo DESC LIMIT 1;    -- get the max

and the 7.0 optimizer does indeed know how to use an index to handle
these queries.

Perhaps someday we will try to convert simple uses of MIN/MAX into
queries like these, but for now, you gotta do it by hand.

            regards, tom lane

Re: btree index and max()

From
leonbloy@sinectis.com.ar
Date:
>  leonbloy@sinectis.com.ar writes:
>  > I understand that the query planner cannot be so clever
>  > to grasp that this particular function (max or min)
>  > might be evaluated by just travelling the BTREE index.
>  > Am I correct?
>
>  You are correct --- the system has no idea that there is any
>  connection between the MIN and MAX aggregates and the sort order
>  of any particular index.  (In fact, the system knows nothing
>  about the specific semantics of any aggregate function; they're
>  all black boxes, which is a very good thing for most purposes.)
>

That's what I thought...

>  However, if you think of your problem as "how can I use the sort order
>  of this index to get the min/max?", a semi-obvious answer pops out:
>
>  SELECT foo FROM table ORDER BY foo LIMIT 1;        -- get the min
>  SELECT foo FROM table ORDER BY foo DESC LIMIT 1;    -- get the max
>
>  and the 7.0 optimizer does indeed know how to use an index to handle
>  these queries.
>

Good! That had not occurred to me.

Though one should :
1) be careful with NULL values (excluding them from the select)
2) understand that (of course!) these queries
are VERY inefficient to compute the max/min if
the btree index is not defined.

By the way, I didn't find many comments about the pros and
cons of btree/hash indexes in the docs, nor in Bruce's book...

Regards

Hernan Gonzalez
Buenos Aires, Argentina

Re: btree index and max()

From
efinley@efinley.com (Elliot Finley)
Date:
On Thu, 1 Jun 2000 19:10:48 -0300, leonbloy@sinectis.com.ar wrote:

>>  leonbloy@sinectis.com.ar writes:
>>  > I understand that the query planner cannot be so clever
>>  > to grasp that this particular function (max or min)
>>  > might be evaluated by just travelling the BTREE index.
>>  > Am I correct?
>>
>>  You are correct --- the system has no idea that there is any
>>  connection between the MIN and MAX aggregates and the sort order
>>  of any particular index.  (In fact, the system knows nothing
>>  about the specific semantics of any aggregate function; they're
>>  all black boxes, which is a very good thing for most purposes.)
>>
>
>That's what I thought...
>
>>  However, if you think of your problem as "how can I use the sort order
>>  of this index to get the min/max?", a semi-obvious answer pops out:
>>
>>  SELECT foo FROM table ORDER BY foo LIMIT 1;        -- get the min
>>  SELECT foo FROM table ORDER BY foo DESC LIMIT 1;    -- get the max
>>
>>  and the 7.0 optimizer does indeed know how to use an index to handle
>>  these queries.
>>
>
>Good! That had not occurred to me.
>
>Though one should :
>1) be careful with NULL values (excluding them from the select)
>2) understand that (of course!) these queries
>are VERY inefficient to compute the max/min if
>the btree index is not defined.
>
>By the way, I didn't find many comments about the pros and
>cons of btree/hash indexes in the docs, nor in Bruce's book...

If I remember my data-structures (from way back when) correctly then:

hash indexes are only good for very fast single row lookups.

isam indexes are good for range lookups, but the implementations that
I've seen of isam indexes doesn't allow for dynamic index expanding.

btree is good for both.  btree won't be quite as fast as a hash for a
single row lookup, but still very fast.  btree won't (if I remember
correctly) be quite as fast as an isam for a range lookup, but still
very fast.  Also, btree allows for dynamic index expansion.
--
Elliot (efinley@efinley.com) Weird Science!

Re: btree index and max()

From
Bruce Momjian
Date:
> >By the way, I didn't find many comments about the pros and
> >cons of btree/hash indexes in the docs, nor in Bruce's book...
>
> If I remember my data-structures (from way back when) correctly then:
>
> hash indexes are only good for very fast single row lookups.
>
> isam indexes are good for range lookups, but the implementations that
> I've seen of isam indexes doesn't allow for dynamic index expanding.
>
> btree is good for both.  btree won't be quite as fast as a hash for a
> single row lookup, but still very fast.  btree won't (if I remember
> correctly) be quite as fast as an isam for a range lookup, but still
> very fast.  Also, btree allows for dynamic index expansion.

Sorry to be replying to late.

First, I did not mention btree vs. hash in my book because we have not
seen any empirical evidence that hash is faster than btree in
PostgreSQL.  Also, I wanted simplicity, so I did not get into the issue.

As far as ISAM, yes, I do miss its absense.  The best we have now is
btree combined with the CLUSTER command.  Since ISAM is not
self-optimizing, having to run CLUSTER on a btree is similar to having
to recreate the ISAM every so often.  Not sure what gain we would get by
having a native ISAM vs our current btree/CLUSTER capability.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026