Re: Select max(foo) and select count(*) optimization - Mailing list pgsql-performance

From Christopher Browne
Subject Re: Select max(foo) and select count(*) optimization
Date
Msg-id m31xqef8go.fsf@wolfe.cbbrowne.com
Whole thread Raw
In response to Select max(foo) and select count(*) optimization  (John Siracusa <siracusa@mindspring.com>)
Responses Re: Select max(foo) and select count(*) optimization  (Paul Tuckfield <paul@tuckfield.com>)
List pgsql-performance
Oops! siracusa@mindspring.com (John Siracusa) was seen spray-painting on a wall:
> Speaking of special cases (well, I was on the admin list) there are two
> kinds that would really benefit from some attention.
>
> 1. The query "select max(foo) from bar" where the column foo has an
> index.  Aren't indexes ordered?  If not, an "ordered index" would be
> useful in this situation so that this query, rather than doing a
> sequential scan of the whole table, would just "ask the index" for
> the max value and return nearly instantly.
>
> 2. The query "select count(*) from bar" Surely the total number of
> rows in a table is kept somewhere convenient.  If not, it would be
> nice if it could be :) Again, rather than doing a sequential scan of
> the entire table, this type of query could return instantly.
>
> I believe MySQL does both of these optimizations (which are probably
> a lot easier in that product, given its data storage system).  These
> were the first areas where I noticed a big performance difference
> between MySQL and Postgres.
>
> Especially with very large tables, hearing the disks grind as
> Postgres scans every single row in order to determine the number of
> rows in a table or the max value of a column (even a primary key
> created from a sequence) is pretty painful.  If the implementation
> is not too horrendous, this is an area where an orders-of-magnitude
> performance increase can be had.

These are both VERY frequently asked questions.

In the case of question #1, the optimization you suggest could be
accomplished via some Small Matter Of Programming.  None of the people
that have wanted the optimization have, however, offered to actually
DO the programming.

In the case of #2, the answer is "surely NOT."  In MVCC databases,
that information CANNOT be stored anywhere convenient because queries
requested by transactions started at different points in time must get
different answers.

I think we need to add these questions and their answers to the FAQ so
that the answer can be "See FAQ Item #17" rather than people having to
gratuitously explain it over and over and over again.
--
(reverse (concatenate 'string "moc.enworbbc" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/finances.html
Rules of  the Evil Overlord #127.  "Prison guards will  have their own
cantina featuring  a wide  variety of tasty  treats that  will deliver
snacks to the  guards while on duty. The guards  will also be informed
that  accepting food or  drink from  any other  source will  result in
execution." <http://www.eviloverlord.com/>

pgsql-performance by date:

Previous
From: Christopher Browne
Date:
Subject: Re: Use my (date) index, darn it!
Next
From: Stephan Szabo
Date:
Subject: Re: deferred foreign keys