Home > mailing lists

Re: Performance of count(*) - Mailing list pgsql-performance

From	Craig A. James
Subject	Re: Performance of count(*)
Date	March 22, 2007 14:22:01
Msg-id	4602BB19.9050306@modgraph-usa.com Whole thread Raw
In response to	Re: Performance of count(*) (Tino Wildenhain <tino@wildenhain.de>)
Responses	Re: Performance of count() Re: Performance of count()
List	pgsql-performance

Tree view

Tino Wildenhain wrote:
> Craig A. James schrieb:
> ...
>> In our case (for a variety of reasons, but this one is critical), we
>> actually can't use Postgres indexing at all -- we wrote an entirely
>> separate indexing system for our data...
>
> ...There is no need to store or
> maintain this information along with postgres when you can store
> and maintain it directly in postgres as well.

Whether we store our data inside or outside Postgres misses the point (in fact, most of our data is stored IN
Postgres). It's the code that actually performs the index operation that has to be external to Postgres. 

> On top of that, postgres has a very flexible and extensible index
> system.

You guys can correct me if I'm wrong, but the key feature that's missing from Postgres's flexible indexing is the
abilityto maintain state across queries.  Something like this: 

  select a, b, my_index_state() from foo where ...
    offset 100 limit 10 using my_index(prev_my_index_state);

The my_index_state() function would issue something like a "cookie", an opaque text or binary object that would record
informationabout how it got from row 1 through row 99.  When you issue the query above, it could start looking for row
100WITHOUT reexamining rows 1-99. 

This could be tricky in a OLTP environment, where the "cookie" could be invalidated by changes to the database.  But in
warehouseread-mostly or read-only environments, it could yield vastly improved performance for database web
applications.

If I'm not mistaken, Postgres (nor Oracle, MySQL or other RDBMS) can't do this.  I would love to be corrected.

The problem is that relational databases were invented before the web and its stateless applications.  In the "good old
days",you could connect to a database and work for hours, and in that environment cursors and such work well -- the
RDBMSmaintains the internal state of the indexing system.  But in a web environment, state information is very
difficultto maintain.  There are all sorts of systems that try (Enterprise Java Beans, for example), but they're very
complex.

Craig

pgsql-performance by date:

From: Michael Stone
Date: 22 March 2007, 14:11:01
Subject: Re: Parallel Vacuum

From: Tino Wildenhain
Date: 22 March 2007, 14:27:41
Subject: Re: Performance of count(*)

Re: Performance of count(*) - Mailing list pgsql-performance

Previous

Next