Thread: count(*) slow on large tables

count(*) slow on large tables

From

Dror Matalon

Date:

02 October 2003, 16:16:52

Hi,

I have a somewhat large table, 3 million rows, 1 Gig on disk,  and growing. Doing a
count(*) takes around 40 seconds.

Looks like the count(*) fetches the table from disk and goes through it.
Made me wonder, why the optimizer doesn't just choose the smallest index
which in my case is around 60 Megs and goes through it, which it could
do in a fraction of the time.

Dror


--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: count(*) slow on large tables

From

Bruno Wolff III

Date:

02 October 2003, 16:32:36

On Thu, Oct 02, 2003 at 12:15:47 -0700,
  Dror Matalon <dror@zapatec.com> wrote:
> Hi,
>
> I have a somewhat large table, 3 million rows, 1 Gig on disk,  and growing. Doing a
> count(*) takes around 40 seconds.
>
> Looks like the count(*) fetches the table from disk and goes through it.
> Made me wonder, why the optimizer doesn't just choose the smallest index
> which in my case is around 60 Megs and goes through it, which it could
> do in a fraction of the time.

Because it can't tell from the index if a tuple is visible to the current
transaction and would still have to hit the table to check this. So that
performance would be a lot worse instead of better.

Re: count(*) slow on large tables

From

Tomasz Myrta

Date:

02 October 2003, 16:39:40

> Hi,
>
> I have a somewhat large table, 3 million rows, 1 Gig on disk,  and growing. Doing a
> count(*) takes around 40 seconds.
>
> Looks like the count(*) fetches the table from disk and goes through it.
> Made me wonder, why the optimizer doesn't just choose the smallest index
> which in my case is around 60 Megs and goes through it, which it could
> do in a fraction of the time.
>
> Dror

Just like other aggregate functions, count(*) won't use indexes when
counting whole table.

Regards,
Tomasz Myrta

Re: count(*) slow on large tables

From

Bruno Wolff III

Date:

02 October 2003, 16:52:14

On Thu, Oct 02, 2003 at 12:46:45 -0700,
  Dror Matalon <dror@zapatec.com> wrote:

Please keep replies copied to the list.

> When would it happen that a tuple be invisible to the current
> transaction? Are we talking about permissions?

They could be tuples that were changed by a transaction that hasn't committed
or in the case of serializable isolation, a transaction that committed after
the current transaction started.

>
> On Thu, Oct 02, 2003 at 02:39:05PM -0500, Bruno Wolff III wrote:
> > On Thu, Oct 02, 2003 at 12:15:47 -0700,
> >   Dror Matalon <dror@zapatec.com> wrote:
> > > Hi,
> > >
> > > I have a somewhat large table, 3 million rows, 1 Gig on disk,  and growing. Doing a
> > > count(*) takes around 40 seconds.
> > >
> > > Looks like the count(*) fetches the table from disk and goes through it.
> > > Made me wonder, why the optimizer doesn't just choose the smallest index
> > > which in my case is around 60 Megs and goes through it, which it could
> > > do in a fraction of the time.
> >
> > Because it can't tell from the index if a tuple is visible to the current
> > transaction and would still have to hit the table to check this. So that
> > performance would be a lot worse instead of better.
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> >
> >                http://www.postgresql.org/docs/faqs/FAQ.html
>
> --
> Dror Matalon, President
> Zapatec Inc
> 1700 MLK Way
> Berkeley, CA 94709
> http://www.zapatec.com

Re: count(*) slow on large tables

From

Jean-Luc Lachance

Date:

02 October 2003, 18:21:27

That's one of the draw back of MVCC.
I once suggested that the transaction number and other house keeping
info be included in the index, but was told to forget it...
It would solve once and for all the issue of seq_scan vs index_scan.
It would simplify the aggregate problem.


Bruno Wolff III wrote:
>
> On Thu, Oct 02, 2003 at 12:15:47 -0700,
>   Dror Matalon <dror@zapatec.com> wrote:
> > Hi,
> >
> > I have a somewhat large table, 3 million rows, 1 Gig on disk,  and growing. Doing a
> > count(*) takes around 40 seconds.
> >
> > Looks like the count(*) fetches the table from disk and goes through it.
> > Made me wonder, why the optimizer doesn't just choose the smallest index
> > which in my case is around 60 Megs and goes through it, which it could
> > do in a fraction of the time.
>
> Because it can't tell from the index if a tuple is visible to the current
> transaction and would still have to hit the table to check this. So that
> performance would be a lot worse instead of better.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html

Re: count(*) slow on large tables

From

Christopher Browne

Date:

02 October 2003, 19:20:13

jllachan@nsd.ca (Jean-Luc Lachance) writes:
> That's one of the draw back of MVCC.
> I once suggested that the transaction number and other house keeping
> info be included in the index, but was told to forget it...
> It would solve once and for all the issue of seq_scan vs index_scan.
> It would simplify the aggregate problem.

It would only simplify _one_ case, namely the case where someone cares
about the cardinality of a relation, and it would do that at
_considerable_ cost.

A while back I outlined how this would have to be done, and for it to
be done efficiently, it would be anything BUT simple.

It would be very hairy to implement it correctly, and all this would
cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"

If you had a single WHERE clause attached, you would have to revert to
walking through the tuples looking for the ones that are live and
committed, which is true for any DBMS.

And it still begs the same question, of why the result of this query
would be particularly meaningful to anyone.  I don't see the
usefulness; I don't see the value of going to the considerable effort
of "fixing" this purported problem.
--
let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)

Re: count(*) slow on large tables

From

Dror Matalon

Date:

02 October 2003, 19:34:14

I don't have an opinion on how hard it would be to implement the
tracking in the indexes, but "select count(*) from some table" is, in my
experience, a query that people tend to run quite often.
One of the databases that I've used, I believe it was Informix, had that
info cached so that it always new how many rows there were in any
table. It was quite useful.


On Thu, Oct 02, 2003 at 05:57:30PM -0400, Christopher Browne wrote:
> jllachan@nsd.ca (Jean-Luc Lachance) writes:
> > That's one of the draw back of MVCC.
> > I once suggested that the transaction number and other house keeping
> > info be included in the index, but was told to forget it...
> > It would solve once and for all the issue of seq_scan vs index_scan.
> > It would simplify the aggregate problem.
>
> It would only simplify _one_ case, namely the case where someone cares
> about the cardinality of a relation, and it would do that at
> _considerable_ cost.
>
> A while back I outlined how this would have to be done, and for it to
> be done efficiently, it would be anything BUT simple.
>
> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"
>
> If you had a single WHERE clause attached, you would have to revert to
> walking through the tuples looking for the ones that are live and
> committed, which is true for any DBMS.
>
> And it still begs the same question, of why the result of this query
> would be particularly meaningful to anyone.  I don't see the
> usefulness; I don't see the value of going to the considerable effort
> of "fixing" this purported problem.
> --
> let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
> <http://dev6.int.libertyrms.com/>
> Christopher Browne
> (416) 646 3304 x124 (land)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
Dror Matalon, President
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: count(*) slow on large tables

From

Christopher Browne

Date:

02 October 2003, 23:20:24

The world rejoiced as dror@zapatec.com (Dror Matalon) wrote:
> I don't have an opinion on how hard it would be to implement the
> tracking in the indexes, but "select count(*) from some table" is, in my
> experience, a query that people tend to run quite often.
> One of the databases that I've used, I believe it was Informix, had that
> info cached so that it always new how many rows there were in any
> table. It was quite useful.

I can't imagine why the raw number of tuples in a relation would be
expected to necessarily be terribly useful.

I'm involved with managing Internet domains, and it's only when people
are being pretty clueless that anyone imagines that "select count(*)
from domains;" would be of any use to anyone.  There are enough "test
domains" and "inactive domains" and other such things that the raw
number of "things in the table" aren't really of much use.

- I _do_ care how many pages a table occupies, to some extent, as that
determines whether it will fit in my disk space or not, but that's not
COUNT(*).

- I might care about auditing the exact numbers of records in order to
be assured that a data conversion process was done correctly.  But in
that case, I want to do something a whole *lot* more detailed than
mere COUNT(*).

I'm playing "devil's advocate" here, to some extent.  But
realistically, there is good reason to be skeptical of the merits of
using SELECT COUNT(*) FROM TABLE for much of anything.

Furthermore, the relation that you query mightn't be a physical
"table."  It might be a more virtual VIEW, and if that's the case,
bets are even MORE off.  If you go with the common dictum of "good
design" that users don't directly access tables, but go through VIEWs,
users may have no way to get at SELECT COUNT(*) FROM TABLE.
--
output = reverse("ac.notelrac.teneerf" "@" "454aa")
http://www.ntlug.org/~cbbrowne/finances.html
Rules  of  the  Evil  Overlord  #74.   "When  I  create  a  multimedia
presentation of my plan designed  so that my five-year-old advisor can
easily  understand the  details, I  will not  label the  disk "Project
Overlord" and leave it lying on top of my desk."
<http://www.eviloverlord.com/>

Re: count(*) slow on large tables

From

Dror Matalon

Date:

03 October 2003, 01:28:44

I smell a religious war in the aii:-).
Can you go several days in a row without doing select count(*) on any
of your tables?

I suspect that this is somewhat a domain specific issue. In some areas
you don't need to know the total number of rows in your tables, in
others you do.

I also suspect that you're right, that end user applications don't use
this information as often as DBAs would. On the other hand, it seems
whenever you want to optimize your app (something relevant to this list),
one of the things you do need to know is the number of rows in your
table.

Dror

On Thu, Oct 02, 2003 at 10:08:18PM -0400, Christopher Browne wrote:
> The world rejoiced as dror@zapatec.com (Dror Matalon) wrote:
> > I don't have an opinion on how hard it would be to implement the
> > tracking in the indexes, but "select count(*) from some table" is, in my
> > experience, a query that people tend to run quite often.
> > One of the databases that I've used, I believe it was Informix, had that
> > info cached so that it always new how many rows there were in any
> > table. It was quite useful.
>
> I can't imagine why the raw number of tuples in a relation would be
> expected to necessarily be terribly useful.
>
> I'm involved with managing Internet domains, and it's only when people
> are being pretty clueless that anyone imagines that "select count(*)
> from domains;" would be of any use to anyone.  There are enough "test
> domains" and "inactive domains" and other such things that the raw
> number of "things in the table" aren't really of much use.
>
> - I _do_ care how many pages a table occupies, to some extent, as that
> determines whether it will fit in my disk space or not, but that's not
> COUNT(*).
>
> - I might care about auditing the exact numbers of records in order to
> be assured that a data conversion process was done correctly.  But in
> that case, I want to do something a whole *lot* more detailed than
> mere COUNT(*).
>
> I'm playing "devil's advocate" here, to some extent.  But
> realistically, there is good reason to be skeptical of the merits of
> using SELECT COUNT(*) FROM TABLE for much of anything.
>
> Furthermore, the relation that you query mightn't be a physical
> "table."  It might be a more virtual VIEW, and if that's the case,
> bets are even MORE off.  If you go with the common dictum of "good
> design" that users don't directly access tables, but go through VIEWs,
> users may have no way to get at SELECT COUNT(*) FROM TABLE.
> --
> output = reverse("ac.notelrac.teneerf" "@" "454aa")
> http://www.ntlug.org/~cbbrowne/finances.html
> Rules  of  the  Evil  Overlord  #74.   "When  I  create  a  multimedia
> presentation of my plan designed  so that my five-year-old advisor can
> easily  understand the  details, I  will not  label the  disk "Project
> Overlord" and leave it lying on top of my desk."
> <http://www.eviloverlord.com/>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
Dror Matalon, President
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: count(*) slow on large tables

From

Greg Stark

Date:

03 October 2003, 02:13:59

Christopher Browne <cbbrowne@libertyrms.info> writes:

> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"
>
> If you had a single WHERE clause attached, you would have to revert to
> walking through the tuples looking for the ones that are live and
> committed, which is true for any DBMS.

Well it would be handy for a few other cases as well.

1 It would be useful for the case where you have a partial index with a
  matching where clause. The optimizer already considers using such indexes
  but it has to pay the cost of the tuple lookup, which is substantial.

2 It would be useful for the very common queries of the form
  WHERE x IN (select id from foo where some_indexed_expression)

  (Or the various equivalent forms including outer joins that test to see if
  the matching record was found and don't retrieve any other columns in the
  select list.)

3 It would be useful for many-many relationships where the intermediate table
  has only the two primary key columns being joined. If you create a
  multi-column index on the two columns it shouldn't need to look up the
  tuple. This would be effectively be nearly equivalent to an "index organized
  table".

4 It would be useful for just about all the referential integrity queries...

I don't mean to say this is definitely a good thing. The tradeoff in
complexity and time to maintain the index pages would be large. But don't
dismiss it as purely a count(*) optimization hack.

I know Oracle is capable of it and it can speed up your query a lot when you
remove that last unnecessary column from a join table allowing oracle to skip
the step of reading the table.

--
greg

Re: count(*) slow on large tables

From

"Gregory S. Williamson"

Date:

03 October 2003, 03:23:39

I can tell you that this is one of the first thing applications' programmers and IT managers notice. It can slightly
tarnishpostgres' image when it takes it many long seconds to do what other databases can do in a snap. The "whys and
wherefores"can be hard to get across once they see the comparative numbers. 

When I use Informix "dbaccess" it has a "status" which will tell me the row count of a table virtually instantly -- it
canbe locked out by a user with an exclusive lock so its not entirely independant of the table (like a stored value in
oneof the system catalog tables). 

This is not to say Informix is "right" and Postgres is "wrong" ... but it  is something that virtually any newcomer
willrun into head long, with resulting bruises and contusions, not to mention confusion. 

At the very least this needs to be VERY clearly explained right up front, along with some of the possible work-arounds,
dependingon what one is really after with this info. 

Greg Williamson
DBA
GlobeXplorer LLC

-----Original Message-----
From:    Dror Matalon [mailto:dror@zapatec.com]
Sent:    Thu 10/2/2003 9:27 PM
To:    pgsql-performance@postgresql.org
Cc:
Subject:    Re: [PERFORM] count(*) slow on large tables

I smell a religious war in the aii:-).
Can you go several days in a row without doing select count(*) on any
of your tables?

I suspect that this is somewhat a domain specific issue. In some areas
you don't need to know the total number of rows in your tables, in
others you do.

I also suspect that you're right, that end user applications don't use
this information as often as DBAs would. On the other hand, it seems
whenever you want to optimize your app (something relevant to this list),
one of the things you do need to know is the number of rows in your
table.

Dror

On Thu, Oct 02, 2003 at 10:08:18PM -0400, Christopher Browne wrote:
> The world rejoiced as dror@zapatec.com (Dror Matalon) wrote:
> > I don't have an opinion on how hard it would be to implement the
> > tracking in the indexes, but "select count(*) from some table" is, in my
> > experience, a query that people tend to run quite often.
> > One of the databases that I've used, I believe it was Informix, had that
> > info cached so that it always new how many rows there were in any
> > table. It was quite useful.
>
> I can't imagine why the raw number of tuples in a relation would be
> expected to necessarily be terribly useful.
>
> I'm involved with managing Internet domains, and it's only when people
> are being pretty clueless that anyone imagines that "select count(*)
> from domains;" would be of any use to anyone.  There are enough "test
> domains" and "inactive domains" and other such things that the raw
> number of "things in the table" aren't really of much use.
>
> - I _do_ care how many pages a table occupies, to some extent, as that
> determines whether it will fit in my disk space or not, but that's not
> COUNT(*).
>
> - I might care about auditing the exact numbers of records in order to
> be assured that a data conversion process was done correctly.  But in
> that case, I want to do something a whole *lot* more detailed than
> mere COUNT(*).
>
> I'm playing "devil's advocate" here, to some extent.  But
> realistically, there is good reason to be skeptical of the merits of
> using SELECT COUNT(*) FROM TABLE for much of anything.
>
> Furthermore, the relation that you query mightn't be a physical
> "table."  It might be a more virtual VIEW, and if that's the case,
> bets are even MORE off.  If you go with the common dictum of "good
> design" that users don't directly access tables, but go through VIEWs,
> users may have no way to get at SELECT COUNT(*) FROM TABLE.
> --
> output = reverse("ac.notelrac.teneerf" "@" "454aa")
> http://www.ntlug.org/~cbbrowne/finances.html
> Rules  of  the  Evil  Overlord  #74.   "When  I  create  a  multimedia
> presentation of my plan designed  so that my five-year-old advisor can
> easily  understand the  details, I  will not  label the  disk "Project
> Overlord" and leave it lying on top of my desk."
> <http://www.eviloverlord.com/>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
Dror Matalon, President
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: count(*) slow on large tables

From

Shridhar Daithankar

Date:

03 October 2003, 03:27:49

Dror Matalon wrote:

> I smell a religious war in the aii:-).
> Can you go several days in a row without doing select count(*) on any
> of your tables?
>
> I suspect that this is somewhat a domain specific issue. In some areas
> you don't need to know the total number of rows in your tables, in
> others you do.

If I were you, I would have an autovacuum daemon running and rather than doing
select count(*), I would look at stats generated by vacuums. They give
approximate number of tuples and it should be good enough it is accurate within
a percent.

Just another approach of achieving same thing.. Don't be religious about running
a qeury from SQL prompt. That's it..

  Shridhar

Re: count(*) slow on large tables

From

Christopher Browne

Date:

03 October 2003, 08:50:24

Oops! dror@zapatec.com (Dror Matalon) was seen spray-painting on a wall:
> I smell a religious war in the aii:-).
> Can you go several days in a row without doing select count(*) on any
> of your tables?

I would be more likely, personally, to run "VACUUM VERBOSE ANALYZE",
which has useful side-effects :-).

> I suspect that this is somewhat a domain specific issue. In some
> areas you don't need to know the total number of rows in your
> tables, in others you do.

"Relationship tables," which don't contain data in their own right,
but which, instead, link together records in other tables, are likely
to have particularly useless COUNT(*)'s.

> I also suspect that you're right, that end user applications don't
> use this information as often as DBAs would. On the other hand, it
> seems whenever you want to optimize your app (something relevant to
> this list), one of the things you do need to know is the number of
> rows in your table.

Ah, but in the case of optimization, there's little need for
"transactionally safe, MVCC-managed, known-to-be-exact" values.
Approximations are plenty good enough to get the right plan.

Furthermore, it's not the number of rows that is most important when
optimizing queries; the number of pages are more relevant to the
matter, as that's what the database is slinging around.
--
(reverse (concatenate 'string "ac.notelrac.teneerf" "@" "454aa"))
http://www3.sympatico.ca/cbbrowne/multiplexor.html
Rules of  the Evil Overlord #134. "If  I am escaping in  a large truck
and the hero is pursuing me in  a small Italian sports car, I will not
wait for the hero to pull up along side of me and try to force him off
the road  as he attempts to climb  aboard. Instead I will  slam on the
brakes  when he's  directly behind  me.  (A  rudimentary  knowledge of
physics can prove quite useful.)" <http://www.eviloverlord.com/>

Re: count(*) slow on large tables

From

Jeff

Date:

03 October 2003, 09:37:41

On Thu, 2 Oct 2003, Christopher Browne wrote:

> I can't imagine why the raw number of tuples in a relation would be
> expected to necessarily be terribly useful.
>

We use stuff like that for reporting queries.

example:
On our message boards each post is a row.  The powers that be like to know
how many posts there are total (In addition to 'today')-
select count(*) from posts is how it has been
done on our informix db.  With our port to PG I instead select reltuples
pg_class.

I know when I login to a new db (or unknown to me db) the first thing I do
is look at tables and see what sort of data there is.. but in code I'd
rarely do that.

I know some monitoring things around here also do a select count(*) on
sometable to ensure it is growing, but like you said, this is easily done
with the number of pages as well.

yes. Informix caches this data. I believe Oracle does too.

Mysql with InnoDB does the same thing PG does. (MyISAM caches it)

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Re: count(*) slow on large tables

From

Jean-Luc Lachance

Date:

03 October 2003, 12:43:22

Well I can think of many more case where it would be usefull:

SELECT COUNT(DISTINCT x) FROM ...
SELECT COUNT(*) FROM ... WHERE x = ?


Also having transaction number (visibility) would tip the balance more
toward index_scan than seq_scan because you do not have to look up
visibility in the data file. We all know this has been an issue many
times.
Having a different index file structure when the index is not UNIQUE
would help too.
The last page of a non unique index could hold more stats.



Christopher Browne wrote:
>
> jllachan@nsd.ca (Jean-Luc Lachance) writes:
> > That's one of the draw back of MVCC.
> > I once suggested that the transaction number and other house keeping
> > info be included in the index, but was told to forget it...
> > It would solve once and for all the issue of seq_scan vs index_scan.
> > It would simplify the aggregate problem.
>
> It would only simplify _one_ case, namely the case where someone cares
> about the cardinality of a relation, and it would do that at
> _considerable_ cost.
>
> A while back I outlined how this would have to be done, and for it to
> be done efficiently, it would be anything BUT simple.
>
> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"
>
> If you had a single WHERE clause attached, you would have to revert to
> walking through the tuples looking for the ones that are live and
> committed, which is true for any DBMS.
>
> And it still begs the same question, of why the result of this query
> would be particularly meaningful to anyone.  I don't see the
> usefulness; I don't see the value of going to the considerable effort
> of "fixing" this purported problem.
> --
> let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
> <http://dev6.int.libertyrms.com/>
> Christopher Browne
> (416) 646 3304 x124 (land)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

Re: count(*) slow on large tables

From

Hilary Forbes

Date:

03 October 2003, 13:50:28

We frequently need to know the number of tuples in a table although sometimes we do have WHERE status='X' for example
butthis often doesn't guarantee an indexed scan. And yes, my reasons are the same - reporting figures eg number of
bookingsmade since the system was introduced.   Have you tried doing 

SELECT count(pkey)

rather than count(*)

where pkey is the primary key (assuming you have a single field that is a primary key or a unique indexed key).  This
isMUCH faster in my experience.  If you don't have such an animal, I'd seriously suggesting putting in a serial number
andrecreate the table with that as the primary key. 

The vacuuming bit is not accurate enough for us in many instances.  Also a count can be easily fed into other
programs/webpages etc without having to parse the vacuum output. 

Hilary

At 23:22 02/10/2003 -0700, you wrote:

>I can tell you that this is one of the first thing applications' programmers and IT managers notice. It can slightly
tarnishpostgres' image when it takes it many long seconds to do what other databases can do in a snap. The "whys and
wherefores"can be hard to get across once they see the comparative numbers. 
>
>When I use Informix "dbaccess" it has a "status" which will tell me the row count of a table virtually instantly -- it
canbe locked out by a user with an exclusive lock so its not entirely independant of the table (like a stored value in
oneof the system catalog tables). 
>
>This is not to say Informix is "right" and Postgres is "wrong" ... but it  is something that virtually any newcomer
willrun into head long, with resulting bruises and contusions, not to mention confusion. 
>
>At the very least this needs to be VERY clearly explained right up front, along with some of the possible
work-arounds,depending on what one is really after with this info. 
>
>Greg Williamson
>DBA
>GlobeXplorer LLC
>
>-----Original Message-----
>From:   Dror Matalon [mailto:dror@zapatec.com]
>Sent:   Thu 10/2/2003 9:27 PM
>To:     pgsql-performance@postgresql.org
>Cc:
>Subject:        Re: [PERFORM] count(*) slow on large tables
>
>
>I smell a religious war in the aii:-).
>Can you go several days in a row without doing select count(*) on any
>of your tables?
>
>I suspect that this is somewhat a domain specific issue. In some areas
>you don't need to know the total number of rows in your tables, in
>others you do.
>
>I also suspect that you're right, that end user applications don't use
>this information as often as DBAs would. On the other hand, it seems
>whenever you want to optimize your app (something relevant to this list),
>one of the things you do need to know is the number of rows in your
>table.
>
>Dror
>
>On Thu, Oct 02, 2003 at 10:08:18PM -0400, Christopher Browne wrote:
>> The world rejoiced as dror@zapatec.com (Dror Matalon) wrote:
>> > I don't have an opinion on how hard it would be to implement the
>> > tracking in the indexes, but "select count(*) from some table" is, in my
>> > experience, a query that people tend to run quite often.
>> > One of the databases that I've used, I believe it was Informix, had that
>> > info cached so that it always new how many rows there were in any
>> > table. It was quite useful.
>>
>> I can't imagine why the raw number of tuples in a relation would be
>> expected to necessarily be terribly useful.
>>
>> I'm involved with managing Internet domains, and it's only when people
>> are being pretty clueless that anyone imagines that "select count(*)
>> from domains;" would be of any use to anyone.  There are enough "test
>> domains" and "inactive domains" and other such things that the raw
>> number of "things in the table" aren't really of much use.
>>
>> - I _do_ care how many pages a table occupies, to some extent, as that
>> determines whether it will fit in my disk space or not, but that's not
>> COUNT(*).
>>
>> - I might care about auditing the exact numbers of records in order to
>> be assured that a data conversion process was done correctly.  But in
>> that case, I want to do something a whole *lot* more detailed than
>> mere COUNT(*).
>>
>> I'm playing "devil's advocate" here, to some extent.  But
>> realistically, there is good reason to be skeptical of the merits of
>> using SELECT COUNT(*) FROM TABLE for much of anything.
>>
>> Furthermore, the relation that you query mightn't be a physical
>> "table."  It might be a more virtual VIEW, and if that's the case,
>> bets are even MORE off.  If you go with the common dictum of "good
>> design" that users don't directly access tables, but go through VIEWs,
>> users may have no way to get at SELECT COUNT(*) FROM TABLE.
>> --
>> output = reverse("ac.notelrac.teneerf" "@" "454aa")
>> http://www.ntlug.org/~cbbrowne/finances.html
>> Rules  of  the  Evil  Overlord  #74.   "When  I  create  a  multimedia
>> presentation of my plan designed  so that my five-year-old advisor can
>> easily  understand the  details, I  will not  label the  disk "Project
>> Overlord" and leave it lying on top of my desk."
>> <http://www.eviloverlord.com/>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Don't 'kill -9' the postmaster
>
>--
>Dror Matalon, President
>Zapatec Inc
>1700 MLK Way
>Berkeley, CA 94709
>http://www.zapatec.com
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster


Hilary Forbes
-------------
DMR Computer Limited:   http://www.dmr.co.uk/
Direct line:  01689 889950
Switchboard:  (44) 1689 860000  Fax: (44) 1689 860330
E-mail:  hforbes@dmr.co.uk

**********************************************************

Re: count(*) slow on large tables

From

Christopher Browne

Date:

03 October 2003, 19:20:30

In the last exciting episode, jllachan@nsd.ca (Jean-Luc Lachance) wrote:
> Well I can think of many more case where it would be usefull:
>
> SELECT COUNT(DISTINCT x) FROM ...
> SELECT COUNT(*) FROM ... WHERE x = ?

Those are precisely the cases that the "other databases" ALSO fall
down on.

Maintaining those sorts of statistics would lead [in _ANY_ database;
PostgreSQL has no disadvantage in this] to needing for each and every
update to update a whole host of statistic values.

It would be fairly reasonable to have a trigger, in PostgreSQL, to
manage this sort of information.  It would not be outrageously
difficult to substantially improve performance of queries, at the
considerable cost that each and every update would have to update a
statistics table.

If you're doing a whole lot of these sorts of queries, then it is a
reasonable idea to create appropriate triggers for the (probably very
few) tables where you are doing these counts.

But the notion that this should automatically be applied to all tables
always is a dangerous one.  It would make update performance Suck
Badly, because the extra statistical updates would be quite expensive.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','cbbrowne.com').
http://www3.sympatico.ca/cbbrowne/multiplexor.html
I'm sorry Dave, I can't let you do that.
Why don't you lie down and take a stress pill?

Re: [HACKERS] Index/Function organized table layout (from Re:

From

Hannu Krosing

Date:

04 October 2003, 06:01:09

Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
> jllachan@nsd.ca (Jean-Luc Lachance) writes:
> > That's one of the draw back of MVCC.
> > I once suggested that the transaction number and other house keeping
> > info be included in the index, but was told to forget it...
> > It would solve once and for all the issue of seq_scan vs index_scan.
> > It would simplify the aggregate problem.
>
> It would only simplify _one_ case, namely the case where someone cares
> about the cardinality of a relation, and it would do that at
> _considerable_ cost.
>
> A while back I outlined how this would have to be done, and for it to
> be done efficiently, it would be anything BUT simple.

Could this be made a TODO item, perhaps with your attack plan.
Of course as strictly optional feature useful only for special situations
(see below)

I cross-post this to [HACKERS] as it seem relevant to a problem recently
discussed there.

> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"

Not really. Just yesterday there was a discussion on [HACKERS] about
implementing btree-organized tables, which would be much less needed if
the visibility info were kept in indexes.

> If you had a single WHERE clause attached, you would have to revert to
> walking through the tuples looking for the ones that are live and
> committed, which is true for any DBMS.

If the WHERE clause could use the same index (or any index with
visibility info) then there would be no need for "walking through the
tuples" in data relation.

the typical usecase cited on [HACKERS] was time series data, where
inserts are roughly in (timestamp,id)order but queries in (id,timestamp)
order. Now if the index would include all relevant fields
(id,timestamp,data1,data2,...,dataN) then the query could run on index
only touching just a few pages and thus vastly improving performance. I
agree that this is not something everybody needs, but when it is needed
it is needed bad.

> And it still begs the same question, of why the result of this query
> would be particularly meaningful to anyone.  I don't see the
> usefulness; I don't see the value of going to the considerable effort
> of "fixing" this purported problem.

Being able to do fast count(*) is just a side benefit.

----------------
Hannu

Re: count(*) slow on large tables

From

Christopher Kings-Lynne

Date:

04 October 2003, 06:38:22

> On our message boards each post is a row.  The powers that be like to know
> how many posts there are total (In addition to 'today')-
> select count(*) from posts is how it has been
> done on our informix db.  With our port to PG I instead select reltuples
> pg_class.

We have exactly the same situation, except we just added a 'num_replies'
field to each thread and a 'num_posts' field to each forum, so that
getting that information out is a very fast operation.  Because, of
course, there are hundreds of times more reads of that information than
writes...

Chris

Re: count(*) slow on large tables

From

Bruce Momjian

Date:

04 October 2003, 12:57:32

Christopher Browne wrote:
> jllachan@nsd.ca (Jean-Luc Lachance) writes:
> > That's one of the draw back of MVCC.
> > I once suggested that the transaction number and other house keeping
> > info be included in the index, but was told to forget it...
> > It would solve once and for all the issue of seq_scan vs index_scan.
> > It would simplify the aggregate problem.
>
> It would only simplify _one_ case, namely the case where someone cares
> about the cardinality of a relation, and it would do that at
> _considerable_ cost.
>
> A while back I outlined how this would have to be done, and for it to
> be done efficiently, it would be anything BUT simple.
>
> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"

We do have a TODO item:

    * Consider using MVCC to cache count(*) queries with no WHERE clause

The idea is to cache a recent count of the table, then have
insert/delete add +/- records to the count.  A COUNT(*) would get the
main cached record plus any visible +/- records.  This would allow the
count to return the proper value depending on the visibility of the
requesting transaction, and it would require _no_ heap or index scan.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

COUNT(*) again (was Re: [HACKERS] Index/Function organized table layout)

From

Tom Lane

Date:

04 October 2003, 13:08:43

Hannu Krosing <hannu@tm.ee> writes:
> Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
>> A while back I outlined how this would have to be done, and for it to
>> be done efficiently, it would be anything BUT simple.

> Could this be made a TODO item, perhaps with your attack plan.

If I recall that discussion correctly, no one including Christopher
thought the attack plan was actually reasonable.

What this keeps coming down to is that an optimization that helps only
COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in
development and maintenance effort to justify its existence.

At least if you insist on an exact, MVCC-correct answer.  So far as I've
seen, the actual use cases for unqualified COUNT(*) could be handled
equally well by an approximate answer.  What we should be doing rather
than wasting large amounts of time trying to devise exact solutions is
telling people to look at pg_class.reltuples for approximate answers.
We could also be looking at beefing up support for that approach ---
maybe provide some syntactic sugar for the lookup, maybe see if we can
update reltuples in more places than we do now, make sure that the
autovacuum daemon includes "keep reltuples accurate" as one of its
design goals, etc etc.

            regards, tom lane

Re: count(*) slow on large tables

From

Tom Lane

Date:

04 October 2003, 13:50:26

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> We do have a TODO item:
>     * Consider using MVCC to cache count(*) queries with no WHERE clause

> The idea is to cache a recent count of the table, then have
> insert/delete add +/- records to the count.  A COUNT(*) would get the
> main cached record plus any visible +/- records.  This would allow the
> count to return the proper value depending on the visibility of the
> requesting transaction, and it would require _no_ heap or index scan.

... and it would give the wrong answers.  Unless the cache is somehow
snapshot-aware, so that it can know which other transactions should be
included in your count.

            regards, tom lane

Re: count(*) slow on large tables

From

Bruce Momjian

Date:

04 October 2003, 14:49:39

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > We do have a TODO item:
> >     * Consider using MVCC to cache count(*) queries with no WHERE clause
>
> > The idea is to cache a recent count of the table, then have
> > insert/delete add +/- records to the count.  A COUNT(*) would get the
> > main cached record plus any visible +/- records.  This would allow the
> > count to return the proper value depending on the visibility of the
> > requesting transaction, and it would require _no_ heap or index scan.
>
> ... and it would give the wrong answers.  Unless the cache is somehow
> snapshot-aware, so that it can know which other transactions should be
> included in your count.

The cache is an ordinary table, with xid's on every row.  I meant it
would require no index/heap scans of the large table --- it would still
require a scan of the "count" table.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: count(*) slow on large tables

From

Tom Lane

Date:

04 October 2003, 14:52:30

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> ... and it would give the wrong answers.  Unless the cache is somehow
>> snapshot-aware, so that it can know which other transactions should be
>> included in your count.

> The cache is an ordinary table, with xid's on every row.  I meant it
> would require no index/heap scans of the large table --- it would still
> require a scan of the "count" table.

Oh, that idea.  Yeah, I think we had concluded it might work.  You'd
better make the TODO item link to that discussion, because there's sure
been plenty of discussion of ideas that wouldn't work.

            regards, tom lane

Re: count(*) slow on large tables

From

Bruce Momjian

Date:

04 October 2003, 15:20:37

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> ... and it would give the wrong answers.  Unless the cache is somehow
> >> snapshot-aware, so that it can know which other transactions should be
> >> included in your count.
>
> > The cache is an ordinary table, with xid's on every row.  I meant it
> > would require no index/heap scans of the large table --- it would still
> > require a scan of the "count" table.
>
> Oh, that idea.  Yeah, I think we had concluded it might work.  You'd
> better make the TODO item link to that discussion, because there's sure
> been plenty of discussion of ideas that wouldn't work.

OK, I beefed up the TODO:

    * Use a fixed row count and a +/- count with MVCC visibility rules
      to allow fast COUNT(*) queries with no WHERE clause(?)

I can always give the details if someone asks.  It doesn't seem complex
enough for a separate TODO.detail item.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: count(*) slow on large tables

From

Tom Lane

Date:

04 October 2003, 15:35:23

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> It doesn't seem complex enough for a separate TODO.detail item.

I thought it was, if only because it is so easy to think of wrong
implementations.

            regards, tom lane

Re: COUNT(*) again (was Re: [HACKERS] Index/Function

From

Hannu Krosing

Date:

04 October 2003, 18:01:11

Tom Lane kirjutas L, 04.10.2003 kell 19:07:
> Hannu Krosing <hannu@tm.ee> writes:
> > Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
> >> A while back I outlined how this would have to be done, and for it to
> >> be done efficiently, it would be anything BUT simple.
>
> > Could this be made a TODO item, perhaps with your attack plan.
>
> If I recall that discussion correctly, no one including Christopher
> thought the attack plan was actually reasonable.
>
> What this keeps coming down to is that an optimization that helps only
> COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in
> development and maintenance effort to justify its existence.

Please read further in my email ;)

The point I was trying to make was that faster count(*)'s is just a side
effect. If we could (conditionally) keep visibility info in indexes,
then this would also solve the problem fo much more tricky question of
index-structured tables.

Count(*) is *not* the only query that could benefit from not needing to
go to actual data table for visibilty info, The much more needed case
would be the "inveres time series" type of queries, which would
otherways trash cache pages badly.

----------------------------
Hannu

Re: COUNT(*) again (was Re: [HACKERS] Index/Function organized table layout)

From

Tom Lane

Date:

04 October 2003, 18:16:24

Hannu Krosing <hannu@tm.ee> writes:
> The point I was trying to make was that faster count(*)'s is just a side
> effect. If we could (conditionally) keep visibility info in indexes,

I think that's not happening, conditionally or otherwise.  The atomicity
problems alone are sufficient reason why not, even before you look at
the performance issues.

            regards, tom lane

Uses for Index/Function organizing

From

James Rogers

Date:

04 October 2003, 18:18:08

On 10/4/03 2:00 AM, "Hannu Krosing" <hannu@tm.ee> wrote:
>
> If the WHERE clause could use the same index (or any index with
> visibility info) then there would be no need for "walking through the
> tuples" in data relation.
>
> the typical usecase cited on [HACKERS] was time series data, where
> inserts are roughly in (timestamp,id)order but queries in (id,timestamp)
> order. Now if the index would include all relevant fields
> (id,timestamp,data1,data2,...,dataN) then the query could run on index
> only touching just a few pages and thus vastly improving performance. I
> agree that this is not something everybody needs, but when it is needed
> it is needed bad.

I would add that automatically index-organizing tuples isn't just useful for
time-series data (though it is a good example), but can be used to
substantially improve the query performance of any really large table in a
number of different and not always direct ways.  Once working sets routinely
exceed the size of physical RAM, buffer access/utilization efficiency often
becomes the center of performance tuning, but not one that many people know
much about.

One of the less direct ways of using btree-organized tables for improving
scalability is to "materialize" table indexes of tables that *shouldn't* be
btree-organized.  Not only can you turn tables into indexes, but you can
also turn indexes into tables, which can have advantages in some cases.

For example, I did some scalability consulting at a well-known movie rental
company with some very large Oracle databases running on big Sun boxen.  One
of the biggest problems was that their rental history table, which had a
detailed record of every movie ever rented by every customer, had grown so
large that the performance was getting painfully slow.  To make matters
worse, it and a few related tables had high concurrent usage, a mixture of
many performance-sensitive queries grabbing windows of a customer's history
plus a few broader OLAP queries which were not time sensitive.  Everything
was technically optimized in a relational and basic configuration sense, and
the database guys at the company were at a loss on how to fix this problem.
Performance of all queries was essentially bound by how fast pages could be
moved between the disk and buffers.

Issue #1:  The history rows had quite a lot of columns and the OLAP
processes used non-primary indexes, so the table was not particularly
suitable for btree-organizing.

Issue #2:  Partitioning was not an option because it would have exceeded
certain limits in Oracle (at that time, I don't know if that has changed).

Issue #3:  Although customer histories were being constantly queried, data
needed most was really an index view of the customers history, not the
details of the history itself.

The solution I came up with was to use a synced btree-organized partial
clone of the main history table that only contained a small number of key
columns that mattered for generating customer history indexes in the
applications that used them.  While this substantially increased the disk
space footprint for the same data (since we were cloning it), it greatly
reduced the total number of cache misses for the typical query, only
fetching the full history row pages when actually needed.  In other words,
basically concentrating more buffer traffic into a smaller number of page
buffers.  What we had was an exceedingly active but relatively compact
materialized index of the history table that could essentially stay resident
in RAM, and a much less active history table+indexes that while less likely
to be buffered than before, had pages accessed at such a reduced frequency
that there was a huge net performance gain because disk access plummeted.

Average performance improvement for the time sensitive queries: 50-70x

So btree-organized tables can do more than make tables behave like indexes.
They can also make indexes behave like tables.  Both are very useful in some
cases when your working set exceeds the physical buffer space.  For smaller
databases this has much less utility and users need to understand the
limitations, nonetheless when tables and databases get really big it becomes
an important tool in the tool belt.

Cheers,

-James Rogers
 jamesr@best.com

Re: count(*) slow on large tables

From

Christopher Browne

Date:

04 October 2003, 20:50:43

Quoth tgl@sss.pgh.pa.us (Tom Lane):
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> We do have a TODO item:
>>     * Consider using MVCC to cache count(*) queries with no WHERE clause
>
>> The idea is to cache a recent count of the table, then have
>> insert/delete add +/- records to the count.  A COUNT(*) would get the
>> main cached record plus any visible +/- records.  This would allow the
>> count to return the proper value depending on the visibility of the
>> requesting transaction, and it would require _no_ heap or index scan.
>
> ... and it would give the wrong answers.  Unless the cache is somehow
> snapshot-aware, so that it can know which other transactions should be
> included in your count.

[That's an excellent summary that Bruce did of what came out of the
previous discussion...]

If this "cache" was a table, itself, the visibility of its records
should be identical to that of the visibility of the "real" records.
+/- records would become visible when the transaction COMMITed, at the
very same time the source records became visible.

I thought, at one point, that it would be a slick idea for "record
compression" to take place automatically; when you do a COUNT(*), the
process would include compressing multiple records down to one.
Unfortunately, that turns out to be Tremendously Evil if the same
COUNT(*) were being concurrently processed in multiple transactions.
Both would repeat much the same work, and this would ultimately lead
to one of the transactions aborting.  [I recently saw this effect
occur, um, a few times...]

For this not to have Evil Effects on unsuspecting transactions, we
would instead require some process analagous to VACUUM, where a single
transaction would be used to compress the "counts table" down to one
record per table.  Being independent of "user transactions," it could
safely compress the data without injuring unsuspecting transactions.

But in most cases, the cost of this would be pretty prohibitive.
Every transaction that adds a record to a table leads to a record
being added to table "pg_exact_row_counts".  If transactions typically
involve adding ONE row to any given table, this effectively doubles
the update traffic.  Ouch.  That means that in a _real_
implementation, it would make sense to pick and choose the tables that
would be so managed.

In my earlier arguing of "You don't really want that!", while I may
have been guilty of engaging in a _little_ hyperbole, I was certainly
_not_ being facetious overall.  At work, we tell the developers "avoid
doing COUNT(*) inside ordinary transactions!", and that is certainly
NOT facetious comment.  I recall a case a while back where system
performance was getting brutalized by a lurking COUNT(*).  (Combined
with some other pathological behaviour, of course!)  And note that
this wasn't a query that the TODO item could address; it was of the
form "SELECT COUNT(*) FROM SOME_TABLE WHERE OWNER = VALUE;"

As you have commented elsewhere in the thread, much of the time, the
point of asking for COUNT(*) is often to get some idea of table size,
where the precise number isn't terribly important in comparison with
getting general magnitude.  Improving the ability to get approximate
values would be of some value.

I would further argue that "SELECT COUNT(*) FROM TABLE" isn't
particularly useful even when precision _is_ important.  If I'm
working on reports that would be used to reconcile things, the queries
I use are a whole lot more involved than that simple form.  It is far
more likely that I'm using a GROUP BY.

It is legitimate to get wishful and imagine that it would be nice if
we could get the value of that query "instantaneously."  It is also
legitimate to think that the effort required to implement that might
be better used on improving other things.
--
(reverse (concatenate 'string "ac.notelrac.teneerf" "@" "454aa"))
http://www3.sympatico.ca/cbbrowne/
"very few people approach me in real life and insist on proving they
are drooling idiots."  -- Erik Naggum, comp.lang.lisp

Re: COUNT(*) again (was Re: [HACKERS] Index/Function organized

From

Bruce Momjian

Date:

05 October 2003, 01:21:30

Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> > The point I was trying to make was that faster count(*)'s is just a side
> > effect. If we could (conditionally) keep visibility info in indexes,
>
> I think that's not happening, conditionally or otherwise.  The atomicity
> problems alone are sufficient reason why not, even before you look at
> the performance issues.

What are the atomicity problems of adding a create/expire xid to the
index tuples?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: COUNT(*) again (was Re: [HACKERS] Index/Function organized table layout)

From

Tom Lane

Date:

05 October 2003, 03:09:26

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> I think that's not happening, conditionally or otherwise.  The atomicity
>> problems alone are sufficient reason why not, even before you look at
>> the performance issues.

> What are the atomicity problems of adding a create/expire xid to the
> index tuples?

You can't update a tuple's status in just one place ... you have to
update the copies in the indexes too.

            regards, tom lane

Re: COUNT(*) again (was Re: [HACKERS] Index/Function organized

From

Bruce Momjian

Date:

05 October 2003, 10:37:36

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> I think that's not happening, conditionally or otherwise.  The atomicity
> >> problems alone are sufficient reason why not, even before you look at
> >> the performance issues.
>
> > What are the atomicity problems of adding a create/expire xid to the
> > index tuples?
>
> You can't update a tuple's status in just one place ... you have to
> update the copies in the indexes too.

But we don't update the tuple status for a commit, we just mark the xid
as committed.  We do have lazy status bits that prevent later lookups in
pg_clog, but we have those in the index already also.

What am I missing?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: count(*) slow on large tables

From

Josh Berkus

Date:

05 October 2003, 15:59:42

Bruce,

> OK, I beefed up the TODO:
>
>     * Use a fixed row count and a +/- count with MVCC visibility rules
>       to allow fast COUNT(*) queries with no WHERE clause(?)
>
> I can always give the details if someone asks.  It doesn't seem complex
> enough for a separate TODO.detail item.

Hmmm ... this doesn't seem effort-worthy to me.   How often does anyone do
COUNT with no where clause, except GUIs that give you a record count?  (of
course, as always, if someone wants to code it, feel free ...)

And for those GUIs, wouldn't it be 97% as good to run an ANALYZE and give the
approximate record counts for large tables?

As for counts with a WHERE clause, this is obviously up to the user.  Joe
Conway and I tested using a C trigger to track some COUNT ... GROUP BY values
for large tables based on additive numbers.   It worked fairly well for
accuracy, but the performance penalty on data writes was significant ... 8%
to 25% penalty for UPDATES, depending on the frequency and batch size (>
frequency > batch size -->  > penalty)

It's possible that this could be improved through some mechanism more tightly
integrated with the source code.   However,the coding effort would be
significant ( 12-20 hours ) and it's possible that there would be no
improvement, which is why we didn't do it.

We also discussed an asynchronous aggregates collector that would work
something like the statistics collector, and keep pre-programmmed aggregate
data, updating during "low-activity" periods.  This would significantly
reduce the performance penalty, but at the cost of accuracy ... that is, a
1%-5% variance on high-activity tables would be unavoidable, and all cached
aggregates would have to be recalculated on database restart, significantly
slowing down startup.   Again, we felt that the effort-result payoff was not
worthwhile.

Overall, I think the stuff we already have planned ... the hash aggregates in
7.4 and Tom's suggestion of adding an indexable flag to pg_aggs ... are far
more likely to yeild useful fruit than any caching plan.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: count(*) slow on large tables

From

Rod Taylor

Date:

05 October 2003, 16:11:58

> And for those GUIs, wouldn't it be 97% as good to run an ANALYZE and give the
> approximate record counts for large tables?

Interfaces which run a COUNT(*) like that are broken by design. They
fail to consider the table may really be a view which of course could
not be cached with results like that and may take days to load a full
result set (we had some pretty large views in an old billing system).

Attachment

signature.asc

Re: count(*) slow on large tables

From

Shridhar Daithankar

Date:

06 October 2003, 03:07:36

Bruce Momjian wrote:
> OK, I beefed up the TODO:
>
>     * Use a fixed row count and a +/- count with MVCC visibility rules
>       to allow fast COUNT(*) queries with no WHERE clause(?)
>
> I can always give the details if someone asks.  It doesn't seem complex
> enough for a separate TODO.detail item.

May I propose alternate approach for this optimisation?

- Postgresql allows to maintain user defined variables in shared memory.
- These variables obey transactions but do not get written to disk at all.
- There should be a facility to detect whether such a variable is initialized or
not.

How it will help? This is in addition to trigger proposal that came up earlier.
With  triggers it's not possible to make values visible across backends unless
trigger updates a table, which eventually leads to vacuum/dead tuples problem.

1. User creates a trigger to check updates/inserts for certain conditions.
2. It updates the count as and when required.
3. If the trigger detects the count is not initialized, it would issue the same
query first time. There is no avoiding this issue.

Besides providing facility of resident variables could be used imaginatively as
well.

Does this make sense? IMO this is more generalised approach over all.

Just a thought.

  Shridhar

Re: count(*) slow on large tables

From

Sean Chittenden

Date:

06 October 2003, 14:02:48

> How it will help? This is in addition to trigger proposal that came
> up earlier. With triggers it's not possible to make values visible
> across backends unless trigger updates a table, which eventually
> leads to vacuum/dead tuples problem.
>
> 1. User creates a trigger to check updates/inserts for certain conditions.
> 2. It updates the count as and when required.
> 3. If the trigger detects the count is not initialized, it would issue the
> same query first time. There is no avoiding this issue.
>
> Besides providing facility of resident variables could be used
> imaginatively as well.
>
> Does this make sense? IMO this is more generalised approach over all.

I do this _VERY_ frequently in my databases, only I have my stored
procs do the aggregate in a predefined MVCC table that's always there.
Here's a denormalized version for public consumption/thought:

CREATE TABLE global.dba_aggregate_cache (
  dbl TEXT NOT NULL,        -- The database location, doesn't need to be
                            -- qualified (ex: schema.table.col)
  op TEXT NOT NULL,         -- The operation, SUM, COUNT, etc.
  qual TEXT,                -- Any kind of conditional, such as a where clause
  val_int INT,              -- Whatever the value is, of type INT
  val_bigint BIGINT,        -- Whatever the value is, of type BIGINT
  val_text TEXT,            -- Whatever the value is, of type TEXT
  val_bytea BYTEA,          -- Whatever the value is, of type BYTEA
);
CREATE UNIQUE INDEX dba_aggregate_cache_dbl_op_udx ON global.dba_aggregate_cache(dbl,op);

Then, I use a function to retrieve this value instead of a SELECT
COUNT(*).

SELECT public.cache_count('dbl','qual');  -- In this case, the op is COUNT
SELECT public.cache_count('dbl');         -- Returns the COUNT for the table listed in the dbl

Then, I create 4 or 5 functions (depends on the op I'm performing):

1) A private function that _doesn't_ run as security definer, that
   populates the global.dba_aggregate_cache row if it's empty.
2) A STABLE function for SELECTs, if the row doesn't exist, then it
   calls function #1 to populate its existence.
3) A STABLE function for INSERTs, if the row doesn't exist, then it
   calls function #1 to populate its existence, then adds the
   necessary bits to make it accurate.
4) A STABLE function for DELETEs, if the row doesn't exist, then it
   calls function #1 to populate its existence, then deletes the
   necessary bits to make it accurate.
5) A STABLE function for UPDATEs, if the row doesn't exist, then it
   calls function #1 to populate its existence, then updates the
   necessary bits to make it accurate.  It's not uncommon for me to
   not have an UPDATE function/trigger.

Create triggers for functions 2-5, and test away.  It's MVCC,
searching through a table that's INDEX'ed for a single row is
obviously vastly faster than a seqscan/aggregate.  If I need any kind
of an aggregate to be fast, I use this system with a derivation of the
above table.  The problem with it being that I have to retrain others
to use cache_count(), or some other function instead of using
COUNT(*).

That said, it'd be nice if there were a way to tell PostgreSQL to do
the above for you and teach COUNT(*), SUM(*), or other aggregates to
use an MVCC backed cache similar to the above.  If people want their
COUNT's to be fast, then they have to live with the INSERT, UPDATE,
DELETE cost.  The above doesn't work with anything complex such as
join's, but it's certainly a start and I think satisfies everyone's
gripes other than the tuple churn that _does_ happen (*nudge nudge*,
pg_autovacuum could be integrated into the backend to handle this).
Those worried about performance, the pages that are constantly being
recycled would likely stay in disk cache (PG or the OS).  There's
still some commit overhead, but still... no need to over optimize by
requiring the table to be stored in the out dated, slow, and over used
shm (also, *nudge nudge*).

Anyway, let me throw that out there as a solution that I use and it
works quite well.  I didn't explain the use of the qual column, but I
think those who grasp the above way of handling things probably grok
how to use the qual column in a dynamically executed query.

CREATE AGGREGATE CACHE anyone?

-sc

--
Sean Chittenden

Re: count(*) slow on large tables

From

Harald Fuchs

Date:

06 October 2003, 17:04:02

In article <3F7D172E.3060107@persistent.co.in>,
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:

> Dror Matalon wrote:
>> I smell a religious war in the aii:-). Can you go several days in a
>> row without doing select count(*) on any
>> of your tables? I suspect that this is somewhat a domain specific
>> issue. In some areas
>> you don't need to know the total number of rows in your tables, in
>> others you do.

> If I were you, I would have an autovacuum daemon running and rather
> than doing select count(*), I would look at stats generated by
> vacuums. They give approximate number of tuples and it should be good
> enough it is accurate within a percent.

The stats might indeed be a good estimate presumed there were not many
changes since the last VACUUM.  But how about a variant of COUNT(*)
using an index?  It would not be quite exact since it might contain
tuples not visible in the current transaction, but it might be a much
better estimate than the stats.