Thread: "select count(*) from contacts" is too slow!

"select count(*) from contacts" is too slow!

From

Paul Serby

Date:

07 October 2003, 14:55:00

Why does 'select count(id) from "tblContacts"' do a sequential scan when the field 'id' is indexed using a btree?

MySql simply looks at the index which is keeping a handy record of the number of rows.

Can anybody explain how and why postgres does this query like it does?

Many thanks

Paul

Re: "select count(*) from contacts" is too slow!

From

"Nigel J. Andrews"

Date:

07 October 2003, 15:39:30

On Tue, 7 Oct 2003, Paul Serby wrote:

> Why does '*select count(id) from "tblContacts"'* do a sequential scan
> when the field '*id*' is indexed using a btree?
>
> MySql simply looks at the index which is keeping a handy record of the
> number of rows.
>
> Can anybody explain how and why postgres does this query like it does?

It's a FAQ I believe.

MySQL can tell you from it's index because it doesn't care if it gives you the
right number or not.

--
Nigel Andrews

Re: "select count(*) from contacts" is too slow!

From

Stephan Szabo

Date:

07 October 2003, 15:55:14

On Tue, 7 Oct 2003, Paul Serby wrote:

> Why does '*select count(id) from "tblContacts"'* do a sequential scan
> when the field '*id*' is indexed using a btree?
>
> MySql simply looks at the index which is keeping a handy record of the
> number of rows.
>
> Can anybody explain how and why postgres does this query like it does?

Because the index doesn't contain enough information to determine if a
particular row is visible to your transaction or not.  It would have to go
read the table to find that out, at which point using the index doesn't
help.  There's been a recent discussion of this on one of the lists
(either -general or -performance I'd guess) that you might want to look up
in the archives.

Re: "select count(*) from contacts" is too slow!

From

nolan@celery.tssi.com

Date:

07 October 2003, 21:43:32

> MySQL can tell you from it's index because it doesn't care if it gives you the
> right number or not.

Under what circumstances would MySQL give the wrong number?
--
Mike Nolan

Re: "select count(*) from contacts" is too slow!

From

Christopher Browne

Date:

08 October 2003, 13:16:49

A long time ago, in a galaxy far, far away, paul.serby@clockltd.com (Paul Serby) wrote:
> Why does 'select count(id) from "tblContacts"' do a sequential scan
> when the field 'id' is indexed using a btree?  MySql simply looks at
> the index which is keeping a handy record of the number of rows.
> Can anybody explain how and why postgres does this query like it
> does?

Look into the semantics of MVCC (MultiVersion Concurrency Control);
that (otherwise useful) feature prevents having any such "handy
record."
--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/spreadsheets.html
Developmental Psychology
"Schoolyard behavior resembles adult primate behavior because "Ontogeny
Recapitulates Phylogeny" doesn't stop at birth."
-- Mark Miller

Re: "select count(*) from contacts" is too slow!

From

Christopher Browne

Date:

08 October 2003, 13:16:57

A long time ago, in a galaxy far, far away, nolan@celery.tssi.com wrote:
>> MySQL can tell you from it's index because it doesn't care if it gives you the
>> right number or not.
>
> Under what circumstances would MySQL give the wrong number?

It would give the wrong number under _every_ circumstance where there
are uncommitted INSERTs or DELETEs.
--
select 'cbbrowne' || '@' || 'ntlug.org';
http://www3.sympatico.ca/cbbrowne/sap.html
Appendium to  the Rules  of the  Evil Overlord #1:  "I will  not build
excessively integrated  security-and-HVAC systems. They  may be Really
Cool, but are far too vulnerable to breakdowns."

Re: "select count(*) from contacts" is too slow!

From

Ang Chin Han

Date:

09 October 2003, 01:17:37

Christopher Browne wrote:
> A long time ago, in a galaxy far, far away, nolan@celery.tssi.com wrote:
>
>>>MySQL can tell you from it's index because it doesn't care if it gives you the
>>>right number or not.
>>
>>Under what circumstances would MySQL give the wrong number?
>
>
> It would give the wrong number under _every_ circumstance where there
> are uncommitted INSERTs or DELETEs.

Give them some credit. I just double checked:

Using mysql 4.0.14 + innodb and transactions,

select count(*) from foo;

does not count uncommited INSERTs.

Heck, even using myisam, mysql's count(*)'s still accurate, since all
INSERTs, etc are autocommitted.

--
Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386
GNU/Linux
  12:00pm  up 287 days,  3:33,  7 users,  load average: 6.93, 6.31, 6.16

Attachment

msg-32091-68473.dat

Re: "select count(*) from contacts" is too slow!

From

Greg Stark

Date:

09 October 2003, 02:09:28

Ang Chin Han <angch@bytecraft.com.my> writes:

> Heck, even using myisam, mysql's count(*)'s still accurate, since all INSERTs,
> etc are autocommitted.

That's sort of true, but not the whole story. Even autocommitted transactions
can be pending for a significant amount of time. The reason it's accurate is
because with mysql isam tables all updates take a table level lock. So there's
never a chance to select the count while an uncommitted transaction is
pending, even if the update takes a long time.

This is simple and efficient when you have low levels of concurrency. But when
you have 4+ processors or transactions involving lots of disk i/o it kills
scalability.

I'm curious how it's implemented with innodb tables. Do they still take a
table-level lock when committing to update the counters? What happens to
transactions that have already started, do they see the new value?

Actually it occurs to me that that might be ok for read-committed. Is there
ever a situation where a count(*) needs to represent an old snapshot in
read-committed? It has to for long-running selects, but if the count(*) itself
is always fast that need should never arise, just shared-lock and read the
value and unlock.

In order words, imagine if you had every transaction keep a net delta of rows
for every table and at commit time locked the entire table and updated the
count. The lock would be a point of contention but it would be very fast since
it would only have to update an integer with a precalculated adjustment. In
read-committed mode that would always be a valid value. (The transaction would
have to apply its own deltas I guess.)

--
greg