Thread: count(*) performance

count(*) performance

From
Gábriel Ákos
Date:
Hi,

I guess this is an age-old 100times answered question, but I didn't find
the answer to it yet (neither in the FAQ nor in the mailing list archives).

Question: I have a table with 2.5M rows. count(*) on this table is
running 4 minutes long. (dual opteron, 4gig ram, db on 4 disk raid10
array (sata, not scsi)) Is this normal? How could I make it run faster?
Maybe make it run faster for the 2nd time? Which parameters should I
change in postgresql.conf and how?





--
Üdvözlettel,
Gábriel Ákos
-=E-Mail :akos.gabriel@i-logic.hu|Web:  http://www.i-logic.hu=-
-=Tel/fax:+3612367353            |Mobil:+36209278894            =-

Re: count(*) performance

From
"Jim C. Nasby"
Date:
On Mon, Mar 27, 2006 at 03:34:32PM +0200, G?briel ?kos wrote:
> Hi,
>
> I guess this is an age-old 100times answered question, but I didn't find
> the answer to it yet (neither in the FAQ nor in the mailing list archives).
>
> Question: I have a table with 2.5M rows. count(*) on this table is
> running 4 minutes long. (dual opteron, 4gig ram, db on 4 disk raid10
> array (sata, not scsi)) Is this normal? How could I make it run faster?
> Maybe make it run faster for the 2nd time? Which parameters should I
> change in postgresql.conf and how?

First, count(*) on PostgreSQL tends to be slow because you can't do
index covering[1].

But in this case, I'd bet money that if it's taking 4 minutes something
else is wrong. Have you been vacuuming that table frequently enough?
What's SELECT relpages FROM pg_class WHERE relname='tablename' show?

[1] http://www.pervasive-postgres.com/lp/newsletters/2006/Insights_postgres_Feb.asp#5
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: count(*) performance

From
"Luke Lonergan"
Date:
Gabriel,

On 3/27/06 5:34 AM, "Gábriel Ákos" <akos.gabriel@i-logic.hu> wrote:

> Question: I have a table with 2.5M rows. count(*) on this table is
> running 4 minutes long. (dual opteron, 4gig ram, db on 4 disk raid10
> array (sata, not scsi)) Is this normal? How could I make it run faster?
> Maybe make it run faster for the 2nd time? Which parameters should I
> change in postgresql.conf and how?

Before changing anything with your Postgres configuration, you should check
your hard drive array performance.  All select count(*) does is a sequential
scan of your data, and if the table is larger than memory, or if it's the
first time you've scanned it, it is limited by your disk speed.

To test your disk speed, use the following commands and report the times
here:

  time bash -c "dd if=/dev/zero of=bigfile bs=8k count=500000 && sync"
  time dd if=bigfile of=/dev/null bs=8k

If these are taking a long time, from another session watch the I/O rate
with "vmstat 1" for a while and report that here.

- Luke



Re: count(*) performance

From
Gábriel Ákos
Date:
Luke Lonergan wrote:

> To test your disk speed, use the following commands and report the times
> here:
>
>   time bash -c "dd if=/dev/zero of=bigfile bs=8k count=500000 && sync"

root@panther:/fast #   time bash -c "dd if=/dev/zero of=bigfile bs=8k
count=500000 && sync"
500000+0 records in
500000+0 records out
4096000000 bytes transferred in 45.469404 seconds (90082553 bytes/sec)
real    0m56.880s
user    0m0.112s
sys     0m18.937s

>   time dd if=bigfile of=/dev/null bs=8k

root@panther:/fast #   time dd if=bigfile of=/dev/null bs=8k
500000+0 records in
500000+0 records out
4096000000 bytes transferred in 53.542147 seconds (76500481 bytes/sec)

real    0m53.544s
user    0m0.048s
sys     0m10.637s

I guess these values aren't that bad :)

--
Üdvözlettel,
Gábriel Ákos
-=E-Mail :akos.gabriel@i-logic.hu|Web:  http://www.i-logic.hu=-
-=Tel/fax:+3612367353            |Mobil:+36209278894            =-


Re: count(*) performance

From
Gábriel Ákos
Date:
Jim C. Nasby wrote:

> But in this case, I'd bet money that if it's taking 4 minutes something
> else is wrong. Have you been vacuuming that table frequently enough?

That gave me an idea. I thought that autovacuum is doing it right, but I
issued a vacuum full analyze verbose , and it worked all the day.
After that I've tweaked memory settings a bit too (more fsm_pages)

Now:

staging=# SELECT count(*) from infx.infx_product;
   count
---------
  3284997
(1 row)

Time: 1301.049 ms

As I saw the output, the database was compressed to 10% of its size :)
This table has quite big changes every 4 hour, let's see how it works.
Maybe I'll have to issue full vacuums from cron regularly.

> What's SELECT relpages FROM pg_class WHERE relname='tablename' show?

This went to 10% as well, now it's around 156000 pages.

Regards,
Akos


--
Üdvözlettel,
Gábriel Ákos
-=E-Mail :akos.gabriel@i-logic.hu|Web:  http://www.i-logic.hu=-
-=Tel/fax:+3612367353            |Mobil:+36209278894            =-


Re: count(*) performance

From
"Luke Lonergan"
Date:
Gabriel,

On 3/27/06 10:05 AM, "Gábriel Ákos" <akos.gabriel@i-logic.hu> wrote:

> That gave me an idea. I thought that autovacuum is doing it right, but I
> issued a vacuum full analyze verbose , and it worked all the day.
> After that I've tweaked memory settings a bit too (more fsm_pages)

Oops! I replied to your disk speed before I saw this.

The only thing is - you probably don't want to do a "vacuum full", but
rather a simple "vacuum" should be enough.

- Luke



Re: count(*) performance

From
Gábriel Ákos
Date:
Luke Lonergan wrote:
> Gabriel,
>
> On 3/27/06 10:05 AM, "Gábriel Ákos" <akos.gabriel@i-logic.hu> wrote:
>
>> That gave me an idea. I thought that autovacuum is doing it right, but I
>> issued a vacuum full analyze verbose , and it worked all the day.
>> After that I've tweaked memory settings a bit too (more fsm_pages)
>
> Oops! I replied to your disk speed before I saw this.
>
> The only thing is - you probably don't want to do a "vacuum full", but
> rather a simple "vacuum" should be enough.

I thought that too. Autovacuum is running on our system but it didn't do
the trick. Anyway the issue is solved, thank you all for helping. :)


--
Üdvözlettel,
Gábriel Ákos
-=E-Mail :akos.gabriel@i-logic.hu|Web:  http://www.i-logic.hu=-
-=Tel/fax:+3612367353            |Mobil:+36209278894            =-

Re: count(*) performance

From
Brendan Duddridge
Date:
Does that mean that even though autovacuum is turned on, you still
should do a regular vacuum analyze periodically?

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Mar 27, 2006, at 11:14 AM, Luke Lonergan wrote:

> Gabriel,
>
> On 3/27/06 10:05 AM, "Gábriel Ákos" <akos.gabriel@i-logic.hu> wrote:
>
>> That gave me an idea. I thought that autovacuum is doing it right,
>> but I
>> issued a vacuum full analyze verbose , and it worked all the day.
>> After that I've tweaked memory settings a bit too (more fsm_pages)
>
> Oops! I replied to your disk speed before I saw this.
>
> The only thing is - you probably don't want to do a "vacuum full", but
> rather a simple "vacuum" should be enough.
>
> - Luke
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


Attachment

Re: count(*) performance

From
"Matthew T. O'Connor"
Date:
Gábriel Ákos wrote:
> Luke Lonergan wrote:
>> Gabriel,
>>
>> On 3/27/06 10:05 AM, "Gábriel Ákos" <akos.gabriel@i-logic.hu> wrote:
>>
>>> That gave me an idea. I thought that autovacuum is doing it right, but I
>>> issued a vacuum full analyze verbose , and it worked all the day.
>>> After that I've tweaked memory settings a bit too (more fsm_pages)
>>
>> Oops! I replied to your disk speed before I saw this.
>>
>> The only thing is - you probably don't want to do a "vacuum full", but
>> rather a simple "vacuum" should be enough.
>
> I thought that too. Autovacuum is running on our system but it didn't do
> the trick. Anyway the issue is solved, thank you all for helping. :)

Yeah, it would be nice of autovacuum had some way of raising a flag to
the admin that given current settings (thresholds, FSM etc...), it's not
keeping up with the activity.  I don't know how to do this, but I hope
someone else has some good ideas.

Matt


Re: count(*) performance

From
Alvaro Herrera
Date:
Brendan Duddridge wrote:
> Does that mean that even though autovacuum is turned on, you still
> should do a regular vacuum analyze periodically?

No, it probably means you have set FSM settings too low, or not tuned
the autovacuum parameters to your specific situation.

A bug in the autovacuum daemon is not unexpected however, so if it
doesn't work after tuning, let us know.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: count(*) performance

From
Guido Neitzer
Date:
On 27.03.2006, at 21:20 Uhr, Brendan Duddridge wrote:

> Does that mean that even though autovacuum is turned on, you still
> should do a regular vacuum analyze periodically?

It seems that there are situations where autovacuum does not a really
good job.

However, in our application I have made stupid design decision which
I want to change as soon as possible. I have a "visit count" column
in one of the very large tables, so updates are VERY regular. I've
just checked and saw that autovacuum does a great job with that.

Nevertheless I have set up a cron job to do a standard vacuum every
month. I've used vacuum full only once after I did a bulk update of
about 200.000 rows ...

cug

--
PharmaLine, Essen, GERMANY
Software and Database Development



Attachment

Re: count(*) performance

From
"Mikael Carneholm"
Date:
This is where a "last_vacuumed" (and "last_analyzed") column in
pg_statistic(?) would come in handy. Each time vacuum or analyze has
finished, update the row for the specific table that was
vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed
column. No more guessing "maybe I haven't vacuumed/analyzed in a while",
and each time a user complains about bad performance, one could request
the user to do a "select s.last_vacuumed, s.last_analyzed from
pg_statistic s, pg_attribute a, pg_class c where ..."

It SOUNDS easy to implement, but that has fooled me before... :-)

- Mikael

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Guido
Neitzer
Sent: den 27 mars 2006 21:44
To: Brendan Duddridge
Cc: Postgresql Performance
Subject: Re: [PERFORM] count(*) performance


On 27.03.2006, at 21:20 Uhr, Brendan Duddridge wrote:

> Does that mean that even though autovacuum is turned on, you still
> should do a regular vacuum analyze periodically?

It seems that there are situations where autovacuum does not a really
good job.

However, in our application I have made stupid design decision which
I want to change as soon as possible. I have a "visit count" column
in one of the very large tables, so updates are VERY regular. I've
just checked and saw that autovacuum does a great job with that.

Nevertheless I have set up a cron job to do a standard vacuum every
month. I've used vacuum full only once after I did a bulk update of
about 200.000 rows ...

cug

--
PharmaLine, Essen, GERMANY
Software and Database Development



Re: count(*) performance

From
"Matthew T. O'Connor"
Date:
Mikael Carneholm wrote:
> This is where a "last_vacuumed" (and "last_analyzed") column in
> pg_statistic(?) would come in handy. Each time vacuum or analyze has
> finished, update the row for the specific table that was
> vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed
> column. No more guessing "maybe I haven't vacuumed/analyzed in a while",
> and each time a user complains about bad performance, one could request
> the user to do a "select s.last_vacuumed, s.last_analyzed from
> pg_statistic s, pg_attribute a, pg_class c where ..."
>
> It SOUNDS easy to implement, but that has fooled me before... :-)


It is fairly easy to implement, however it has been discussed before and
decided that it wasn't necessary.  What the system cares about is how
long it's been since the last vacuum in terms of XIDs not time.  Storing
a timestamp would make it more human readable, but I'm not sure the
powers that be want to add two new columns to some system table to
accommodate this.

Matt

Re: count(*) performance

From
Tom Lane
Date:
"Matthew T. O'Connor" <matthew@zeut.net> writes:
> It is fairly easy to implement, however it has been discussed before and
> decided that it wasn't necessary.  What the system cares about is how
> long it's been since the last vacuum in terms of XIDs not time.

I think Alvaro is intending to do the latter (store per-table vacuum xid
info) for 8.2.

            regards, tom lane

Re: count(*) performance

From
"Mikael Carneholm"
Date:
I think it is definitely necessary from an administration point of view - as an administrator, I want to know:

1) Are there any stats (at all) in a schema
2) Are there any stats on the table that slow_query_foo is targeting
3) If I have stats, how recent are they
4) Could it be that there are a lot of dead tuples lying around (given the amount of traffic I know I have)

These would be (are always!) the first questions I ask myself when I'm about to identify performance problems in an
app,don't know how other people do though :) 

Maybe something I'll try to look into this weekend, if I can spare some time.

- Mikael


-----Original Message-----
From: Matthew T. O'Connor [mailto:matthew@zeut.net]
Sent: den 28 mars 2006 00:43
To: Mikael Carneholm
Cc: Postgresql Performance
Subject: Re: [PERFORM] count(*) performance


Mikael Carneholm wrote:
> This is where a "last_vacuumed" (and "last_analyzed") column in
> pg_statistic(?) would come in handy. Each time vacuum or analyze has
> finished, update the row for the specific table that was
> vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed
> column. No more guessing "maybe I haven't vacuumed/analyzed in a while",
> and each time a user complains about bad performance, one could request
> the user to do a "select s.last_vacuumed, s.last_analyzed from
> pg_statistic s, pg_attribute a, pg_class c where ..."
>
> It SOUNDS easy to implement, but that has fooled me before... :-)


It is fairly easy to implement, however it has been discussed before and
decided that it wasn't necessary.  What the system cares about is how
long it's been since the last vacuum in terms of XIDs not time.  Storing
a timestamp would make it more human readable, but I'm not sure the
powers that be want to add two new columns to some system table to
accommodate this.

Matt

Re: count(*) performance

From
Markus Schaber
Date:
Gábriel Ákos wrote:

> I thought that too. Autovacuum is running on our system but it didn't do
> the trick. Anyway the issue is solved, thank you all for helping. :)

Hi, Gabriel, it may be that your Free Space Map (FSM) setting is way to
low.

Try increasing it.

Btw, VACUUM outputs a Warning if FSM is not high enough, maybe you can
find useful hints in the log file.

HTH
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org


Re: count(*) performance

From
"Jim C. Nasby"
Date:
On Mon, Mar 27, 2006 at 12:20:54PM -0700, Brendan Duddridge wrote:
> Does that mean that even though autovacuum is turned on, you still
> should do a regular vacuum analyze periodically?

Doing a periodic vacuumdb -avz and keeping an eye on the last few lines
isn't a bad idea. It would also be helpful if there was a log parser
that could take a look at the output of a vacuumdb -av and look for any
problem areas, such as relations that have a lot of free space in them.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461