Re: Speedier count(*) - Mailing list pgsql-performance

From John A Meinel
Subject Re: Speedier count(*)
Date
Msg-id 42FAA4E6.5070703@arbash-meinel.com
Whole thread Raw
In response to Speedier count(*)  (Dan Harris <fbsd@drivefaster.net>)
List pgsql-performance
Dan Harris wrote:
> I have a web page for my customers that shows them count of records  and
> some min/max date ranges in each table of a database, as this is  how we
> bill them for service.  They can log in and check the counts  at any
> time.  I'd like for the counts to be as fresh as possible by  keeping
> this dynamic, but I will use a periodic 'snapshot'/cron job  if that is
> the only option to speed this up.   I have thought about  using the
> table statistics, but the estimate error is probably  unacceptable
> because of the billing purposes.
>
> For some reason, the SQL Server we migrated the app from can return
> count(*) in a split second on multi-million row tables, even though  it
> is a MUCH slower box hardware-wise, but it's now taking many  seconds to
> run. I have read in the archives the problems MVCC brings  into the
> count(*) dilemma forcing Pg to run a seq scan to get  counts.  Does
> SQLServer not use MVCC or have they found another  approach for arriving
> at this number?  Compounding all the min/max  and counts from other
> tables and all those queries take about a  minute to run. The tables
> will contain anywhere from 1 million to 40  million rows.

I believe SQL Server doesn't use MVCC in the same way. At the very
least, it stores some row information in the index, so it can get some
info from just an index, without having to go to the actual page (MVCC
requires a main page visit to determine visibility.)

Depending on how much it impacts performance, you can create an
INSERT/UPDATE trigger so that whenever a new entry is added, it
automatically updates a statistics table. It would be maintained as you
go, rather than periodically like a cron job.

I would go Cron if things can be slightly out of date (like 1 hour at
least), and you need updates & inserts to not be slowed down.
Otherwise I think the trigger is nicer, since it doesn't do redundant
work, and means everything stays up-to-date.


>
> Also, I am using "select ... group by ... order by .. limit 1" to get
> the min/max since I have already been bit by the issue of min() max()
> being slower.
>
>
> -Dan

John
=:->

Attachment

pgsql-performance by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: Speedier count(*)
Next
From: Tom Lane
Date:
Subject: Re: partial index regarded more expensive