Thread: Good News re count(*) in 8.1

Good News re count(*) in 8.1

From
"Kevin Grittner"
Date:
I hesitate to raise this issue again, but I've noticed something which I
thought might be worth mentioning.  I've never thought the performance
of count(*) on a table was a significant issue, but I'm prepared to say
that -- for me, at least -- it is officially and totally a NON-issue.

We are replicating data from 72 source databases, each with the
official copy of a subset of the data, to four identical consolidated
databases, spread to separate locations, to serve our web site and other
organization-wide needs.  Currently, two of these central databases are
running a commercial product and two are running PostgreSQL.  There have
been several times that I have run a SELECT COUNT(*) on an entire table
on all central machines.  On identical hardware, with identical data,
and equivalent query loads, the PostgreSQL databases have responded with
a count in 50% to 70% of the time of the commercial product, in spite of
the fact that the commercial product does a scan of a non-clustered
index while PostgreSQL scans the data pages.

The tables have had from a few million to 132 million rows.  The
databases are about 415 GB each.  The servers have 6 GB RAM each.  We've
been running PostgreSQL 8.1, tuned and maintained based on advice from
the documentation and these lists.

I suspect that where people report significantly worse performance for
count(*) under PostgreSQL than some other product, it may sometimes be
the case that they have not properly tuned PostgreSQL, or paid attention
to maintenance issues regarding dead space in the tables.

My recent experience, for what it's worth.

-Kevin


Re: Good News re count(*) in 8.1

From
"Luke Lonergan"
Date:
Kevin,

On 2/22/06 8:57 AM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

> I hesitate to raise this issue again, but I've noticed something which I
> thought might be worth mentioning.  I've never thought the performance
> of count(*) on a table was a significant issue, but I'm prepared to say
> that -- for me, at least -- it is officially and totally a NON-issue.

Cool!  Kudos to Tom for implementing the improvements in the executor to
move tuples faster through the pipeline.

We see a CPU limit (yes, another limit) of about 300MB/s now on Opteron 250
processors running on Linux.  The filesystem can do 420MB/s sequential scan
in 8k pages, but Postgres count(*) on 8.1.3 can only do about 300MB/s.  This
is still a very large improvement over past versions, but we'd always like
to see more...

- Luke



Re: Good News re count(*) in 8.1

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> We are replicating data from 72 source databases, each with the
> official copy of a subset of the data, to four identical consolidated
> databases, spread to separate locations, to serve our web site and other
> organization-wide needs.  Currently, two of these central databases are
> running a commercial product and two are running PostgreSQL.  There have
> been several times that I have run a SELECT COUNT(*) on an entire table
> on all central machines.  On identical hardware, with identical data,
> and equivalent query loads, the PostgreSQL databases have responded with
> a count in 50% to 70% of the time of the commercial product, in spite of
> the fact that the commercial product does a scan of a non-clustered
> index while PostgreSQL scans the data pages.

Interesting.  I think though that the people who are complaining come
from databases where COUNT(*) takes constant time because the DB keeps
a running count in the table's metadata.

            regards, tom lane

Re: Good News re count(*) in 8.1

From
Greg Stark
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

> There have been several times that I have run a SELECT COUNT(*) on an entire
> table on all central machines. On identical hardware, with identical data,
> and equivalent query loads, the PostgreSQL databases have responded with a
> count in 50% to 70% of the time of the commercial product, in spite of the
> fact that the commercial product does a scan of a non-clustered index while
> PostgreSQL scans the data pages.

I take it these are fairly narrow rows? The big benefit of index-only scans
come in when you're scanning extremely wide tables, often counting rows
matching some indexed criteria.

--
greg

Re: Good News re count(*) in 8.1

From
"Kevin Grittner"
Date:
>>> On Wed, Feb 22, 2006 at  9:52 pm, in message
<87irr6zq7j.fsf@stark.xeocode.com>, Greg Stark <gsstark@mit.edu> wrote:


> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>
>> There have been several times that I have run a SELECT COUNT(*) on
an entire
>> table on all central machines. On identical hardware, with identical
data,
>> and equivalent query loads, the PostgreSQL databases have responded
with a
>> count in 50% to 70% of the time of the commercial product, in spite
of the
>> fact that the commercial product does a scan of a non- clustered
index while
>> PostgreSQL scans the data pages.
>
> I take it these are fairly narrow rows? The big benefit of index-
only scans
> come in when you're scanning extremely wide tables, often counting
rows
> matching some indexed criteria.

I'm not sure what you would consider "fairly narrow rows" -- so see the
attached.  This is the VACUUM ANALYZE VERBOSE output for the largest
table, from last night's regular maintenance run.

-Kevin



Attachment

Re: Good News re count(*) in 8.1

From
"Jim C. Nasby"
Date:
On Thu, Feb 23, 2006 at 12:54:52PM -0600, Kevin Grittner wrote:
> >>> On Wed, Feb 22, 2006 at  9:52 pm, in message
> <87irr6zq7j.fsf@stark.xeocode.com>, Greg Stark <gsstark@mit.edu> wrote:
>
>
> > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> >
> >> There have been several times that I have run a SELECT COUNT(*) on
> an entire
> >> table on all central machines. On identical hardware, with identical
> data,
> >> and equivalent query loads, the PostgreSQL databases have responded
> with a
> >> count in 50% to 70% of the time of the commercial product, in spite
> of the
> >> fact that the commercial product does a scan of a non- clustered
> index while
> >> PostgreSQL scans the data pages.
> >
> > I take it these are fairly narrow rows? The big benefit of index-
> only scans
> > come in when you're scanning extremely wide tables, often counting
> rows
> > matching some indexed criteria.
>
> I'm not sure what you would consider "fairly narrow rows" -- so see the
> attached.  This is the VACUUM ANALYZE VERBOSE output for the largest
> table, from last night's regular maintenance run.

Looks to be about 60 rows per page, somewhere around 140 bytes per row
(including overhead). Accounting for overhead and allowing for some
empty room, about 100 bytes of data per row, which isn't all that thin.
Not all that fat, either... The PK index is about 5 times smaller. IF
that ratio holds on the commercial product and they can't beat us with
an index scan.... :)
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461