Thread: Good News re count(*) in 8.1
I hesitate to raise this issue again, but I've noticed something which I thought might be worth mentioning. I've never thought the performance of count(*) on a table was a significant issue, but I'm prepared to say that -- for me, at least -- it is officially and totally a NON-issue. We are replicating data from 72 source databases, each with the official copy of a subset of the data, to four identical consolidated databases, spread to separate locations, to serve our web site and other organization-wide needs. Currently, two of these central databases are running a commercial product and two are running PostgreSQL. There have been several times that I have run a SELECT COUNT(*) on an entire table on all central machines. On identical hardware, with identical data, and equivalent query loads, the PostgreSQL databases have responded with a count in 50% to 70% of the time of the commercial product, in spite of the fact that the commercial product does a scan of a non-clustered index while PostgreSQL scans the data pages. The tables have had from a few million to 132 million rows. The databases are about 415 GB each. The servers have 6 GB RAM each. We've been running PostgreSQL 8.1, tuned and maintained based on advice from the documentation and these lists. I suspect that where people report significantly worse performance for count(*) under PostgreSQL than some other product, it may sometimes be the case that they have not properly tuned PostgreSQL, or paid attention to maintenance issues regarding dead space in the tables. My recent experience, for what it's worth. -Kevin
Kevin, On 2/22/06 8:57 AM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > I hesitate to raise this issue again, but I've noticed something which I > thought might be worth mentioning. I've never thought the performance > of count(*) on a table was a significant issue, but I'm prepared to say > that -- for me, at least -- it is officially and totally a NON-issue. Cool! Kudos to Tom for implementing the improvements in the executor to move tuples faster through the pipeline. We see a CPU limit (yes, another limit) of about 300MB/s now on Opteron 250 processors running on Linux. The filesystem can do 420MB/s sequential scan in 8k pages, but Postgres count(*) on 8.1.3 can only do about 300MB/s. This is still a very large improvement over past versions, but we'd always like to see more... - Luke
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > We are replicating data from 72 source databases, each with the > official copy of a subset of the data, to four identical consolidated > databases, spread to separate locations, to serve our web site and other > organization-wide needs. Currently, two of these central databases are > running a commercial product and two are running PostgreSQL. There have > been several times that I have run a SELECT COUNT(*) on an entire table > on all central machines. On identical hardware, with identical data, > and equivalent query loads, the PostgreSQL databases have responded with > a count in 50% to 70% of the time of the commercial product, in spite of > the fact that the commercial product does a scan of a non-clustered > index while PostgreSQL scans the data pages. Interesting. I think though that the people who are complaining come from databases where COUNT(*) takes constant time because the DB keeps a running count in the table's metadata. regards, tom lane
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > There have been several times that I have run a SELECT COUNT(*) on an entire > table on all central machines. On identical hardware, with identical data, > and equivalent query loads, the PostgreSQL databases have responded with a > count in 50% to 70% of the time of the commercial product, in spite of the > fact that the commercial product does a scan of a non-clustered index while > PostgreSQL scans the data pages. I take it these are fairly narrow rows? The big benefit of index-only scans come in when you're scanning extremely wide tables, often counting rows matching some indexed criteria. -- greg
>>> On Wed, Feb 22, 2006 at 9:52 pm, in message <87irr6zq7j.fsf@stark.xeocode.com>, Greg Stark <gsstark@mit.edu> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > >> There have been several times that I have run a SELECT COUNT(*) on an entire >> table on all central machines. On identical hardware, with identical data, >> and equivalent query loads, the PostgreSQL databases have responded with a >> count in 50% to 70% of the time of the commercial product, in spite of the >> fact that the commercial product does a scan of a non- clustered index while >> PostgreSQL scans the data pages. > > I take it these are fairly narrow rows? The big benefit of index- only scans > come in when you're scanning extremely wide tables, often counting rows > matching some indexed criteria. I'm not sure what you would consider "fairly narrow rows" -- so see the attached. This is the VACUUM ANALYZE VERBOSE output for the largest table, from last night's regular maintenance run. -Kevin
Attachment
On Thu, Feb 23, 2006 at 12:54:52PM -0600, Kevin Grittner wrote: > >>> On Wed, Feb 22, 2006 at 9:52 pm, in message > <87irr6zq7j.fsf@stark.xeocode.com>, Greg Stark <gsstark@mit.edu> wrote: > > > > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > > > >> There have been several times that I have run a SELECT COUNT(*) on > an entire > >> table on all central machines. On identical hardware, with identical > data, > >> and equivalent query loads, the PostgreSQL databases have responded > with a > >> count in 50% to 70% of the time of the commercial product, in spite > of the > >> fact that the commercial product does a scan of a non- clustered > index while > >> PostgreSQL scans the data pages. > > > > I take it these are fairly narrow rows? The big benefit of index- > only scans > > come in when you're scanning extremely wide tables, often counting > rows > > matching some indexed criteria. > > I'm not sure what you would consider "fairly narrow rows" -- so see the > attached. This is the VACUUM ANALYZE VERBOSE output for the largest > table, from last night's regular maintenance run. Looks to be about 60 rows per page, somewhere around 140 bytes per row (including overhead). Accounting for overhead and allowing for some empty room, about 100 bytes of data per row, which isn't all that thin. Not all that fat, either... The PK index is about 5 times smaller. IF that ratio holds on the commercial product and they can't beat us with an index scan.... :) -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461