Home > mailing lists

Re: Improving count(*) - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: Improving count(*)
Date	November 17, 2005 20:08:05
Msg-id	1132272483.4959.299.camel@localhost.localdomain Whole thread
In response to	Re: Improving count(*) ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses	Re: Improving count() Re: Improving count() Re: Improving count(*)
List	pgsql-hackers

Tree view

On Thu, 2005-11-17 at 16:30 -0600, Kevin Grittner wrote:
> In Sybase ASE (and I'm pretty sure the same is true in Microsoft SQL
> Server) the leaf level of the narrowest index on the table is scanned,
> following a linked list of leaf pages.  Leaf pages can be pretty dense
> under Sybase, because they do use prefix compression.  A count(*)
> on a table with 100 million rows is going to take a few minutes, but it
> is going to be at least an order of magnitude faster than a data page
> scan -- maybe two orders of magnitude faster.
> 
> What I don't understand is why people need to do such things so
> frequently that it's a major issue, rather than an occassional
> annoyance.  

Agreed, completely. (And it galls me to agree with multiple, potentially
opposed opinions on my own thread).

The trouble is, people moan and constantly. Perhaps we should stick to
our guns and say, why do you care? From here, I think we should say,
"show me an application package that needs this so badly we'll change
PostgreSQL just for them". Prove it and we'll do it. Kinda polite in the
TODO, but I think we should put something in there that says "things we
haven't yet had any good reason to improve".

> A solution which not only helped the count(*) issue
> but also allowed index scans to skip the trip to the data page to
> see if it's an active version seems like it would boost performance
> overall.  As pointed out elsewhere, it could also allow new
> techniques for vacuum which could be beneficial.
> 
> My view is that when tables get so big that a count(*) takes that
> much time, you don't typiclally need an EXACT count anyway --
> you could normally check the statistics from your nightly analyze.

Amen.

>From here, another proposal. We have a GUC called count_uses_estimate
that is set to off by default. If set to true, then a count(*) will use
the planner logic to estimate number of rows in the table and return
that as the answer, rather than actually count the row. Unless analyze
statistics are not available, in which case it does the real count.

Best Regards, Simon Riggs

pgsql-hackers by date:

From: Joe Conway
Date: 17 November 2005, 20:03:43
Subject: Re: Some array semantics issues

From: Tom Lane
Date: 17 November 2005, 20:18:12
Subject: Re: Some array semantics issues

Re: Improving count(*) - Mailing list pgsql-hackers

Previous

Next