Thread: [RFC] speed up count(*)

[RFC] speed up count(*)

From
John Naylor
Date:
Hi,

Perennially our users have complaints about slow count(*) when coming from some other systems. Index-only scans help, but I think we can do better. I recently wondered if a BRIN index could be used to answer min/max aggregate queries over the whole table, and it turns out it doesn't. However, then it occurred to me that if we had an opclass that keeps track of the count in each page range, that would be a way to do a fast count(*) by creating the right index. That would require planner support and other work, but it seems doable. Any opinions on whether this is worth the effort?

--

Re: [RFC] speed up count(*)

From
Tom Lane
Date:
John Naylor <john.naylor@enterprisedb.com> writes:
> Perennially our users have complaints about slow count(*) when coming from
> some other systems. Index-only scans help, but I think we can do better. I
> recently wondered if a BRIN index could be used to answer min/max aggregate
> queries over the whole table, and it turns out it doesn't. However, then it
> occurred to me that if we had an opclass that keeps track of the count in
> each page range, that would be a way to do a fast count(*) by creating the
> right index. That would require planner support and other work, but it
> seems doable. Any opinions on whether this is worth the effort?

The core reason why this is hard is that we insist on giving the right
answer.  In particular, count(*) is supposed to count the rows that
satisfy the asker's snapshot.  So I don't see a good way to answer it
from an index only, given that we don't track visibility accurately
in indexes.

            regards, tom lane



Re: [RFC] speed up count(*)

From
Andres Freund
Date:
Hi,

On October 20, 2021 10:57:50 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>John Naylor <john.naylor@enterprisedb.com> writes:
>> Perennially our users have complaints about slow count(*) when coming from
>> some other systems. Index-only scans help, but I think we can do better. I
>> recently wondered if a BRIN index could be used to answer min/max aggregate
>> queries over the whole table, and it turns out it doesn't. However, then it
>> occurred to me that if we had an opclass that keeps track of the count in
>> each page range, that would be a way to do a fast count(*) by creating the
>> right index. That would require planner support and other work, but it
>> seems doable. Any opinions on whether this is worth the effort?
>
>The core reason why this is hard is that we insist on giving the right
>answer.  In particular, count(*) is supposed to count the rows that
>satisfy the asker's snapshot.  So I don't see a good way to answer it
>from an index only, given that we don't track visibility accurately
>in indexes.

Yeah.

If we really wanted to, we could accelerate unqualified count(*) substantially by computing the count inside heapam.
There'sa *lot* of overhead associated with returning tuples, grouping them, etc. Especially with all_visible set that's
boundto be way faster (I'd guess are least 3-5x) if done in heapam (like we do the visibility determinations in
heapgetpagefor all tuples on a page at once). 


But it's doubtful the necessary infrastructure is worth it. Perhaps that changes with the infrastructure some columnar
AMsare asking for. They have a need to push more stuff down to the AM that's more generic than just count(*). 

Regards,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: [RFC] speed up count(*)

From
Tomas Vondra
Date:
On 10/20/21 19:57, Tom Lane wrote:
> John Naylor <john.naylor@enterprisedb.com> writes:
>> Perennially our users have complaints about slow count(*) when coming from
>> some other systems. Index-only scans help, but I think we can do better. I
>> recently wondered if a BRIN index could be used to answer min/max aggregate
>> queries over the whole table, and it turns out it doesn't. However, then it
>> occurred to me that if we had an opclass that keeps track of the count in
>> each page range, that would be a way to do a fast count(*) by creating the
>> right index. That would require planner support and other work, but it
>> seems doable. Any opinions on whether this is worth the effort?
> 
> The core reason why this is hard is that we insist on giving the right
> answer.  In particular, count(*) is supposed to count the rows that
> satisfy the asker's snapshot.  So I don't see a good way to answer it
> from an index only, given that we don't track visibility accurately
> in indexes.
> 

Couldn't we simply inspect the visibility map, use the index data only 
for fully visible/summarized ranges, and inspect the heap for the 
remaining pages? That'd still be a huge improvement for tables with most 
only a few pages modified recently, which is a pretty common case.

I think the bigger issue is that people rarely do COUNT(*) on the whole 
table. There are usually other conditions and/or GROUP BY, and I'm not 
sure how would that work.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [RFC] speed up count(*)

From
John Naylor
Date:

On Wed, Oct 20, 2021 at 2:23 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> Couldn't we simply inspect the visibility map, use the index data only
> for fully visible/summarized ranges, and inspect the heap for the
> remaining pages? That'd still be a huge improvement for tables with most
> only a few pages modified recently, which is a pretty common case.
>
> I think the bigger issue is that people rarely do COUNT(*) on the whole
> table. There are usually other conditions and/or GROUP BY, and I'm not
> sure how would that work.

Right. My (possibly hazy) recollection is that people don't have quite as high an expectation for queries with more complex predicates and/or grouping. It would be interesting to see what the balance is.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [RFC] speed up count(*)

From
Tomas Vondra
Date:

On 10/20/21 20:33, John Naylor wrote:
> 
> On Wed, Oct 20, 2021 at 2:23 PM Tomas Vondra 
> <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>> 
> wrote:
>  >
>  > Couldn't we simply inspect the visibility map, use the index data only
>  > for fully visible/summarized ranges, and inspect the heap for the
>  > remaining pages? That'd still be a huge improvement for tables with most
>  > only a few pages modified recently, which is a pretty common case.
>  >
>  > I think the bigger issue is that people rarely do COUNT(*) on the whole
>  > table. There are usually other conditions and/or GROUP BY, and I'm not
>  > sure how would that work.
> 
> Right. My (possibly hazy) recollection is that people don't have quite 
> as high an expectation for queries with more complex predicates and/or 
> grouping. It would be interesting to see what the balance is.
> 

I don't know where the balance is, and I doubt it's possible to answer 
that in general - I'm sure some workloads might benefit significantly.

I wonder if multi-column BRIN indexes would help in cases with 
additional predicates. Seems possible.

BTW you mentioned using BRIN indexes for min/max - I've been thinking 
about using BRIN indexes for ordering/sorting, which seems related. And 
I think it's actually doable, so I wonder why you concluded using BRIN 
indexes for min/max is not possible?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [RFC] speed up count(*)

From
John Naylor
Date:

On Wed, Oct 20, 2021 at 2:41 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

> BTW you mentioned using BRIN indexes for min/max - I've been thinking
> about using BRIN indexes for ordering/sorting, which seems related. And
> I think it's actually doable, so I wonder why you concluded using BRIN
> indexes for min/max is not possible?

I just gathered it was not implemented in the planner, going by the explain plan in the toy query I tried, and then I got the lightbulb in my head about count(*).

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [RFC] speed up count(*)

From
Joe Conway
Date:
On 10/20/21 2:33 PM, John Naylor wrote:
> 
> On Wed, Oct 20, 2021 at 2:23 PM Tomas Vondra 
> <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>> 
> wrote:
>  >
>  > Couldn't we simply inspect the visibility map, use the index data only
>  > for fully visible/summarized ranges, and inspect the heap for the
>  > remaining pages? That'd still be a huge improvement for tables with most
>  > only a few pages modified recently, which is a pretty common case.
>  >
>  > I think the bigger issue is that people rarely do COUNT(*) on the whole
>  > table. There are usually other conditions and/or GROUP BY, and I'm not
>  > sure how would that work.
> 
> Right. My (possibly hazy) recollection is that people don't have quite 
> as high an expectation for queries with more complex predicates and/or 
> grouping. It would be interesting to see what the balance is.

I think you are exactly correct. People seem to understand that with a 
predicate it is harder, but they expect

  select count(*) from foo;

to be nearly instantaneous, and they don't really need it to be exact. 
The stock answer for that has been to do

  select reltuples from pg_class
  where relname = 'foo';

But that is unsatisfying because the problem is often with some 
benchmark or another that cannot be changed.

I'm sure this idea will be shot down in flames <donning flameproof 
suit>, but what if we had a default "off" GUC which could be turned on 
causing the former to be transparently rewritten into the latter 
</donning flameproof suit>?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: [RFC] speed up count(*)

From
Robert Haas
Date:
On Thu, Oct 21, 2021 at 9:09 AM Joe Conway <mail@joeconway.com> wrote:
> I think you are exactly correct. People seem to understand that with a
> predicate it is harder, but they expect
>
>   select count(*) from foo;
>
> to be nearly instantaneous, and they don't really need it to be exact.
> The stock answer for that has been to do
>
>   select reltuples from pg_class
>   where relname = 'foo';
>
> But that is unsatisfying because the problem is often with some
> benchmark or another that cannot be changed.

I would also expect it to almost always give the wrong answer.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [RFC] speed up count(*)

From
Joe Conway
Date:
On 10/21/21 4:06 PM, Robert Haas wrote:
> On Thu, Oct 21, 2021 at 9:09 AM Joe Conway <mail@joeconway.com> wrote:
>> I think you are exactly correct. People seem to understand that with a
>> predicate it is harder, but they expect
>>
>>   select count(*) from foo;
>>
>> to be nearly instantaneous, and they don't really need it to be exact.
>> The stock answer for that has been to do
>>
>>   select reltuples from pg_class
>>   where relname = 'foo';
>>
>> But that is unsatisfying because the problem is often with some
>> benchmark or another that cannot be changed.
> 
> I would also expect it to almost always give the wrong answer.


That is a grossly overstated position. When I have looked, it is often 
not that terribly far off. And for many use cases that I have heard of 
at least, quite adequate.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: [RFC] speed up count(*)

From
Robert Haas
Date:
On Thu, Oct 21, 2021 at 4:19 PM Joe Conway <mail@joeconway.com> wrote:
> That is a grossly overstated position. When I have looked, it is often
> not that terribly far off. And for many use cases that I have heard of
> at least, quite adequate.

I don't think it's grossly overstated. If you need an approximation it
may be good enough, but I don't think it will often be exactly correct
- probably only if the table is small and rarely modified.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [RFC] speed up count(*)

From
Joe Conway
Date:
On 10/21/21 4:23 PM, Robert Haas wrote:
> On Thu, Oct 21, 2021 at 4:19 PM Joe Conway <mail@joeconway.com> wrote:
>> That is a grossly overstated position. When I have looked, it is often
>> not that terribly far off. And for many use cases that I have heard of
>> at least, quite adequate.
> 
> I don't think it's grossly overstated. If you need an approximation it
> may be good enough, but I don't think it will often be exactly correct
> - probably only if the table is small and rarely modified.

meh -- the people who expect this to be impossibly fast don't typically 
need or expect it to be exactly correct, and there is no way to make it 
"exactly correct" in someone's snapshot without doing all the work.

That is why I didn't suggest making it the default. If you flip the 
switch, you would get a very fast approximation. If you care about 
accuracy, you accept it has to be slow.

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: [RFC] speed up count(*)

From
Andrew Dunstan
Date:
On 10/21/21 4:29 PM, Joe Conway wrote:
> On 10/21/21 4:23 PM, Robert Haas wrote:
>> On Thu, Oct 21, 2021 at 4:19 PM Joe Conway <mail@joeconway.com> wrote:
>>> That is a grossly overstated position. When I have looked, it is often
>>> not that terribly far off. And for many use cases that I have heard of
>>> at least, quite adequate.
>>
>> I don't think it's grossly overstated. If you need an approximation it
>> may be good enough, but I don't think it will often be exactly correct
>> - probably only if the table is small and rarely modified.
>
> meh -- the people who expect this to be impossibly fast don't
> typically need or expect it to be exactly correct, and there is no way
> to make it "exactly correct" in someone's snapshot without doing all
> the work.
>
> That is why I didn't suggest making it the default. If you flip the
> switch, you would get a very fast approximation. If you care about
> accuracy, you accept it has to be slow.
>

I don't think we really want a switch for "inaccurate results
acceptable", and I doubt the standard would accept an approximation for
count(*).

But something else that gave a fast approximate answer
("count_estimate(*)"?) would be useful to many. Not portable but still
useful, if someone could come up with a reasonable implementation.


cheers


andrew

 

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: [RFC] speed up count(*)

From
Robert Haas
Date:
On Thu, Oct 21, 2021 at 4:29 PM Joe Conway <mail@joeconway.com> wrote:
> meh -- the people who expect this to be impossibly fast don't typically
> need or expect it to be exactly correct, and there is no way to make it
> "exactly correct" in someone's snapshot without doing all the work.

I think it could actually be WAY faster than it is if, as Andres says,
we had the ability to push the count operation inside the heap AM. I
believe we have a tendency to attribute complaints like this to people
have unreasonable expectations, but here I'm not sure the expectation
is unreasonable. I vaguely recall writing a special-purpose code to
count the number of tuples in relation years ago, and IIRC it was
blazingly fast compared to letting our executor do it.  I agree,
however, that an approximation can be faster still.

> That is why I didn't suggest making it the default. If you flip the
> switch, you would get a very fast approximation. If you care about
> accuracy, you accept it has to be slow.

I'm not really here to take a position on the proposal. It doesn't
excite me, because I have not run across any users in the combination
of circumstances you mention: query can't be changed, exact answer not
actually required, whole table being counted. But I am not here to
call you a liar either. If you run across users in that situation all
the time, then you do.

-- 
Robert Haas
EDB: http://www.enterprisedb.com