Thread: Re: [PERFORM] Slow count(*) again...

Re: [PERFORM] Slow count(*) again...

From
Bruce Momjian
Date:
Tom Lane wrote:
> At this point what we've got is 25% of the runtime in nodeAgg.c overhead,
> and it's difficult to see how to get any real improvement without tackling
> that.  Rather than apply the patch shown above, I'm tempted to think about
> hard-wiring COUNT(*) as a special case in nodeAgg.c such that we don't go
> through advance_aggregates/advance_transition_function at all, but just
> increment a counter directly.  However, that would very clearly be
> optimizing COUNT(*) and nothing else.  Given the opinions expressed
> elsewhere in this thread that heavy reliance on COUNT(*) represents
> bad application design, I'm not sure that such a patch would meet with
> general approval.
> 
> Actually the patch shown above is optimizing COUNT(*) and nothing else,
> too, since it's hard to conceive of any other zero-argument aggregate.
> 
> Anyway, if anyone is hot to make COUNT(*) faster, that's where to look.
> I don't think any of the previous discussion in this thread is on-point
> at all, except for the parts where people suggested avoiding it.

Do we want a TODO about optimizing COUNT(*) to avoid aggregate
processing overhead?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: [PERFORM] Slow count(*) again...

From
Andrew Dunstan
Date:

On 02/01/2011 05:47 PM, Bruce Momjian wrote:
> Tom Lane wrote:
>> At this point what we've got is 25% of the runtime in nodeAgg.c overhead,
>> and it's difficult to see how to get any real improvement without tackling
>> that.  Rather than apply the patch shown above, I'm tempted to think about
>> hard-wiring COUNT(*) as a special case in nodeAgg.c such that we don't go
>> through advance_aggregates/advance_transition_function at all, but just
>> increment a counter directly.  However, that would very clearly be
>> optimizing COUNT(*) and nothing else.  Given the opinions expressed
>> elsewhere in this thread that heavy reliance on COUNT(*) represents
>> bad application design, I'm not sure that such a patch would meet with
>> general approval.
>>
>> Actually the patch shown above is optimizing COUNT(*) and nothing else,
>> too, since it's hard to conceive of any other zero-argument aggregate.
>>
>> Anyway, if anyone is hot to make COUNT(*) faster, that's where to look.
>> I don't think any of the previous discussion in this thread is on-point
>> at all, except for the parts where people suggested avoiding it.
> Do we want a TODO about optimizing COUNT(*) to avoid aggregate
> processing overhead?

Whether or not it's bad application design, it's ubiquitous, and we 
should make it work as best we can, IMNSHO. This often generates 
complaints about Postgres, and if we really plan for world domination 
this needs to be part of it.

cheers

andrew


Re: [PERFORM] Slow count(*) again...

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 02/01/2011 05:47 PM, Bruce Momjian wrote:
>> Tom Lane wrote:
>>> At this point what we've got is 25% of the runtime in nodeAgg.c overhead,
>>> and it's difficult to see how to get any real improvement without tackling
>>> that.

>> Do we want a TODO about optimizing COUNT(*) to avoid aggregate
>> processing overhead?

> Whether or not it's bad application design, it's ubiquitous, and we 
> should make it work as best we can, IMNSHO. This often generates 
> complaints about Postgres, and if we really plan for world domination 
> this needs to be part of it.

I don't think that saving ~25% on COUNT(*) runtime will help that at all.
The people who complain about it expect it to be instantaneous.

If this sort of hack were free, I'd be all for doing it anyway; but I'm
concerned that adding tests to enable a fast path will slow down every
other aggregate, or else duplicate a lot of code that we'll then have to
maintain.
        regards, tom lane


Re: [PERFORM] Slow count(*) again...

From
Mladen Gogala
Date:
On 2/1/2011 5:47 PM, Bruce Momjian wrote:
> Do we want a TODO about optimizing COUNT(*) to avoid aggregate
> processing overhead?
>

Definitely not.  In my opinion, and I've seen more than a few database 
designs, having count(*) is almost always an error.
If I am counting a large table like the one below,  waiting for 30 
seconds more is not going to make much of a difference.
To paraphrase Kenny Rogers, it will be time enough for counting when the 
application is done.

Timing is on.
news=> select count(*) from moreover_documents_y2011m01;  count
---------- 20350907
(1 row)

Time: 124142.437 ms
news=>

-- 

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions





Re: [PERFORM] Slow count(*) again...

From
Bruce Momjian
Date:
Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > On 02/01/2011 05:47 PM, Bruce Momjian wrote:
> >> Tom Lane wrote:
> >>> At this point what we've got is 25% of the runtime in nodeAgg.c overhead,
> >>> and it's difficult to see how to get any real improvement without tackling
> >>> that.
> 
> >> Do we want a TODO about optimizing COUNT(*) to avoid aggregate
> >> processing overhead?
> 
> > Whether or not it's bad application design, it's ubiquitous, and we 
> > should make it work as best we can, IMNSHO. This often generates 
> > complaints about Postgres, and if we really plan for world domination 
> > this needs to be part of it.
> 
> I don't think that saving ~25% on COUNT(*) runtime will help that at all.
> The people who complain about it expect it to be instantaneous.
> 
> If this sort of hack were free, I'd be all for doing it anyway; but I'm
> concerned that adding tests to enable a fast path will slow down every
> other aggregate, or else duplicate a lot of code that we'll then have to
> maintain.

OK, thank you.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +