Re: Much Ado About COUNT(*) - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Much Ado About COUNT(*)
Date
Msg-id 1105562498.2886.344.camel@jeff
Whole thread Raw
In response to Re: Much Ado About COUNT(*)  (Greg Stark <gsstark@mit.edu>)
Responses Re: Much Ado About COUNT(*)
List pgsql-hackers

> No, sequential scans require slightly more i/o than index scans. More
> importantly they require random access i/o instead of sequential i/o which is
> much slower.
> 

Just to clear it up, I think what you meant was the index requires
random i/o, not the table. And the word "slightly" depends on the size
of the table I suppose. And of course it also depends on how many tuples
you actually need to retrieve (in this case we're talking about
retrieving all the tuples ragardless).

> Though this depends. If the tuple is very wide then the index might be faster
> to scan since it would only contain the data from the fields being indexed.
> 

That, and it seems strange on the surface to visit every entry in an
index, since normally indexes are used to find only a small fraction of
the tuples.

> This brings to mind another approach. It might be handy to split the heap for
> a table into multiple heaps. The visibility information would only be in one
> of the heaps. This would be a big win if many of the fields were rarely used,
> especially if they're rarely used by sequential scans.

Except then the two heaps would have to be joined somehow for every
operation. It makes sense some times to (if you have a very wide table)
split off the rarely-accessed attributes into a seperate table to be
joined one-to-one when those attributes are needed. To have the system
do that automatically would create problems if the attributes that are
split off are frequently accessed, right?

Perhaps you could optionally create a seperate copy of the same tuple
visibility information linked in a way similar to an index. It still
seems like you gain very little, and only in some very rare situation
that I've never encountered (I've never had the need to do frequent
unqualified count()s at the expense of other operations). 

Now, it seems like it might make a little more sense to use an index for
min()/max(), but that's a different story.

Regards,Jeff Davis




pgsql-hackers by date:

Previous
From: Rod Taylor
Date:
Subject: Re: Much Ado About COUNT(*)
Next
From: "Jonah H. Harris"
Date:
Subject: Re: Much Ado About COUNT(*)