Re: Much Ado About COUNT(*) - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: Much Ado About COUNT(*)
Date
Msg-id D425483C2C5C9F49B5B7A41F894415470557D3@postal.corporate.connx.com
Whole thread Raw
In response to Much Ado About COUNT(*)  ("Jonah H. Harris" <jharris@tvi.edu>)
List pgsql-hackers
A notion for indices that are not unique... (won't help much on select
count(*) but might be helpful for other types of query optimization)

Put a count in the index for each distinct type.
In the worst case, the index is actually unique and you have 8 wasted
bytes per index entry and all the entries are in the leaves (perhaps it
could be an OPTION for some tables).  I don't know enough about the
structure of PostgreSQL's indexes to know if my suggestion is pure
hogwash, so don't laugh to hard if it is pure stupidity.

The most practical value of SELECT COUNT(*) is for updating statistics
(and looking good in phony-baloney benchmarks).  But the statistics only
need to be updated when you vacuum, so it hardly seems a crucial issue
to me.

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane
Sent: Wednesday, January 12, 2005 11:42 AM
To: Jonah H. Harris
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Much Ado About COUNT(*)

"Jonah H. Harris" <jharris@tvi.edu> writes:
> My thinking is that we may be able to implement index usage for not
only
> unqualified counts, but also on any query that can be satisfied by the

> index itself.

The fundamental problem is that you can't do it without adding at least
16 bytes, probably 20, to the size of an index tuple header.  That would
double the physical size of an index on a simple column (eg an integer
or timestamp).  The extra I/O costs and extra maintenance costs are
unattractive to say the least.  And it takes away some of the
justification for the whole thing, which is that reading an index is
much cheaper than reading the main table.  That's only true if the index
is much smaller than the main table ...
        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your     joining column's datatypes do not match


pgsql-hackers by date:

Previous
From: "Jonah H. Harris"
Date:
Subject: Re: Much Ado About COUNT(*)
Next
From: Tom Lane
Date:
Subject: Re: segfault caused by heimdal (was: SUSE port)