Re: Improving count(*) - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Improving count(*)
Date
Msg-id 1132263248.4959.267.camel@localhost.localdomain
Whole thread Raw
In response to Re: Improving count(*)  ("Jonah H. Harris" <jonah.harris@gmail.com>)
Responses Re: Improving count(*)  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
On Thu, 2005-11-17 at 14:46 -0500, Jonah H. Harris wrote:
> Nice suggestion, I think it's workable but (like all other methods)
> has some technical/pseudo-political challenges.
>  
> I'm still voting for my old, "Much Ado About COUNT(*)" topic; adding
> visibiility to the indexes and counting them like the other RDBMS
> vendors.  True, it would add storage overhead that several people
> don't want, but such as the life of the COUNT(*) discussion for
> PostgreSQL...

[When no idea is good, we take the least-worst path. When we only have
one bad idea, nobody does anything. So we need a couple of ugly but
workable alternatives to flush out the one we will pick to resolve
things.]

As Martijn points out *any* solution must take account of visibility
rules. So abstracting from both these ideas gives shape to the solution,
which must be:
- a data structure smaller than the table itself
- including visibility data, explicitly/implicitly, exactly/lossily
- must serve multiple purposes to ensure the overhead of maintaining the
structure is amortised across many potential benefits, since various
needs share similar solution requirements

Would having visibility on an index be of use to a VACUUM?
Yes, I guess it could be. If we knew when the table was last vacuumed,
we could build a bitmap of changed blocks by scanning the index. We
would only need visibility on *one* of the indexes on a table, so
perhaps index visibility could be an option rather than a
one-size-fits-all. 

Adding visibility to an index would add substantial bulk to any index.
If we could do this at the same time as adding leading key, full field
compression (*not* prefix compression), then it might be worth doing. 

I would also note that DELETE would need to touch all visible index
rows, which currently is not required for btree indexes. (But as we
already noted, any solution must include visibility data and so any
solution must update some data structure on delete).

Index-only plans could help with various GROUP BY and join queries also,
so it certainly is attractive, though costly.

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: Improving count(*)
Next
From: Tom Lane
Date:
Subject: Re: Improving count(*)