Re: Improving count(*) - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Improving count(*)
Date
Msg-id 878xvmr3t8.fsf@stark.xeocode.com
Whole thread Raw
In response to Improving count(*)  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
I think the important thing to keep track of is a single bit: 
 Which the following apply?
 a) all of the tuples in this block are visible
 b) at least one tuple in this block is in-doubt and may need to be vacuumed

That isn't enough to calculate count(*) on its own but it means you could scan
an index like any other database and avoid checking every single tuple. If the
tuple lies in a block that is known not to contain any in-doubt records then
the tuple can be counted immediately.

This has the advantage that it helps with a lot more cases than a simple
"select count(*) from tab". As Tom pointed out that case can be tackled more
directly with a O(1) solution anyways. More complex cases are where fast index
scans are really important.

So you could do "SELECT count(*) FROM tab WHERE a > ?" and have it scan an
index on <a> without having to check the visibility of every single tuple. It
only has to check the visibility of tuples that lie on blocks that contain at
least one in-doubt tuple.

You could even imagine using the new bitmap index scan machinery to combine
these bits of information too.

And this is exactly the same information that vacuum needs. Once vacuum has
run and cleaned out a block it knows whether there are any records that are
still in-doubt or whether every record it left is universally visible. It can
note that and allow future vacuums to skip that block if no deletes or inserts
have changed that bit since.

-- 
greg



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Improving count(*)
Next
From: "Joshua D. Drake"
Date:
Subject: Bug in predicate indexes?