Re: COUNT(*) and index-only scans - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: COUNT(*) and index-only scans |
Date | |
Msg-id | CA+Tgmoa_b=PnmF-qcNx03ufFTjVRGMuts9TTJKJh-P-P2kK9GQ@mail.gmail.com Whole thread Raw |
In response to | Re: COUNT(*) and index-only scans (Thom Brown <thom@linux.com>) |
Responses |
Re: COUNT(*) and index-only scans
|
List | pgsql-hackers |
On Sat, Nov 19, 2011 at 11:22 AM, Thom Brown <thom@linux.com> wrote: > While I accept that maybe adapting the existing bitmap index scan > functionality isn't necessarily desirable, would it be feasible to > create a corresponding bitmap index-only scan method. I've been thinking about this a bit more; I wonder whether we could create yet another type of index scan - let's call it a Streaming Index Only Scan. A streaming index only scan uses a buffer, which holds index tuples and the corresponding CTIDs, and it has some maximum size, probably based on work_mem. Tuples in the buffer are kept sorted by CTID. The scan happens in two alternating phases: buffer fill, and buffer drain. In buffer fill mode, we scan the index and add matching tuples and their CTIDs to the buffer. When the buffer is full or the index AM reports that there are no more tuples in the scan, we switch to buffer drain mode. In buffer drain mode, we repeatedly select a heap block number and attempt to return all buffered tuples on that page. We maintain a counter, LastBlockNumber, initially zero, which tracks the last heap block number so selected. To select the next block number, we choose the first block number greater than or equal to LastBlockNumber referenced by any CTID in the buffer (it should be possible to design the buffer so that this value can be computed quickly); if there are none, we instead choose the first block number referenced by any CTID in the buffer, period. Having selected the block number, we check whether the page is all-visible. If so, we can return all the index tuples from that page without further ado. Otherwise, we fetch the heap block, check visibility for each tuple, and return only those index tuples for which the corresponding heap tuples are visible to the scan. If there's now enough buffer space available to be certain that the next index tuple will fit, we switch back to buffer fill mode; otherwise, we remain in buffer drain mode. As compared with a bitmap index scan, this doesn't have the advantage of being able to combine multiple indexes effectively; I don't really see any way to make that work with the index-only scan concept in general, except perhaps for the special case of a zero-argument aggregate. It also doesn't have the advantage that each heap page will be guaranteed to be visited only once. But in practice duplicate visits to the same page should be uncommon; they'll be avoided when either work_mem is sufficient to buffer the whole scan, or when there's some degree of table clustering with respect to the index. While I'm building castles in the sky, a further idea would be to try to optimize the case where there's a LIMIT node above the scan. If we could pass down a hint indicating how many rows are thought likely to be needed, we could enter buffer drain mode after about that many tuples, instead of waiting until the buffer was full. If the hint is right, we win (and if it's wrong, we can still go back and fetch some more tuples, at a cost of possible performance loss). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: