Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans - Mailing list pgsql-hackers

From Alexander Kuzmenkov
Subject Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans
Date
Msg-id ca192322-29fd-9e60-3766-fe5e69d1f9af@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans  (Alexander Kumenkov <a.kuzmenkov@postgrespro.ru>)
List pgsql-hackers
On 12.04.2017 15:04, Tom Lane wrote:
> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>> "Alexander" == Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
>>   Alexander> Structurally, the patch consists of two major parts: a
>>   Alexander> specialized executor node
>> Why?
>> It strikes me that the significant fact here is not that we're doing
>> count(*), but that we don't need any columns from the bitmap heap scan
>> result.  Rather than creating a whole new node, can't the existing
>> bitmap heapscan be taught to skip fetching the actual table page in
>> cases where it's all-visible, not lossy, and no columns are needed?
> +1 ... while I hadn't actually looked at the code, it seemed to me that
> anything like the optimization-as-described would be impossibly klugy
> from the planner's standpoint.  Your formulation sounds lots nicer.
>
> Detecting that no columns are needed in the executor might be a bit tricky
> because of the planner's habit of inserting a "physical tlist" to avoid a
> projection step.  (See also nearby discussion about custom scan planning.)
> But we could fix that.  I think one rule that would make sense is to
> just disable the physical-tlist substitution if the relation's targetlist
> is empty --- it wouldn't be buying much in such a case anyhow.  Then the
> runtime tlist for the scan node would also be empty, and away you go.
When making an early prototype of this optimization, I did what you are 
describing with the bitmap heap scan executor node. In an internal 
review, it was said that the bitmap heap scan node is already 
complicated enough and shouldn't have more logic added to it, so I 
rewrote it a separate node. To me, your approach looks good too, so if 
the community is generally in favor of this, I could rewrite the 
executor as such.

With planner, the changes are more complex. Two things must be done 
there. First, when the tlist is empty, we must use a different cost 
function for bitmap heap scan, because the heap access pattern is 
different. Second, choose_bitmap_and() must use a different algorithm 
for choosing the right combination of paths. A standard algorithm 
chooses the combination based on cost. For count(*) purposes, the 
decisive factor is that the path has to check all the restrictions, or 
else we'll need heap access to recheck some of them, which defeats the 
purpose of having this optimization. The planner code that builds and 
costs the index path is fairly complex, and integrating this additional 
behavior into it didn't look good to me. Instead, I created a 
specialized path node and isolated the logic that handles it. The 
resulting implementation adds several functions, but it is mostly 
self-contained and has a single entry point in grouping_planner(). It 
handles the specific case of bitmap count plans and doesn't complicate 
the existing code any further.

The planner part is to some extent independent of whether we use a 
separate executor node or not. If we choose not to, the same count(*) 
optimization code I proposed could create plain bitmap heap scan paths.

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: [HACKERS] logical rep worker for table sync can start and stopvery frequently unexpectedly
Next
From: Tom Lane
Date:
Subject: [HACKERS] Cutting initdb's runtime (Perl question embedded)