Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans
Date
Msg-id CAPpHfduy_P3XcRVLchkd7skpQSna0PmBMKqb+yq_Mj84=cd3bA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans  (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
List pgsql-hackers
On Wed, Apr 12, 2017 at 12:40 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Tue, Apr 11, 2017 at 8:24 PM, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote:
I would like to propose a patch that speeds up the queries of the form 'select
count(*) ... where ...',  where the restriction clauses can be satisfied by some
indexes. At the moment, such queries use index-only scans followed by
aggregation. Index-only scans are only possible for indexes that are capable of
returning indexed tuples, that is, support the 'amgettuple' access method. They
are not applicable to indexes such as GIN and RUM. However, it is possible to
improve count(*) performance for indexes that support bitmap scans. Having
performed a bitmap index scan or a combination of such, the bits in bitmap can
be counted to obtain the final result. Of course, the bitmap pages that are
lossy or not visible to all existing transactions will still require heap
access.

That's a cool feature for FTS users! Please, register this patch to the next commitfest.

This patch has some important limitations:
* It only supports targetlist consisting of a single expression that can be
projected from count(*).
* count(expr) is not supported. We could support it for cases where the
"expr is not null" restriction can be satisfied with an index.
* The current implementation does not support parallel execution. It could be
implemented during the PostgreSQL 11 release cycle.
* For some indexes, the bitmap index scan will always require rechecking all
the tuples. Bitmap count plans should not be used in such cases. For now, this
check is not implemented.

Does this limitation cause a performance drawback?  When bitmap index scan returns all rechecks, alternative to Bitmap Count is still Aggregate + Bitmap Heap Scan.  Thus, I think Bitmap Count would have the same performance or even slightly faster.  That's worth testing.

Also, what is planning overhead of this patch?  That's worth testing too.

Another thing catch my eye.  The estimated number of rows in Bitmap Count node is the same as in Bitmap Index Scan node.  Should it be 1 because it always returns single row?

test1=# explain analyze select count(*) from pglist where fts @@ to_tsquery( 'tom & lane' );
                                                                 QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Count on pglist  (cost=550.65..1095.68 rows=54503 width=8) (actual time=1120.281..1120.281 rows=1 loops=1)
   Recheck Cond: (fts @@ to_tsquery('tom & lane'::text))
   Heap Fetches: 0
   Heap Blocks: exact=105992
   ->  Bitmap Index Scan on idx_pglist_rum_fts  (cost=0.00..537.02 rows=54503 width=0) (actual time=1056.060..1056.060 rows=222813 loops=1)
         Index Cond: (fts @@ to_tsquery('tom & lane'::text))
 Planning time: 119.568 ms
 Execution time: 1121.409 ms
(8 rows)

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: [HACKERS] Some thoughts about SCRAM implementation
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Foreign Join pushdowns not working properly for outer joins