[HACKERS] index-only count(*) for indexes supporting bitmap scans - Mailing list pgsql-hackers

From Alexander Kuzmenkov
Subject [HACKERS] index-only count(*) for indexes supporting bitmap scans
Date
Msg-id 239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
Whole thread Raw
Responses Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
List pgsql-hackers
Hi,

I would like to propose a patch that speeds up the queries of the form 
'select
count(*) ... where ...',  where the restriction clauses can be satisfied 
by some
indexes. At the moment, such queries use index-only scans followed by
aggregation. Index-only scans are only possible for indexes that are 
capable of
returning indexed tuples, that is, support the 'amgettuple' access 
method. They
are not applicable to indexes such as GIN and RUM. However, it is 
possible to
improve count(*) performance for indexes that support bitmap scans. Having
performed a bitmap index scan or a combination of such, the bits in 
bitmap can
be counted to obtain the final result. Of course, the bitmap pages that are
lossy or not visible to all existing transactions will still require heap
access.

One kind of applications that can benefit from this change is the full-text
search with pagination. To show a search results page, the application 
has to
know the results that go to current page, and the total number of the 
results.
Getting one page is fast, when the desired sorting order can be provided 
by an
index. For example, results can be sorted by date with a separate btree 
index,
or by relevance with RUM index. However, getting the total number of 
results is
more difficult. With text search indexes, it requires a bitmap heap 
scan, which
can be rather slow due to obligatory heap access. A well-known hack for 
this is
using the approximate data from 'explain' results. The proposed change 
allows
the user to obtain the precise number of the results in an efficient and
idiomatic manner.

The performance of this approach was tested on an archive of pgsql-hackers
mailing list. The detailed results for two sample queries can be found 
in the
attached file 'benchmark.txt'. The first test demonstrates full-text search
with RUM index, ordering the results by rank. For a query with low 
selectivity,
getting the top results is much faster than counting them all with a 
bitmap heap
scan. With bitmap count execution plan, the results can be counted much 
faster.
A similar test is done with a GIN index, where the results are 
restricted and
ordered by date using another btree index. Again, it shows a significant 
speedup
for count(*) query for bitmap count scan compared to bitmap heap scan. These
results demonstrate that the bitmap count plan can indeed be useful for 
full-
text search scenarios.

Structurally, the patch consists of two major parts: a specialized executor
node and the generation of corresponding paths and plans. The optimizer 
behavior
here is similar to that of the min/max aggregate optimization. The main 
entry
point is preprocess_count_aggs(); it is called by grouping_planner(). It
searches for count(*) expression in the target list, and, if possible, 
generates
a special path for it (struct BitmapCountPath). This path is then added to
UPPERREL_GROUP_AGG upperrel, and competes with other paths at the 
further stages
of planning.

The executor node (nodeBitmapCount.c) is similar to the bitmap heap scan 
node,
with the main difference being that it does not access heap for tuples 
that are
known to satisfy the restriction and to be visible to all transactions.

This patch has some important limitations:
* It only supports targetlist consisting of a single expression that can be
projected from count(*).
* count(expr) is not supported. We could support it for cases where the
"expr is not null" restriction can be satisfied with an index.
* The current implementation does not support parallel execution. It 
could be
implemented during the PostgreSQL 11 release cycle.
* For some indexes, the bitmap index scan will always require rechecking 
all
the tuples. Bitmap count plans should not be used in such cases. For 
now, this
check is not implemented.

I would be glad to hear your comments on this patch.

Regards,
Alexander Kuzmenkov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Pavan Deolasee
Date:
Subject: Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Next
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Quorum commit for multiple synchronous replication.