Re: Parallel bitmap index scan - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Parallel bitmap index scan
Date
Msg-id CAFiTN-uB7md0C=cV+Fd8aN=uBcNedtigEgOFWno7EnGefcYXBw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel bitmap index scan  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers


On Mon, 17 Aug 2020 at 7:42 PM, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Sun, Jul 26, 2020 at 6:43 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

>

> I would like to propose a patch for enabling the parallelism for the

> bitmap index scan path.

>

> Background:

> Currently, we support only a parallel bitmap heap scan path.  Therein,

> the underlying bitmap index scan is done by a single worker called the

> leader.  The leader creates a bitmap in shared memory and once the

> bitmap is ready it creates a shared iterator and after that, all the

> workers process the shared iterator and scan the heap in parallel.

> While analyzing the TPCH plan we have observed that some of the

> queries are spending significant time in preparing the bitmap.  So the

> idea of this patch is to use the parallel index scan for preparing the

> underlying bitmap in parallel.

>

> Design:

> If underlying index AM supports the parallel path (currently only

> BTREE support it), then we will create a parallel bitmap heap scan

> path on top of the parallel bitmap index scan path.  So the idea of

> this patch is that each worker will do the parallel index scan and

> generate their part of the bitmap.  And, we will create a barrier so

> that we can not start preparing the shared iterator until all the

> worker is ready with their bitmap.  The first worker, which is ready

> with the bitmap will keep a copy of its TBM and the page table in the

> shared memory.  And, all the subsequent workers will merge their TBM

> with the shared TBM.  Once all the TBM are merged we will get one

> common shared TBM and after that stage, the worker can continue.  The

> remaining part is the same,  basically, again one worker will scan the

> shared TBM and prepare the shared iterator and once it is ready all

> the workers will jointly scan the heap in parallel using shared

> iterator.

>



Though I have not looked at the patch or code for the existing

parallel bitmap heap scan, one point keeps bugging in my mind. I may

be utterly wrong or my question may be so silly, anyways I would like

to ask here:



From the above design: each parallel worker creates partial bitmaps

for the index data that they looked at. Why should they merge these

bitmaps to a single bitmap in shared memory? Why can't each parallel

worker do a bitmap heap scan using the partial bitmaps they built

during it's bitmap index scan and emit qualified tuples/rows so that

the gather node can collect them? There may not be even lock

contention as bitmap heap scan takes read locks for the heap

pages/tuples.

The main reason is that there could be lossy pages in bitmap and if that is the case then there will be duplicate data.  Maybe if there is no lossy data in any of the bitmap we might do as u describe but still I think that it is very much possible that different bitmap might have many common heap pages because bitmap is prepared using index scan.  And in such cases we will be doing duplicate i/o.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: Re: Improve planner cost estimations for alternative subplans
Next
From: Heikki Linnakangas
Date:
Subject: Re: Newline after --progress report