Parallel bitmap index scan - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Parallel bitmap index scan |
Date | |
Msg-id | CAFiTN-t4NtRzafw94x+Ub_gUqQsv=u9nK=OaJmS612M_bZv6+Q@mail.gmail.com Whole thread Raw |
Responses |
Re: Parallel bitmap index scan
Re: Parallel bitmap index scan Re: Parallel bitmap index scan |
List | pgsql-hackers |
I would like to propose a patch for enabling the parallelism for the bitmap index scan path. Background: Currently, we support only a parallel bitmap heap scan path. Therein, the underlying bitmap index scan is done by a single worker called the leader. The leader creates a bitmap in shared memory and once the bitmap is ready it creates a shared iterator and after that, all the workers process the shared iterator and scan the heap in parallel. While analyzing the TPCH plan we have observed that some of the queries are spending significant time in preparing the bitmap. So the idea of this patch is to use the parallel index scan for preparing the underlying bitmap in parallel. Design: If underlying index AM supports the parallel path (currently only BTREE support it), then we will create a parallel bitmap heap scan path on top of the parallel bitmap index scan path. So the idea of this patch is that each worker will do the parallel index scan and generate their part of the bitmap. And, we will create a barrier so that we can not start preparing the shared iterator until all the worker is ready with their bitmap. The first worker, which is ready with the bitmap will keep a copy of its TBM and the page table in the shared memory. And, all the subsequent workers will merge their TBM with the shared TBM. Once all the TBM are merged we will get one common shared TBM and after that stage, the worker can continue. The remaining part is the same, basically, again one worker will scan the shared TBM and prepare the shared iterator and once it is ready all the workers will jointly scan the heap in parallel using shared iterator. BitmapHeapNext { ... BarrierAttach(); tbm = MultiExecProcNode(); tbm_merge(tbm); --Merge with common tbm using tbm_union BarrierArriveAndWait(); if (BitmapShouldInitializeSharedState(pstate)). --> only one worker come out of this { tbm_prepare_shared_iterate(); BitmapDoneInitializingSharedState(). -->wakeup others } tbm_attach_shared_iterate(). --> all worker attach to shared iterator ... } Performance: With scale factor 10, I could see that Q6 is spending significant time in a bitmap index scan so I have taken the performance with that query and I can see that the bitmap index scan node is 3x faster by using 3 workers whereas overall plan got ~40% faster. TPCH: S.F. 10, work_mem=512MB shared_buffers: 1GB HEAD: Limit (cost=1559777.02..1559777.03 rows=1 width=32) (actual time=5260.121..5260.122 rows=1 loops=1) -> Finalize Aggregate (cost=1559777.02..1559777.03 rows=1 width=32) (actual time=5260.119..5260.119 rows=1 loops=1) -> Gather (cost=1559776.69..1559777.00 rows=3 width=32) (actual time=5257.251..5289.595 rows=4 loops=1) Workers Planned: 3 Workers Launched: 3 -> Partial Aggregate (cost=1558776.69..1558776.70 rows=1 width=32) (actual time=5247.714..5247.714 rows=1 loops=4) -> Parallel Bitmap Heap Scan on lineitem (cost=300603.01..1556898.89 rows=375560 width=12) (actual time=3475.944..50 37.484 rows=285808 loops=4) Recheck Cond: ((l_shipdate >= '1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp without tim e zone) AND (l_discount >= 0.02) AND (l_discount <= 0.04) AND (l_quantity < '24'::numeric)) Heap Blocks: exact=205250 -> Bitmap Index Scan on idx_lineitem_shipdate (cost=0.00..300311.95 rows=1164235 width=0) (actual time=3169.85 5..3169.855 rows=1143234 loops=1) Index Cond: ((l_shipdate >= '1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone) AND (l_discount >= 0.02) AND (l_discount <= 0.04) AND (l_quantity < '24'::numeric)) Planning Time: 0.659 ms Execution Time: 5289.787 ms (13 rows) PATCH: Limit (cost=1559579.85..1559579.86 rows=1 width=32) (actual time=3333.572..3333.572 rows=1 loops=1) -> Finalize Aggregate (cost=1559579.85..1559579.86 rows=1 width=32) (actual time=3333.569..3333.569 rows=1 loops=1) -> Gather (cost=1559579.52..1559579.83 rows=3 width=32) (actual time=3328.619..3365.227 rows=4 loops=1) Workers Planned: 3 Workers Launched: 3 -> Partial Aggregate (cost=1558579.52..1558579.53 rows=1 width=32) (actual time=3307.805..3307.805 rows=1 loops=4) -> Parallel Bitmap Heap Scan on lineitem (cost=300405.84..1556701.72 rows=375560 width=12) (actual time=1585.726..30 97.628 rows=285808 loops=4) Recheck Cond: ((l_shipdate >= '1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp without tim e zone) AND (l_discount >= 0.02) AND (l_discount <= 0.04) AND (l_quantity < '24'::numeric)) Heap Blocks: exact=184293 -> Parallel Bitmap Index Scan on idx_lineitem_shipdate (cost=0.00..300311.95 rows=1164235 width=0) (actual tim e=1008.361..1008.361 rows=285808 loops=4) Index Cond: ((l_shipdate >= '1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone) AND (l_discount >= 0.02) AND (l_discount <= 0.04) AND (l_quantity < '24'::numeric)) Planning Time: 0.690 ms Execution Time: 3365.420 ms Note: - Currently, I have only parallelized then bitmap index path when we have a bitmap index scan directly under bitmap heap. But, if we have BitmapAnd or BitmapOr path then I did not parallelize the underlying bitmap index scan. I think for BitmapAnd and BitmapOr we should use a completely different design, something similar to what we are doing in parallel append so I don't think BitmapAnd and BitmapOr we need to cover under this patch. - POC patch is attached to discuss the idea. The patch still needs cleanup and testing. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: