Parallel bitmap heap scan - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Parallel bitmap heap scan |
Date | |
Msg-id | CAFiTN-ukcfSsZ9GrBcEJG0O0Kbv2WXgOofJUHRQEm1zuH12xEg@mail.gmail.com Whole thread Raw |
Responses |
Re: Parallel bitmap heap scan
(Dilip Kumar <dilipbalaut@gmail.com>)
Re: Parallel bitmap heap scan (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
Hi Hackers, I would like to propose parallel bitmap heap scan feature. After running TPCH benchmark, It was observed that many of TPCH queries are using bitmap scan (@TPCH_plan.tar.gz attached below). Keeping this point in mind we thought that many query will get benefited with parallel bitmap scan. Robert has also pointed out the same thing in his blog related to parallel query http://rhaas.blogspot.in/2016/04/postgresql-96-with-parallel-query-vs.html Currently Bitmap heap plan look like this : ------------------------------------------------------ Bitmap Heap Scan -> Bitmap Index Scan After this patch : --------------------- Parallel Bitmap Heap Scan -> Bitmap Index Scan As part of this work I have implemented parallel processing in BitmapHeapScan node. BitmapIndexScan is still non parallel. Brief design idea: ----------------------- #1. Shared TIDBitmap creation and initialization First worker to see the state as parallel bitmap info as PBM_INITIAL become leader and set the state to PBM_INPROGRESS All other workers see the state as PBM_INPROGRESS will wait for leader to complete the TIDBitmap. #2 At this level TIDBitmap is ready and all workers are awake. #3. Bitmap processing (Iterate and process the pages). In this phase each worker will iterate over page and chunk array and select heap pages one by one. If prefetch is enable then there will be two iterator. Since multiple worker are iterating over same page and chunk array we need to have a shared iterator, so we grab a spin lock and iterate within a lock, so that each worker get and different page to process. Note: For more detail on design, please refer comment of BitmapHeapNext API in "parallel-bitmap-heap-scan-v1.patch" file. Attached patch details: ------------------------------ 1. parallel-bitmap-heap-scan-v1.patch: This is the main patch to make bitmap heap scan node parallel aware. 2. dht-return-dsa-v1.patch: This patch will provide new API, where we can scan full DHT[1], and get the dsa_pointers (a relative pointer). The dsa_pointer values can be shared with other processes. We need this because, after TIDBitmap is created, only one worker will process whole TIDBitmap and convert it to a page and chunk array. So we need to store the generic pointer, so that later on each worker can convert those to their local pointer before start processing. My patch depends on following patches. ------------------------------------------------------ 1. conditional_variable https://www.postgresql.org/message-id/CAEepm%3D0zshYwB6wDeJCkrRJeoBM%3DjPYBe%2B-k_VtKRU_8zMLEfA%40mail.gmail.com 2. dsa_area https://www.postgresql.org/message-id/CAEepm%3D024p-MeAsDmG%3DR3%2BtR4EGhuGJs_%2BrjFKF0eRoSTmMJnA%40mail.gmail.com 3. Creating a DSA area to provide work space for parallel execution https://www.postgresql.org/message-id/CAEepm%3D0HmRefi1%2BxDJ99Gj5APHr8Qr05KZtAxrMj8b%2Bay3o6sA%40mail.gmail.com 4. Hash table in dynamic shared memory (DHT) [1] https://www.postgresql.org/message-id/CAEepm%3D0VrMt3s_REDhQv6z1pHL7FETOD7Rt9V2MQ3r-2ss2ccA%40mail.gmail.com Order in which patches should be applied: -------------------------------------------------------- 1. conditional_variable 2. dsa_area 3. Creating a DSA area to provide work space for parallel execution 4. Hash table in dynamic shared memory. 5. dht-return-dsa-v1.patch 6. parallel-bitmap-heap-scan-v1.patch Performance Results: ----------------------------- Summary : 1. After this patch, I observed currently 4 queries are getting significant improvement (Q4, Q6, Q14, Q15). - Q4, is converting from parallel seqscan to parallel bitmap heap scan. - Other queries are converted from a regular bitmap heap scan to a parallel bitmap heap scan. 2. Benefit is more visible at lower workers (upto 4), after that some of the queries are selecting ParallelSeqScan over ParallelBitmapScan. And, I think this is expected, because so far we have only made BitmapHeap node as parallel whereas ParallelSeqScan is completely parallel so at higher worker count ParallelSeqScan is better choice. 3. Detailed result is attached @TPCH_PBMS.pdf 4. Explain analyse output is attached @TPCH_plan.tar.gz (for all changed queries at worker 2) TPCH query plan changed example (TPCH Q6): ---------------------------------------------------------------- On Head: ------------- QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=1558475.95..1558475.96 rows=1 width=32) (actual time=40921.437..40921.438 rows=1 loops=1) -> Aggregate (cost=1558475.95..1558475.96 rows=1 width=32) (actual time=40921.435..40921.435 rows=1 loops=1) -> Bitmap Heap Scan on lineitem (cost=291783.32..1552956.39 rows=1103911 width=12) (actual time=7032.075..38997.369 rows=1140434 loops=1) Recheck Cond: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone) AND (l_discount >= 0.01) AND (l_discount <= 0.03) AND (l_quantity < '24'::numeric)) Rows Removed by Index Recheck: 25284232 Heap Blocks: exact=134904 lossy=530579 -> Bitmap Index Scan on idx_lineitem_shipdate (cost=0.00..291507.35 rows=1103911 width=0) (actual time=6951.408..6951.408 rows=1140434 loops=1) Index Cond: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone) AND (l_discount >= 0.01) AND (l_discount <= 0.03) AND (l_quantity < '24'::numeric)) Planning time: 1.126 ms Execution time: 40922.569 ms (10 rows) After Patch: ---------------- QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=1541767.60..1541767.61 rows=1 width=32) (actual time=21895.008..21895.009 rows=1 loops=1) -> Finalize Aggregate (cost=1541767.60..1541767.61 rows=1 width=32) (actual time=21895.006..21895.006 rows=1 loops=1) -> Gather (cost=1541767.38..1541767.59 rows=2 width=32) (actual time=21894.341..21894.970 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=1540767.38..1540767.39 rows=1 width=32) (actual time=21890.990..21890.990 rows=1 loops=3) -> Parallel Bitmap Heap Scan on lineitem (cost=291783.32..1538467.56 rows=459963 width=12) (actual time=8517.126..21215.469 rows=380145 loops=3) Recheck Cond: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone) AND (l_discount >= 0.01) AND (l_discount <= 0.03) AND (l_quantity < '24'::numeric)) Rows Removed by Index Recheck: 8427921 Heap Blocks: exact=47761 lossy=187096 -> Bitmap Index Scan on idx_lineitem_shipdate (cost=0.00..291507.35 rows=1103911 width=0) (actual time=8307.291..8307.291 rows=1140434 loops=1) Index Cond: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone) AND (l_discount >= 0.01) AND (l_discount <= 0.03) AND (l_quantity < '24'::numeric)) Planning time: 1.173 ms Execution time: 21915.931 ms (14 rows) * Thanks to Robert Haas and Amit Kapila, for helping in design review (off list) and many valuable inputs. * Thanks to Thomas Munro for DSA and DHT work on which my patch is based on. * Thanks to Rafia sabih for helping with performance test. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: