Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Thom Brown
Subject Re: Parallel Seq Scan
Date
Msg-id CAA-aLv6JMAsDOg7R6DzvcWgLCSukGK_Ap4gRfiC+1NgWaqHAVw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Parallel Seq Scan  (Thom Brown <thom@linux.com>)
Re: Parallel Seq Scan  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 25 March 2015 at 10:27, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Mar 20, 2015 at 5:36 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> So the patches have to be applied in below sequence:
> HEAD Commit-id : 8d1f2390
> parallel-mode-v8.1.patch [2]
> assess-parallel-safety-v4.patch [1]
> parallel-heap-scan.patch [3]
> parallel_seqscan_v11.patch (Attached with this mail)
>
> The reason for not using the latest commit in HEAD is that latest
> version of assess-parallel-safety patch was not getting applied,
> so I generated the patch at commit-id where I could apply that
> patch successfully.
>
>  [1] - http://www.postgresql.org/message-id/CA+TgmobJSuefiPOk6+i9WERUgeAB3ggJv7JxLX+r6S5SYydBRQ@mail.gmail.com
>  [2] - http://www.postgresql.org/message-id/CA+TgmoZJjzYnpXChL3gr7NwRUzkAzPMPVKAtDt5sHvC5Cd7RKw@mail.gmail.com
>  [3] - http://www.postgresql.org/message-id/CA+TgmoYJETgeAXUsZROnA7BdtWzPtqExPJNTV1GKcaVMgSdhug@mail.gmail.com
>

Fixed the reported issue on assess-parallel-safety thread and another
bug caught while testing joins and integrated with latest version of
parallel-mode patch (parallel-mode-v9 patch).

Apart from that I have moved the Initialization of dsm segement from
InitNode phase to ExecFunnel() (on first execution) as per suggestion
from Robert.  The main idea is that as it creates large shared memory
segment, so do the work when it is really required.


HEAD Commit-Id: 11226e38
parallel-mode-v9.patch [2]
assess-parallel-safety-v4.patch [1]
parallel-heap-scan.patch [3]
parallel_seqscan_v12.patch (Attached with this mail)


Okay, with my pgbench_accounts partitioned into 300, I ran:

SELECT DISTINCT bid FROM pgbench_accounts;

The query never returns, and I also get this:

grep -r 'starting background worker process "parallel worker for PID 12165"' postgresql-2015-03-25_112522.log  | wc -l
2496

2,496 workers?  This is with parallel_seqscan_degree set to 8.  If I set it to 2, this number goes down to 626, and with 16, goes up to 4320.

Here's the query plan:

                                               QUERY PLAN                                               
---------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=38856527.50..38856529.50 rows=200 width=4)
   Group Key: pgbench_accounts.bid
   ->  Append  (cost=0.00..38806370.00 rows=20063001 width=4)
         ->  Seq Scan on pgbench_accounts  (cost=0.00..0.00 rows=1 width=4)
         ->  Funnel on pgbench_accounts_1  (cost=0.00..192333.33 rows=100000 width=4)
               Number of Workers: 8
               ->  Partial Seq Scan on pgbench_accounts_1  (cost=0.00..1641000.00 rows=100000 width=4)
         ->  Funnel on pgbench_accounts_2  (cost=0.00..192333.33 rows=100000 width=4)
               Number of Workers: 8
               ->  Partial Seq Scan on pgbench_accounts_2  (cost=0.00..1641000.00 rows=100000 width=4)
         ->  Funnel on pgbench_accounts_3  (cost=0.00..192333.33 rows=100000 width=4)
               Number of Workers: 8
...
               ->  Partial Seq Scan on pgbench_accounts_498  (cost=0.00..10002.10 rows=210 width=4)
         ->  Funnel on pgbench_accounts_499  (cost=0.00..1132.34 rows=210 width=4)
               Number of Workers: 8
               ->  Partial Seq Scan on pgbench_accounts_499  (cost=0.00..10002.10 rows=210 width=4)
         ->  Funnel on pgbench_accounts_500  (cost=0.00..1132.34 rows=210 width=4)
               Number of Workers: 8
               ->  Partial Seq Scan on pgbench_accounts_500  (cost=0.00..10002.10 rows=210 width=4)

Still not sure why 8 workers are needed for each partial scan.  I would expect 8 workers to be used for 8 separate scans.  Perhaps this is just my misunderstanding of how this feature works.

--
Thom

pgsql-hackers by date:

Previous
From: Shigeru HANADA
Date:
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Next
From: Sawada Masahiko
Date:
Subject: Re: Auditing extension for PostgreSQL (Take 2)