Re: Introducing coarse grain parallelism by postgres_fdw. - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Introducing coarse grain parallelism by postgres_fdw.
Date
Msg-id 20140801.181837.27561179.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Introducing coarse grain parallelism by postgres_fdw.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Introducing coarse grain parallelism by postgres_fdw.
List pgsql-hackers
Hello,

> Hello, this is the new version which is complete to some extent
> of parallelism based on postgres_fdw.
> 
> This compares the costs for parallel and non-parallel execution
> and choose parallel one if it is faster by some extent specified
> by GUCs. The attached files are,
> 
>  0001_parallel_exec_planning_v0.patch:
>    - PostgreSQL body stuff for parallel execution planning.
> 
>  0002_enable_postgres_fdw_to_run_in_parallel_v0.patch:
>    - postgres_fdw parallelization.
> 
>  0003_file_fdw_changes_to_avoid_error.patch:
>    - error avoidig stuff for file_fdw (not necessary for this patch)
> 
>  env.sql:
>    - simple test script to try this patch.
> 
> =====
> 
>  - planner stuff to handle cost of parallel execution. Including
>    indication of parallel execution.
> 
>  - GUCs to control how easy to go parallel.
> 
>    parallel_cost_threshold is the threshold of path total cost
>    where to enable parallel execution.
> 
>    prallel_ratio_threshond is the threshold of the ratio of
>    parallel cost to non-parallel cost where to choose the
>    parallel path.
> 
>  - postgres_fdw which can run in multiple sessions using snapshot
>    export and fetches in parallel for foreign scans on dedicated
>    connections.

But now the effect of async execution of FETCH'es is omitted
during planning.

>    foreign server has a new option 'max_aux_connections', which
>    limits the number of connections for parallel execution per
>    (server, user) pairs.
> 
>  - change file_fdw to follow the changes of planner stuff.
> 
> 
> Whth the patch attached, the attached sql script shows the
> following result (after some line breaks are added).
> 
> postgres=# EXPLAIN ANALYZE SELECT a.a, a.b, b.c
>            FROM fvs1 a join fvs1_2 b on (a.a = b.a);
>                                 QUERY PLAN
> ----------------------------------------------------------------------------
> Hash Join  (cost=9573392.96..9573393.34 rows=1 width=40 parallel)
>            (actual time=2213.400..2213.407 rows=12 loops=1)
>  Hash Cond: (a.a = b.a)
>  ->  Foreign Scan on fvs1 a
>            (cost=9573392.96..9573393.29 rows=10 width=8 parallel)
>            (actual time=2199.992..2199.993 rows=10 loops=1)
>  ->  Hash  (cost=9573393.29..9573393.29 rows=10 width=36)
>            (actual time=13.388..13.388 rows=10 loops=1)
>        Buckets: 1024  Batches: 1  Memory Usage: 6kB
>        ->  Foreign Scan on fvs1_2 b 
>                    (cost=9573392.96..9573393.29 rows=10 width=36 parallel)
>                    (actual time=13.376..13.379 rows=10 loops=1)
>  Planning time: 4.761 ms
>  Execution time: 2227.462 ms
> (8 rows)
> postgres=# SET parallel_ratio_threshold to 0.0;
> postgres=# EXPLAIN ANALYZE SELECT a.a, a.b, b.c
>            FROM fvs1 a join fvs1 b on (a.a = b.a);
>                                 QUERY PLAN
> ------------------------------------------------------------------------------
>  Hash Join  (cost=318084.32..318084.69 rows=1 width=40)
>             (actual time=4302.913..4302.928 rows=12 loops=1)
>    Hash Cond: (a.a = b.a)
>    ->  Foreign Scan on fvs1 a  (cost=159041.93..159042.26 rows=10 width=8)
>                                (actual time=2122.989..2122.992 rows=10 loops=1)
>    ->  Hash  (cost=159042.26..159042.26 rows=10 width=500)
>              (actual time=2179.900..2179.900 rows=10 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 6kB
>          ->  Foreign Scan on fvs1 b
>                    (cost=159041.93..159042.26 rows=10 width=500)
>                    (actual time=2179.856..2179.864 rows=10 loops=1)
>  Planning time: 5.085 ms
>  Execution time: 4303.728 ms
> (8 rows)
> 
> Where, "parallel" indicates that the node includes nodes run in
> parallel. The latter EXPLAIN shows the result when parallel
> execution is inhibited.
> 
> Since the lack of time, sorry that the details for this patch is
> comming later.
> 
> Is there any suggestions or opinions?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Introducing coarse grain parallelism by postgres_fdw.
Next
From: David Rowley
Date:
Subject: Re: SKIP LOCKED DATA (work in progress)