Re: Parallel Query - Can it be used within functions? - Mailing list pgsql-general

From Michael Krüger
Subject Re: Parallel Query - Can it be used within functions?
Date
Msg-id CACSnzzXtX1t9r2pvnaYgGrkcWEXLdRFjKOnmhRVsBadcRASgnw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Query - Can it be used within functions?  (Michael Krüger <michael@kruegers.email>)
List pgsql-general
ah, and for completeness the simple python function I use for the test:

create or replace function reports.generic_query_python(_sql text)
  RETURNS SETOF record
  LANGUAGE 'plpythonu'
  PARALLEL SAFE
  COST 100
  VOLATILE
  ROWS 5000
AS $BODY$
    return plpy.execute( _sql )
$BODY$;


Michael Krüger <michael@kruegers.email> schrieb am Mi., 28. Feb. 2018 um 09:05 Uhr:
Ok, to close this thread. The problem is, that plpgsql function do seem to return data using a cursor. That fact is disabling parallel execution. So if we instead hand over the SQL to a function with e.g. a python body, then parallel execution is happening, because the data is first assembled in memory before it is returned, without using a cursor:

mkrueger=# explain analyze select * from reports.generic_query_python($$ select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where mediatrunkid in (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028) group by 1 $$) as foo (mediatrunkid bigint, count numeric);
LOG:  00000: duration: 35.158 ms  plan:
Query Text:  select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where mediatrunkid in (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028) group by 1
Partial HashAggregate  (cost=56693.72..56696.72 rows=300 width=16) (actual time=35.144..35.149 rows=17 loops=1)
  Group Key: mediatrunkid
  Buffers: shared hit=1641
  ->  Parallel Bitmap Heap Scan on mediatrunkkpi  (cost=4525.01..56279.28 rows=82889 width=8) (actual time=2.350..24.584 rows=63794 loops=1)
        Recheck Cond: (mediatrunkid = ANY ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
        Heap Blocks: exact=1641
        Buffers: shared hit=1641
        ->  Bitmap Index Scan on idx_mediatrunkkpi_trunk  (cost=0.00..4475.27 rows=198933 width=0) (never executed)
              Index Cond: (mediatrunkid = ANY ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
LOCATION:  explain_ExecutorEnd, auto_explain.c:359
LOG:  00000: duration: 35.165 ms  plan:
Query Text:  select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where mediatrunkid in (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028) group by 1
Partial HashAggregate  (cost=56693.72..56696.72 rows=300 width=16) (actual time=35.152..35.157 rows=17 loops=1)
  Group Key: mediatrunkid
  Buffers: shared hit=1630
  ->  Parallel Bitmap Heap Scan on mediatrunkkpi  (cost=4525.01..56279.28 rows=82889 width=8) (actual time=2.364..24.702 rows=63272 loops=1)
        Recheck Cond: (mediatrunkid = ANY ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
        Heap Blocks: exact=1630
        Buffers: shared hit=1630
        ->  Bitmap Index Scan on idx_mediatrunkkpi_trunk  (cost=0.00..4475.27 rows=198933 width=0) (never executed)
              Index Cond: (mediatrunkid = ANY ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
LOCATION:  explain_ExecutorEnd, auto_explain.c:359
LOG:  00000: duration: 47.855 ms  plan:
Query Text:  select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where mediatrunkid in (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028) group by 1
Finalize GroupAggregate  (cost=57784.41..57792.66 rows=300 width=40) (actual time=45.331..45.344 rows=17 loops=1)
  Group Key: mediatrunkid
  Buffers: shared hit=2735
  ->  Sort  (cost=57784.41..57785.91 rows=600 width=16) (actual time=45.322..45.325 rows=51 loops=1)
        Sort Key: mediatrunkid
        Sort Method: quicksort  Memory: 27kB
        Buffers: shared hit=2735
        ->  Gather  (cost=57693.72..57756.72 rows=600 width=16) (actual time=45.270..45.295 rows=51 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              Buffers: shared hit=2735
              ->  Partial HashAggregate  (cost=56693.72..56696.72 rows=300 width=16) (actual time=38.387..38.391 rows=17 loops=3)
                    Group Key: mediatrunkid
                    Buffers: shared hit=6006
                    ->  Parallel Bitmap Heap Scan on mediatrunkkpi  (cost=4525.01..56279.28 rows=82889 width=8) (actual time=5.564..27.399 rows=67080 loops=3)
                          Recheck Cond: (mediatrunkid = ANY ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
                          Heap Blocks: exact=1912
                          Buffers: shared hit=6006
                          ->  Bitmap Index Scan on idx_mediatrunkkpi_trunk  (cost=0.00..4475.27 rows=198933 width=0) (actual time=11.229..11.229 rows=201241 loops=1)
                                Index Cond: (mediatrunkid = ANY ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
                                Buffers: shared hit=823
LOCATION:  explain_ExecutorEnd, auto_explain.c:359
LOG:  00000: duration: 49.924 ms  plan:
Query Text: explain analyze select * from reports.generic_query_python($$ select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where mediatrunkid in (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028) group by 1 $$) as foo (mediatrunkid bigint, count numeric);
Function Scan on generic_query_python foo  (cost=0.25..50.25 rows=5000 width=40) (actual time=49.920..49.922 rows=17 loops=1)
  Buffers: shared hit=6388
LOCATION:  explain_ExecutorEnd, auto_explain.c:359
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Function Scan on generic_query_python foo  (cost=0.25..50.25 rows=5000 width=40) (actual time=49.920..49.922 rows=17 loops=1)
 Planning time: 0.029 ms
 Execution time: 49.977 ms
(3 rows)





Michael Krüger <michael@kruegers.email> schrieb am Fr., 16. Feb. 2018 um 11:42 Uhr:
Dear all,

still same behavior with Postgres 10.2 ...

Just as a reminder that the issue still exists.

Regards,
Michael

Andreas Kretschmer <andreas@a-kretschmer.de> schrieb am Di., 6. Feb. 2018 um 08:35 Uhr:
Hi,


Am 06.02.2018 um 08:24 schrieb Michael Krüger:
> create or replace function reports.generic_query(_sql text)
>   RETURNS SETOF record
>   LANGUAGE 'plpgsql'
>   PARALLEL SAFE
>   COST 100

there is an other parameter, parallel_setup_cost, with default = 1000. I
think, you should set this parameter too.

Please keep me informed, it is interessting me.


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com


pgsql-general by date:

Previous
From: Michael Krüger
Date:
Subject: Re: Parallel Query - Can it be used within functions?
Next
From: mariusz
Date:
Subject: Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)