Home > mailing lists

Re: Volatile Functions in Parallel Plans - Mailing list pgsql-hackers

From	Zhenghua Lyu
Subject	Re: Volatile Functions in Parallel Plans
Date	July 16, 2020 04:22:36
Msg-id	SN6PR05MB455953FEB8591EBE298A7099B57F0@SN6PR05MB4559.namprd05.prod.outlook.com Whole thread Raw
In response to	Re: Volatile Functions in Parallel Plans (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Volatile Functions in Parallel Plans
List	pgsql-hackers

Tree view

Hi, thanks for your reply.

But this won't be consistent even for non-parallel plans.

If we do not use the distributed law of parallel join, it seems

OK.

If we generate a parallel plan using the distributed law of the join,

then this transformation's pre-assumption might be broken.

Currently, we don't consider volatile functions as
parallel-safe by default.

I run the SQL in pg12:

zlv=# select count(proname) from pg_proc where provolatile = 'v' and proparallel ='s';
count
-------
100
(1 row)

zlv=# select proname from pg_proc where provolatile = 'v' and proparallel ='s';
proname
----------------------------------------
timeofday
bthandler
hashhandler
gisthandler
ginhandler
spghandler
brinhandler

It seems there are many functions which is both volatile and parallel safe.

From: Amit Kapila <amit.kapila16@gmail.com>
Sent: Thursday, July 16, 2020 12:07 PM
To: Zhenghua Lyu <zlyu@vmware.com>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Volatile Functions in Parallel Plans

On Wed, Jul 15, 2020 at 6:14 PM Zhenghua Lyu <zlyu@vmware.com> wrote:
>
>
> The first plan:
>
> Finalize Aggregate
>    -> Gather
>          Workers Planned: 2
>          -> Partial Aggregate
>                -> Nested Loop
>                      Join Filter: (t3.c1 = t4.c1)
>                      -> Parallel Seq Scan on t3
>                            Filter: (c1 ~~ '%sss'::text)
>                      -> Seq Scan on t4
>                            Filter: (timeofday() = c1)
>
> The join's left tree is parallel scan and the right tree is seq scan.
> This algorithm is correct using the distribute distributive law of
> distributed join:
>        A = [A1 A2 A3...An], B then A join B = gather( (A1 join B) (A2 join B) ... (An join B) )
>
> The correctness of the above law should have a pre-assumption:
>       The data set of B is the same in each join: (A1 join B) (A2 join B) ... (An join B)
>
> But things get complicated when volatile functions come in. Timeofday is just
> an example to show the idea. The core is volatile functions can return different
> results on successive calls with the same arguments. Thus the following piece,
> the right tree of the join
>                      -> Seq Scan on t4
>                            Filter: (timeofday() = c1)
> can not be considered consistent everywhere in the scan workers.
>

But this won't be consistent even for non-parallel plans. I mean to
say for each loop of join the "Seq Scan on t4" would give different
results. Currently, we don't consider volatile functions as
parallel-safe by default.

--
With Regards,
Amit Kapila.
EnterpriseDB: https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com%2F&data=02%7C01%7Czlyu%40vmware.com%7C825aa0c2259c4da0112008d8293dcd1c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637304692698598521&sdata=LWZnJ43KQML3EBwB2DoPGE0KHA2t6A3%2FIS9KSLx%2Bcn4%3D&reserved=0

pgsql-hackers by date:

From: Masahiko Sawada
Date: 16 July 2020, 04:16:50
Subject: Re: Transactions involving multiple postgres foreign servers, take 2

From: "tsunakawa.takay@fujitsu.com"
Date: 16 July 2020, 04:53:38
Subject: RE: Transactions involving multiple postgres foreign servers, take 2

Re: Volatile Functions in Parallel Plans - Mailing list pgsql-hackers

Previous

Next