Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query
Date
Msg-id CAA4eK1+6U0fOLMdMMk-iMC-6RSM+70p-9YqCVnWTEBH=V73Agg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #15324: Non-deterministic behaviour from parallelisedsub-query
List pgsql-bugs
On Tue, Aug 14, 2018 at 9:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Tiikkaja <marko@joh.to> writes:
>> Marking the function parallel safe doesn't seem wrong to me.  The
>> non-parallel-safe part is that the input gets fed to it in different order
>> in different workers.  And I don't really think that to be the function's
>> fault.
>
> So that basically opens the question of whether *any* window function
> calculation can safely be pushed down to parallel workers.
>

I think we can consider it as a parallel-restricted operation.  For
the purpose of testing, I have marked row_number as
parallel-restricted in pg_proc and I get the below plan:

postgres=# Explain select count(*) from qwr where (a, b) in (select a,
row_number() over() from qwr);
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Aggregate  (cost=46522.12..46522.13 rows=1 width=8)
   ->  Hash Semi Join  (cost=24352.08..46362.12 rows=64001 width=0)
         Hash Cond: ((qwr.a = qwr_1.a) AND (qwr.b = (row_number() OVER (?))))
         ->  Gather  (cost=0.00..18926.01 rows=128002 width=8)
               Workers Planned: 2
               ->  Parallel Seq Scan on qwr  (cost=0.00..18926.01
rows=64001 width=8)
         ->  Hash  (cost=21806.06..21806.06 rows=128002 width=12)
               ->  WindowAgg  (cost=0.00..20526.04 rows=128002 width=12)
                     ->  Gather  (cost=0.00..18926.01 rows=128002 width=4)
                           Workers Planned: 2
                           ->  Parallel Seq Scan on qwr qwr_1
(cost=0.00..18926.01 rows=64001 width=4)
(11 rows)

This seems okay, though the results of the above parallel-execution
are not same as serial-execution.  I think the reason for it is that
we don't get rows in predictable order from workers.

> Somewhat like the LIMIT/OFFSET case, it seems to me that we could only
> expect to do this safely if the row ordering induced by the WINDOW clause
> can be proven to be fully deterministic.  The planner has no such smarts
> at the moment AFAIR.  In principle you could do it if there were
> partitioning/ordering by a primary key, but I'm not excited about the
> prospects of that being true often enough in practice to justify making
> the check.
>

Yeah, I am also not sure if it is worth adding the additional checks.
So, for now, we can treat any window function calculation as
parallel-restricted and if later anybody has a reason strong enough to
relax the restriction for some particular case, we will consider it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgsql-bugs by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query
Next
From: Stephen Frost
Date:
Subject: Re: BUG #15324: Non-deterministic behaviour from parallelisedsub-query