On Thu, Nov 26, 2020 at 12:15 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> I took a deep look at the projection logic.
> In most cases, you are right that Gather node does not need projection.
>
> In some rare cases, such as Subplan (or initplan I guess).
> The projection will happen in Gather node.
>
> The example:
>
> Create table test(i int);
> Create table test2(a int, b int);
> insert into test values(generate_series(1,10000000,1));
> insert into test2 values(generate_series(1,1000,1), generate_series(1,1000,1));
>
> postgres=# explain(verbose, costs off) select test.i,(select i from (select * from test2) as tt limit 1) from test
wheretest.i < 2000;
> QUERY PLAN
> ----------------------------------------
> Gather
> Output: test.i, (SubPlan 1)
> Workers Planned: 2
> -> Parallel Seq Scan on public.test
> Output: test.i
> Filter: (test.i < 2000)
> SubPlan 1
> -> Limit
> Output: (test.i)
> -> Seq Scan on public.test2
> Output: test.i
>
> In this case, projection is necessary,
> because the subplan will be executed in projection.
>
> If skipped, the table created will loss some data.
>
Thanks a lot for the use case. Yes with the current patch table will
lose data related to the subplan. On analyzing further, I think we can
not allow parallel inserts in the cases when the Gather node has some
projections to do. Because the workers can not perform that
projection. So, having ps_ProjInfo in the Gather node is an indication
for us to disable parallel inserts and only the leader can do the
insertions after the Gather node does the required projections.
Thoughts?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com