Thread: parallel foreign scan

parallel foreign scan

From
Manuel Kniep
Date:
Dear hackers,

I’m working on a foreign database wrapper for Kafka [1]
Now I am trying to make it parallel aware. Following 
the documentation [2]
However it seems that I can’t make it use more than a
single worker with force_parallel_mode = on.

I wonder if I need to do more than just implementing the
needed callback function to benefit from multiple workers.

Looking at create_foreignscan_path in path_nodes.c
I found that the ForeignPath seems to always set

pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel;
pathnode->path.parallel_workers = 0;

Do I need so set these in my GetForeignPaths callback manually?

Is there anything else I need to do?

Thanks

Manuel

Re: parallel foreign scan

From
Kyotaro HORIGUCHI
Date:
At Tue, 15 May 2018 23:09:31 +0200, Manuel Kniep <m.kniep@web.de> wrote in
<D84E3D72-2E83-482B-8EF8-D25F93F1CEA8@web.de>
> Dear hackers,
> 
> I’m working on a foreign database wrapper for Kafka [1]
> Now I am trying to make it parallel aware. Following 
> the documentation [2]
> However it seems that I can’t make it use more than a
> single worker with force_parallel_mode = on.
> 
> I wonder if I need to do more than just implementing the
> needed callback function to benefit from multiple workers.
> 
> Looking at create_foreignscan_path in path_nodes.c
> I found that the ForeignPath seems to always set
> 
> pathnode->path.parallel_aware = false;
> pathnode->path.parallel_safe = rel->consider_parallel;
> pathnode->path.parallel_workers = 0;
> 
> Do I need so set these in my GetForeignPaths callback manually?

Right. create_foreignscan_path is used by FDW drivers to create
the path struct. GetForeignPaths() needs to finish the path by
setting the parameters and partial paths.

# I myself haven't do that so I'm not sure the details.

> Is there anything else I need to do?

I think you are trying collecting data from multple kafka
server. This means each server has a dedicate foreign table on a
dedicate foreign server. Parallel execution doesn't fit in that
case since it works on single base relation (or a
table). Parallel append/merge append look a bit different but
actually is the same in the sense that one base relation is
scanned on multiple workers. Even if you are trying to fetch from
one kafka stream on multiple workers, I think the fdw driver
doesn't support parallel scanning anyway.

In any case it is inevitable to modify the fdw driver.

If you are trying to collect data from multple servers, the
following proposed PoC patch is a implement of asynchronous
execution of postgres_fdw and it might be helpful.

https://www.postgresql.org/message-id/20180515.202945.69332784.horiguchi.kyotaro@lab.ntt.co.jp

The postgres_fdw.c part in it is complicated since it supports
shared connection but not that complex ignoring that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: parallel foreign scan

From
Manuel Kniep
Date:

I think you are trying collecting data from multple kafka
server. This means each server has a dedicate foreign table on a
dedicate foreign server. Parallel execution doesn't fit in that
case since it works on single base relation (or a
table). Parallel append/merge append look a bit different but
actually is the same in the sense that one base relation is
scanned on multiple workers. Even if you are trying to fetch from
one kafka stream on multiple workers, I think the fdw driver
doesn't support parallel scanning anyway.

Well my idea was to to scan multiple partitions from a single kafka
server / topic  in parallel.
I’ll will look into your suggestion to set up partial paths

regards

Manuel