Thread: parallel foreign scan
Dear hackers,
I’m working on a foreign database wrapper for Kafka [1]
Now I am trying to make it parallel aware. Following
the documentation [2]
However it seems that I can’t make it use more than a
single worker with force_parallel_mode = on.
I wonder if I need to do more than just implementing the
needed callback function to benefit from multiple workers.
Looking at create_foreignscan_path in path_nodes.c
I found that the ForeignPath seems to always set
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel;
pathnode->path.parallel_workers = 0;
Do I need so set these in my GetForeignPaths callback manually?
Is there anything else I need to do?
Thanks
Manuel
At Tue, 15 May 2018 23:09:31 +0200, Manuel Kniep <m.kniep@web.de> wrote in <D84E3D72-2E83-482B-8EF8-D25F93F1CEA8@web.de> > Dear hackers, > > I’m working on a foreign database wrapper for Kafka [1] > Now I am trying to make it parallel aware. Following > the documentation [2] > However it seems that I can’t make it use more than a > single worker with force_parallel_mode = on. > > I wonder if I need to do more than just implementing the > needed callback function to benefit from multiple workers. > > Looking at create_foreignscan_path in path_nodes.c > I found that the ForeignPath seems to always set > > pathnode->path.parallel_aware = false; > pathnode->path.parallel_safe = rel->consider_parallel; > pathnode->path.parallel_workers = 0; > > Do I need so set these in my GetForeignPaths callback manually? Right. create_foreignscan_path is used by FDW drivers to create the path struct. GetForeignPaths() needs to finish the path by setting the parameters and partial paths. # I myself haven't do that so I'm not sure the details. > Is there anything else I need to do? I think you are trying collecting data from multple kafka server. This means each server has a dedicate foreign table on a dedicate foreign server. Parallel execution doesn't fit in that case since it works on single base relation (or a table). Parallel append/merge append look a bit different but actually is the same in the sense that one base relation is scanned on multiple workers. Even if you are trying to fetch from one kafka stream on multiple workers, I think the fdw driver doesn't support parallel scanning anyway. In any case it is inevitable to modify the fdw driver. If you are trying to collect data from multple servers, the following proposed PoC patch is a implement of asynchronous execution of postgres_fdw and it might be helpful. https://www.postgresql.org/message-id/20180515.202945.69332784.horiguchi.kyotaro@lab.ntt.co.jp The postgres_fdw.c part in it is complicated since it supports shared connection but not that complex ignoring that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
I think you are trying collecting data from multple kafka
server. This means each server has a dedicate foreign table on a
dedicate foreign server. Parallel execution doesn't fit in that
case since it works on single base relation (or a
table). Parallel append/merge append look a bit different but
actually is the same in the sense that one base relation is
scanned on multiple workers. Even if you are trying to fetch from
one kafka stream on multiple workers, I think the fdw driver
doesn't support parallel scanning anyway.
Well my idea was to to scan multiple partitions from a single kafka
server / topic in parallel.
I’ll will look into your suggestion to set up partial paths
regards
Manuel