Re: parallel foreign scan - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: parallel foreign scan
Date
Msg-id 20180516.110902.216853091.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to parallel foreign scan  (Manuel Kniep <m.kniep@web.de>)
Responses Re: parallel foreign scan  (Manuel Kniep <m.kniep@web.de>)
List pgsql-hackers
At Tue, 15 May 2018 23:09:31 +0200, Manuel Kniep <m.kniep@web.de> wrote in
<D84E3D72-2E83-482B-8EF8-D25F93F1CEA8@web.de>
> Dear hackers,
> 
> I’m working on a foreign database wrapper for Kafka [1]
> Now I am trying to make it parallel aware. Following 
> the documentation [2]
> However it seems that I can’t make it use more than a
> single worker with force_parallel_mode = on.
> 
> I wonder if I need to do more than just implementing the
> needed callback function to benefit from multiple workers.
> 
> Looking at create_foreignscan_path in path_nodes.c
> I found that the ForeignPath seems to always set
> 
> pathnode->path.parallel_aware = false;
> pathnode->path.parallel_safe = rel->consider_parallel;
> pathnode->path.parallel_workers = 0;
> 
> Do I need so set these in my GetForeignPaths callback manually?

Right. create_foreignscan_path is used by FDW drivers to create
the path struct. GetForeignPaths() needs to finish the path by
setting the parameters and partial paths.

# I myself haven't do that so I'm not sure the details.

> Is there anything else I need to do?

I think you are trying collecting data from multple kafka
server. This means each server has a dedicate foreign table on a
dedicate foreign server. Parallel execution doesn't fit in that
case since it works on single base relation (or a
table). Parallel append/merge append look a bit different but
actually is the same in the sense that one base relation is
scanned on multiple workers. Even if you are trying to fetch from
one kafka stream on multiple workers, I think the fdw driver
doesn't support parallel scanning anyway.

In any case it is inevitable to modify the fdw driver.

If you are trying to collect data from multple servers, the
following proposed PoC patch is a implement of asynchronous
execution of postgres_fdw and it might be helpful.

https://www.postgresql.org/message-id/20180515.202945.69332784.horiguchi.kyotaro@lab.ntt.co.jp

The postgres_fdw.c part in it is complicated since it supports
shared connection but not that complex ignoring that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: [HACKERS] Planning counters in pg_stat_statements
Next
From: Michael Paquier
Date:
Subject: Re: Cache lookup errors with functions manipulation object addresses