Re: Asynchronous execution on FDW - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Asynchronous execution on FDW
Date
Msg-id 20150707.101935.28049720.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Asynchronous execution on FDW  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Asynchronous execution on FDW
List pgsql-hackers
Hello, thank you for looking this.

If it is acceptable to reconstruct the executor nodes to have
additional return state PREP_RUN or such (which means it needs
one more call for the first tuple) , I'll modify the whole
executor to handle the state in the next patch to do so.

I haven't take the advice I had so far in this sense. But I came
to think that it is the most reasonable way to solve this.


======
> > - It was a problem when to give the first kick for async exec. It
> >    is not in ExecInit phase, and ExecProc phase does not fit,
> >    too. An extra phase ExecPreProc or something is too
> >    invasive. So I tried "pre-exec callback".
> >
> >    Any init-node can register callbacks on their turn, then the
> >    registerd callbacks are called just before ExecProc phase in
> >    executor. The first patch adds functions and structs to enable
> >    this.
> 
> At a quick glance, I think this has all the same problems as starting
> the execution at ExecInit phase. The correct way to do this is to kick
> off the queries in the first IterateForeignScan() call. You said that
> "ExecProc phase does not fit" - why not?

Execution nodes are expected to return the first tuple if
available. But asynchronous execution can not return the first
tuple immediately. Simultaneous execution for the first tuple on
every foreign node is crucial than asynchronous fetching for many
cases, especially for the cases like sort/agg pushdown on FDW.

The reason why ExecProc does not fit is that the first loop
without returning tuple looks impact too large portion in
executor.

It is my mistake that it doesn't address the problem about
parameterized paths. Parameterized paths should be executed
within ExecProc loops so this patch would be like following.

- To gain the advantage of kicking execution before the first ExecProc loop, non-parameterized paths are started using
thecallback feature this patch provides.
 

- Parameterized paths need the upper nodes executed before it starts execution so they should be start in ExecProc
loop,but runs asynchronously if possible.
 

This is rather a makeshift solution for the problem, but
considering current trend of parallelism, it might the time to
make the executor to fit parallel execution.

If it is acceptable to reconstruct the executor nodes to have
additional return state PREP_RUN or such (which means it needs
one more call for the first tuple) , I'll modify the whole
executor to handle the state in the next patch to do so.

I hate my stupidity if you suggested this kind of solution by "do
it in ExecProc":(

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Haribabu Kommi
Date:
Subject: Re: Parallel Seq Scan
Next
From: Michael Paquier
Date:
Subject: Re: Support for N synchronous standby servers - take 2