I forgot to mention.
At Tue, 12 Jul 2016 11:04:17 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20160712.110417.145469826.horiguchi.kyotaro@lab.ntt.co.jp>
> Cooled down then measured performance again.
>
> I show you the true result briefly for now.
>
> At Mon, 11 Jul 2016 19:07:22 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote
in<20160711.190722.145849861.horiguchi.kyotaro@lab.ntt.co.jp>
> > Anyway I need some time to cool down..
>
> I recalled that I put Makefile.custom that contains
> CFLAGS="-O0". Removing that gave me a sainer result.
Different from the previous measurements, the remote side in
these measurements is unpatched-O2 postgres, so the differences
are made only by the local-side changes.
> patched- -O2
>
> table 10-average(ms) stddev runtime-diff from unpatched(%)
> t0 441.78 0.32 3.4
> pl 201.77 0.32 13.6
> pf0 6619.22 18.99 -19.7
> pf1 1800.72 32.72 -78.0
> ---
> unpatched- -O2
>
> t0 427.21 0.42
> pl 177.54 0.25
> pf0 8250.42 23.29
> pf1 8206.02 12.91
>
> ==========
>
> 3% slower for local 1*seqscan (2-parallel)
> 14% slower for append-4*seqscan (no-prallel)
> 19% faster for append-4*foreignscan (all scans on one connection)
> 78% faster for append-4*foreignscan (scans have dedicate connection)
>
> ExecProcNode might be able to be optimized a bit.
> ExecAppend seems to need some fix.
>
> Addition to the aboves, I will try reentrant ExecAsyncWaitForNode
> or something.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center