Re: Asynchronous execution on FDW - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Asynchronous execution on FDW
Date
Msg-id 20150724.151059.102807210.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Asynchronous execution on FDW  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Responses Re: Asynchronous execution on FDW
List pgsql-hackers
Hello,

At Thu, 23 Jul 2015 09:38:39 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in
<9A28C8860F777E439AA12E8AEA7694F80111BCEC@BPXM15GP.gisp.nec.co.jp>
> I expected workloads like single shot scan on a partitioned large
> fact table on DWH system. Yep, if workload is expected to rescan
> so frequently, its expected cost shall be higher (by the cost to
> launch bgworker) than existing Append, then planner will kick out
> this path.
> 
> Regarding of interaction between Limit and ParallelMergeAppend,
> it is probably the best scenario, isn't it? If Limit picks up
> the least 1000rows from a partitioned table consists of 20 child
> tables, ParallelMergeAppend can launch 20 parallel jobs that
> picks up the least 1000rows from the child relations for each.
> Probably, it is same job done in pass_down_bound() of nodeLimit.c.

Yes. I confused a bit. The scenario is one of least problematic
cases.

> > As for ForeignScan, it is merely an API for FDW and does nothing
> > substantial so it would have nothing special to do. As for
> > postgres_fdw, current patch restricts one execution per one
> > foreign server at once by itself. We would have to provide
> > another execution management if we want to have two or more
> > simultaneous scans per one foreign server at once.
> >
> Yep, your 4th patch defines a new callback to FdwRoutines and
> 5th patch implements postgres_fdw specific portion.
> It shall work for distributed / shaded database environment well,
> however, its benefit is around ForeignScan only.
> Once management node kicks underlying SeqScan, ForeignScan or
> others in parallel, it also enables to run local heap scan
> asynchronously.

I suppose SeqScan don't need async kick since its startup cost is
extremely low as nothing. (fetching first several pages would
boost seqscans?) On the other hand sort/hash would be a field
where asynchronous execution is in effect.

> > Sorry for the focusless discussion but does this answer some of
> > your question?
> >
> Hmm... Its advantage is still unclear for me. However, it is not
> fair to hijack this thread by my idea.

It would be more advantageous if join/sort pushdown on fdw comes,
where start-up cost could be extremely high...

> I'll submit my design proposal about ParallelAppend towards the
> next commit-fest. Please comment on.

Ok, I'll come there.

> > > Expected waste of CPU or I/O is common problem to be solved, however, it does
> > > not need to add a special case handling to ForeignScan, I think.
> > > How about your opinion?
> > 
> > I agree with you that ForeignScan as the wrapper for FDWs don't
> > need anything special for the case. I suppose for now that
> > avoiding the penalty from abandoning too many speculatively
> > executed scans (or other works on bg worker like sorts) would be
> > a business of the upper node of FDWs, or somewhere else.
> > 
> > However, I haven't dismissed the possibility that some common
> > works related to resource management could be integrated into
> > executor (or even into planner), but I see none for now.
> >
> I also agree with it is "eventually" needed, but may not be supported
> in the first version.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: pgbench - allow backslash-continuations in custom scripts
Next
From: Heikki Linnakangas
Date:
Subject: Re: WAL logging problem in 9.4.3?