Thread: Asynchronous execution on FDW
Hello. This is the new version of FDW async exection feature. The status of this feature is as follows, as of the last commitfest. - Async execution is valuable to have. - But do the first kick in ExecInit phase is wrong. So the design outline of this version is as following, - The patch set consists of three parts. The fist is the infrastracture in core-side, second is the code to enable asynchronousexecution of Postgres-FDW. The third part is the alternative set of three methods to adapt fetch size, whichmakes asynchronous execution more effective. - It was a problem when to give the first kick for async exec. It is not in ExecInit phase, and ExecProc phase does not fit,too. An extra phase ExecPreProc or something is too invasive. So I tried "pre-exec callback". Any init-node can register callbacks on their turn, then the registerd callbacks are called just before ExecProc phase inexecutor. The first patch adds functions and structs to enable this. - The second part is not changed from the previous version. Add PgFdwConn as a extended PgConn which have some members tosupport asynchronous execution. The asynchronous execution is kicked only for the first ForeignScan node on the same foreign server. And the state lastsuntil the next scan comes. This behavior is mainly controlled in fetch_more_data(). The behavior limits the number ofsimultaneous exection for one foreign server to 1. This behavior is decided from the reason that no reasonable method tolimit multiplicity of execution on *single peer* was found so far. - The third part is three kind of trials of adaptive fetch size feature. The first one is duration-based adaptation. The patch increases the fetch size by every FETCH execution but try to keepthe duration of every FETCH below 500 ms. But it is not promising because it looks very unstable, or the behavior is nearly unforeseeable.. The second one is based on byte-based FETCH feature. This patch adds to FETCH command an argument to limit the numberof bytes (octets) to send. But this might be a over-exposure of the internals. The size is counted based on internal representation of a tuple and the client is needed to send the overhead of its internal tuple representation inbytes. This is effective but quite ugly.. The third is the most simple and straight-forward way, that is, adds a foreign table option to specify the fetch_size.The effect of this is also in doubt since the size of tuples for one foreign table would vary according to thereturn-columns list. But it is foreseeable for users and is a necessary knob for those who want to tune it. Foreignserver also could have the same option as the default for that for foreign tables but this patch have not addedit. The attached patches are the following, - 0001-Add-infrastructure-of-pre-execution-callbacks.patch Infrastructure of pre-execution callback - 0002-Allow-asynchronous-remote-query-of-postgres_fdw.patch FDW asynchronous execution feature - 0003a-Add-experimental-POC-adaptive-fetch-size-feature.patch Adaptive fetch size alternative 1: duration based control - 0003b-POC-Experimental-fetch_by_size-feature.patch Adaptive fetch size alternative 2: FETCH by size - 0003c-Add-foreign-table-option-to-set-fetch-size.patch Adaptive fetch size alternative 3: Foreign table option. regards,
Ouch! I mistakenly made two CF entries for this patch. Could someone remove this entry for me? https://commitfest.postgresql.org/5/290/ The correct entry is "/5/291/" ====== Hello. This is the new version of FDW async exection feature. The status of this feature is as follows, as of the last commitfest. - Async execution is valuable to have. - But do the first kick in ExecInit phase is wrong. So the design outline of this version is as following, - The patch set consists of three parts. The fist is the infrastracture in core-side, second is the code to enable asynchronousexecution of Postgres-FDW. The third part is the alternative set of three methods to adapt fetch size, whichmakes asynchronous execution more effective. - It was a problem when to give the first kick for async exec. It is not in ExecInit phase, and ExecProc phase does not fit,too. An extra phase ExecPreProc or something is too invasive. So I tried "pre-exec callback". Any init-node can register callbacks on their turn, then the registerd callbacks are called just before ExecProc phase inexecutor. The first patch adds functions and structs to enable this. - The second part is not changed from the previous version. Add PgFdwConn as a extended PgConn which have some members tosupport asynchronous execution. The asynchronous execution is kicked only for the first ForeignScan node on the same foreign server. And the state lastsuntil the next scan comes. This behavior is mainly controlled in fetch_more_data(). The behavior limits the number ofsimultaneous exection for one foreign server to 1. This behavior is decided from the reason that no reasonable method tolimit multiplicity of execution on *single peer* was found so far. - The third part is three kind of trials of adaptive fetch size feature. The first one is duration-based adaptation. The patch increases the fetch size by every FETCH execution but try to keepthe duration of every FETCH below 500 ms. But it is not promising because it looks very unstable, or the behavior is nearly unforeseeable.. The second one is based on byte-based FETCH feature. This patch adds to FETCH command an argument to limit the numberof bytes (octets) to send. But this might be a over-exposure of the internals. The size is counted based on internal representation of a tuple and the client is needed to send the overhead of its internal tuple representation inbytes. This is effective but quite ugly.. The third is the most simple and straight-forward way, that is, adds a foreign table option to specify the fetch_size.The effect of this is also in doubt since the size of tuples for one foreign table would vary according to thereturn-columns list. But it is foreseeable for users and is a necessary knob for those who want to tune it. Foreignserver also could have the same option as the default for that for foreign tables but this patch have not addedit. The attached patches are the following, - 0001-Add-infrastructure-of-pre-execution-callbacks.patch Infrastructure of pre-execution callback - 0002-Allow-asynchronous-remote-query-of-postgres_fdw.patch FDW asynchronous execution feature - 0003a-Add-experimental-POC-adaptive-fetch-size-feature.patch Adaptive fetch size alternative 1: duration based control - 0003b-POC-Experimental-fetch_by_size-feature.patch Adaptive fetch size alternative 2: FETCH by size - 0003c-Add-foreign-table-option-to-set-fetch-size.patch Adaptive fetch size alternative 3: Foreign table option. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Jul 2, 2015 at 3:07 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Ouch! I mistakenly made two CF entries for this patch. Could > someone remove this entry for me? > > https://commitfest.postgresql.org/5/290/ > > The correct entry is "/5/291/" Done. -- Michael
Thank you. At Thu, 2 Jul 2015 16:02:27 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqTs0YCwXedt1P=JjxFJeoj9UzLzkLuiX8=JdtPYUtNwwg@mail.gmail.com> > On Thu, Jul 2, 2015 at 3:07 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > Ouch! I mistakenly made two CF entries for this patch. Could > > someone remove this entry for me? > > > > https://commitfest.postgresql.org/5/290/ > > > > The correct entry is "/5/291/" > > Done. -- Kyotaro Horiguchi NTT Open Source Software Center
On 07/02/2015 08:48 AM, Kyotaro HORIGUCHI wrote: > - It was a problem when to give the first kick for async exec. It > is not in ExecInit phase, and ExecProc phase does not fit, > too. An extra phase ExecPreProc or something is too > invasive. So I tried "pre-exec callback". > > Any init-node can register callbacks on their turn, then the > registerd callbacks are called just before ExecProc phase in > executor. The first patch adds functions and structs to enable > this. At a quick glance, I think this has all the same problems as starting the execution at ExecInit phase. The correct way to do this is to kick off the queries in the first IterateForeignScan() call. You said that "ExecProc phase does not fit" - why not? - Heikki
Hello, thank you for looking this. If it is acceptable to reconstruct the executor nodes to have additional return state PREP_RUN or such (which means it needs one more call for the first tuple) , I'll modify the whole executor to handle the state in the next patch to do so. I haven't take the advice I had so far in this sense. But I came to think that it is the most reasonable way to solve this. ====== > > - It was a problem when to give the first kick for async exec. It > > is not in ExecInit phase, and ExecProc phase does not fit, > > too. An extra phase ExecPreProc or something is too > > invasive. So I tried "pre-exec callback". > > > > Any init-node can register callbacks on their turn, then the > > registerd callbacks are called just before ExecProc phase in > > executor. The first patch adds functions and structs to enable > > this. > > At a quick glance, I think this has all the same problems as starting > the execution at ExecInit phase. The correct way to do this is to kick > off the queries in the first IterateForeignScan() call. You said that > "ExecProc phase does not fit" - why not? Execution nodes are expected to return the first tuple if available. But asynchronous execution can not return the first tuple immediately. Simultaneous execution for the first tuple on every foreign node is crucial than asynchronous fetching for many cases, especially for the cases like sort/agg pushdown on FDW. The reason why ExecProc does not fit is that the first loop without returning tuple looks impact too large portion in executor. It is my mistake that it doesn't address the problem about parameterized paths. Parameterized paths should be executed within ExecProc loops so this patch would be like following. - To gain the advantage of kicking execution before the first ExecProc loop, non-parameterized paths are started using thecallback feature this patch provides. - Parameterized paths need the upper nodes executed before it starts execution so they should be start in ExecProc loop,but runs asynchronously if possible. This is rather a makeshift solution for the problem, but considering current trend of parallelism, it might the time to make the executor to fit parallel execution. If it is acceptable to reconstruct the executor nodes to have additional return state PREP_RUN or such (which means it needs one more call for the first tuple) , I'll modify the whole executor to handle the state in the next patch to do so. I hate my stupidity if you suggested this kind of solution by "do it in ExecProc":( regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, This is the new version of this patch. At Tue, 07 Jul 2015 10:19:35 +0900, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote > This is rather a makeshift solution for the problem, but > considering current trend of parallelism, it might the time to > make the executor to fit parallel execution. > > If it is acceptable to reconstruct the executor nodes to have > additional return state PREP_RUN or such (which means it needs > one more call for the first tuple) , I'll modify the whole > executor to handle the state in the next patch to do so. I made a patchset to do this. The details of it and some examples are shown after the summary below. - I provided an infrastructure for asynchronous (simultaneous) execution of multiple execnodes belonging one node, like joins. - It (should) have addressed the "parameterized plan" problem. - The infrastructure is a bit intrusive but simple, and it will be usable by any nodes that supports asynchronous execution(none so far except fdw, needs some modification in core). So the async exec for Postgres-FDW now became an exapmlefor the infrastructure. It might be nice to start backend worker for promising async resuest for a sort node. - The postgres_fdw part is almost the same as the previous one. The detailed explanation of the patchset follows. ============ I made a patchset to do this. It consists of five patches (plus one for debug message). 1. Add infrastructure for run-state of executor node. Currently executor nodes have binary run-states, one is!TupIsNull(slot) which indicates that the next tuple may comefromthe node, and the another is TupIsNull(slot) whichindicates that no more tuple will be come. This patch expands it to four-state and have the value inPlanState struct. Inited : it is just after initialized. Started: it is startd execution but no tuple retrieved. This could be skipped. Running: it is returning tuples. Done : it has no more tuple to return. This is equivalent to TupIsNull(slot). The nodes Group, ModifyTable, SetOp and WindowAgg had their ownstate flag replaceable by the new states in their own *Statepartso they are moved to this new state set in this patch. Thispatch does not change the current behavior. 2. Change all tuple-returning execnodes to maintain the new run-state appropriately. The rest nodes are modified by this patch to maintain the stateto be consistent with the TupIsNull() state at the ExecProcNodelevel. This patch does not change the current behavior, too. (Ifeel that the state Done would be no other thanan encumbrancein maintenance. The state is not referred in nowhere) 3. Add a feature to start node asynchronously. All nodes that have more than one child node can execute thechildren asynchronously by this patch. It tries start childrenasynchronouslyif the state is "Inited" when entering Exec*functions. Async request for nodes which has just one childissimply propagated to the child, and leaf nodes such as scanswill decide whether to be async or not. Currently no leafnodecan be async except postgres_fdw. NestLoop may run parameterized plan so it is specially treatedin StartNestLoop so that parameterized plans will not beasynchronouslystarted. In StartHashJoin, whether the inner (hash) node is executed ornot is judged by the similar logic with ExecHashJoin. Even after this patch applied, no leaf node can startasynchronously so the behavior of the executor still beunchanged. 4. Add StartForeignScan to FdwRoutine Add new entry function to accept the asynchronous executionrequest from the core. 5. Allow asynchronous remote query of postgres_fdw. This is almost the same as the previous version. Except that itruns on the new infrastructure, and added new server/foreigntableoption allow_async. The first foreign scan on the same server will be asynchronouslystarted execution if requested. And apart from the asyncstart,every successive fetches for the same foreign scan will beasynchronously fetched. Currently there's no means to observe what it is doing from outside, so the additional sixth patch is to output debug messages about asynchronous execution. However, currently it is no test code for that but I'm at a loss what to do as the test.. FWIW I provided two exaples of running asynchronous exexution. regards, ===== Example CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', dbname 'postgres'); CREATE SERVER sv2 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', dbname 'postgres'); CREATE USER MAPPING FOR CURRENT_USER SERVER sv1; CREATE USER MAPPING FOR CURRENT_USER SERVER sv2; CREATE TABLE lp (a int, b int); CREATE TABLE lt1 () INHERITS (lp); CREATE TABLE lt2 () INHERITS (lp); CREATE TABLE lt3 () INHERITS (lp); CREATE TABLE lt4 () INHERITS (lp); CREATE TABLE fp (LIKE lp); CREATE FOREIGN TABLE ft1 () INHERITS (fp) SERVER sv1 OPTIONS (table_name 'lt1'); CREATE FOREIGN TABLE ft2 () INHERITS (fp) SERVER sv1 OPTIONS (table_name 'lt1'); CREATE FOREIGN TABLE ft3 () INHERITS (fp) SERVER sv2 OPTIONS (table_name 'lt1'); CREATE FOREIGN TABLE ft4 () INHERITS (fp) SERVER sv2 OPTIONS (table_name 'lt1'); INSERT INTO lt1 (SELECT a, a FROM generate_series(0, 999) a); INSERT INTO lt2 (SELECT a+1000, a FROM generate_series(0, 999) a); INSERT INTO lt3 (SELECT a+2000, a FROM generate_series(0, 999) a); INSERT INTO lt4 (SELECT a+3000, a FROM generate_series(0, 999) a); ;; TEST FOR SIMPLE APPEND =# SELECT * FROM fp; 1 LOG: pg_fdw: [ft1/sv1/0x293a580] Async exec started. 2 LOG: pg_fdw: [ft2/sv1/0x293a580] Async exec denied. 3 LOG: pg_fdw: [ft3/sv2/0x2898c70] Async exec started. 4 LOG: pg_fdw: [ft4/sv2/0x2898c70] Async exec denied. 5 LOG: pg_fdw: [ft1/sv1/0x293a580] Async fetch .... 6 LOG: pg_fdw: [ft1/sv1/0x293a580] Async fetch 7 LOG: pg_fdw: [ft2/sv1/0x293a580] Sync fetch. 8 LOG: pg_fdw: [ft2/sv1/0x293a580] Async fetch ... 9 LOG: pg_fdw: [ft2/sv1/0x293a580] Async fetch 10 LOG: pg_fdw: [ft3/sv2/0x2898c70] Async fetch .... 11 LOG: pg_fdw: [ft3/sv2/0x2898c70] Async fetch 12 LOG: pg_fdw: [ft4/sv2/0x2898c70] Sync fetch. 14 LOG: pg_fdw: [ft4/sv2/0x2898c70] Async fetch ... 15 LOG: pg_fdw: [ft4/sv2/0x2898c70] Async fetch ;; The notation inside the square bracket is ;; <table name>/<server name>/<ponter of connection>. ;; ;; 1-4 foreign servers denied async for the second scan for each (ft2/ft4). ;; ;; At 7, reading different table from 6 made it sync fetch but ;; the successive fetches afterward are async. ;; ;; ft2 and ft3 was on different server so 10 is async fetch for ;; the query executed asynchronously at 3. ;; ;; At 12 the same thing to 7 occurred. ;; TEST FOR PARAMETERIZED NESTLOOP =# SET enable_hashjoin TO false; =# SET enable_mergejoin TO false; =# SET enable_material TO false; =# ALTER FOREIGN TABLE ft4 OPTIONS (ADD use_remote_estimate 'true'); =# SELECT ft4.a FROM ft1 JOIN ft4 ON ft1.b = ft4.b WHERE ft1.a BETWEEN 800 AND 1000; 1 LOG: pg_fdw: [ft1/sv1/0x293a580] Async exec started. 2 LOG: pg_fdw: [ft1/sv1/0x293a580] Async fetch 3 LOG: pg_fdw: [ft4/sv2/0x2898c70] Sync fetch. 4 LOG: pg_fdw: [ft4/sv2/0x2898c70] Sync fetch. ... 5 LOG: pg_fdw: [ft4/sv2/0x2898c70] Sync fetch. 6 LOG: pg_fdw: [ft1/sv1/0x293a580] Async fetch 7 LOG: pg_fdw: [ft4/sv2/0x2898c70] Sync fetch. ... 8 LOG: pg_fdw: [ft4/sv2/0x2898c70] Sync fetch. 9 LOG: pg_fdw: [ft1/sv1/0x293a580] Async fetch ;; ft4 did not even try to async since the inner(ft4) is parameterized. ;; All fetches for inner(ft4) was executed synchronously. ;; ;; Meanwhile, ft1 was continuously reading asynchronously. -- Kyotaro Horiguchi NTT Open Source Software Center
Hi, > Currently there's no means to observe what it is doing from > outside, so the additional sixth patch is to output debug > messages about asynchronous execution. The sixth patch did not contain one message shown in the example. Attached is the revised version. Other patches are not changed. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Jul 3, 2015 at 4:41 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > At a quick glance, I think this has all the same problems as starting the > execution at ExecInit phase. The correct way to do this is to kick off the > queries in the first IterateForeignScan() call. You said that "ExecProc > phase does not fit" - why not? What exactly are those problems? I can think of these: 1. If the scan is parametrized, we probably can't do it for lack of knowledge of what they will be. This seems easy; just don't do it in that case. 2. It's possible that we're down inside some subtree of the plan that won't actually get executed. This is trickier. Consider this: Append -> Foreign Scan -> Foreign Scan -> Foreign Scan <repeat 17 more times> If we don't start each foreign scan until the first tuple is fetched, we will not get any benefit here, because we won't fetch the first tuple from query #2 until we finish reading the results of query #1. If the result of the Append node will be needed in its entirety, we really, really want to launch of those queries as early as possible. OTOH, if there's a Limit node with a small limit on top of the Append node, that could be quite wasteful. We could decide not to care: after all, if our limit is satisfied, we can just bang the remote connections shut, and if they wasted some CPU, well, tough luck for them. But it would be nice to be smarter. I'm not sure how, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello, thank you for the comment. At Fri, 17 Jul 2015 14:34:53 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoaiJK1svzw_GkFU+zsSxciJKFELqu2AOMVUPhpSFw4BsQ@mail.gmail.com> > On Fri, Jul 3, 2015 at 4:41 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > At a quick glance, I think this has all the same problems as starting the > > execution at ExecInit phase. The correct way to do this is to kick off the > > queries in the first IterateForeignScan() call. You said that "ExecProc > > phase does not fit" - why not? > > What exactly are those problems? > > I can think of these: > > 1. If the scan is parametrized, we probably can't do it for lack of > knowledge of what they will be. This seems easy; just don't do it in > that case. We can put an early kick to foreign scans only for the first shot if we do it outside (before) ExecProc phase. Nestloop -> SeqScan -> Append -> Foreign (Index) Scan -> Foreign (Index) Scan .. This plan premises precise (even to some extent) estimate for remote query but async execution within ExecProc phase would be in effect for this case. > 2. It's possible that we're down inside some subtree of the plan that > won't actually get executed. This is trickier. As for current postgres_fdw, it is done simply abandoning queued result then close the cursor. > Consider this: > > Append > -> Foreign Scan > -> Foreign Scan > -> Foreign Scan > <repeat 17 more times> > > If we don't start each foreign scan until the first tuple is fetched, > we will not get any benefit here, because we won't fetch the first > tuple from query #2 until we finish reading the results of query #1. > If the result of the Append node will be needed in its entirety, we > really, really want to launch of those queries as early as possible. > OTOH, if there's a Limit node with a small limit on top of the Append > node, that could be quite wasteful. It's the nature of speculative execution, but the Limit will be pushed down onto every Foreign Scans near future. > We could decide not to care: after all, if our limit is > satisfied, we can just bang the remote connections shut, and if > they wasted some CPU, well, tough luck for them. But it would > be nice to be smarter. I'm not sure how, though. Appropriate fetch size will cap the harm and the case will be handled as I mentioned above as for postgres_fdw. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kyotaro HORIGUCHI > Sent: Wednesday, July 22, 2015 4:10 PM > To: robertmhaas@gmail.com > Cc: hlinnaka@iki.fi; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Asynchronous execution on FDW > > Hello, thank you for the comment. > > At Fri, 17 Jul 2015 14:34:53 -0400, Robert Haas <robertmhaas@gmail.com> wrote > in <CA+TgmoaiJK1svzw_GkFU+zsSxciJKFELqu2AOMVUPhpSFw4BsQ@mail.gmail.com> > > On Fri, Jul 3, 2015 at 4:41 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > > At a quick glance, I think this has all the same problems as starting the > > > execution at ExecInit phase. The correct way to do this is to kick off the > > > queries in the first IterateForeignScan() call. You said that "ExecProc > > > phase does not fit" - why not? > > > > What exactly are those problems? > > > > I can think of these: > > > > 1. If the scan is parametrized, we probably can't do it for lack of > > knowledge of what they will be. This seems easy; just don't do it in > > that case. > > We can put an early kick to foreign scans only for the first shot > if we do it outside (before) ExecProc phase. > > Nestloop > -> SeqScan > -> Append > -> Foreign (Index) Scan > -> Foreign (Index) Scan > .. > > This plan premises precise (even to some extent) estimate for > remote query but async execution within ExecProc phase would be > in effect for this case. > > > > 2. It's possible that we're down inside some subtree of the plan that > > won't actually get executed. This is trickier. > > As for current postgres_fdw, it is done simply abandoning queued > result then close the cursor. > > > Consider this: > > > > Append > > -> Foreign Scan > > -> Foreign Scan > > -> Foreign Scan > > <repeat 17 more times> > > > > If we don't start each foreign scan until the first tuple is fetched, > > we will not get any benefit here, because we won't fetch the first > > tuple from query #2 until we finish reading the results of query #1. > > If the result of the Append node will be needed in its entirety, we > > really, really want to launch of those queries as early as possible. > > OTOH, if there's a Limit node with a small limit on top of the Append > > node, that could be quite wasteful. > > It's the nature of speculative execution, but the Limit will be > pushed down onto every Foreign Scans near future. > > > We could decide not to care: after all, if our limit is > > satisfied, we can just bang the remote connections shut, and if > > they wasted some CPU, well, tough luck for them. But it would > > be nice to be smarter. I'm not sure how, though. > > Appropriate fetch size will cap the harm and the case will be > handled as I mentioned above as for postgres_fdw. > Horiguchi-san, Let me ask an elemental question. If we have ParallelAppend node that kicks a background worker process for each underlying child node in parallel, does ForeignScan need to do something special? Expected waste of CPU or I/O is common problem to be solved, however, it does not need to add a special case handling to ForeignScan, I think. How about your opinion? Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Hello, > Let me ask an elemental question. > > If we have ParallelAppend node that kicks a background worker process for > each underlying child node in parallel, does ForeignScan need to do something > special? Although I don't see the point of the background worker in your story but at least for ParalleMergeAppend, it would frequently discontinues to scan by upper Limit so one more state, say setup - which mans a worker is allocated but not started- would be useful and the driver node might need to manage the number of async execution. Or the driven nodes might do so inversely. As for ForeignScan, it is merely an API for FDW and does nothing substantial so it would have nothing special to do. As for postgres_fdw, current patch restricts one execution per one foreign server at once by itself. We would have to provide another execution management if we want to have two or more simultaneous scans per one foreign server at once. Sorry for the focusless discussion but does this answer some of your question? > Expected waste of CPU or I/O is common problem to be solved, however, it does > not need to add a special case handling to ForeignScan, I think. > How about your opinion? I agree with you that ForeignScan as the wrapper for FDWs don't need anything special for the case. I suppose for now that avoiding the penalty from abandoning too many speculatively executed scans (or other works on bg worker like sorts) would be a business of the upper node of FDWs, or somewhere else. However, I haven't dismissed the possibility that some common works related to resource management could be integrated into executor (or even into planner), but I see none for now. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
> > If we have ParallelAppend node that kicks a background worker process for > > each underlying child node in parallel, does ForeignScan need to do something > > special? > > Although I don't see the point of the background worker in your > story but at least for ParalleMergeAppend, it would frequently > discontinues to scan by upper Limit so one more state, say setup > - which mans a worker is allocated but not started- would be > useful and the driver node might need to manage the number of > async execution. Or the driven nodes might do so inversely. > I expected workloads like single shot scan on a partitioned large fact table on DWH system. Yep, if workload is expected to rescan so frequently, its expected cost shall be higher (by the cost to launch bgworker) than existing Append, then planner will kick out this path. Regarding of interaction between Limit and ParallelMergeAppend, it is probably the best scenario, isn't it? If Limit picks up the least 1000rows from a partitioned table consists of 20 child tables, ParallelMergeAppend can launch 20 parallel jobs that picks up the least 1000rows from the child relations for each. Probably, it is same job done in pass_down_bound() of nodeLimit.c. > As for ForeignScan, it is merely an API for FDW and does nothing > substantial so it would have nothing special to do. As for > postgres_fdw, current patch restricts one execution per one > foreign server at once by itself. We would have to provide > another execution management if we want to have two or more > simultaneous scans per one foreign server at once. > Yep, your 4th patch defines a new callback to FdwRoutines and 5th patch implements postgres_fdw specific portion. It shall work for distributed / shaded database environment well, however, its benefit is around ForeignScan only. Once management node kicks underlying SeqScan, ForeignScan or others in parallel, it also enables to run local heap scan asynchronously. > Sorry for the focusless discussion but does this answer some of > your question? > Hmm... Its advantage is still unclear for me. However, it is not fair to hijack this thread by my idea. I'll submit my design proposal about ParallelAppend towards the next commit-fest. Please comment on. > > Expected waste of CPU or I/O is common problem to be solved, however, it does > > not need to add a special case handling to ForeignScan, I think. > > How about your opinion? > > I agree with you that ForeignScan as the wrapper for FDWs don't > need anything special for the case. I suppose for now that > avoiding the penalty from abandoning too many speculatively > executed scans (or other works on bg worker like sorts) would be > a business of the upper node of FDWs, or somewhere else. > > However, I haven't dismissed the possibility that some common > works related to resource management could be integrated into > executor (or even into planner), but I see none for now. > I also agree with it is "eventually" needed, but may not be supported in the first version. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Hello, At Thu, 23 Jul 2015 09:38:39 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in <9A28C8860F777E439AA12E8AEA7694F80111BCEC@BPXM15GP.gisp.nec.co.jp> > I expected workloads like single shot scan on a partitioned large > fact table on DWH system. Yep, if workload is expected to rescan > so frequently, its expected cost shall be higher (by the cost to > launch bgworker) than existing Append, then planner will kick out > this path. > > Regarding of interaction between Limit and ParallelMergeAppend, > it is probably the best scenario, isn't it? If Limit picks up > the least 1000rows from a partitioned table consists of 20 child > tables, ParallelMergeAppend can launch 20 parallel jobs that > picks up the least 1000rows from the child relations for each. > Probably, it is same job done in pass_down_bound() of nodeLimit.c. Yes. I confused a bit. The scenario is one of least problematic cases. > > As for ForeignScan, it is merely an API for FDW and does nothing > > substantial so it would have nothing special to do. As for > > postgres_fdw, current patch restricts one execution per one > > foreign server at once by itself. We would have to provide > > another execution management if we want to have two or more > > simultaneous scans per one foreign server at once. > > > Yep, your 4th patch defines a new callback to FdwRoutines and > 5th patch implements postgres_fdw specific portion. > It shall work for distributed / shaded database environment well, > however, its benefit is around ForeignScan only. > Once management node kicks underlying SeqScan, ForeignScan or > others in parallel, it also enables to run local heap scan > asynchronously. I suppose SeqScan don't need async kick since its startup cost is extremely low as nothing. (fetching first several pages would boost seqscans?) On the other hand sort/hash would be a field where asynchronous execution is in effect. > > Sorry for the focusless discussion but does this answer some of > > your question? > > > Hmm... Its advantage is still unclear for me. However, it is not > fair to hijack this thread by my idea. It would be more advantageous if join/sort pushdown on fdw comes, where start-up cost could be extremely high... > I'll submit my design proposal about ParallelAppend towards the > next commit-fest. Please comment on. Ok, I'll come there. > > > Expected waste of CPU or I/O is common problem to be solved, however, it does > > > not need to add a special case handling to ForeignScan, I think. > > > How about your opinion? > > > > I agree with you that ForeignScan as the wrapper for FDWs don't > > need anything special for the case. I suppose for now that > > avoiding the penalty from abandoning too many speculatively > > executed scans (or other works on bg worker like sorts) would be > > a business of the upper node of FDWs, or somewhere else. > > > > However, I haven't dismissed the possibility that some common > > works related to resource management could be integrated into > > executor (or even into planner), but I see none for now. > > > I also agree with it is "eventually" needed, but may not be supported > in the first version. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello Horiguchi-san, > > > As for ForeignScan, it is merely an API for FDW and does nothing > > > substantial so it would have nothing special to do. As for > > > postgres_fdw, current patch restricts one execution per one > > > foreign server at once by itself. We would have to provide > > > another execution management if we want to have two or more > > > simultaneous scans per one foreign server at once. > > > > > Yep, your 4th patch defines a new callback to FdwRoutines and > > 5th patch implements postgres_fdw specific portion. > > It shall work for distributed / shaded database environment well, > > however, its benefit is around ForeignScan only. > > Once management node kicks underlying SeqScan, ForeignScan or > > others in parallel, it also enables to run local heap scan > > asynchronously. > > I suppose SeqScan don't need async kick since its startup cost is > extremely low as nothing. (fetching first several pages would > boost seqscans?) On the other hand sort/hash would be a field > where asynchronous execution is in effect. > Startup cost is not only advantage of asynchronous execution. If background worker prefetches the records to be read soon, during other tasks are in progress, its latency to fetch next record is much faster than usual execution path. Please assume if next record is on neither shared-buffer nor page cache of operating system. First, the upper node calls heap_getnext() to fetch next record, then it looks up the target block on the shared-buffer, then it issues read(2) system call, then operating system makes the caller process slept until this block gets read from the storage. If asynchronous worker already goes through the above painful code path and the records to be read are ready on the top of queue, it will reduce the i/o wait time dramatically. > > > Sorry for the focusless discussion but does this answer some of > > > your question? > > > > > Hmm... Its advantage is still unclear for me. However, it is not > > fair to hijack this thread by my idea. > > It would be more advantageous if join/sort pushdown on fdw comes, > where start-up cost could be extremely high... > Not only FDW. I intend to combine the ParallelAppend with another idea I previously post, to run tables join in parallel. In case of partitioned foreign-tables, planner probably needs to consider (1) FDW scan + local serial join, (2) FDW scan + local parallel join, or (3) FDW remote join, according to the cost. * [idea] table partition + hash join: http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010F672B@BPXM15GP.gisp.nec.co.jp Anyway, let's have a further discussion in another thread. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
I've marked this as rejected in the commitfest, because others are working on a more general solution with parallel workers. That's still work-in-progress, and it's not certain if it's going to make it into 9.6, but if it does it will largely render this obsolete. We can revisit this patch later in the release cycle, if the parallel scan patch hasn't solved the same use case by then. - Heikki
On Mon, Aug 10, 2015 at 3:23 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > I've marked this as rejected in the commitfest, because others are > working on a more general solution with parallel workers. That's still > work-in-progress, and it's not certain if it's going to make it into > 9.6, but if it does it will largely render this obsolete. We can revisit > this patch later in the release cycle, if the parallel scan patch hasn't > solved the same use case by then. I think the really important issue for this patch is the one discussed here: http://www.postgresql.org/message-id/CA+TgmoaiJK1svzw_GkFU+zsSxciJKFELqu2AOMVUPhpSFw4BsQ@mail.gmail.com You raised an important issue there but never really expressed an opinion on the points I raised, here or on the other thread. And neither did anyone else except the patch author who, perhaps unsurprisingly, thinks it's OK. I wish we could get more discussion about that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Mon, Aug 10, 2015 at 3:23 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > I've marked this as rejected in the commitfest, because others are > > working on a more general solution with parallel workers. That's still > > work-in-progress, and it's not certain if it's going to make it into > > 9.6, but if it does it will largely render this obsolete. We can revisit > > this patch later in the release cycle, if the parallel scan patch hasn't > > solved the same use case by then. > > I think the really important issue for this patch is the one discussed here: > > http://www.postgresql.org/message-id/CA+TgmoaiJK1svzw_GkFU+zsSxciJKFELqu2AOMVUPhpSFw4BsQ@mail.gmail.com I agree that it'd be great to figure out the answer to #2, but I'm also of the opinion that we can either let the user tell us through the use of the GUCs proposed in the patch or simply not worry about the potential for time wastage associated with starting them all at once, as you suggested there. > You raised an important issue there but never really expressed an > opinion on the points I raised, here or on the other thread. And > neither did anyone else except the patch author who, perhaps > unsurprisingly, thinks it's OK. I wish we could get more discussion > about that. When I read the proposal, I had the same reaction that it didn't seem like quite the right place and it further bothered me that it was specific to FDWs. Perhaps not surprisingly, as I authored it, but I'm still a fan of my proposal #1 here: http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net More generally, I completely agree with the position (I believe your's, but I might be misremembering) that we want to have this async capability independently and in addition to parallel scan. I don't believe one obviates the advantages of the other. Thanks! Stephen