Re: [idea] more aggressive join pushdown on postgres_fdw - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | Re: [idea] more aggressive join pushdown on postgres_fdw |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F8010F47DA@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
In response to | [idea] more aggressive join pushdown on postgres_fdw (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
Responses |
Re: [idea] more aggressive join pushdown on postgres_fdw
(Robert Haas <robertmhaas@gmail.com>)
|
List | pgsql-hackers |
> On Sat, May 30, 2015 at 9:03 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > Yesterday, JPUG held an unconference event at Tokyo, and > > Hanada-san had a talk about join-pushdown feature of > > postgres_fdw. > > At this talk, someone proposed an interesting idea to > > make join pushdown more aggressive/effective. > > Let me share it with pgsql-hackers. > > > > He said, we may have a workload to join a large foreign- > > scan and a small local-scan regardless of the plan type. > > > > For example: > > joinrel (expected nrows = 5) > > + outerrel ForeignScan (expected nrows = 1000000) > > + innerrel LocalScan (expected nrows = 5) > > > > In this case, we may be able to run the entire joinrel > > on the remote side then fetch just 5 rows, if fdw-driver > > construct VALUES() clause according to the contents of > > LocalScan then makes an entire join query with another > > one kept in ForeignScan. > > > > If above ForeignScan have the following remote query, > > SELECT a, b, c FROM t0 WHERE d < 1000000 > > we may be able to construct the query below to run remote > > join with local (small) relation. > > > > SELECT a, b, c, x, y FROM > > (SELECT a, b, c FROM t0 WHERE d < 1000000) AS ft > > JOIN > > (VALUES (1,'aaa'), (2,'bbb'), (3,'ccc'), > > (4,'ddd'), (5,'eee')) AS lt (x, y) > > ON ft.a = lt.x > > > > The VALUES clauses can be mechanically constructed according > > to the result set of LocalScan, and it is not difficult to > > make such a remote query on top of the existing ForeignScan. > > In the result, it will reduce amount of network traffic and > > CPU cycles to form/deform tuples dramatically. > > > > I don't intend to implement this idea urgently (of course, > > join pushdown for both ForeignScan case has higher priority), > > however, it makes sense to keep the future direction in mind. > > > > Also, as an aside, even though Hanada-san mentioned ForeignScan > > does not need an infrastructure to initialize child path nodes, > > this idea may require ForeignScan to have local child path. > > Neat idea. This ties into something I've thought about and mentioned > before: what if the innerrel is local, but there's a replicated copy > on the remote server? Perhaps both cases are worth thinking about at > some point. > I think, here is both merit and de-merit for each. It implies either of them never always-better-strategy. * Push out local table as VALUES(...) clause Good: No restriction to functions/operators in the local scan or underlying plan node. Bad: High cost for data format modification (HeapTupleSlot => VALUES(...) clause in text), and 2-way data transfer. * Remote join between foreign table and replicated table Good: Data already exists on remote side, no need to kick out contents of local relation (and no need to consume CPU cycle to make VALUES() clause). Bad: Functions/operators are restricted as existing postgres_fdw is doing. Only immutable and built-in ones are availableto run on the remote side. BTW, do we need either of tables being foreign table, if entire database is (synchronously) replicated? Also, loopback server may be a candidate even if not replicated (although it may be an entrance of deadlock heaven). Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
pgsql-hackers by date: