Re: pgsql_fdw, FDW for PostgreSQL server - Mailing list pgsql-hackers
From | Kohei KaiGai |
---|---|
Subject | Re: pgsql_fdw, FDW for PostgreSQL server |
Date | |
Msg-id | CADyhKSVpqp4uiYReA+FrReCTtLt1BN8QuEo0LhN7cPkcHnLFkQ@mail.gmail.com Whole thread Raw |
In response to | Re: pgsql_fdw, FDW for PostgreSQL server (Shigeru Hanada <shigeru.hanada@gmail.com>) |
Responses |
Re: pgsql_fdw, FDW for PostgreSQL server
Re: pgsql_fdw, FDW for PostgreSQL server |
List | pgsql-hackers |
Hi Harada-san, I checked the "fdw_helper_funcs_v3.patch", "pgsql_fdw_v5.patch" and "pgsql_fdw_pushdown_v1.patch". My comments are below. [BUG] Even though pgsql_fdw tries to push-down qualifiers being executable on the remove side at the deparseSql(), it does not remove qualifiers being pushed down from the baserel->baserestrictinfo, thus, these qualifiers are eventually executed twice. See the result of this EXPLAIN. postgres=# EXPLAIN SELECT * FROM ft1 WHERE a > 2 AND f_leak(b); QUERY PLAN ------------------------------------------------------------------------------------------------------ Foreign Scan on ft1 (cost=107.43..122.55 rows=410 width=36) Filter: (f_leak(b) AND (a > 2)) Remote SQL: DECLARE pgsql_fdw_cursor_0SCROLL CURSOR FOR SELECT a, b FROM public.t1 WHERE (a > 2) (3 rows) My expectation is (a > 2) being executed on the remote-side and f_leak(b) being executed on the local-side. But, filter of foreign-scan on ft1 has both of qualifiers. It has to be removed, if a RestrictInfo get pushed-down. [Design comment] I'm not sure the reason why store_result() uses MessageContext to save the Tuplestorestate within PgsqlFdwExecutionState. The source code comment says it is used to avoid memory leaks in error cases. I also have a similar experience on implementation of my fdw module, so, I could understand per-scan context is already cleared at the timing of resource-release-callback, thus, handlers to external resource have to be saved on separated memory context. In my personal opinion, the right design is to declare a memory context for pgsql_fdw itself, instead of the abuse of existing memory context. (More wise design is to define sub-memory-context for each foreign-scan, then, remove the sub-memory-context after release handlers.) [Design comment] When "BEGIN" should be issued on the remote-side? The connect_pg_server() is an only chance to issue "BEGIN" command at the remote-side on connection being opened. However, the connection shall be kept unless an error is not raised. Thus, the remote-side will continue to work within a single transaction block, even if local-side iterates a pair of "BEGIN" and "COMMIT". I'd like to suggest to close the transaction block at the timing of either end of the scan, transaction or sub-transaction. [Comment to Improve] Also, which transaction isolation level should be specified in this case? An idea is its isolation level is specified according to the current isolation level on the local-side. (Of course, it is your choice if it is not supported right now.) [Comment to improve] It seems to me the design of exprFunction is not future-proof, if we add a new node type that contains two or more function calls, because it can return an oid of functions. I think, the right design is to handle individual node-types within the large switch statement at foreign_expr_walker(). Of course, it is just my sense. [Comment to improve] The pgsql_fdw_handler() allocates FdwRoutine using makeNode(), then it set function-pointers on each fields. Why don't you return a pointer to statically declared FdwRoutine variable being initialized at compile time, like: static FdwRoutine pgsql_fdw_handler = { .type = T_FdwRoutine, .PlanForeignScan = pgsqlPlanForeignScan, .ExplainForeignScan = pgsqlExplainForeignScan, .BeginForeignScan = pgsqlBeginForeignScan, .IterateForeignScan = pgsqlIterateForeignScan, .ReScanForeignScan = pgsqlReScanForeignScan, .EndForeignScan = pgsqlEndForeignScan, }; Datum pgsql_fdw_handler(PG_FUNCTION_ARGS) { PG_RETURN_POINTER(&pgsql_fdw_handler); } [Question to implementation] At pgsqlIterateForeignScan(), it applies null-check on festate->tuples and bool-checks on festete->cursor_opened. Do we have a possible scenario that festate->tuples is not null, but festate->cursor_opened, or an opposite combination? If null-check on festate->tuples is enough to detect the first call of the iterate callback, it is not my preference to have redundant flag. Thanks, 2011年12月14日15:02 Shigeru Hanada <shigeru.hanada@gmail.com>: > (2011/12/13 14:46), Tom Lane wrote: >> Shigeru Hanada<shigeru.hanada@gmail.com> writes: >>> Agreed. How about to add a per-column boolean FDW option, say >>> "pushdown", to pgsql_fdw? Users can tell pgsql_fdw that the column can >>> be pushed down safely by setting this option to true. >> >> [ itch... ] That doesn't seem like the right level of granularity. >> ISTM the problem is with whether specific operators have the same >> meaning at the far end as they do locally. If you try to attach the >> flag to columns, you have to promise that *every* operator on that >> column means what it does locally, which is likely to not be the >> case ever if you look hard enough. Plus, having to set the flag on >> each individual column of the same datatype seems pretty tedious. > > Indeed, I too think that labeling on each columns is not the best way, > but at that time I thought that it's a practical way, in a way. IOW, I > chose per-column FDW options as a compromise between never-push-down and > indiscriminate-push-down. > > Anyway, ISTM that we should consider various mapping for > functions, operators and collations to support push-down in general > way, but it would be hard to accomplish in this CF. > > Here I'd like to propose three incremental patches: > > 1) fdw_helper_funcs_v3.patch: This is not specific to pgsql_fdw, but > probably useful for every FDWs which use FDW options. This patch > provides some functions which help retrieving FDW options from catalogs. > This patch also enhances document about existing FDW helper functions. > > 2) pgsql_fdw_v5.patch: This patch provides simple pgsql_fdw > which does *NOT* support any push-down. All data in remote table are > retrieved for each foreign scan, and conditions are always evaluated on > local side. This is safe about semantics difference between local and > remote, but very inefficient especially for large remote tables. > > 3) pgsql_fdw_pushdown_v1.patch: This patch adds limited push-down > capability to pgsql_fdw which is implemented by previous patch. The > criteria for pushing down is little complex. I modified pgsql_fdw to > *NOT* push down conditions which contain any of: > > a) expression whose result collation is valid > b) expression whose input collation is valid > c) expression whose result type is user-defined > d) expression which uses user-defined function > e) array expression whose elements has user-defined type > f) expression which uses user-defined operator > g) expression which uses mutable function > > As the result, pgsql_fdw can push down very limited conditions such as > numeric comparisons, but it would be still useful. I hope that these > restriction are enough to avoid problems about semantics difference > between remote and local. > > To implement d), I added exprFunction to nodefuncs.c which returns Oid > of function which is used in the expression node, but I'm not sure that > it should be there. Should we have it inside pgsql_fdw? > > I'd like to thank everyone who commented on this topic! > > Regards, > -- > Shigeru Hanada -- KaiGai Kohei <kaigai@kaigai.gr.jp>
pgsql-hackers by date: