Re: pgsql_fdw, FDW for PostgreSQL server - Mailing list pgsql-hackers
From | Shigeru HANADA |
---|---|
Subject | Re: pgsql_fdw, FDW for PostgreSQL server |
Date | |
Msg-id | 4F61F75A.6090000@gmail.com Whole thread Raw |
In response to | Re: pgsql_fdw, FDW for PostgreSQL server (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: pgsql_fdw, FDW for PostgreSQL server
|
List | pgsql-hackers |
I fixed pgsql_fdw to use new FDW API. Also I fixed to use own deparser to generate remote queries, so these patch don't contain changes for backend codes any more. Now planning foreign scan is done in the steps below: 1) in GerForeignRelSize, pgsql_fdw generates simple SELECT statement which has no WHERE clause from given information such as PlannerInfo and RelOptInfo. This query string is used to execute remote EXPLAIN in order to estimate number of rows which will be returned from the scan. Remote EXPLAIN doesn't return such number directly, so pgsql_fdw calls set_baserel_size_estimates function which calculates number of rows with selectivity of quals and reltuples. It also estimates width, but currently no statistic is available for foreign tables, so pgsql_fdw overrides it with width value provided by remote EXPLAIN. If we support ANALYZE for foreign tables, we would be able to defer EXPLAIN until GetForeignPaths, because local statistics are enough information to estimate number of rows returned by the scan. 2) in GetForeignPaths, first, pgsql_fdw considers a very simple Path which doesn't push any expression down to remote end. It is like SeqScan for regular tables. Backend will filter the result with all quals in baserestrictinfo. In push-down supported version, pgsql_fdw also considers another Path which pushes conditions as much as possible. This would reduce the amount of data transfer, and clocks used to convert strings to tuples at local side. pgsql_fdw emits another EXPLAIN for this path, though we might be able to omit. Planner can choose either path with basis of their costs. If pgsql_fdw can know that the remote table is indexed, pgsql_fdw would be able to consider sorted Path for each remote index. 3) in GetForeignPlan, pgsql_fdw creates only one ForeignScan node from given best_path. Currently pgsql_fdw uses SQL-level cursor in order to avoid out-of-memory by huge result set, so we need to construct several SQL statements for the plan. Old implementation has created SQL statements in Path phase, but now it's deferred until Plan phase. This change would avoid possible unnecessary string operation. I worry that I've misunderstood the purpose of fdw_exprs... Although the patches are still WIP, especially in WHERE push-down part, but I'd like to post them so that I can get feedback of the design as soon as possible. (2012/03/11 1:34), Tom Lane wrote: >> 1) IIUC, GetForeignRelSize should set baserel->rows to the number of >> rows the ForeignScan node returns to upper node, but not the number >> of rows FDW returns to core executor, right? > > It should be the number of rows estimated to pass the baserestrictinfo > restriction clauses, so yeah, not the same as what the FDW would return, > except in cases where all the restriction clauses are handled internally > by the FDW. > >> BTW, once Fujita-san's ANALYZE support patch is merged, we will be >> able to get rows estimatation easily by calling clauselist_selectivity >> with baserel->tuples and baserestrictinfo. Otherwise, pgsql_fdw >> would still need to execute EXPLAIN on remote side to get meaningful >> rows estimation. > > Yeah, one of the issues for that patch is how we see it coexisting with > the option of doing a remote-side EXPLAIN. It seems not so easy to determine whether remote EXPLAIN is better from local statistics. An easy way is having a per-relation FDW option like "use_local_stats" or something for pgsql_fdw, but it doesn't sound right because other FDWs have same problem... >> 2) ISTM that pgsql_fdw needs to execute EXPLAIN on remote side for each >> possible remote query to get meaningful costs estimation, and it >> requires pgsql_fdw to generate SQL statements in GetForeignPaths. >> I worry that I've misunderstood intention of your design because >> you've mentioned postponing SQL deparsing to createplan time. > > If you want to get the cost estimates that way, then yes, you'd be > needing to do some SQL-statement-construction earlier than final plan > generation. But it's not apparent to me that those statements would > necessarily be the same as, or even very similar to, what the final > queries would be. For instance, you'd probably try to reduce parameters > to constants for estimation purposes. Hm, I though that using queries same as final ones has no overhead, if we don't need to deparse clauses redundantly. In current implementation, WHERE clause is deparsed only once for a scan, and basis of the query (SELLECT ... FROM ...) is also deparseed only once. Indeed string copy is not avoidable, but I feel that's not big problem. Thoughts? Regards, -- Shigeru Hanada
Attachment
pgsql-hackers by date: