Thread: Function scan push-down using SQL/MED syntax
I'm working on SQL/MED foreign table for "Function scan push-down" item: http://wiki.postgresql.org/wiki/ClusterFeatures#Function_scan_push-down The current research is described in: http://wiki.postgresql.org/wiki/SQL/MED and the codes are in: http://repo.or.cz/w/pgsql-fdw.git The present codes consist of two parts: 1. Foreign table syntax, catalog, and connection manager in the core. 2. contrib/postgresql_fdw extension module. SQL/MED could take on some part of dblink and PL/Proxy. Also, it requires the same issues with pgpool where to execute functions, local or remote. The major part of the proposal is that table functions (SRF) should have an ability to access ScanState during execution because ScanState has filtering conditions. Foreign Data Wrappers can re-constract a SQL query from the conditions and send the query to the external server. The current design is very similar to "Executor node hook". ExecXxxForeignScan() almost pass-through the processing to each FDW routine. SQL-based FDWs can push-down the conditions to their foreign servers. Also, plain data FDWs (including CSV-FDW) can give over the filtering to the default executor. Comments and suggestions are very welcome for the design and implementation. There are also some known issues in the Open questions section. Regards, --- Takahiro Itagaki NTT Open Source Software Center
On 3/3/10 8:52 PM, Takahiro Itagaki wrote: > The current design is very similar to "Executor node hook". > ExecXxxForeignScan() almost pass-through the processing to each FDW > routine. SQL-based FDWs can push-down the conditions to their foreign > servers. Also, plain data FDWs (including CSV-FDW) can give over > the filtering to the default executor. It would be really good to have access to the scan criteria as text, a hash, or other generally manipulable data structures, as well, for foreign data which is not SQL-based. Finally, if you are making the scan more visible, a nice side effect of this would be the ability to log all the scan criteria on a particular table (whether its an FDW or not); it would help a lot with indexing. --Josh Berkus
2010/3/4 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>: > I'm working on SQL/MED foreign table for "Function scan push-down" item: > > > Comments and suggestions are very welcome for the design and implementation. > There are also some known issues in the Open questions section. > I've not read any part of code yet but have comments: * I'm not very happy with "Getting tuples from the foreign server" section. Present tuplestore isn't quite efficient and putting all tuples into TSS adds devastating overhead. In principle, storing tuples doesn't match SQL exectuor model. So something like cursor is needed here. * In FDW routines section there seems only methods to read out data, but I'd suggest there should be coverage of all CRUD operation. Sometimes dropping foreign data is so specific that only FdwRoutines know the way. Regards, -- Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> wrote: > I've not read any part of code yet but have comments: > > * I'm not very happy with "Getting tuples from the foreign server" > section. Present tuplestore isn't quite efficient and putting all > tuples into TSS adds devastating overhead. In principle, storing > tuples doesn't match SQL exectuor model. So something like cursor is > needed here. Sure, but your optimization requires some extensions in libpq protocol. We could send HeapTuple in a binary form if the remote and the local server uses the same format, but the present libpq can return tuples only as text or libpq-specific binary forms (that is not a HeapTuple). > * In FDW routines section there seems only methods to read out data, > but I'd suggest there should be coverage of all CRUD operation. > Sometimes dropping foreign data is so specific that only FdwRoutines > know the way. There is only SELECT in the SQL standard, where CREATE FOREIGN TABLE means a symbolic link for the external data. But we also could support CREATE and DROP in the routines, and it means CREATE/DROP FOREIGN TABLE will actually create or drop external data. Also, we could add methods for INSERT, UPDATE and DELETE -- but it requires additional consideration that which key we should use to determine the modified tuples; PostgreSQL uses CTID as a physical key, but some of FDW might have completely different keys. Regards, --- Takahiro Itagaki NTT Open Source Software Center
Josh Berkus <josh@agliodbs.com> wrote: > It would be really good to have access to the scan criteria as text, a > hash, or other generally manipulable data structures, as well, for > foreign data which is not SQL-based. We can provide such formatter functions as options. In fact, the present postgresql_fdw uses deparse_context_for_plan() and deparse_expression() exported from the core. However, I think the conversion should be performed in each FDW because some of FDW don't necessary need the criteria. For example, CSV-FDW will just return the all of the contents as tuples, and the common executor routine will filter and modify them. > Finally, if you are making the scan more visible, a nice side effect of > this would be the ability to log all the scan criteria on a particular > table (whether its an FDW or not); it would help a lot with indexing. We might need to provide more useful deparse_xxx() functions for general uses. For example, oracle_fdw also requires deparsing criteria to a SQL. It would be almost the same as PostgreSQL, but also require to remap incompatible functions and expressions in their preferred ways. Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote: > > * I'm not very happy with "Getting tuples from the foreign server" > > section. Present tuplestore isn't quite efficient and putting all > > tuples into TSS adds devastating overhead. In principle, storing > > tuples doesn't match SQL exectuor model. So something like cursor is > > needed here. > > Sure, but your optimization requires some extensions in libpq protocol. > We could send HeapTuple in a binary form if the remote and the local > server uses the same format, but the present libpq can return tuples > only as text or libpq-specific binary forms (that is not a HeapTuple). In addition, I beleive the tuplestore is requried *for performance* because per-tuple cursor fetching is very slow if we retrieve tuples from remote servers. We should fetch tuples in some resonable-size of batches. If we will optimize the part, we could remove PGresult-to-tuplestore convertson here. But we also need to some codes to avoid memory leak of PGresult on error because PGresult is allocaed with malloc, not palloc. (That is the same bug in contrib/dblink fixed recently.) Regards, --- Takahiro Itagaki NTT Open Source Software Center
2010/3/8 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>: > > Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote: > >> > * I'm not very happy with "Getting tuples from the foreign server" >> > section. Present tuplestore isn't quite efficient and putting all >> > tuples into TSS adds devastating overhead. In principle, storing >> > tuples doesn't match SQL exectuor model. So something like cursor is >> > needed here. >> >> Sure, but your optimization requires some extensions in libpq protocol. >> We could send HeapTuple in a binary form if the remote and the local >> server uses the same format, but the present libpq can return tuples >> only as text or libpq-specific binary forms (that is not a HeapTuple). > > In addition, I beleive the tuplestore is requried *for performance* > because per-tuple cursor fetching is very slow if we retrieve tuples from > remote servers. We should fetch tuples in some resonable-size of batches. > > If we will optimize the part, we could remove PGresult-to-tuplestore > convertson here. But we also need to some codes to avoid memory leak > of PGresult on error because PGresult is allocaed with malloc, not palloc. > (That is the same bug in contrib/dblink fixed recently.) > So, as the first step we implement it by tuplestore with the present libpq, but for further improvement we need to refactor or to extend our libpq to buffer some sized tuples. Or invent another more-data-fetching-oriented protocol like existing copy? Regards, -- Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> wrote: > So, as the first step we implement it by tuplestore with the present > libpq, but for further improvement we need to refactor or to extend > our libpq to buffer some sized tuples. Or invent another > more-data-fetching-oriented protocol like existing copy? Before starting such optimization, we need some research for which part is the performance bottleneck when sending large amount of data. IMHO, since network is typically slower than in-memory data copy, data compression and burst-transport would be more effective even if they consume local CPU resources to convert the data format. Regards, --- Takahiro Itagaki NTT Open Source Software Center
> IMHO, since network is typically slower than in-memory data copy, > data compression and burst-transport would be more effective > even if they consume local CPU resources to convert the data format. Not sure. Once I tested the performance of rsync using two computers connected with a GbE. With --compress option, rsync was slower than without the option. Of course this may vary in different CPU speed/network speed combo though. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp