Thread: SQL/MED - core functionality
Hi hackers, Attached is a patch that adds core functionality of SQL/MED. This patch provides: * new option HANDLER for FOREIGN DATA WRAPPER * CREATE/ALTER DDLs are supported * psql \dew command shows handler option too * pg_dump can dump HANDLER option * new object type FOREIGN TABLE * CREATE/ALTER/DROP DDLs are supported * system columns except TABLEOID are not supported * inheriting normal table is supported * psql \d shows detail of foreign tables * psql \det lists foreign tables * psql \dE lists foreign tables in \d format * pg_dump can dump the definition * information_schema views added * foreign table is read-only, so INSERT/UPDATE/DELETE are denied * ANALYZE and VACUUM skips foreign tables * new executor node ForeignScan * it's a counterpart of SeqScan * this node scans one foreign table at a time * FDW HANDLER is necessary to execute SELECT statement Patches for FDWs which can be used to execute SELECT statement will be posted in their own thread soon. "SQL/MED - file_fdw" : FDW for external PostgreSQL "SEL/MED - postgresql_fdw" : FDW for server-side file (CSV, TEXT) I would reuse existing CommitFest item "SQL/MED" for this patch, and add new item for each FDW patch. Regards, -- Shigeru Hanada
Attachment
On 25.11.2010 09:34, Shigeru HANADA wrote: > Attached is a patch that adds core functionality of SQL/MED. This > patch provides: > > * new option HANDLER for FOREIGN DATA WRAPPER > * CREATE/ALTER DDLs are supported > * psql \dew command shows handler option too > * pg_dump can dump HANDLER option > > * new object type FOREIGN TABLE > * CREATE/ALTER/DROP DDLs are supported > * system columns except TABLEOID are not supported > * inheriting normal table is supported > * psql \d shows detail of foreign tables > * psql \det lists foreign tables > * psql \dE lists foreign tables in \d format > * pg_dump can dump the definition > * information_schema views added > * foreign table is read-only, so INSERT/UPDATE/DELETE are denied > * ANALYZE and VACUUM skips foreign tables > > * new executor node ForeignScan > * it's a counterpart of SeqScan > * this node scans one foreign table at a time > * FDW HANDLER is necessary to execute SELECT statement > > Patches for FDWs which can be used to execute SELECT statement will be > posted in their own thread soon. > > "SQL/MED - file_fdw" : FDW for external PostgreSQL > "SEL/MED - postgresql_fdw" : FDW for server-side file (CSV, TEXT) > > I would reuse existing CommitFest item "SQL/MED" for this patch, and > add new item for each FDW patch. Looking at the API again, there's a few things I don't like about it: * It's tied to the ForeignScanState, so all the executor state structures are exposed to the FDW implementation. It feels like a modularity violation that the FDW Iterate function returns the tuple by storing it directly in scanstate->ss.ss_ScanTupleSlot for example. And it's not going to work for remote scans that don't go through the executor, for example if you wanted to rewrite contrib/dblink to use foreign data wrappers. Or the SQL/MED passthrough mode. * There's no clear Plan stage in the API. Except for EstimateCosts, which just fills in the estimated costs in RelOptInfo, so it needs to understand quite a lot of the planner data structures to come up with a reasonable estimate. But if it e.g wants to apply a qual remotely, like the PostgreSQL FDW does, it has to check for such quals at execution time. And as I complained before, you don't get any meaningful EXPLAIN output. I propose the attached API instead. This has a clear separation between plan and execution. I'm sure we'll have to extend the API in the future FDWs want tighter integration, but I think this is a good start. It makes it quite straightforward to write simple FDW like the file FDW, without having to know anything about the executor or planner internals, but provides enough flexibility to cover the functionality in your PostgreSQL FDW. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
On Thu, Nov 25, 2010 at 22:03, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > I propose the attached API instead. This has a clear separation between plan > and execution. The APIs seem to be cleaner. The previous ones might be too straight implementation of the SQL standard. But I have some questions about the new APIs: 1. Doesn't FdwPlan need to inherit Plan struct? 2. Doesn't FdwPlan need tosupport copyObject()? 3. If "Datum *values, bool *isnulls" is the better interface, why do we use TupleTableSlot? Wemight have the similar issue in the index-only scan; it also handles virtual tuples. -- Itagaki Takahiro
On 25.11.2010 16:16, Itagaki Takahiro wrote: > On Thu, Nov 25, 2010 at 22:03, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> I propose the attached API instead. This has a clear separation between plan >> and execution. > > The APIs seem to be cleaner. The previous ones might be too straight > implementation of the SQL standard. > > But I have some questions about the new APIs: > 1. Doesn't FdwPlan need to inherit Plan struct? > 2. Doesn't FdwPlan need to support copyObject()? No. You'll need a ForeignScan object in the planner that supports copyObject(), just like in your patch. ForeignScan points to the FdwPlan, but the FDW doesn't need to know anything about that stuff. I left out some details on what exactly FdwPlan should contain and what it's lifecycle should be. I'm thinking that it should be allocated in the CurrentMemoryContext that's active when the FDW Plan routine is called, which would be the same context where we store all the Plan objects. It should not be modified after creation, so that it doesn't need to be copied when the ForeignScan is copied with copyObject(). It should not contain transient state information like connection objects, or references to a remotely prepared cursor etc. It must be possible to call BeginScan multiple times with the same FdwPlan object, so that it can be stored in a prepared plan that is executed multiple times. For a typical case like the PostgreSQL FDW, it would contain the foreign server's OID, and the constructed SQL query that will be sent to the remote server on execution. For the file FDW, it will probably contain the filename, and the format options in some pre-parsed format. > 3. If "Datum *values, bool *isnulls" is the better interface, > why do we use TupleTableSlot? I'm not wedded to that part, but in general, the less the FDW needs to know about PostgreSQL internals the better. There's performance gain from passing a TupleTableSlot to the FDW, but the ForeignScan node will certainly store the datums/isnulls array to a TupleTableSlot to pass on the tuple. > We might have the similar issue > in the index-only scan; it also handles virtual tuples. Index-only scans are a very different story, that's going to be tightly internal to the planner and executor, there's no externally-visible API there. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > I left out some details on what exactly FdwPlan should contain and what > it's lifecycle should be. I'm thinking that it should be allocated in > the CurrentMemoryContext that's active when the FDW Plan routine is > called, which would be the same context where we store all the Plan > objects. It should not be modified after creation, so that it doesn't > need to be copied when the ForeignScan is copied with copyObject(). It > should not contain transient state information like connection objects, > or references to a remotely prepared cursor etc. It must be possible to > call BeginScan multiple times with the same FdwPlan object, so that it > can be stored in a prepared plan that is executed multiple times. The above statements seem mutually contradictory. In particular, I think you're proposing that copyObject copy only a pointer and not the whole plan tree when copying ForeignScan. That is entirely unworkable/unacceptable: quite aside from the semantic ugliness, it will fail altogether for cached plans. regards, tom lane
On 25.11.2010 18:18, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> I left out some details on what exactly FdwPlan should contain and what >> it's lifecycle should be. I'm thinking that it should be allocated in >> the CurrentMemoryContext that's active when the FDW Plan routine is >> called, which would be the same context where we store all the Plan >> objects. It should not be modified after creation, so that it doesn't >> need to be copied when the ForeignScan is copied with copyObject(). It >> should not contain transient state information like connection objects, >> or references to a remotely prepared cursor etc. It must be possible to >> call BeginScan multiple times with the same FdwPlan object, so that it >> can be stored in a prepared plan that is executed multiple times. > > The above statements seem mutually contradictory. In particular, > I think you're proposing that copyObject copy only a pointer and not the > whole plan tree when copying ForeignScan. Right. > That is entirely > unworkable/unacceptable: quite aside from the semantic ugliness, it will > fail altogether for cached plans. Hmm, I see, cached plans are planned in a shorter-lived context first, and copied to permanent storage afterwards. Needs more thought then. Maybe the FDW needs to provide a copyFdwPlan() function to copy FdwPlans returned by that FDW. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Hmm, I see, cached plans are planned in a shorter-lived context first, > and copied to permanent storage afterwards. Needs more thought then. > Maybe the FDW needs to provide a copyFdwPlan() function to copy FdwPlans > returned by that FDW. Or just specify a format for the extra information. Perhaps it could be thought of as being a value of type bytea? Obviously we can't just have a fixed amount of info, but maybe a blob with a length word is enough. regards, tom lane
On 25.11.2010 18:28, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> Hmm, I see, cached plans are planned in a shorter-lived context first, >> and copied to permanent storage afterwards. Needs more thought then. >> Maybe the FDW needs to provide a copyFdwPlan() function to copy FdwPlans >> returned by that FDW. > > Or just specify a format for the extra information. Perhaps it could be > thought of as being a value of type bytea? Obviously we can't just have > a fixed amount of info, but maybe a blob with a length word is enough. That seems quite awkward to work with. Let's at least make it a Node *, so that you can store a Value or List there, or anything else that already has copyObject support. I think the PostgreSQL FDW would want to store the remote query there. But it's not a stretch that you want to use parameter markers in the remote query, with the parameter values determined at runtime. In that case you'd also store a list of Exprs for the parameter values (Hmm, BeginScan needs an ExprContext for that..). This is very hand-wavy, but I think we'll hit the wall with a single blob pretty quickly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 25.11.2010 18:28, Tom Lane wrote: >> Or just specify a format for the extra information. Perhaps it could be >> thought of as being a value of type bytea? Obviously we can't just have >> a fixed amount of info, but maybe a blob with a length word is enough. > That seems quite awkward to work with. Let's at least make it a Node *, > so that you can store a Value or List there, or anything else that > already has copyObject support. Yeah, that works. A struct could be emulated by using a List with a known order of elements. If someone did need a binary blob, they could represent it as a Const of type bytea. regards, tom lane
Thanks for the comments. I'll revise the patch along the discussion. Before starting code work, please let me summarize the discussion. * Generally, we should keep FDWs away from PostgreSQL internals, such as TupleTableSlot. * FDW should have planner hook which allows FDW to create FDW-specific plan (FdwPlan in Heikki's proposal) for a scan on a foreign table. * FdwPlan, a part of ForeignScan plan node, should be able to be copied in generic way because plans would be copied into another memory context during caching. It might be better to represent FdwPlan with Node or List. * FdwExecutionState, a part of ForeignScanState, should be used instead of ForeignScanState to remove executor details from FDW implementation. # ISTM that FdwExecutionState would be replace FdwReply. Regards, -- Shigeru Hanada
Shigeru HANADA <hanada@metrosystems.co.jp> writes: > I'll revise the patch along the discussion. Before starting code work, > please let me summarize the discussion. > * Generally, we should keep FDWs away from PostgreSQL internals, > such as TupleTableSlot. > * FDW should have planner hook which allows FDW to create FDW-specific > plan (FdwPlan in Heikki's proposal) for a scan on a foreign table. > * FdwPlan, a part of ForeignScan plan node, should be able to be > copied in generic way because plans would be copied into another > memory context during caching. It might be better to represent > FdwPlan with Node or List. > * FdwExecutionState, a part of ForeignScanState, should be used > instead of ForeignScanState to remove executor details from FDW > implementation. > # ISTM that FdwExecutionState would be replace FdwReply. FWIW, I think that the notion that FDW can be "kept away from Postgres internals" is ridiculous on its face. Re-read the above list and ask yourself exactly which of the bullet points above are not talking about being hip-deep in Postgres internals. In particular, arbitrarily avoiding use of TupleTableSlot is silly. It's a fundamental part of many executor APIs. It would be a good idea to avoid use of internals in the wire protocol to another Postgres database; but the interfaces to the local database can hardly avoid that, and we shouldn't bend them out of shape to meet entirely-arbitrary requirements about what to use or not use. regards, tom lane
On Fri, Nov 26, 2010 at 11:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Shigeru HANADA <hanada@metrosystems.co.jp> writes: >> I'll revise the patch along the discussion. Before starting code work, >> please let me summarize the discussion. > >> * Generally, we should keep FDWs away from PostgreSQL internals, >> such as TupleTableSlot. > >> * FDW should have planner hook which allows FDW to create FDW-specific >> plan (FdwPlan in Heikki's proposal) for a scan on a foreign table. > >> * FdwPlan, a part of ForeignScan plan node, should be able to be >> copied in generic way because plans would be copied into another >> memory context during caching. It might be better to represent >> FdwPlan with Node or List. > >> * FdwExecutionState, a part of ForeignScanState, should be used >> instead of ForeignScanState to remove executor details from FDW >> implementation. >> # ISTM that FdwExecutionState would be replace FdwReply. > > FWIW, I think that the notion that FDW can be "kept away from Postgres > internals" is ridiculous on its face. Re-read the above list and ask > yourself exactly which of the bullet points above are not talking about > being hip-deep in Postgres internals. In particular, arbitrarily > avoiding use of TupleTableSlot is silly. It's a fundamental part of > many executor APIs. > > It would be a good idea to avoid use of internals in the wire protocol > to another Postgres database; but the interfaces to the local database > can hardly avoid that, and we shouldn't bend them out of shape to meet > entirely-arbitrary requirements about what to use or not use. But there's probably some value in minimizing the number of unnecessary interfaces that get exposed to the plugins. There's no help for the fact that the FDWs are going to need about Datums, but do we gain anything by making them also know about TupleTableSlot? If so, fine; if not, don't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2010/11/25 Shigeru HANADA <hanada@metrosystems.co.jp>: > Hi hackers, > > Attached is a patch that adds core functionality of SQL/MED. This > patch provides: > > "SQL/MED - file_fdw" : FDW for external PostgreSQL > "SEL/MED - postgresql_fdw" : FDW for server-side file (CSV, TEXT) > I've tried SQL/MED with postgresql_fdw last night, and found myself confusing the long setup procedure. A simplest way to use it AFAIK is: 1.CREATE FOREIGN DATA WRAPPER ... (or run install sql script) 2.CREATE SERVER ... FOREIGN DATA WRAPPER ... 3.CREATE USER MAPPING FOR ... 4.CREATE FOREIGN TALBE( ... ) From a user's view, this is very long way to see a simplest foreign table. I know it is based on the standard, but I really want a shortcut. Especially, I don't understand why CREATE USER MAPPING FOR current_user SERVER <server> is needed for default use case. If you forget CREATE USER MAPPING and do CREATE FOREIGN TABLE, it raises an error. User mapping is useful if the local user and remote user should be mapped but I imagine in most cases they are the same. postgresql_fdw can tell the remote user by conninfo string, in addition. This is another topic, but it would be useful if CREATE FOREIGN TABLE can omit column definitions since fdw usually knows what should be there in the definitions. I some times mistyped the column names between remote and local and resulted in fail on execution. If these are discussed before, I apologize for noise. Basically, I would love to see this in the next release. Good work! Regards, -- Hitoshi Harada
On Wed, Dec 1, 2010 at 12:30, Hitoshi Harada <umi.tanuki@gmail.com> wrote: > From a user's view, this is very long way to see a simplest foreign > table. I know it is based on the standard, but I really want a > shortcut. Especially, I don't understand why CREATE USER MAPPING FOR > current_user SERVER <server> is needed for default use case. If you > forget CREATE USER MAPPING and do CREATE FOREIGN TABLE, it raises an > error. User mapping is useful if the local user and remote user should > be mapped but I imagine in most cases they are the same. > postgresql_fdw can tell the remote user by conninfo string, in > addition. How do you connect to the remote server when password is required? I think we cannot pass through passwords to the remote server even if the same user is used on the local and remote servers. However, SERVER and USER MAPPING might be useless for file_fdw because SERVER is always the local file system and USER is always the OS's user who started the postmaster. I'm not sure how we should treat cases where those settings don't have any configurations. -- Itagaki Takahiro
2010/12/1 Itagaki Takahiro <itagaki.takahiro@gmail.com>: > On Wed, Dec 1, 2010 at 12:30, Hitoshi Harada <umi.tanuki@gmail.com> wrote: >> From a user's view, this is very long way to see a simplest foreign >> table. I know it is based on the standard, but I really want a >> shortcut. Especially, I don't understand why CREATE USER MAPPING FOR >> current_user SERVER <server> is needed for default use case. If you >> forget CREATE USER MAPPING and do CREATE FOREIGN TABLE, it raises an >> error. User mapping is useful if the local user and remote user should >> be mapped but I imagine in most cases they are the same. >> postgresql_fdw can tell the remote user by conninfo string, in >> addition. > > How do you connect to the remote server when password is required? > I think we cannot pass through passwords to the remote server > even if the same user is used on the local and remote servers. Sure, it is limited on trust authentication only. If you need more complicated connection, do USER MAPPING. But it's not clear to me that it should be required in any case. Regards, -- Hitoshi Harada
On Wed, 1 Dec 2010 12:30:46 +0900 Hitoshi Harada <umi.tanuki@gmail.com> wrote: > This is another topic, but it would be useful if CREATE FOREIGN TABLE > can omit column definitions since fdw usually knows what should be > there in the definitions. I some times mistyped the column names > between remote and local and resulted in fail on execution. The SQL/MED standard includes "IMPORT FOREIGN SCHEMA schema FROM SERVER server" syntax which imports definitions of remote tables in the specified schema into local PostgreSQL, and you can optionally specify the list of target tables with "LIMIT TO table_list" option. This syntax would make defining foreign tables easier, but it needs to enhance both FDW API and core parser. Regards, -- Shigeru Hanada
On ons, 2010-12-01 at 12:30 +0900, Hitoshi Harada wrote: > I've tried SQL/MED with postgresql_fdw last night, and found myself > confusing the long setup procedure. A simplest way to use it AFAIK is: > > 1.CREATE FOREIGN DATA WRAPPER ... (or run install sql script) > 2.CREATE SERVER ... FOREIGN DATA WRAPPER ... > 3.CREATE USER MAPPING FOR ... > 4.CREATE FOREIGN TALBE( ... ) > > From a user's view, this is very long way to see a simplest foreign > table. I know it is based on the standard, but I really want a > shortcut. Especially, I don't understand why CREATE USER MAPPING FOR > current_user SERVER <server> is needed for default use case. If you > forget CREATE USER MAPPING and do CREATE FOREIGN TABLE, it raises an > error. User mapping is useful if the local user and remote user should > be mapped but I imagine in most cases they are the same. > postgresql_fdw can tell the remote user by conninfo string, in > addition. I reviewed the standard about this, and a lot of things are implementation-defined. I think user mappings could be made optional. > This is another topic, but it would be useful if CREATE FOREIGN TABLE > can omit column definitions since fdw usually knows what should be > there in the definitions. I some times mistyped the column names > between remote and local and resulted in fail on execution. Also, according to the standard, the column list in CREATE FOREIGN TABLE is optional (if you can get it in some automatic way, of course).
On Sun, 12 Dec 2010 23:47:53 +0200 Peter Eisentraut <peter_e@gmx.net> wrote: > On ons, 2010-12-01 at 12:30 +0900, Hitoshi Harada wrote: > > From a user's view, this is very long way to see a simplest foreign > > table. I know it is based on the standard, but I really want a > > shortcut. Especially, I don't understand why CREATE USER MAPPING FOR > > current_user SERVER <server> is needed for default use case. If you > > forget CREATE USER MAPPING and do CREATE FOREIGN TABLE, it raises an > > error. User mapping is useful if the local user and remote user should > > be mapped but I imagine in most cases they are the same. > > postgresql_fdw can tell the remote user by conninfo string, in > > addition. > > I reviewed the standard about this, and a lot of things are > implementation-defined. I think user mappings could be made optional. Simple FDWs such as File FDW might not have concept of "user" on remote side. In such case, it would be enough to control access privilege per local user with GRANT/REVOKE SELECT statement. > > This is another topic, but it would be useful if CREATE FOREIGN TABLE > > can omit column definitions since fdw usually knows what should be > > there in the definitions. I some times mistyped the column names > > between remote and local and resulted in fail on execution. > > Also, according to the standard, the column list in CREATE FOREIGN TABLE > is optional (if you can get it in some automatic way, of course). To allow omitting column definitions for that purpose, a way to create ero-column tables would have to be provided. New syntax which allows FDWs to determine column definition would be necessary. ex) -- Create foo from the remote table foo on the server bar CREATE FOREIGN TABLE foo SERVER bar; -- Create zero-column table foo CREATE FOREIGN TABLE foo () SERVER bar; To support this feature, another hook function need to be added to FDW API. ISTM that this feature should be considered with IMPORT SCHEMA statement. Regards, -- Shigeru Hanada
On Thu, 25 Nov 2010 15:03:29 +0200 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > I propose the attached API instead. This has a clear separation between > plan and execution. I'm sure we'll have to extend the API in the future > FDWs want tighter integration, but I think this is a good start. It > makes it quite straightforward to write simple FDW like the file FDW, > without having to know anything about the executor or planner internals, > but provides enough flexibility to cover the functionality in your > PostgreSQL FDW. Thanks for the comments and the proposal. I've revised fdw_core patch along your proposal and subsequent discussion, and tried to fix FDW patches pgsql_fdw and file_fdw according to fdw_core. Attached is a WIP which includes changes below. (1) Introduce FdwPlan and FdwExecutionState IIUC, FdwPlan should have copyObject support because it is a part of Plan node and would be copied when the plan was cached. So FdwPlan need to be a Node, and hold private information in List. In contrast, FdwExecutionState.private doesn't have such limitation because it would not be copied. One problem to solve is that current PostgreSQL FDW needs PlanState to generate foreign query string with deparse_expression(). It would be able to generate SQL string from PlannerInfo and RelOptInfo in PlanRelScan() without deparse_expression(), but it might need so many codes. (2) Pass ParamListInfo to BeginScan, not ForeignScanState To handle parameters of EXECUTE statement, BeginScan need to get ParamListInfo from EState. So I've added ParamListInfo parameter to BeginScan. (3) Iterate() returns result with TupleTableSlot parameter How about receiving result with TupleTableSlot which was specified by caller? This would allow FDWs to forget about ScanState. In this design, the prototype of Iterate is: void *(*Iterate)(FdwExecutionState *fstate, TupleTableSlot *slot); (4) Support EXPLAIN information Now EXPLAIN VERBOSE shows FDW-specific information. Here is a sample of PostgreSQL FDW. File FDW might show filename and size. postgres=# explain verbose select * From accounts where aid = 1; QUERY PLAN ----------------------------------------------------------------------------------------------------- Foreign Scan on local.accounts (cost=0.00..0.00 rows=1 width=100) Output: aid, bid, abalance, filler Filter: (accounts.aid = 1) FDW-Info: SELECT aid, bid, abalance, filler FROM public.pgbench_accounts accounts WHERE (aid = 1) (4 rows) Regards, -- Shigeru Hanada
Attachment
2010/12/14 Shigeru HANADA <hanada@metrosystems.co.jp>: > On Thu, 25 Nov 2010 15:03:29 +0200 > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> I propose the attached API instead. This has a clear separation between >> plan and execution. I'm sure we'll have to extend the API in the future >> FDWs want tighter integration, but I think this is a good start. It >> makes it quite straightforward to write simple FDW like the file FDW, >> without having to know anything about the executor or planner internals, >> but provides enough flexibility to cover the functionality in your >> PostgreSQL FDW. > > Thanks for the comments and the proposal. > > I've revised fdw_core patch along your proposal and subsequent > discussion, and tried to fix FDW patches pgsql_fdw and file_fdw > according to fdw_core. Reviewing the patch a little, it occurred to me that it might be a good idea to split the patch into two pieces again. One is for adding CREATE FOREIGN TABLE syntax and actual creation and the other is for FDW APIs. ISTM syntax and and utility processing of FOREIGN TABLE has been stable in the latest patches, and the discussion should be concentrated on APIs. APIs change don't harm SQL interfaces. Of course CREATE FOREIGN TABLE is not useful as alone, but it would be helpful to be reviewed easily. Regards, -- Hitoshi Harada
2010/12/13 Shigeru HANADA <hanada@metrosystems.co.jp>: > On Sun, 12 Dec 2010 23:47:53 +0200 > Peter Eisentraut <peter_e@gmx.net> wrote: >> On ons, 2010-12-01 at 12:30 +0900, Hitoshi Harada wrote: >> > From a user's view, this is very long way to see a simplest foreign >> > table. I know it is based on the standard, but I really want a >> > shortcut. Especially, I don't understand why CREATE USER MAPPING FOR >> > current_user SERVER <server> is needed for default use case. If you >> > forget CREATE USER MAPPING and do CREATE FOREIGN TABLE, it raises an >> > error. User mapping is useful if the local user and remote user should >> > be mapped but I imagine in most cases they are the same. >> > postgresql_fdw can tell the remote user by conninfo string, in >> > addition. >> >> I reviewed the standard about this, and a lot of things are >> implementation-defined. I think user mappings could be made optional. > > Simple FDWs such as File FDW might not have concept of "user" on > remote side. In such case, it would be enough to control access > privilege per local user with GRANT/REVOKE SELECT statement. Yeah, with file_fdw users won't understand why CREATE USER MAPPING is needed. >> > This is another topic, but it would be useful if CREATE FOREIGN TABLE >> > can omit column definitions since fdw usually knows what should be >> > there in the definitions. I some times mistyped the column names >> > between remote and local and resulted in fail on execution. >> >> Also, according to the standard, the column list in CREATE FOREIGN TABLE >> is optional (if you can get it in some automatic way, of course). > > To allow omitting column definitions for that purpose, a way to create > ero-column tables would have to be provided. New syntax which allows > FDWs to determine column definition would be necessary. > > ex) > -- Create foo from the remote table foo on the server bar > CREATE FOREIGN TABLE foo SERVER bar; > -- Create zero-column table foo > CREATE FOREIGN TABLE foo () SERVER bar; > > To support this feature, another hook function need to be added to FDW > API. ISTM that this feature should be considered with IMPORT SCHEMA > statement. Hmm, the benefit from it does not seem so much as the paid cost. Definition of minimum APIs sound like the first step to get it in the next release. We could add it later. Regards, -- Hitoshi Harada
On Mon, Dec 13, 2010 at 10:10 AM, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > I've revised fdw_core patch along your proposal and subsequent > discussion, and tried to fix FDW patches pgsql_fdw and file_fdw > according to fdw_core. Attached is a WIP which includes changes > below. This actually doesn't apply cleanly. There's a hunk in pg_class.h that is failing. I think attgenoptions is a poor choice of name for the concept it represents. Surely it should be attfdwoptions. But I am wondering whether we could actually go a step further and break this functionality off into a separate patch. Do file_fdw and/or postgresql_fdw require column-level FDW options? If not, splitting this part out looks like it would be a fairly significant simplification for v1 Along similar lines, I think we could simplify the first version of this considerably by removing all the support for constraints on foreign tables. It might be useful to have that some day, but in the interest of whittling this down to a manageable size, it seems like we could easily do without that for starters. On the other hand, I don't really see any advantage to allowing rules on foreign tables - ever. Unless there's some reason we really need that, my gut feeling would be to rip it out and forget about it. The docs should avoid cut-and-pasting large quantities of the existing docs. Instead, they should refer to the existing material. Copyright notice for new files should go through 2010, not 2009. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2010/12/14 Robert Haas <robertmhaas@gmail.com>: > On Mon, Dec 13, 2010 at 10:10 AM, Shigeru HANADA > <hanada@metrosystems.co.jp> wrote: >> I've revised fdw_core patch along your proposal and subsequent >> discussion, and tried to fix FDW patches pgsql_fdw and file_fdw >> according to fdw_core. Attached is a WIP which includes changes >> below. > > This actually doesn't apply cleanly. There's a hunk in pg_class.h > that is failing. > > I think attgenoptions is a poor choice of name for the concept it > represents. Surely it should be attfdwoptions. But I am wondering > whether we could actually go a step further and break this > functionality off into a separate patch. Do file_fdw and/or > postgresql_fdw require column-level FDW options? If not, splitting > this part out looks like it would be a fairly significant > simplification for v1 > > Along similar lines, I think we could simplify the first version of > this considerably by removing all the support for constraints on > foreign tables. It might be useful to have that some day, but in the > interest of whittling this down to a manageable size, it seems like we > could easily do without that for starters. > > On the other hand, I don't really see any advantage to allowing rules > on foreign tables - ever. Unless there's some reason we really need > that, my gut feeling would be to rip it out and forget about it. views, updateable views? Pavel > > The docs should avoid cut-and-pasting large quantities of the existing > docs. Instead, they should refer to the existing material. > > Copyright notice for new files should go through 2010, not 2009. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On mån, 2010-12-13 at 14:45 +0900, Shigeru HANADA wrote: > Simple FDWs such as File FDW might not have concept of "user" on > remote side. In such case, it would be enough to control access > privilege per local user with GRANT/REVOKE SELECT statement. Right. But it depends on the implementation. You could, for example, imagine a "userdir" FDW that reads from users' home directories. > To allow omitting column definitions for that purpose, a way to create > ero-column tables would have to be provided. New syntax which allows > FDWs to determine column definition would be necessary. > > ex) > -- Create foo from the remote table foo on the server bar > CREATE FOREIGN TABLE foo SERVER bar; > -- Create zero-column table foo > CREATE FOREIGN TABLE foo () SERVER bar; That syntax seems pretty obvious.
On Mon, 13 Dec 2010 21:51:40 -0500 Robert Haas <robertmhaas@gmail.com> wrote: > This actually doesn't apply cleanly. There's a hunk in pg_class.h > that is failing. I might have missed recent changes about pg_class.relistemp. I've fixed in local repo. > I think attgenoptions is a poor choice of name for the concept it > represents. Surely it should be attfdwoptions. But I am wondering > whether we could actually go a step further and break this > functionality off into a separate patch. Do file_fdw and/or > postgresql_fdw require column-level FDW options? If not, splitting > this part out looks like it would be a fairly significant > simplification for v1 In current FDW implementation, column-level FDW options are used as: 1) force_not_null option of file_fdw. COPY FROM accepts the option as column name list, but it would be complicated to accept it in table-level FDW option. 2) column name alias option in postgresql_fdw. But they don't seem necessary to discuss basic design. > Along similar lines, I think we could simplify the first version of > this considerably by removing all the support for constraints on > foreign tables. It might be useful to have that some day, but in the > interest of whittling this down to a manageable size, it seems like we > could easily do without that for starters. > > On the other hand, I don't really see any advantage to allowing rules > on foreign tables - ever. Unless there's some reason we really need > that, my gut feeling would be to rip it out and forget about it. > > The docs should avoid cut-and-pasting large quantities of the existing > docs. Instead, they should refer to the existing material. CHECK constraint allowed to support constraint exclusion, but NOT NULL is designed for just query-time constraint. I'll simplify the patch and post patches 1-4 of below first. <essential part> 1) Basic syntax for FOREIGN TABLE and FDW HANDLER 2) FDW API and ForeignScan execution # These patches are split just to make review easy. <FDW implementation> 3) pgsql_fdw 4) file_fdw <Additional features> 5) NOT NULL constraint and query-time evaluation 6) column-level FDW option - syntax and catalog - column alias option for pgsql_fdw - force_not_null option for file_fdw 7) RULE > Copyright notice for new files should go through 2010, not 2009. Will be fixed in next patch. I also replaced all $PostgreSQL$ with actual file names. Regards, -- Shigeru Hanada
On Tue, Dec 14, 2010 at 1:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> On the other hand, I don't really see any advantage to allowing rules >> on foreign tables - ever. Unless there's some reason we really need >> that, my gut feeling would be to rip it out and forget about it. > > views, updateable views? We already have those. They have their own relkind. Why would we need to duplicate that here? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Dec 14, 2010 at 23:38, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Dec 14, 2010 at 1:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> On the other hand, I don't really see any advantage to allowing rules >>> on foreign tables - ever. Unless there's some reason we really need >>> that, my gut feeling would be to rip it out and forget about it. >> >> views, updateable views? > > We already have those. They have their own relkind. Why would we > need to duplicate that here? We need RULEs or INSTEAD OF TRIGGERs to support updatable foreign tables. Do you suggest to define a wrapper view if we want to create an updatable foreign table? I think users don't like such kind of wrappers. -- Itagaki Takahiro
On Tue, Dec 14, 2010 at 9:06 AM, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > I'll simplify the patch and post patches 1-4 of below first. > > <essential part> > 1) Basic syntax for FOREIGN TABLE and FDW HANDLER > 2) FDW API and ForeignScan execution > # These patches are split just to make review easy. > > <FDW implementation> > 3) pgsql_fdw > 4) file_fdw > > <Additional features> > 5) NOT NULL constraint and query-time evaluation > 6) column-level FDW option > - syntax and catalog > - column alias option for pgsql_fdw > - force_not_null option for file_fdw > 7) RULE This seems like a good plan. As a procedural issue, please post patches one and two on the same thread (perhaps this one), because they can't be considered in isolation. Each of the remaining patches should be posted to its own thread. I've moved the SQL/MED patches to CommitFest 2011-01, and created a topic for them, SQL/MED, since it seems like we'll end up with a bunch of them to review. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Dec 14, 2010 at 9:42 AM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Tue, Dec 14, 2010 at 23:38, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Dec 14, 2010 at 1:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>> On the other hand, I don't really see any advantage to allowing rules >>>> on foreign tables - ever. Unless there's some reason we really need >>>> that, my gut feeling would be to rip it out and forget about it. >>> >>> views, updateable views? >> >> We already have those. They have their own relkind. Why would we >> need to duplicate that here? > > We need RULEs or INSTEAD OF TRIGGERs to support updatable foreign tables. We do? Why can't the support for updating foreign tables be built-in rather than trigger-based? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Dec 14, 2010 at 9:42 AM, Itagaki Takahiro > <itagaki.takahiro@gmail.com> wrote: >> We need RULEs or INSTEAD OF TRIGGERs to support updatable foreign tables. > We do? Why can't the support for updating foreign tables be built-in > rather than trigger-based? It *has* to be built in. What exactly would you imagine a rule or trigger is going to do? It won't have any below-SQL-level access to the foreign table with which it could issue some magic command that's not spelled UPDATE; and even if it did, why wouldn't you just spell that command UPDATE? There would be value in being able to fire triggers on foreign-table updates just like you can on local tables. It might well be that that would just fall out of the implementation, since triggers are handled at the top level of the executor, which shouldn't need to know the difference. But if it doesn't fall out easily, I don't mind postponing that feature till later. regards, tom lane
On Tue, Dec 14, 2010 at 23:45, Robert Haas <robertmhaas@gmail.com> wrote: >> We need RULEs or INSTEAD OF TRIGGERs to support updatable foreign tables. > > We do? Why can't the support for updating foreign tables be built-in > rather than trigger-based? Do we have any concrete idea for the built-in update feature? There are no definitions in the SQL standard about interface for updates. So, I think RULE and TRIGGER are the best solution for now. In addition, even if we support some kinds of built-in update feature, I still think RULE and TRIGGER are useful, for example, logging purpose. -- Itagaki Takahiro
On Wed, Dec 15, 2010 at 12:48:59AM +0900, Itagaki Takahiro wrote: > On Tue, Dec 14, 2010 at 23:45, Robert Haas <robertmhaas@gmail.com> wrote: > >> We need RULEs or INSTEAD OF TRIGGERs to support updatable foreign > >> tables. > > > > We do? Why can't the support for updating foreign tables be > > built-in rather than trigger-based? > > Do we have any concrete idea for the built-in update feature? There > are no definitions in the SQL standard about interface for updates. > > So, I think RULE and TRIGGER are the best solution for now. In > addition, even if we support some kinds of built-in update feature, > I still think RULE and TRIGGER are useful, for example, logging > purpose. Please start with TRIGGER, and we can then discuss the whether and possibly the how of RULEs later. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Dec 14, 2010 at 10:48 AM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Tue, Dec 14, 2010 at 23:45, Robert Haas <robertmhaas@gmail.com> wrote: >>> We need RULEs or INSTEAD OF TRIGGERs to support updatable foreign tables. >> >> We do? Why can't the support for updating foreign tables be built-in >> rather than trigger-based? > > Do we have any concrete idea for the built-in update feature? > There are no definitions in the SQL standard about interface for updates. > > So, I think RULE and TRIGGER are the best solution for now. > In addition, even if we support some kinds of built-in update feature, > I still think RULE and TRIGGER are useful, for example, logging purpose. I think triggers are useful. I see no reason to support rules. If the first version of our SQL/MED functionality is read-only, that's fine. But triggers are slow, clumsy, and expose implementation details to users, so those should be something that we provide as a way of making the database extensible, not something we use to build core functionality. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi hackers, Attached are revised version of SQL/MED core functionality patches. In order to make review easier, fdw_core patch has been split into two parts, fdw_syntax and fdw_scan. Please apply fdw_scan onto fdw_syntax to test whole features. The fdw_syntax.patch includes: 1) syntax for FOREIGN TABLE and FDW HANDLER # RULE, DEFAULT, and column level generic option are excluded to # make the patch simple. But NOT NULL constraint is included # because it is required to inherit a table with NOT NULL # constraint. 2) modified catalogs, pg_foreign_table and pg_foreign_data_wrapper 3) documentation about the syntax (maybe need enhancement, though) 4) regression tests for the syntax 5) psql and pg_dump support for foreign tables and FDWs (handler) And the fdw_scan.patch includes: 1) FDW API definition 2) ForeignScan executor node 3) Planner hook and Executor hooks 4) EXPLAIN support (included because it would effect to FdwPlan) Note that executing SELECT and EXPLAIN statement requires FDW with HANDLER, such as file_fdw and pgsql_fdw. New version of file_fdw will be posted in another thread "SQL/MED - file_fdw" soon. Regards, -- Shigeru Hanada
Attachment
On Wed, 2010-12-15 at 22:25 +0900, Shigeru HANADA wrote: > Attached are revised version of SQL/MED core functionality patches. Looks very interesting new feature, well done. Can I ask some questions about how this will work? No particular order, just numbered for reference. 1. The docs don't actually say what a foreign table is. Is it a local representation of foreign data? Or a local copy of foreign data? Or is it a table created on a remote node? 2. Will CREATE FOREIGN TABLE require a transactionid? It seems a good replacement for temp tables on Hot Standby to be able to run a CREATE FOREIGN TABLE using the file_fdw, then reuse the file again later. 3. Do we support CREATE TEMP FOREIGN TABLE? It seems desirable to be able to move data around temporarily, as we do with normal tables. 4. In Hot Standby, we are creating many copies of the data tables on different servers. That seems to break the concept that data is in only one place, when we assume that a foreign table is on only one foreign server. How will we represent the concept that data is potentially available identically from more than one place? Any other comments about how this will work with Hot Standby? 5. In PL/Proxy, we have the concept that a table is sharded across multiple nodes. Is that possible here? Again, we seem to have the concept that a table is only ever in a single place. 6. Can we do CREATE FOREIGN TABLE .... AS SELECT ... I guess the answer depends on (1) 7. Why does ANALYZE skip foreign tables? Surely its really important we know things about a foreign table, otherwise we are going to optimize things very badly. 8. Is the WHERE clause passed down into a ForeignScan? 9. The docs for CHECK constraints imply that the CHECK is executed against any rows returned from FDW. Are we really going to execute that as an additional filter on each row retrieved? 10. Can a foreign table be referenced by a FK? 11. Can you create a DO INSTEAD trigger on a FOREIGN TABLE? 12. I think it would be useful for both review and afterwards to write the documentation section now, so we can begin to understand this. Will there be a documentation section on writing a FDW also? There are enough open questions here that I think we need docs and a review guide, otherwise we'll end up with some weird missing feature, which would be a great shame. 13. How does this relate to dblink? Is that going to be replaced by this feature? 14. How do we do scrollable cursors with foreign tables? Do we materialize them always? Or... 15. In terms of planning queries, do we have a concept of additional cost per row on a foreign server? How does the planner decide how costly retrieving data is from the FDW? 16. If we cancel a query, is there an API call to send query cancel to the FDW and so on to the foreign server? Does that still work if we hot other kinds of ERROR, or FATAL? 17. Can we request different types of transaction isolation on the foreign server, or do certain states get passed through from our session? e.g. if we are running a serializable transaction, does that state get passed through to the FDW, so it knows to request that on the foreign server? That seems essential if we are going to make pg_dump work correctly. 18. Does pg_dump dump the data in the FDW or just of the definition of the data? Can we have an option for either? 19. If we PREPARE a statement, are there API calls to pass thru the PREPARE to the FDW? Or are calls always dynamic? 20. If default privileges include INSERT, UPDATE or DELETE, does this cause error, or does it silently get ignored for foreign tables? I think I would want the latter. 21. Can we LOCK a foreign table? I guess so. Presumably no LOCK is passed through to the FDW? 22. Can we build an local index on a foreign table? No too sure what the right answers are to these questions, but I think we need to know the answers to understand what we are getting. Thanks -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On 12/21/2010 02:33 PM, Simon Riggs wrote: > On Wed, 2010-12-15 at 22:25 +0900, Shigeru HANADA wrote: > >> Attached are revised version of SQL/MED core functionality patches. > Looks very interesting new feature, well done. > > Can I ask some questions about how this will work? > No particular order, just numbered for reference. Answering a few of your questions as I understand the position, faute de mieux. > 1. The docs don't actually say what a foreign table is. Is it a local > representation of foreign data? Or a local copy of foreign data? Or is > it a table created on a remote node? It's an interface to data not managed by Postgres (or at least by this node). It might be a table on another Postgres node, it might be a file, it might be a table in another RDBMS, it might be a stream of some sort. I could imagine creating one over a SOAP call, or for an RSS feed. Someone has created one for a twitter feed, I believe. An LDAP FDW might also be useful (think: single sign on). > 2. Will CREATE FOREIGN TABLE require a transactionid? It seems a good > replacement for temp tables on Hot Standby to be able to run a CREATE > FOREIGN TABLE using the file_fdw, then reuse the file again later. How could it not require a txnid? It's going to write the definition to the catalog, isn't it? > 3. Do we support CREATE TEMP FOREIGN TABLE? It seems desirable to be > able to move data around temporarily, as we do with normal tables. That would definitely be a good thing to have. > > 7. Why does ANALYZE skip foreign tables? Surely its really important we > know things about a foreign table, otherwise we are going to optimize > things very badly. Quite apart from other reasons, such as possible ephemerality of the data, the difficulty of taking a reasonable random sample from an arbitrary foreign data source seems substantial, and we surely don't want ANALYSE to have to run a full sequential scan of a foreign data source. > > 10. Can a foreign table be referenced by a FK? I don't see how it can be. There would be no unique index to use. cheers andrew
On Wed, 2010-12-22 at 09:03 -0500, Andrew Dunstan wrote: > Answering a few of your questions Many thanks. > > 7. Why does ANALYZE skip foreign tables? Surely its really important we > > know things about a foreign table, otherwise we are going to optimize > > things very badly. > > Quite apart from other reasons, such as possible ephemerality of the > data, the difficulty of taking a reasonable random sample from an > arbitrary foreign data source seems substantial, and we surely don't > want ANALYSE to have to run a full sequential scan of a foreign data source. I think we need something that estimates the size of a table, at least, otherwise queries will be completely un-optimised. > > 10. Can a foreign table be referenced by a FK? > > I don't see how it can be. There would be no unique index to use. That answers another question also. Still many unanswered. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On 12/22/2010 10:00 AM, Simon Riggs wrote: > 7. Why does ANALYZE skip foreign tables? Surely its really important we >>> know things about a foreign table, otherwise we are going to optimize >>> things very badly. >> Quite apart from other reasons, such as possible ephemerality of the >> data, the difficulty of taking a reasonable random sample from an >> arbitrary foreign data source seems substantial, and we surely don't >> want ANALYSE to have to run a full sequential scan of a foreign data source. > I think we need something that estimates the size of a table, at least, > otherwise queries will be completely un-optimised. > > Perhaps we could allow FDWs to register a size estimation function. Does this need to be done on the first go round? Time is running a bit short. cheers andrew
On Wed, Dec 22, 2010 at 03:00:16PM +0000, Simon Riggs wrote: > On Wed, 2010-12-22 at 09:03 -0500, Andrew Dunstan wrote: > > > Answering a few of your questions > > Many thanks. > > > > 7. Why does ANALYZE skip foreign tables? Surely its really > > > important we know things about a foreign table, otherwise we are > > > going to optimize things very badly. > > > > Quite apart from other reasons, such as possible ephemerality of > > the data, the difficulty of taking a reasonable random sample from > > an arbitrary foreign data source seems substantial, and we surely > > don't want ANALYSE to have to run a full sequential scan of a > > foreign data source. > > I think we need something that estimates the size of a table, at > least, otherwise queries will be completely un-optimised. The estimated size for a lot of things--streams of data, for example--is infinity. I suppose that's a good default for some cases. > > > 10. Can a foreign table be referenced by a FK? > > > > I don't see how it can be. There would be no unique index to use. > > That answers another question also. There might be some cases where we can say an expression on a set of columns is unique, but not in the general case. In the general case, we can't even guarantee 1NF. > Still many unanswered. Will try these later today :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Dec 21, 2010 at 07:33:04PM +0000, Simon Riggs wrote: > On Wed, 2010-12-15 at 22:25 +0900, Shigeru HANADA wrote: > > > Attached are revised version of SQL/MED core functionality patches. > > Looks very interesting new feature, well done. While, "read SQL:2008" is generally not super useful advice, in the particular case of 6WD2_09_MED_2007-12, it's actually pretty clear. Doubtless, as more competition arises, SQL/MED's specification will get cloudier, but for now, it's a decent place to start. > 4. In Hot Standby, we are creating many copies of the data tables on > different servers. That seems to break the concept that data is in > only one place, when we assume that a foreign table is on only one > foreign server. How will we represent the concept that data is > potentially available identically from more than one place? Any > other comments about how this will work with Hot Standby? Your premise, that we're creating many copies, refers to a potential optimization of SQL/MED which IMHO we should not even vaguely consider until we've given SQL/MED a chance to shake out. It's a basic question about cache coherency, and we just don't need to go there on the first round. Possibly not even on the second. > 5. In PL/Proxy, we have the concept that a table is sharded across > multiple nodes. Is that possible here? Again, we seem to have the > concept that a table is only ever in a single place. It's not super relevant. > 6. Can we do CREATE FOREIGN TABLE .... AS SELECT ... > I guess the answer depends on (1) The answer to (1) is "reference to data not managed by the instance of PostgreSQL doing the queries," so it should be possible, depending on what capabilities the particular foreign data wrapper exposes. > 7. Why does ANALYZE skip foreign tables? Surely its really important > we know things about a foreign table, otherwise we are going to > optimize things very badly. The only thing known about a table may be the structure of its rows. Streams would be one example of this. > 8. Is the WHERE clause passed down into a ForeignScan? Dunno. I'd presume that predicate pushing would be a very important part of how to get this working at a reasonable speed, but as I've demonstrated with DBI-Link, it's not strictly necessary for base functionality. > 9. The docs for CHECK constraints imply that the CHECK is executed > against any rows returned from FDW. Are we really going to execute that > as an additional filter on each row retrieved? Depends. If there's some reliable way to push the CHECK to the foreign data source, that is of course preferable. > 10. Can a foreign table be referenced by a FK? It's conceivable, at least in some cases. The penalties could potentially be quite high. > 11. Can you create a DO INSTEAD trigger on a FOREIGN TABLE? Don't see why not. Again, bearing in mind that not all foreign data sources are remotely similar in capability. > 12. I think it would be useful for both review and afterwards to > write the documentation section now, so we can begin to understand > this. Will there be a documentation section on writing a FDW also? > There are enough open questions here that I think we need docs and a > review guide, otherwise we'll end up with some weird missing > feature, which would be a great shame. Excellent idea. Next on the agenda: carving out the needed resources for this project. > 13. How does this relate to dblink? Is that going to be replaced by > this feature? As I understand it, dblink is already using some of the infrastructure. For legacy reasons, I suspect dblink will be maintained, even when we have a full-blown SQL/MED implementation. > 14. How do we do scrollable cursors with foreign tables? Do we > materialize them always? Or... That will depend on the nature of the foreign data source. > 15. In terms of planning queries, do we have a concept of additional > cost per row on a foreign server? How does the planner decide how costly > retrieving data is from the FDW? Dunno. I return to my refrain of, "depends on the foreign data source." You could imagine that a future version of PostgreSQL would have handy interfaces available to MED and others, where a CSV file would be much less likely to. > 16. If we cancel a query, is there an API call to send query cancel to > the FDW and so on to the foreign server? Does that still work if we hot > other kinds of ERROR, or FATAL? Depends... > 17. Can we request different types of transaction isolation on the > foreign server, or do certain states get passed through from our > session? e.g. if we are running a serializable transaction, does > that state get passed through to the FDW, so it knows to request > that on the foreign server? That seems essential if we are going to > make pg_dump work correctly. Assuming that the foreign server even has a concept of transaction isolation, which it may well not, we should be able to pass same. > 18. Does pg_dump dump the data in the FDW or just of the definition of > the data? Can we have an option for either? I'm thinking it should not dump foreign data in general, and if and when we build in that capability, it should come with loud, clear warnings about the caching issues involved. > 19. If we PREPARE a statement, are there API calls to pass thru the > PREPARE to the FDW? Or are calls always dynamic? Depends... > 20. If default privileges include INSERT, UPDATE or DELETE, does this > cause error, or does it silently get ignored for foreign tables? I think > I would want the latter. Access control should probably be controlled by PostgreSQL, and as I understand the API, they will be. > 21. Can we LOCK a foreign table? I guess so. Presumably no LOCK is > passed through to the FDW? Depends... > 22. Can we build an local index on a foreign table? Dunno. In principle, it might be possible, but there are pretty thorny caching issues that make this an edge case. I can imagine some gigantic read-only store where this would make sense, but I'm having trouble thinking of another case. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Excerpts from David Fetter's message of mié dic 22 12:36:10 -0300 2010: > On Wed, Dec 22, 2010 at 03:00:16PM +0000, Simon Riggs wrote: > > On Wed, 2010-12-22 at 09:03 -0500, Andrew Dunstan wrote: > > > Quite apart from other reasons, such as possible ephemerality of > > > the data, the difficulty of taking a reasonable random sample from > > > an arbitrary foreign data source seems substantial, and we surely > > > don't want ANALYSE to have to run a full sequential scan of a > > > foreign data source. > > > > I think we need something that estimates the size of a table, at > > least, otherwise queries will be completely un-optimised. > > The estimated size for a lot of things--streams of data, for > example--is infinity. I suppose that's a good default for some cases. Since we don't have streaming queries, this would seem rather pointless ... Surely the FDW must be able to limit the resultset somehow. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, Dec 22, 2010 at 12:54:06PM -0300, Alvaro Herrera wrote: > Excerpts from David Fetter's message of mié dic 22 12:36:10 -0300 2010: > > On Wed, Dec 22, 2010 at 03:00:16PM +0000, Simon Riggs wrote: > > > On Wed, 2010-12-22 at 09:03 -0500, Andrew Dunstan wrote: > > > > > Quite apart from other reasons, such as possible ephemerality of > > > > the data, the difficulty of taking a reasonable random sample from > > > > an arbitrary foreign data source seems substantial, and we surely > > > > don't want ANALYSE to have to run a full sequential scan of a > > > > foreign data source. > > > > > > I think we need something that estimates the size of a table, at > > > least, otherwise queries will be completely un-optimised. > > > > The estimated size for a lot of things--streams of data, for > > example--is infinity. I suppose that's a good default for some cases. > > Since we don't have streaming queries, We don't, but other systems do. > this would seem rather pointless ... Surely the FDW must be able to > limit the resultset somehow. Using LIMIT. :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, 2010-12-22 at 07:36 -0800, David Fetter wrote: > > > > > > 7. Why does ANALYZE skip foreign tables? Surely its really > > > > important we know things about a foreign table, otherwise we are > > > > going to optimize things very badly. > > > > > > Quite apart from other reasons, such as possible ephemerality of > > > the data, the difficulty of taking a reasonable random sample from > > > an arbitrary foreign data source seems substantial, and we surely > > > don't want ANALYSE to have to run a full sequential scan of a > > > foreign data source. > > > > I think we need something that estimates the size of a table, at > > least, otherwise queries will be completely un-optimised. > > The estimated size for a lot of things--streams of data, for > example--is infinity. I suppose that's a good default for some cases. A fairly gross estimate will be all that is required. Without it, we'd better be looking at some advanced query-cancel support, cos the "17 hours and still going" events of previous SQL gateway products seems to be looming in my rear view mirror. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Tue, 21 Dec 2010 19:33:04 +0000 Simon Riggs <simon@2ndQuadrant.com> wrote: > 1. The docs don't actually say what a foreign table is. Is it a local > representation of foreign data? Or a local copy of foreign data? Or is > it a table created on a remote node? "Foreign table" is an database object which represents the format of existing external data as PG-table, and it can be used as data source of SELECT statement. It is like a VIEW rather than a TABLE because FOREIGN TABLE doesn't have any data locally. > 2. Will CREATE FOREIGN TABLE require a transactionid? It seems a good > replacement for temp tables on Hot Standby to be able to run a CREATE > FOREIGN TABLE using the file_fdw, then reuse the file again later. AFAIK Yes. CREATE FOREIGN TABLE make records in catalogs (pg_class, pg_attribute, etc.). > 3. Do we support CREATE TEMP FOREIGN TABLE? It seems desirable to be > able to move data around temporarily, as we do with normal tables. When we support write access to foreign tables, it would be useful. > 4. In Hot Standby, we are creating many copies of the data tables on > different servers. That seems to break the concept that data is in only > one place, when we assume that a foreign table is on only one foreign > server. How will we represent the concept that data is potentially > available identically from more than one place? Any other comments about > how this will work with Hot Standby? IIUC, If you create FOREIGN TABLE on primary node, it will be propagated to standby nodes with same generic options. Then, users connected to standby nodes can use the foreign tables to retrieve foreign data. If you have multiple standby nodes, all foreign tables on all nodes including primary node would refer one data source. For example, file_fdw would read data from the file pointed by "filename" option, but you can't change the path for each standby server. You may copy the file to each standby servers, or share one disk which contains the file by all servers. OTOH, RDBMS wrappers would refer same server if the SERVER, USER MAPPING and FOREIGN TABLE have same generic options (host, port, dbname, etc.), so you would need just one data instance for all of FOREIGN TABLEs on standby nodes, and data consistency might have to be checked on the remote side when the data is being changed. # Um, I might have missed your point... > 5. In PL/Proxy, we have the concept that a table is sharded across > multiple nodes. Is that possible here? Again, we seem to have the > concept that a table is only ever in a single place. You would able to point one data source from multiple foreign tables on different PG-nodes. > 6. Can we do CREATE FOREIGN TABLE .... AS SELECT ... > I guess the answer depends on (1) We might be able to support that syntax, but IMHO it doesn't seem too useful. > 7. Why does ANALYZE skip foreign tables? Surely its really important we > know things about a foreign table, otherwise we are going to optimize > things very badly. I think ANALYZE is good timing to get statistics of remote data. In current design, planner calls PlanRelScan() to get costs (startup/total) of the scan, but it seems difficult to estimate rows/width by each FDW. I think acquire_sample_rows() would be the hook point for that purpose. Then, "how to get random sample rows" would be FDW's matter, but I have not found smart way to acquire samples without sequential scan on the remote side... > 8. Is the WHERE clause passed down into a ForeignScan? Parsed WHERE clause is passed to PlanRelScan() via baserestrictinfo of RelOptInfo. Wrappers would be able to push it (or part of it) down to the remote side. Maybe RDBMS wrappers need to implement deparsing routine similar to deparse_expression() or ri_GenerateQual() for themselves. > 9. The docs for CHECK constraints imply that the CHECK is executed > against any rows returned from FDW. Are we really going to execute that > as an additional filter on each row retrieved? In current implementation, CHECK/NOT NULL constraints are not executed, and I'm not sure that they should be. NOT NULL and CHECK are supported for table inheritance mainly. > 10. Can a foreign table be referenced by a FK? Currently no. FK requires PK on the referenced table, but foreign table can't have PK constraint. > 11. Can you create a DO INSTEAD trigger on a FOREIGN TABLE? Currently no, but it would be useful. > 12. I think it would be useful for both review and afterwards to write > the documentation section now, so we can begin to understand this. Will > there be a documentation section on writing a FDW also? There are enough > open questions here that I think we need docs and a review guide, > otherwise we'll end up with some weird missing feature, which would be a > great shame. Agreed. ISTM that "V. Server Programming" section is suitable. > 13. How does this relate to dblink? Is that going to be replaced by this > feature? They would be independent each other in first version, and dblink would have to be maintained because there would be many users. > 14. How do we do scrollable cursors with foreign tables? Do we > materialize them always? Or... We materialize the result as same as normal tables. > 15. In terms of planning queries, do we have a concept of additional > cost per row on a foreign server? How does the planner decide how costly > retrieving data is from the FDW? Costs for a scan on a foreign table is estimated in FDW routine PlanRelScan(). So FDW can use arbitrary algorithm to estimate costs. pgsql_fdw might execute "EXPLAIN SELECT ... FROM xxx" on remote side to get remote costs. > 16. If we cancel a query, is there an API call to send query cancel to > the FDW and so on to the foreign server? Does that still work if we hot > other kinds of ERROR, or FATAL? There is no handler for query cancel. If FDW wants cleanup on the interrupts, resourceowner mechanism would help. > 17. Can we request different types of transaction isolation on the > foreign server, or do certain states get passed through from our > session? e.g. if we are running a serializable transaction, does that > state get passed through to the FDW, so it knows to request that on the > foreign server? That seems essential if we are going to make pg_dump > work correctly. Currently there is no transaction management hook. > 18. Does pg_dump dump the data in the FDW or just of the definition of > the data? Can we have an option for either? I think that pg_dump need to dump only definition of foreign tables because 1) multiple foreign tables on different nodes can refer one data source, and 2) we can't load data to foreign table. > 19. If we PREPARE a statement, are there API calls to pass thru the > PREPARE to the FDW? Or are calls always dynamic? PREPARE calls PlanRelScan() for each foreign tables used in the query, and EXECUTE calls BeginScan(), Iterate(), EndScan(). Parameters of EXECUTE statement are passed to Iterate(). > 20. If default privileges include INSERT, UPDATE or DELETE, does this > cause error, or does it silently get ignored for foreign tables? I think > I would want the latter. In current implementation, all default privileges are granted to foreign table if the owner has default privileges. It breaks consistency, will fix to ignore privileges which can't be granted via GRANT statement. > 21. Can we LOCK a foreign table? I guess so. Presumably no LOCK is > passed through to the FDW? LOCK statement can be used on foreign tables, but FDW can't handle the statement. You mean that locking remote data through LOCK statement? It would be better to discuss locking with write-access support. > 22. Can we build an local index on a foreign table? No. It would be difficult to determine what we should store abstract "tuple id", ctid for heap tuple, in index tuples. Regards, -- Shigeru Hanada
Thank you for those replies, I understand things much better now. I have two remaining concerns... On Fri, 2010-12-24 at 19:51 +0900, Shigeru HANADA wrote: > > 15. In terms of planning queries, do we have a concept of additional > > cost per row on a foreign server? How does the planner decide how > costly > > retrieving data is from the FDW? > > Costs for a scan on a foreign table is estimated in FDW routine > PlanRelScan(). So FDW can use arbitrary algorithm to estimate costs. > pgsql_fdw might execute "EXPLAIN SELECT ... FROM xxx" on remote side > to get remote costs. OK, so there is an API call to allow the FDW to determine the size of the table, before we attempt to materialize it. That is good, and will allow us to make some reasonable optimisations. Am I right in thinking that if the materialized result is larger than some_limit_parameter, that a ForeignScan will end with an ERROR? I think we're much more at risk from this with SQL/MED than we are with direct access. Keeping data remote usually means it is very large. work_space? > > 16. If we cancel a query, is there an API call to send query cancel > to > the FDW and so on to the foreign server? Does that still work if > we hot > other kinds of ERROR, or FATAL? > > There is no handler for query cancel. If FDW wants cleanup on the > interrupts, resourceowner mechanism would help. Please give this some thought. We need to be able to make a clean cancel for a remote query. If my comments seem in any way negative, it is because I have had previous experience with poorly designed SQL gateway products and have no wish to repeat those experiences in PostgreSQL. I understand it will take many years for whole feature set to arrive, though the ones mentioned above I regard as essential for the first release. Specifically, I've seen people do "SELECT * FROM BigForeignTable" and then be unable to cancel it until it/everyone explodes. That is especially annoying, since some SQL tools issue SELECTs as a means of doing DESCRIBE. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Fri, 24 Dec 2010 11:34:59 +0000 Simon Riggs <simon@2ndQuadrant.com> wrote: > On Fri, 2010-12-24 at 19:51 +0900, Shigeru HANADA wrote: > > > 15. In terms of planning queries, do we have a concept of additional > > > cost per row on a foreign server? How does the planner decide how > > costly > > > retrieving data is from the FDW? > > > > Costs for a scan on a foreign table is estimated in FDW routine > > PlanRelScan(). So FDW can use arbitrary algorithm to estimate costs. > > pgsql_fdw might execute "EXPLAIN SELECT ... FROM xxx" on remote side > > to get remote costs. > > OK, so there is an API call to allow the FDW to determine the size of > the table, before we attempt to materialize it. That is good, and will > allow us to make some reasonable optimisations. > > Am I right in thinking that if the materialized result is larger than > some_limit_parameter, that a ForeignScan will end with an ERROR? I think > we're much more at risk from this with SQL/MED than we are with direct > access. Keeping data remote usually means it is very large. > work_space? Materialize node uses Tuplestorestate to keep the result, so huge result would use temporary files. If FDW need to store result locally, it can use Tuplestorestate. > > > 16. If we cancel a query, is there an API call to send query cancel > > to > the FDW and so on to the foreign server? Does that still work if > > we hot > other kinds of ERROR, or FATAL? > > > > There is no handler for query cancel. If FDW wants cleanup on the > > interrupts, resourceowner mechanism would help. > > Please give this some thought. We need to be able to make a clean cancel > for a remote query. Sure. > If my comments seem in any way negative, it is because I have had > previous experience with poorly designed SQL gateway products and have > no wish to repeat those experiences in PostgreSQL. I understand it will > take many years for whole feature set to arrive, though the ones > mentioned above I regard as essential for the first release. > > Specifically, I've seen people do "SELECT * FROM BigForeignTable" and > then be unable to cancel it until it/everyone explodes. That is > especially annoying, since some SQL tools issue SELECTs as a means of > doing DESCRIBE. First of all, I think that it depends on the implementation of FDW and capability of remote server whether user can cancel remote query. For example, current pgsql_fdw uses PQexec("SELECT * FROM table") to execute remote query, and set cleanup callback with RegisterResourceReleaseCallback() after establishment of connection. In cleanup function, pgsql_fdw issues PQfinish() to cancel the whole query. With this implementation, pgsql_fdw can stop both of local and remote query with user interrupt and other errors. I'll research whether the registration of cleanup handler can be moved into core. If we don't provide FdwRoutine handler for query cancel and other errors, it would be better to document usage of resourceower mechanism in "How to write FDW" section or somewhere. Regards, -- Shigeru Hanada
On Wed, Dec 15, 2010 at 8:25 AM, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > In order to make review easier, fdw_core patch has been split into > two parts, fdw_syntax and fdw_scan. Please apply fdw_scan onto > fdw_syntax to test whole features. I'm working on getting a first chunk of this committed. I've removed a number of features that I don't think are absolutely required for an initial commit, simplified some things that seemed overly complicated to me, and then done a bunch of cleanup of what's left. It is still a very big patch, but it's approaching a size and level of complexity that seems manageable to me (currently, about 2300 lines). I'm not going to post it just yet because my version is still kind of a mess, but I thought a quick heads-up that I'm grinding through it would be a good idea, in order to avoid duplication of effort. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Dec 25, 2010 at 11:52 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I'm working on getting a first chunk of this committed. OK, here's the patch. Changes from the submitted fdw_syntax patch: - I removed a LOT of frammishes from CREATE FOREIGN TABLE. I think that the idea of letting foreign tables inherit from regular tables or visca versa is interesting, but I don't think the patch got it entirely right, and there's no reason it can't be added as a subsequent commit. Check constraints are disallowed now, too. In fact, basically all you can do with CREATE FOREIGN TABLE is set column names, types, and whether they're NOT NULL. But I think that's enough to get started. - I hacked things around so that we use more of the existing parser logic for CREATE TABLE and ALTER TABLE and then disallow the constructs we don't support (like constraints and default values) later on. I think this is preferable to replicating large chunks of the create/alter table grammar, especially because, if this patch is any indication, the fraction of it we're replicating is going to grow over time and perhaps approach 100%. I'm not entirely sure I've covered all the bases here, so some review of this logic would be appreciated. - I removed all of the changes related to adding a HANDLER option to foreign data wrappers. I think that stuff properly belongs in the "fdw scan" patch. Instead, what I've done here is just prohibit foreign data wrappers from being used in queries. I'm generally pretty negative on syntax-only patches, but then foreign data wrappers have been basically syntax-only for two releases, and I think there's a good chance that if we get the syntax patch in soon we'll actually be able to make it work before we run out of time. So I'm feeling like it might be OK in this case, especially because even with all the trimming down I've done here, this is still a very big patch. - Lots of cleanup, of both code and documentation. I'd appreciate some review of what's attached, even though it's not totally final yet. Unless there are serious objections to this whole line of attack (which I hope there aren't, because I've put a ton of time into this already), I'm going to continue working on beating this into shape and, barring violent objections, eventually commit something based on the version attached here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Mon, 27 Dec 2010 22:16:42 -0500 Robert Haas <robertmhaas@gmail.com> wrote: > OK, here's the patch. Changes from the submitted fdw_syntax patch: In psql document, I found an inconsistency between \command-letter and object-type has been in the original patch. Attached patch would fix it. Regards, -- Shigeru Hanada
Attachment
On 28.12.2010 05:16, Robert Haas wrote: > On Sat, Dec 25, 2010 at 11:52 PM, Robert Haas<robertmhaas@gmail.com> wrote: > In fact, basically all you can do with CREATE FOREIGN TABLE is set column > names, types, and whether they're NOT NULL. But I think that's enough > to get started. Even NOT NULL seems questionable. It might be interesting for the planner, but our cost estimates of remote queries are pretty bogus anyway. We can't enforce the NULLness of remote data, so I don't think we should allow NOT NULL, and should always choose plans that are safe if there are NULLs after all. > - I removed all of the changes related to adding a HANDLER option to > foreign data wrappers. I think that stuff properly belongs in the > "fdw scan" patch. Instead, what I've done here is just prohibit > foreign data wrappers from being used in queries. I'm generally > pretty negative on syntax-only patches, but then foreign data wrappers > have been basically syntax-only for two releases, and I think there's > a good chance that if we get the syntax patch in soon we'll actually > be able to make it work before we run out of time. So I'm feeling > like it might be OK in this case, especially because even with all the > trimming down I've done here, this is still a very big patch. +1, now that we have a patch for the rest of the feature as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 28.12.2010 05:16, Robert Haas wrote: > I'd appreciate some review of what's attached, even though it's not > totally final yet. This construct doesn't translate well: > + appendStringInfo(&allowed, "table%s%s%s", > + allowView ? " or view" : "", > + allowType ? " or composite type" : "", > + allowForeignTable ? " or foreign table" : ""); Typo here: > @@ -6883,7 +6962,7 @@ ATExecChangeOwner(Oid relationOid, Oid newOwnerId, bool recursing, LOCKMODE lock > default: > ereport(ERROR, > (errcode(ERRCODE_WRONG_OBJECT_TYPE), > - errmsg("\"%s\" is not a table, view, or sequence", > + errmsg("\"%s\" is not a table, view, sequence, or foreign tabl, or foreign tablee", > NameStr(tuple_class->relname)))); > } > -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Dec 28, 2010 at 18:45, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> In fact, basically all you can do with CREATE FOREIGN TABLE is set column >> names, types, and whether they're NOT NULL. But I think that's enough >> to get started. > > Even NOT NULL seems questionable. It might be interesting for the planner, > but our cost estimates of remote queries are pretty bogus anyway. We can't > enforce the NULLness of remote data, so I don't think we should allow NOT > NULL, and should always choose plans that are safe if there are NULLs after > all. The same can be said for CHECK constraints, but CHECKs on foreign tables are very useful to support multi-nodes partitioning. So, I'd like to support CHECKs even though we cannot enforce the constraints at remove servers. Will we have CHECKs but drop NOT NULLs? Another idea is to have an option to apply those constraints at SELECT. We can find corrupted foreign data at that time. -- Itagaki Takahiro
On Tue, Dec 28, 2010 at 1:52 AM, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > On Mon, 27 Dec 2010 22:16:42 -0500 > Robert Haas <robertmhaas@gmail.com> wrote: >> OK, here's the patch. Changes from the submitted fdw_syntax patch: > > In psql document, I found an inconsistency between \command-letter and > object-type has been in the original patch. Attached patch would fix > it. Thanks, I've applied this to my local branch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Dec 28, 2010 at 4:45 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 28.12.2010 05:16, Robert Haas wrote: >> On Sat, Dec 25, 2010 at 11:52 PM, Robert Haas<robertmhaas@gmail.com> >> wrote: >> In fact, basically all you can do with CREATE FOREIGN TABLE is set column >> names, types, and whether they're NOT NULL. But I think that's enough >> to get started. > > Even NOT NULL seems questionable. It might be interesting for the planner, > but our cost estimates of remote queries are pretty bogus anyway. We can't > enforce the NULLness of remote data, so I don't think we should allow NOT > NULL, and should always choose plans that are safe if there are NULLs after > all. It's true that we can't enforce the non-NULL-ness of data, but we also can't enforce that the remote columns have any particular type, or that the number of columns and their names match the remote side, or indeed that the remote table exists at all. I think that the right approach here is to declare that the entire foreign table definition is essentially a contract. The user is promising us that the returned data will match the supplied parameters. If it doesn't, the user should expect errors and/or wrong answers. On a practical level, getting rid of NOT NULL constraints will pessimize many queries, and even more complex table constraints have their uses. Adopting that as a project policy seems short-sighted. >> - I removed all of the changes related to adding a HANDLER option to >> foreign data wrappers. I think that stuff properly belongs in the >> "fdw scan" patch. Instead, what I've done here is just prohibit >> foreign data wrappers from being used in queries. I'm generally >> pretty negative on syntax-only patches, but then foreign data wrappers >> have been basically syntax-only for two releases, and I think there's >> a good chance that if we get the syntax patch in soon we'll actually >> be able to make it work before we run out of time. So I'm feeling >> like it might be OK in this case, especially because even with all the >> trimming down I've done here, this is still a very big patch. > > +1, now that we have a patch for the rest of the feature as well. I think it's going to need some pretty heavy rebasing, over my changes - but yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Dec 28, 2010 at 4:59 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 28.12.2010 05:16, Robert Haas wrote: >> I'd appreciate some review of what's attached, even though it's not >> totally final yet. > > This construct doesn't translate well: Yeah, there are a bunch of remaining error message issues. See the "and it's not a bunny rabbit, either" thread. I want to get that patch committed first and then rebase this over it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Dec 27, 2010 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Dec 25, 2010 at 11:52 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I'm working on getting a first chunk of this committed. > > OK, here's the patch. I've now committed a version of this with a bunch of further revisions, corrections, and cleanup. It looks to me as though this patch was written based on the 9.0 code and not thoroughly updated for some of the 9.1 changes, but I think I cleaned most of that up. With a patch of this size, I am sure there are a few things I overlooked, so please point 'em out and I'll try to fix them promptly. Hanada-san, can you rebase the fdw_scan patch over what I committed and post an updated version ASAP? It'd be better for Heikki or Tom to work on that part of this than me, since they have a better understanding of the executor than I do, but I'm sure that they will not want to work from the previously posted patches as the changes I made are fairly extensive. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Jan 1, 2011 at 11:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Hanada-san, can you rebase the fdw_scan patch over what I committed > and post an updated version ASAP? It'd be better for Heikki or Tom to > work on that part of this than me, since they have a better > understanding of the executor than I do, but I'm sure that they will > not want to work from the previously posted patches as the changes I > made are fairly extensive. Is anyone working on this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, 4 Jan 2011 22:16:26 -0500 Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Jan 1, 2011 at 11:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > Hanada-san, can you rebase the fdw_scan patch over what I committed > > and post an updated version ASAP? It'd be better for Heikki or Tom to > > work on that part of this than me, since they have a better > > understanding of the executor than I do, but I'm sure that they will > > not want to work from the previously posted patches as the changes I > > made are fairly extensive. > > Is anyone working on this? Sorry for late replyl, I'm working on this item. I would post rebased fdw_scan patch soon. Regards, -- Shigeru Hanada
On Sat, 1 Jan 2011 23:54:05 -0500 Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Dec 27, 2010 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Sat, Dec 25, 2010 at 11:52 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> I'm working on getting a first chunk of this committed. > > > > OK, here's the patch. > > I've now committed a version of this with a bunch of further > revisions, corrections, and cleanup. It looks to me as though this > patch was written based on the 9.0 code and not thoroughly updated for > some of the 9.1 changes, but I think I cleaned most of that up. With > a patch of this size, I am sure there are a few things I overlooked, > so please point 'em out and I'll try to fix them promptly. While testing the commit, I found that CREATE FOREIGN TABLE requires unnecessary USAGE privilege on the FOREIGN DATA WRAPPER. SQL/MED standard requires only USAGE on the SERVER as follows. <quote> 1) If <foreign table definition> is contained in an SQL-client module, then the enabled authorization identifiers shall include A. 2) The applicable privileges shall include the USAGE privilege on the foreign-server identified by <foreign server name>. 3) Additional privileges, if any, necessary to execute <foreign table definition> are implementation-defined. </quote> Sorry, this problem comes from original patch. OTOH, the document about this specification which is written in "GRANT" page is correct. Regards, -- Shigeru Hanada
Attachment
On Sat, 1 Jan 2011 23:54:05 -0500 Robert Haas <robertmhaas@gmail.com> wrote: > Hanada-san, can you rebase the fdw_scan patch over what I committed > and post an updated version ASAP? It'd be better for Heikki or Tom to > work on that part of this than me, since they have a better > understanding of the executor than I do, but I'm sure that they will > not want to work from the previously posted patches as the changes I > made are fairly extensive. I've rebased fdw_scan patch onto HEAD, and also split into two parts: 1) fdw_handler.patch includes HANDLER option syntax for CREATE/ALTER FOREIGN DATA WRAPPER 2) foreign_scan.patch includes ForeignScan executor node and FdwRoutine interface Regards, -- Shigeru Hanada
Attachment
On Wed, Jan 5, 2011 at 19:24, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > 2) foreign_scan.patch includes ForeignScan executor node and > FdwRoutine interface I can see now Iterate() callback is called in per-tuple memory context. I'll adjust copy from API for the change. We don't need to export the executor state in CopyState. ForeignNext() still needs to materialize the slot. It seems reasonable for me to add tts_tableoid to TupleTableSlot and modify slot_getattr() to return the field for virtual and minimal tuples. Am I missing any problems here? Even if we still materialize tuples in 9.1, we would be better to use ExecStoreVirtualTuple() in file_fdw and pgsql_fdw for future optimization. -- Itagaki Takahiro