Thread: patch: SQL/MED(FDW) DDL
Hello. This is a proposal patch for SQL/MED for 9.1. At the prev. CF, this patch had so many features to make it hard to review all of them. So I devided it into smaller parts: (1) foreign table DDL support (this proposal): - support for foreign table DDL syntax (CREATE/ALTER FOREIGN TABLE) - Definition of FDW routine interface and system catalogs for it. (2) SELECT support for external PostgreSQL tables. First of all, I'd like to discuss and make agreement on the basic design and its implementation for FDW interface. Comments and ideas would be very appriciated, especially on how system catalog stores the information about foreign tables, server, and user mappings. Detail information are described in the wiki: http://wiki.postgresql.org/wiki/SQL/MED I'll add this item to the CF site. And WIP patch for (2) will be available on the page for how (1) is utilized. - Open Issue: - fdw connection params issue: In this implimentation, foreign table connection params(including passwords) are visible to all users. To avoid it, store them in the user_mapping object, not in foreign server object. # It should be improved, though. - fdw security issue: It depends on how indivisual fdw extensions are designed. I'll focus on it after basic agreement on fdw interface. - prev. CF page: https://commitfest.postgresql.org/action/patch_view?id=326 - code repository: http://repo.or.cz/w/pgsql-fdw.git (branch: foreign_table). repository branches structures is as follows: master +foreign_table: (active) branch for this proposal - foreign_scan: (active) WIP branch for select support +fdw_ddl: (will be obsolute) branch for last CF - dblink - copy_from Thanks in advance, SAKAMOTO Masahiko NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center e-mail: sakamoto.masahiko@oss.ntt.co.jp
Attachment
On 15/09/10 08:46, SAKAMOTO Masahiko wrote: > This is a proposal patch for SQL/MED for 9.1. > > At the prev. CF, this patch had so many features to make it hard > to review all of them. So I devided it into smaller parts: > (1) foreign table DDL support (this proposal): > - support for foreign table DDL syntax (CREATE/ALTER FOREIGN TABLE) > - Definition of FDW routine interface and system catalogs for it. > (2) SELECT support for external PostgreSQL tables. > > > First of all, I'd like to discuss and make agreement on the basic design > and its implementation for FDW interface. > Comments and ideas would be very appriciated, especially > on how system catalog stores the information about > foreign tables, server, and user mappings. In my mind the key question is: what does the API for writing foreign data wrappers look like? I couldn't figure that out by looking at the patch. The API needs to be simple and version-independent, so that you can write simple wrappers like the flat file wrapper easily. At the same time, it needs to be very flexible, so that it allows safely pushing down all kinds constructs like functions, aggregates and joins. The planner needs to know which constructs the wrapper can handle and get cost estimates for the foreign scans. Those two goals are at odds. I've been thinking about that for some time. Here's one sketch I made a while ago: http://archives.postgresql.org/pgsql-hackers/2008-07/msg00395.php. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
2010/9/15 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>: > In my mind the key question is: what does the API for writing foreign > data wrappers look like? I couldn't figure that out by looking at the patch. > > The API needs to be simple and version-independent, so that you can > write simple wrappers like the flat file wrapper easily. At the same > time, it needs to be very flexible, so that it allows safely pushing > down all kinds constructs like functions, aggregates and joins. The > planner needs to know which constructs the wrapper can handle and get > cost estimates for the foreign scans. Those two goals are at odds. > > I've been thinking about that for some time. Here's one sketch I made a > while ago: > http://archives.postgresql.org/pgsql-hackers/2008-07/msg00395.php. I wonder if we might not think of the foreign data wrapper problem as an instance of an even more general problem. Right now, we have semi-pluggable index access methods - it's probably not quite possible to implement one as a standalone module because of XLOG, and maybe other reasons, but at least there's some abstraction layer there. Perhaps we should be thinking about a similar facility of table-access methods. What if someone wants to implement column-oriented storage, or index-organized tables, or or tables that are really slow under heavy write loads but can instantaneously compute SELECT COUNT(*) FROM table, or mauve-colored tables with magic pixie dust? I don't want to raise the bar for this project to the point where we can never get it off the ground, but if there's a way to avoid baking in the assumption that only foreign tables can ever have special capabilities, that might be valuable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 15/09/10 21:21, Robert Haas wrote: > I wonder if we might not think of the foreign data wrapper problem as > an instance of an even more general problem. Right now, we have > semi-pluggable index access methods - it's probably not quite possible > to implement one as a standalone module because of XLOG, and maybe > other reasons, but at least there's some abstraction layer there. > Perhaps we should be thinking about a similar facility of table-access > methods. What if someone wants to implement column-oriented storage, > or index-organized tables, or or tables that are really slow under > heavy write loads but can instantaneously compute SELECT COUNT(*) FROM > table, or mauve-colored tables with magic pixie dust? I don't want to > raise the bar for this project to the point where we can never get it > off the ground, but if there's a way to avoid baking in the assumption > that only foreign tables can ever have special capabilities, that > might be valuable. Well, you could implement all that as a foreign data wrappers. Tables made out of pixie dust feels pretty foreign to me ;-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Sep 15, 2010 at 2:28 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 15/09/10 21:21, Robert Haas wrote: >> >> I wonder if we might not think of the foreign data wrapper problem as >> an instance of an even more general problem. Right now, we have >> semi-pluggable index access methods - it's probably not quite possible >> to implement one as a standalone module because of XLOG, and maybe >> other reasons, but at least there's some abstraction layer there. >> Perhaps we should be thinking about a similar facility of table-access >> methods. What if someone wants to implement column-oriented storage, >> or index-organized tables, or or tables that are really slow under >> heavy write loads but can instantaneously compute SELECT COUNT(*) FROM >> table, or mauve-colored tables with magic pixie dust? I don't want to >> raise the bar for this project to the point where we can never get it >> off the ground, but if there's a way to avoid baking in the assumption >> that only foreign tables can ever have special capabilities, that >> might be valuable. > > Well, you could implement all that as a foreign data wrappers. Tables made > out of pixie dust feels pretty foreign to me ;-). Eh, maybe. It doesn't seem like the best name, if we're actually managing the underlying data blocks with our smgr layer, etc. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > 2010/9/15 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>: >> I've been thinking about that for some time. Here's one sketch I made a >> while ago: >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg00395.php. > I wonder if we might not think of the foreign data wrapper problem as > an instance of an even more general problem. Right now, we have > semi-pluggable index access methods - it's probably not quite possible > to implement one as a standalone module because of XLOG, and maybe > other reasons, but at least there's some abstraction layer there. > Perhaps we should be thinking about a similar facility of table-access > methods. What if someone wants to implement column-oriented storage, > or index-organized tables, or or tables that are really slow under > heavy write loads but can instantaneously compute SELECT COUNT(*) FROM > table, or mauve-colored tables with magic pixie dust? I don't want to > raise the bar for this project to the point where we can never get it > off the ground, but if there's a way to avoid baking in the assumption > that only foreign tables can ever have special capabilities, that > might be valuable. Well, the problem is to not draw the abstraction boundary so high that your plugins have to reimplement the world to get anything done. mysql got this wrong IMO, and are still paying the price in the form of bizarre functional incompatibilities between their different storage engines. As an example, I don't think there is any sane way to provide column-oriented storage as a plugin. The entire executor is based around the assumption that table scans return a row at a time; in consequence, the entire planner is too. You can't have a plugin that replaces all of that. You could probably build a plugin that allows columnar storage but reconstructs rows to return to the executor ... but having to do that would largely destroy any advantages of a columnar DB, I fear. Yet there are other cases that probably *could* work well based on a storage-level abstraction boundary; index-organized tables for instance. So I think we need to have some realistic idea of what we want to support and design an API accordingly, not hope that if we don't know what we want we will somehow manage to pick an API that makes all things possible. I'm personally more worried about whether Heikki's sketch has the boundary too high-level than too low-level. It might work all right for handing off to a full-fledged remote database, particularly if the other DB is also Postgres; but otherwise it's leaving a lot of work to be done by the plugin author. And at the same time I don't think it's exposing enough information to let the local planner do anything intelligent in terms of trading off remote vs. local work. regards, tom lane
On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Well, the problem is to not draw the abstraction boundary so high that > your plugins have to reimplement the world to get anything done. > mysql got this wrong IMO, and are still paying the price in the form of > bizarre functional incompatibilities between their different storage > engines. Yeah, as far as I can tell there is pretty much universal consensus that they got that wrong. Actually, I have no personal opinion on the topic, having no familiarity with the innards of MySQL: but that is what people keep telling me. > As an example, I don't think there is any sane way to provide > column-oriented storage as a plugin. The entire executor is based > around the assumption that table scans return a row at a time; in > consequence, the entire planner is too. You can't have a plugin that > replaces all of that. You could probably build a plugin that allows > columnar storage but reconstructs rows to return to the executor ... but > having to do that would largely destroy any advantages of a columnar DB, > I fear. Yeah, I don't know. A columnar DB is a bit like making "SELECT * FROM table" really mean some kind of join between table_part1, table_part2, and table_part3 (which could then perhaps be reordered, a candidate for join removal, etc.). But I have no position on whether whatever infrastructure we'd need to support that is any way related to the problem du jour. It's worth noting, however, that even if we give up on a column-oriented storage within PG, we might easily be talking to a column-oriented DB on the other end of an SQL/MED connection; and we'd like to be able to handle that sanely. > Yet there are other cases that probably *could* work well based on a > storage-level abstraction boundary; index-organized tables for instance. > So I think we need to have some realistic idea of what we want to > support and design an API accordingly, not hope that if we don't > know what we want we will somehow manage to pick an API that makes > all things possible. Agreed. Random ideas: index-organized tables, tables that use a rollback log rather than VACUUM, tables that use strict two-phase locking rather than MVCC, tables that have no concurrency control at all and you get dirty reads (could be useful for logging tables), write-once read-many tables, compressed tables, encrypted tables, tables in formats used by previous versions of PostgreSQL, tables that store data in a round-robin fashion (like MRTG rrdtool). Within the general orbit of index-organized tables, you can wonder about different kinds of indices: btree, hash, and gist all seem promising. You can even imagine a GIST-like structure that does something like maintain running totals for certain columns on each non-leaf page, to speed up SUM operations. Feel free to ignore whatever of that seems irrelevant. > I'm personally more worried about whether Heikki's sketch has the > boundary too high-level than too low-level. It might work all right > for handing off to a full-fledged remote database, particularly if > the other DB is also Postgres; but otherwise it's leaving a lot of > work to be done by the plugin author. And at the same time I don't > think it's exposing enough information to let the local planner do > anything intelligent in terms of trading off remote vs. local work. Yeah, I think the API for exposing cost information needs a lot of thought. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, Sep 15, 2010 at 8:53 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: > 2010/9/16 Robert Haas <robertmhaas@gmail.com>: >> On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Yet there are other cases that probably *could* work well based on a >>> storage-level abstraction boundary; index-organized tables for instance. >>> So I think we need to have some realistic idea of what we want to >>> support and design an API accordingly, not hope that if we don't >>> know what we want we will somehow manage to pick an API that makes >>> all things possible. >> >> Agreed. Random ideas: index-organized tables... > > I'd love to see a table that is based on one of the existing KVSs. I'm not familiar with the term KVS? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
2010/9/16 Robert Haas <robertmhaas@gmail.com>: > On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yet there are other cases that probably *could* work well based on a >> storage-level abstraction boundary; index-organized tables for instance. >> So I think we need to have some realistic idea of what we want to >> support and design an API accordingly, not hope that if we don't >> know what we want we will somehow manage to pick an API that makes >> all things possible. > > Agreed. Random ideas: index-organized tables... I'd love to see a table that is based on one of the existing KVSs. -- Hitoshi Harada
On Wed, Sep 15, 2010 at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Sep 15, 2010 at 8:53 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: >> 2010/9/16 Robert Haas <robertmhaas@gmail.com>: >>> On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Yet there are other cases that probably *could* work well based on a >>>> storage-level abstraction boundary; index-organized tables for instance. >>>> So I think we need to have some realistic idea of what we want to >>>> support and design an API accordingly, not hope that if we don't >>>> know what we want we will somehow manage to pick an API that makes >>>> all things possible. >>> >>> Agreed. Random ideas: index-organized tables... >> >> I'd love to see a table that is based on one of the existing KVSs. > > I'm not familiar with the term KVS? Oh, key-value store, I bet. Yeah, that would be cool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
2010/9/16 Robert Haas <robertmhaas@gmail.com>: > On Wed, Sep 15, 2010 at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Sep 15, 2010 at 8:53 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: >>> 2010/9/16 Robert Haas <robertmhaas@gmail.com>: >>>> On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>> Yet there are other cases that probably *could* work well based on a >>>>> storage-level abstraction boundary; index-organized tables for instance. >>>>> So I think we need to have some realistic idea of what we want to >>>>> support and design an API accordingly, not hope that if we don't >>>>> know what we want we will somehow manage to pick an API that makes >>>>> all things possible. >>>> >>>> Agreed. Random ideas: index-organized tables... >>> >>> I'd love to see a table that is based on one of the existing KVSs. >> >> I'm not familiar with the term KVS? > > Oh, key-value store, I bet. Yeah, that would be cool. That's it. Like Redis, Tokyo Cabinet, or something. -- Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> writes: > 2010/9/16 Robert Haas <robertmhaas@gmail.com>: >> Oh, key-value store, I bet. �Yeah, that would be cool. > That's it. Like Redis, Tokyo Cabinet, or something. What exactly do those get you that an ordinary index, or at worst an index-organized table, doesn't get you? regards, tom lane
On 16/09/10 13:22, Tom Lane wrote: > Hitoshi Harada<umi.tanuki@gmail.com> writes: > >> 2010/9/16 Robert Haas<robertmhaas@gmail.com>: >> >>> Oh, key-value store, I bet. Yeah, that would be cool. >>> > >> That's it. Like Redis, Tokyo Cabinet, or something. >> > What exactly do those get you that an ordinary index, or at worst an > index-organized table, doesn't get you? > > regards, tom lane > > It is pretty rare to see key value stores vs relational engines discussed without a descent into total foolishiness, but this Wikipedia page looks like a reasonable summary: http://en.wikipedia.org/wiki/NoSQL
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes: > On 16/09/10 13:22, Tom Lane wrote: >> What exactly do those get you that an ordinary index, or at worst an >> index-organized table, doesn't get you? > It is pretty rare to see key value stores vs relational engines > discussed without a descent into total foolishiness, but this Wikipedia > page looks like a reasonable summary: > http://en.wikipedia.org/wiki/NoSQL That doesn't do anything at all to answer my question. I don't want to debate NoSQL versus traditional RDBMS here. What I asked was: given that PG is a traditional RDBMS, what exactly are you hoping to accomplish by putting a key-value storage mechanism in it? And if you did, how would that be different from an index-organized table? regards, tom lane
On Wed, Sep 15, 2010 at 9:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hitoshi Harada <umi.tanuki@gmail.com> writes: >> 2010/9/16 Robert Haas <robertmhaas@gmail.com>: >>> Oh, key-value store, I bet. Yeah, that would be cool. > >> That's it. Like Redis, Tokyo Cabinet, or something. > > What exactly do those get you that an ordinary index, or at worst an > index-organized table, doesn't get you? Speed, hopefully. Because otherwise there's little reason to think that abandoning transactional semantics, durability, and strong typing are good ideas. We spend a lot of time focusing on people who need things to be MORE ACID (sync rep and true serializability are just two examples) and that's really great stuff but on occasion it can be useful to turn the dial in the other direction. There are an awful lot of solutions out there that are based on eventually consistent replication, in-memory databases with periodic snapshots to disk, etc. and, while I sure as hell wouldn't advocate using those solutions to target nuclear weapons, they are proving to be pretty useful for things like social networking, storing the results of giant web crawls, etc., because they reduce the amount of hardware that you need to buy to process a given volume of data. The work I'm doing on unlogged tables is a good example of how we can provide users with the flexibility they need. I just did my first test (sorry, it's not in good enough shape to post for the CF, more's the pity) and a bulk insert into an unlogged table was ~3x faster than the same thing on a regular table. There are a lot of people who can benefit from that sort of thing. For example, you can imagine that if you have a "sessions" table where you store a record for each currently-logged-in user, an unlogged table would be fine. If the database crashes and comes back up again, everyone has to log in again, but that's a rare event and not a disaster if it happens. Another avenue that I think we should explore in this regard is tables that have some sort of weakened MVCC properties. The most obvious example is a table where all tuples are visible to everyone, period. You can imagine using this as a logging table, for example., in lieu of autonomous transactions. There might be other cases that are useful, too: what about a table where only committed tuples are visible, but we ignore the effect of snapshots (i.e. if the inserting or deleting transaction is committed, then we assume that the XMIN/XMAX is visible to the current snapshot without checking)? That might seem like playing with firecrackers, but suppose the table has very low concurrency: e.g. overnight you update data in bulk, then after that's done you run daily reporting queries, then during the day users run small read-only queries interactively. You pretty much know that you're never going to have a tuple that's visible to some transactions but not others, so why pay the overhead of checking every time? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
2010/9/15 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>: > The API needs to be simple and version-independent, so that you can > write simple wrappers like the flat file wrapper easily. At the same > time, it needs to be very flexible, so that it allows safely pushing > down all kinds constructs like functions, aggregates and joins. The > planner needs to know which constructs the wrapper can handle and get > cost estimates for the foreign scans. Those two goals are at odds. The patch didn't say nothing about the API design, no? It just implement FOREIGN TABLE commands in the SQL standard, and we need the part anyway if we want to support the standard. -- Itagaki Takahiro
On 16/09/10 14:05, Robert Haas wrote: > On Wed, Sep 15, 2010 at 9:22 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > >> Hitoshi Harada<umi.tanuki@gmail.com> writes: >> >>> 2010/9/16 Robert Haas<robertmhaas@gmail.com>: >>> >>>> Oh, key-value store, I bet. Yeah, that would be cool. >>>> >> >>> That's it. Like Redis, Tokyo Cabinet, or something. >>> >> What exactly do those get you that an ordinary index, or at worst an >> index-organized table, doesn't get you? >> > > For example, you can imagine that if > you have a "sessions" table where you store a record for each > currently-logged-in user, an unlogged table would be fine. If the > database crashes and comes back up again, everyone has to log in > again, but that's a rare event and not a disaster if it happens. > > Or perhaps even a "sessions" type table where the rows are overwritten in place in some manner, to avoid bloat. regards Mark
2010/9/16 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>: > On 16/09/10 14:05, Robert Haas wrote: >> >> On Wed, Sep 15, 2010 at 9:22 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> >>> >>> Hitoshi Harada<umi.tanuki@gmail.com> writes: >>> >>>> >>>> 2010/9/16 Robert Haas<robertmhaas@gmail.com>: >>>> >>>>> >>>>> Oh, key-value store, I bet. Yeah, that would be cool. >>>>> >>> >>> >>>> >>>> That's it. Like Redis, Tokyo Cabinet, or something. >>>> >>> >>> What exactly do those get you that an ordinary index, or at worst an >>> index-organized table, doesn't get you? >>> >> >> For example, you can imagine that if >> you have a "sessions" table where you store a record for each >> currently-logged-in user, an unlogged table would be fine. If the >> database crashes and comes back up again, everyone has to log in >> again, but that's a rare event and not a disaster if it happens. >> >> > > Or perhaps even a "sessions" type table where the rows are overwritten in > place in some manner, to avoid bloat. > My answer is "variety". If an index-organized table was the one best solution, there would not been so many KVSes these days. Regards, -- Hitoshi Harada
(2010/09/16 11:20), Itagaki Takahiro wrote: > 2010/9/15 Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>: >> The API needs to be simple and version-independent, so that you can >> write simple wrappers like the flat file wrapper easily. At the same >> time, it needs to be very flexible, so that it allows safely pushing >> down all kinds constructs like functions, aggregates and joins. The >> planner needs to know which constructs the wrapper can handle and get >> cost estimates for the foreign scans. Those two goals are at odds. > > The patch didn't say nothing about the API design, no? > It just implement FOREIGN TABLE commands in the SQL standard, > and we need the part anyway if we want to support the standard. you are right, sorry. Attached includes - FDW routine interface and - FDW implementation of FDW for external PostgreSQL. This should be patched after the previous patch. This is designed to be used in the executor module. The API has some simple interfaces, such as FreeConnection, Open, Iterate, Close, ReOpen, and OnError. In this current FDW implementation for external PG, It can push-down WHERE-clause by reconstructing query that shouled be sent to the foreign server. I think this FDW routine interface is simple, but extension develpers can add intelligent features to some extent. FDW routine interface design and what this fdw-for-pg can do is summarized in: http://wiki.postgresql.org/wiki/SQL/MED#PostgreSQL Regards, SAKAMOTO Masahiko NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center e-mail: sakamoto.masahiko@oss.ntt.co.jp
Attachment
And this is sample script to play the feature of this patch. It includes: - create foreign data wrapper and foreign server(PG at localhost) - select foreign PostgreSQL table. *Important* This script initializes your database cluster(specified by $PGDATA). Run with care.... Regards, SAKAMOTO Masahiko NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center e-mail: sakamoto.masahiko@oss.ntt.co.jp
Attachment
On Thu, Sep 16, 2010 at 3:15 PM, SAKAMOTO Masahiko <sakamoto.masahiko@oss.ntt.co.jp> wrote: > This is designed to be used in the executor module. > The API has some simple interfaces, such as FreeConnection, > Open, Iterate, Close, ReOpen, and OnError. Of course they are similar APIs mentioned in the SQL standard, but I'm not sure they are enough for all purposes. For example, they don't have planner hooks, but we will need them eventually to push down ORDER BY and LIMIT into FDW. We could learn from MySQL's storage engine interface, because they are designed for flat file wrapper at first, but enhanced with pains for condition push-down. -- Itagaki Takahiro
On Wed, Sep 15, 2010 at 10:05:00PM -0400, Tom Lane wrote: > Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes: > > On 16/09/10 13:22, Tom Lane wrote: > >> What exactly do those get you that an ordinary index, or at worst an > >> index-organized table, doesn't get you? > > > It is pretty rare to see key value stores vs relational engines > > discussed without a descent into total foolishiness, but this Wikipedia > > page looks like a reasonable summary: > > http://en.wikipedia.org/wiki/NoSQL > > That doesn't do anything at all to answer my question. I don't want > to debate NoSQL versus traditional RDBMS here. What I asked was: > given that PG is a traditional RDBMS, what exactly are you hoping > to accomplish by putting a key-value storage mechanism in it? And > if you did, how would that be different from an index-organized table? One thing it would get is integration with existing infrastructure that makes up many critical apps. Many horizontal apps use things like memcached or redis to provide a distributed data layer for developing their applications. Sometimes that becomes the middle layer for an enterprise. Being able to hook into that middle layer is very handy. Shared login or session information is a good example of data that one might want put in a KVS. Also, many enterprises are full of different departments, orgs, teams, systems, etc ... KVS are simple and limited enough they might make a good choice for standardizing on how to share data in some places. Isn't this what SQL/Med is about? Garick > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
2010/9/15 SAKAMOTO Masahiko <sakamoto.masahiko@oss.ntt.co.jp>: > This is a proposal patch for SQL/MED for 9.1. > (1) foreign table DDL support (this proposal): > - support for foreign table DDL syntax (CREATE/ALTER FOREIGN TABLE) > - Definition of FDW routine interface and system catalogs for it. I checked the patch. It includes changes for DDL, system catalogs, information schema, and basic psql support. The patch itself have no useful works, but we need the parts anyway to support the SQL standard. I have a couples of comments: * There are unused types in the patch. They might be used by additional patches based on the patch, but should be removedfor now. - enum GenericOptionFlags.ForeignTableOpt - typedef struct FSConnection FSConnection; - typedef structFdwRoutine FdwRoutine; - typedef struct FdwReply FdwReply; * Needs an error check to SELECT FROM foreign table. It might be replaced to actual FDW routines soon, but the current errormessage is not ideal. postgres=# SELECT * FROM ft1; ERROR: could not open file "base/11908/16391": No suchfile or directory * Type checks between TABLE and FOREIGN TABLE are a bit inconsistent. For example, "ALTER TABLE ADD COLUMN" can add a columnto a foreign tables but "DROP TABLE" cannot remove foreign tables. IMHO, however, we can allow such looseness becauseoperations actually forbidden will end with ERRORs without problems. -- Itagaki Takahiro
Thanks for your comment. I've updated the patches. > I checked the patch. It includes changes for DDL, system catalogs, > information schema, and basic psql support. The patch itself have no useful > works, but we need the parts anyway to support the SQL standard. As you mentioned, this patch provides CREATE/ALTER/DROP FOREIGNTABLE functionality only. Please review this patch from the viewpoints: - confirming level to the SQL standard - features which should be supported on foreign table - WITH OIDS - DEFAULT - constraints (currently NOT NULL and CHECK are supported) - etc. SELECTing on foreign tables functionality will be provided by the fdw_select patch, as attached. > * There are unused types in the patch. Fixed (definitions are moved to the other patch). > * Needs an error check to SELECT FROM foreign table. Fixed. I've added relkind check in set_plain_rel_pathlist(), which aborts query with elog(ERROR) if the relation was a foreign table. This fix also checks for DELETE, UPDATE and EXPLAIN SELECT. > * Type checks between TABLE and FOREIGN TABLE are a bit inconsistent. > For example, "ALTER TABLE ADD COLUMN" can add a column to a foreign tables > but "DROP TABLE" cannot remove foreign tables. > IMHO, however, we can allow such looseness because operations actually > forbidden will end with ERRORs without problems. In current implementation, we allow some kind of operations to ALTER TABLE which can be done on views on foreign tables (i.e. ALTER TABLE RENAME TO). But as Itagaki-san says, the inconsistency will not cause serious problem. Regards, (2010/09/17 11:29), Itagaki Takahiro wrote:it. > > I checked the patch. It includes changes for DDL, system catalogs, > information schema, and basic psql support. The patch itself have no useful > works, but we need the parts anyway to support the SQL standard. > > I have a couples of comments: > > * There are unused types in the patch. They might be used by additional > patches based on the patch, but should be removed for now. > - enum GenericOptionFlags.ForeignTableOpt > - typedef struct FSConnection FSConnection; > - typedef struct FdwRoutine FdwRoutine; > - typedef struct FdwReply FdwReply; > > * Needs an error check to SELECT FROM foreign table. It might be replaced > to actual FDW routines soon, but the current error message is not ideal. > postgres=# SELECT * FROM ft1; > ERROR: could not open file "base/11908/16391": No such file or directory > > * Type checks between TABLE and FOREIGN TABLE are a bit inconsistent. > For example, "ALTER TABLE ADD COLUMN" can add a column to a foreign tables > but "DROP TABLE" cannot remove foreign tables. > IMHO, however, we can allow such looseness because operations actually > forbidden will end with ERRORs without problems. >
Attachment
On Fri, Sep 17, 2010 at 7:47 PM, SAKAMOTO Masahiko <sakamoto.masahiko@oss.ntt.co.jp> wrote: > Thanks for your comment. I've updated the patches. I reviewed fdw_table20100917.patch.gz. It adds FOREIGN TABLE in the SQL standard, including CREATE/ALTER/DROP FOREIGN TABLE, information_schema (foreign_table and foreign_table_options), and permission controls. But SELECT from foreign tables is not supported yet -- so, SELECT permission won't work in anyway. In addition to the standard, it also supports some additional operations. * COMMENT ON FOREIGN TABLE * WITH OIDS -- reasonablebecause it is just same as adding a column. * RULE and DEFAULT -- useful to define updatable foreign table. * INHERIT,NOT NULL, and CHECK -- they will be used by inter-server partitioning in the future. DEFAULT is not used by INSERT INTO foreign table (not supported), but is used by INSERT RULE. So, it's reasonable for us to have the feature. Foreign tables are read-only in the standard just like a VIEW. >> * Needs an error check to SELECT FROM foreign table. >Fixed. I've added relkind check in set_plain_rel_pathlist(), Not that if we create a foreign table inherits a normal table, we cannot SELECT FROM the normal table because planner expands a query to retrieve data from the foreign table, and the query fails with an error. ERROR: foreign table scan not implemented. But I think it's an acceptable limitation for now. I found some trivial mistakes in the patch: * src/bin/psql/describe.c : listForeignTables() + if (pset.sversion < 80500) We are in 90100 now ;-) * SGML doc for CREATE FOREIGN TABLE contains "xxx (integer)". Is it a typo? All of the Storage Parameters section will beremoved. * SGML doc for DROP FOREIGN TABLE has "a<productname>PostgreSQL</>" A separator is required between a and <productname>. For the documentation, we could rewrite some duplicated descriptions in CREATE/ALTER FOREIGN TABLE into links to CREATE/ALTER TABLE. They have many copy-and-pasted texts. Do we need to adjust them? Overall, I think the FOREIGN TABLE DDL part is almost ready to commit if we accept the proposal itself. Of course we need more discussions about FDW Routines, but it must be developed based on the infrastructure. -- Itagaki Takahiro
On Thu, Sep 23, 2010 at 10:55 PM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > Overall, I think the FOREIGN TABLE DDL part is almost ready to commit if we > accept the proposal itself. Of course we need more discussions about FDW > Routines, but it must be developed based on the infrastructure. I think we need to further discuss how this is eventually going to get integrated with the query planner and the executor before we commit anything. The syntax support by itself is quite trivial. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Fri, Sep 24, 2010 at 12:08 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I think we need to further discuss how this is eventually going to get > integrated with the query planner and the executor before we commit > anything. The syntax support by itself is quite trivial. There are no active discussions :-( I think the author tried his best, so if other developers think it's a bad design, alternate plan must be proposed. Also, if the syntax change is trivial, that's why we merge it at earlier commitfests. I saw many patch went into "No patch reviewers because of too large size" syndrome before. DDL changes are 5K lines of diff -c patch, and "select" part is additional 6K lines. -- Itagaki Takahiro
On 24/09/10 06:26, Itagaki Takahiro wrote: > On Fri, Sep 24, 2010 at 12:08 PM, Robert Haas<robertmhaas@gmail.com> wrote: >> I think we need to further discuss how this is eventually going to get >> integrated with the query planner and the executor before we commit >> anything. The syntax support by itself is quite trivial. > > There are no active discussions :-( I think the author tried his best, so if > other developers think it's a bad design, alternate plan must be proposed. It's not that the design is bad, it's that it's non-existent. I haven't seen any design on how this integrates with the planner. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Sep 24, 2010 at 6:12 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > It's not that the design is bad, it's that it's non-existent. I haven't seen > any design on how this integrates with the planner. In my understanding, the DDL part is independent from planner integration, and that's why the author extract DDL changes from the composite patch. (The proposal name is "SQL/MED(FDW) DDL".) I think there are two type of FDWs. One is a simple flat file wrapper used by COPY FROM now, that doesn't require any planner hooks. Another is a connector to an external database, like as dblink, that should be integrated with the planner. The former is almost same as SRF functions, but it will skip unneeded tuplestore caching. Even the level is useful enough because we can support "SELECT * FROM csvfile" without any intermediate tables. Could we set the first goal to the level? Of course we need more discussion for the latter case. The current proposed patch ("select" part") supports executor integration -- WHERE clause push-down, so it can use indexes in external servers. On the other hand, unsupported ORDER BY, LIMIT, and JOIN push-down require planner integration. More works will be required for fdw_select20100917.patch.gz. -- Itagaki Takahiro
On Thu, Sep 23, 2010 at 11:26 PM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > There are no active discussions :-( I think the author tried his best, so if > other developers think it's a bad design, alternate plan must be proposed. > > Also, if the syntax change is trivial, that's why we merge it at > earlier commitfests. I saw many patch went into "No patch reviewers > because of too large size" syndrome before. DDL changes are 5K > lines of diff -c patch, and "select" part is additional 6K lines. IMHO, merging a syntax that doesn't actually work leaves the tree in a broken state. It optimistically presumes that we will later commit a patch to make the syntax in question do something useful. I don't wish to presume that, even if we were well along on the design of the core functionality and especially because we are not. It's fine to break the patches into separate chunks for review, but our main tree is not a dumping ground for half-finished features. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Fri, Sep 24, 2010 at 5:56 AM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > I think there are two type of FDWs. One is a simple flat file wrapper > used by COPY FROM now, that doesn't require any planner hooks. > Another is a connector to an external database, like as dblink, that > should be integrated with the planner. This is a good point. On the one hand, I do agree that the API for simple things like processing CSV files shouldn't be overly complex. So perhaps we could start with a simple API and extend it later. On the other hand, understanding how some of the more complex cases ought to work provides insight into handling the simpler cases. So I think we should aim to have at least a sketch of a design for the whole feature, and then if in phase 1 we want to implement only the easier parts, that's OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
sorry for my slow response. (2010/09/24 21:13), Robert Haas wrote: > This is a good point. On the one hand, I do agree that the API for > simple things like processing CSV files shouldn't be overly complex. > So perhaps we could start with a simple API and extend it later. On > the other hand, understanding how some of the more complex cases ought > to work provides insight into handling the simpler cases. So I think > we should aim to have at least a sketch of a design for the whole > feature, and then if in phase 1 we want to implement only the easier > parts, that's OK. Right. In any case, I should clearify what this API could cover by this patch and what could not. # And also how far I and my collaborator can implement.. As Itagaki points out, we have two types of FDW implementations in progress: flat file wrapper and postgresql wrapper. these items are described in wiki: http://wiki.postgresql.org/wiki/SQL/MED But it may be hard to get what this fdw routines 'could' provide, so I and my collaborator is now summarizing these items to discuss and ask for help. Anyway I think these items are needed to discuss further. - API design that is used in executor for a singleexternal table. - simple FDW implements to clearfy what this API could provide. These items will be shown by my collaborator soon. thanks in advance, SAKAMOTO Masahiko NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center e-mail: sakamoto.masahiko@oss.ntt.co.jp
Hi hackers, On Mon, 27 Sep 2010 15:50:34 +0900 SAKAMOTO Masahiko <sakamoto.masahiko@oss.ntt.co.jp> wrote: > Right. In any case, I should clearify what this API could cover > by this patch and what could not. > # And also how far I and my collaborator can implement.. > > As Itagaki points out, we have two types of FDW implementations in > progress: flat file wrapper and postgresql wrapper. these items are > described in wiki: > http://wiki.postgresql.org/wiki/SQL/MED > > But it may be hard to get what this fdw routines 'could' provide, > so I and my collaborator is now summarizing these items to discuss and > ask for help. Anyway I think these items are needed to discuss further. > - API design that is used in executor for a single external table. > - simple FDW implements to clearfy what this API could provide. To discuss how the SQL/MED patches should be, we made summary of current SQL/MED proposal. * What the current SQL/MED patches provide Recent patch fdw_select20100917 consists of some parts: I/F of FDW, new executor node ForeignScan, connection caching, and FDW for PostgreSQL. FDW for flat file is not included. All of them are based on DDLs and catalogs which are provided by fdw_table20100917 patch. Itagaki has summarized the details of fdw_table20100917 patch well. Thanks for review. http://archives.postgresql.org/pgsql-hackers/2010-09/msg01653.php The I/F of FDW is defined as struct FdwRoutine, set of API functions which are called from backend modules. The APIs are designed to support only scanning a foreign at once. Path/Plan optimizations like JOIN/UNION push-down are out of scope. Such optimizations require planner hook as mentioned by Itagaki before. In short, our current goal is to implement SeqScan for foreign tables. ForeignScan node is an executor node which is like SeqScan node for local table. Optimizer generates T_ForeignScan path instead of T_SeqScan path for a foreign table in set_plain_rel_pathlist() if the RangeTblEntry was a foreign table. Index paths and tidscan paths are never generated for foreign tables. ForeignScanState and FdwReply are introduced to represent the status of a foreign scan. ForeignScanState is a subclass of ScanState, and FdwReply is a abstract type which is used to pass FDW-specific data between API calls. * Details of FDW API FDWs should implement HANDLER function which returns a pointer to a FdwRoutine instance which has pointers to actual functions. struct FdRoutine { /* * ConnectServer() will be called from ExecInitForeignScan() * if the backend has no connection which can be usedto * execute the foreign query for the foreign table. * FDW should establish a connection between foreign server * and return it with casting to pointer to FSConnection * (abstract connection type). If the FDW doesn't needany * connection, returning NULL is OK. * The arguments, server and user, can be used to extract * connectioninformation. */ FSConnection* (*ConnectServer)(ForeignServer *server, UserMapping*user); /* * FreeFSConnection() will be called when backend dies or * DISCARD ALL command was executed. * FDW shouldclose connection gracefully and free resources * if any. */ void (*FreeFSConnection)(FSConnection *conn); /* * Open() will be called from ExecInitForeignScan(). * FDW should initialize ForeignScanState, internal stateof * a foreign scan, and ready to return tuple in next Iterate() * call. * For instance, FDW for PostgreSQLonly generate SQL from * ScanState. And we implemented WHERE clause push-down here. */ void (*Open)(ForeignScanState*scanstate); /* * Iterate() will be called from ExecForeignScan() when the * executor requests next tuple. * For instance,FDW for PostgreSQL executes foreign query at * first call and stores all results into TupleStore, and * returnseach tuple for each Iterate() call. */ void (*Iterate)(ForeignScanState *scanstate); /* * Close() will be called from ExecEndForeignScan(). * FDW should free resources for FdwReply if any. */ void (*Close)(ForeignScanState *scanstate); /* * ReOpen() will be called from ExecForeignReScan() when the * foreign scan should be reseted to scan whole datafrom the * head again. * For instance, FDW for PostgreSQL frees current result set * to make next Iterate()call to execute foreign query again. */ void (*ReOpen)(ForeignScanState *scanstate); }; Maybe FdwRoutine should have more APIs to support essential features. The startup/total cost of scanning a foreign table are fixed to 10000.0 in current patch. They are groundless values, they just say that scanning foreign table costs more than scanning local table. The cost should be estimated from statistics in pg_statistic and pg_class. This function has not been implemented yet. /* * UpdateStats() will be called from ANALYZE routine when * statistics of a foreign table should be updated. * * Some more parameters may be needed. * * For instance, FDW for PostgreSQL will retrieve * pg_statistic andpg_class from foreign server and store * them in local catalog. How to represent overhead of * foreign query isstill issue. */ void (*UpdateStats)(Oid relid); To support path/plan optimization, some APIs should be added, but we think that simple FDWs can be implemented with APIs above. * Next action To focus on API discussion, we are going to simplify and re-submit patches. * Fix some mistakes pointed out by Itagaki. * Make FDW for PostgreSQL independent contrib module. Currently it sharesconnection management codes with contrib/dblink. * Add simple FDW for flat file (maybe CSV ?) into core to implementregression tests for FOREIGN TABLE DDLs and foreign table query. In the future, the FDW for flat file can be integrated with COPY FROM. Any comments/questions are welcome. Regards, -- Shigeru Hanada
On Mon, Sep 27, 2010 at 2:50 AM, SAKAMOTO Masahiko <sakamoto.masahiko@oss.ntt.co.jp> wrote: > http://wiki.postgresql.org/wiki/SQL/MED With regard to what is written here, it strikes me that it would be an extremely bad idea to try to mix reloptions or attoptions with fdwoptions. fdwoptions are options to be passed transparently to the fdw to handle as it likes; rel/attoptions affect the behavior of PG. I think the section about WHERE clause push-down is way off base. First, it seems totally wrong to assume that the same functions and operators will be defined on the remote side as you have locally; indeed, for CSV files, you won't have anything defined on the remote side at all. You need some kind of a discovery mechanism here to figure out which quals are push-downable. And it should probably be something generic, not a bunch of hard-wired rules that may or may not be correct in any particular case. What if the remote side is a competing database product that doesn't understand X = ANY(Y)? Second, even if a functions or operators does exist on both sides of the link, how do you know whether they have compatible semantics? Short of solving the entscheidungsproblem, you're not going to be able to determine that algorithmically, so you need some kind of mechanism for controlling what assumptions get made. Otherwise, you'll end up with queries that don't work and no way for the user to fix it. It seems to me that the API should allow PG to ask the FDW questions like this: - How many tuples are there on the remote side? - Here is a qual. Are you able to evaluate this qual remotely? - What are the startup and total costs of a sequential scan of the remote side with the following set of remotely executable quals? - Are there any indices available on the remote side, and if so what are there names and which columns do they index in which order (asc/desc, nulls first/last)? - What are the startup and total costs of an index scan of the remote side using the index called $NAME given the following set of remotely executable quals? and, as you mentIon: - Please update pg_statistic for this foreign table, if you have that capability. Then: - Begin a sequential scan with the following set of quals. - Begin an index scan using the index called X with the following set of quals. - Fetch next tuple. - End scan. Maybe that's too much for a first version but if we're not going to deal with the problems in a general way, then we ought to not deal with them at all, rather than having hacky rules that will work if your environment is set up in exactly the way the code expects and otherwise break horribly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Mon, Sep 27, 2010 at 2:50 AM, SAKAMOTO Masahiko <sakamoto.masahiko@oss.ntt.co.jp> wrote: > http://wiki.postgresql.org/wiki/SQL/MED With regard to what is written here, it strikes me that it would be an extremely bad idea to try to mix reloptions or attoptions with fdwoptions. fdwoptions are options to be passed transparently to the fdw to handle as it likes; rel/attoptions affect the behavior of PG. I think the section about WHERE clause push-down is way off base. First, it seems totally wrong to assume that the same functions and operators will be defined on the remote side as you have locally; indeed, for CSV files, you won't have anything defined on the remote side at all. You need some kind of a discovery mechanism here to figure out which quals are push-downable. And it should probably be something generic, not a bunch of hard-wired rules that may or may not be correct in any particular case. What if the remote side is a competing database product that doesn't understand X = ANY(Y)? Second, even if a functions or operators does exist on both sides of the link, how do you know whether they have compatible semantics? Short of solving the entscheidungsproblem, you're not going to be able to determine that algorithmically, so you need some kind of mechanism for controlling what assumptions get made. Otherwise, you'll end up with queries that don't work and no way for the user to fix it. It seems to me that the API should allow PG to ask the FDW questions like this: - How many tuples are there on the remote side? - Here is a qual. Are you able to evaluate this qual remotely? - What are the startup and total costs of a sequential scan of the remote side with the following set of remotely executable quals? - Are there any indices available on the remote side, and if so what are there names and which columns do they index in which order (asc/desc, nulls first/last)? - What are the startup and total costs of an index scan of the remote side using the index called $NAME given the following set of remotely executable quals? and, as you mentIon: - Please update pg_statistic for this foreign table, if you have that capability. Then: - Begin a sequential scan with the following set of quals. - Begin an index scan using the index called X with the following set of quals. - Fetch next tuple. - End scan. Maybe that's too much for a first version but if we're not going to deal with the problems in a general way, then we ought to not deal with them at all, rather than having hacky rules that will work if your environment is set up in exactly the way the code expects and otherwise break horribly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 09/28/10 17:26, Robert Haas wrote: > First, it seems totally wrong to assume that the same functions and > operators will be defined on the remote side as you have locally; > indeed, for CSV files, you won't have anything defined on the remote > side at all. You need some kind of a discovery mechanism here to > figure out which quals are push-downable. And it should probably be > something generic, not a bunch of hard-wired rules that may or may not > be correct in any particular case. What if the remote side is a > competing database product that doesn't understand X = ANY(Y)? > Second, even if a functions or operators does exist on both sides of > the link, how do you know whether they have compatible semantics? Or side-effects. The SQL/MED specification has "routine mappings" for this purpose. We will need that or something similar. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 28 Sep 2010 10:26:42 -0400 Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Sep 27, 2010 at 2:50 AM, SAKAMOTO Masahiko > <sakamoto.masahiko@oss.ntt.co.jp> wrote: > > ?http://wiki.postgresql.org/wiki/SQL/MED > With regard to what is written here, it strikes me that it would be an > extremely bad idea to try to mix reloptions or attoptions with > fdwoptions. fdwoptions are options to be passed transparently to the > fdw to handle as it likes; rel/attoptions affect the behavior of PG. In current patch, fdwoptions for relations have been separated from reloptins by introducing pg_foreign_table catalog. As mentioned in wiki, integration into rel/attoptions is nothing but an idea, we're willing to add pg_foreign_attribute catalog which has columns: farelid oid not null faattnum smallint not null faoptions text[] not null Though this catalog has only fdwoptions as non-key value. Or, adding attrfdwoptions column to pg_attribute catalog is better? > I think the section about WHERE clause push-down is way off base. > First, it seems totally wrong to assume that the same functions and > operators will be defined on the remote side as you have locally; > indeed, for CSV files, you won't have anything defined on the remote > side at all. You need some kind of a discovery mechanism here to > figure out which quals are push-downable. And it should probably be > something generic, not a bunch of hard-wired rules that may or may not > be correct in any particular case. What if the remote side is a > competing database product that doesn't understand X = ANY(Y)? > Second, even if a functions or operators does exist on both sides of > the link, how do you know whether they have compatible semantics? > Short of solving the entscheidungsproblem, you're not going to be able > to determine that algorithmically, so you need some kind of mechanism > for controlling what assumptions get made. Otherwise, you'll end up > with queries that don't work and no way for the user to fix it. First of all, WHERE clause push-down ideas written in wiki are just for FDW for PostgreSQL, implicitly same version, so we have assumed that the remote side has same syntax/semantics. WHERE clause push-down is implemented in postgresql_fdw, and optimizer/planner are not . The postgresql_fdw pushes down WHERE clause with following steps. * scans all quals in the PlanState and pickup remote-ablequals * for each remote-able quals, * adds WHERE clause to remote SQL statement * remove the qual from PlanStatenode to avoid duplicate evaluation at upper layer ExecScan() If the foreign data wrapper didn't support WHERE clause push-down, such as CSV-wrapper, the wrapper can retrieve all tuples from remote side and pass them to upper layer, and ExecScan() will filter the tuples based on the quals in the PlanState. By the way, in current implementation, all operators/functions in SELECT clause will be evaluated on the local side always, so FDWs should provide only plain column values. > It seems to me that the API should allow PG to ask the FDW questions like this: > > - How many tuples are there on the remote side? > - Here is a qual. Are you able to evaluate this qual remotely? > - What are the startup and total costs of a sequential scan of the > remote side with the following set of remotely executable quals? > - Are there any indices available on the remote side, and if so what > are there names and which columns do they index in which order > (asc/desc, nulls first/last)? > - What are the startup and total costs of an index scan of the remote > side using the index called $NAME given the following set of remotely > executable quals? > > and, as you mentIon: > > - Please update pg_statistic for this foreign table, if you have that > capability. > > Then: > > - Begin a sequential scan with the following set of quals. > - Begin an index scan using the index called X with the following set of quals. > - Fetch next tuple. > - End scan. > > Maybe that's too much for a first version but if we're not going to > deal with the problems in a general way, then we ought to not deal > with them at all, rather than having hacky rules that will work if > your environment is set up in exactly the way the code expects and > otherwise break horribly. Using remote indexes might be very effective, but I think there are many issues. For instance, how can planner/optimizer know whether the foreign table has been indexed or not? Checking remote catalogs for each scan must be a bad idea. HiRDB, Hitachi's dbms product, seems to have introduced FOREIGN INDEX for that purpose. I'll consider about cost estimation and path/plan generation again, and post later. Regards, Shigeru Hanada -- Shigeru Hanada
On Tue, 28 Sep 2010 22:14:22 +0300 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 09/28/10 17:26, Robert Haas wrote: > > First, it seems totally wrong to assume that the same functions and > > operators will be defined on the remote side as you have locally; > > indeed, for CSV files, you won't have anything defined on the remote > > side at all. You need some kind of a discovery mechanism here to > > figure out which quals are push-downable. And it should probably be > > something generic, not a bunch of hard-wired rules that may or may not > > be correct in any particular case. What if the remote side is a > > competing database product that doesn't understand X = ANY(Y)? > > Second, even if a functions or operators does exist on both sides of > > the link, how do you know whether they have compatible semantics? > > Or side-effects. > > The SQL/MED specification has "routine mappings" for this purpose. We > will need that or something similar. Yes, I see the problem. To resolve function-mismatch issues, we need routine mapping mechanism. With that, FDW can translate function calls into remote expressions along rules which are defined by user. SQL/MED standard requires a routine mapping to map a pair of local routine and server to a set of generic options. An local routine is identified by type, name and parameters. The type is one of ROUTINE/FUNCTION/PROCEDURE/METHOD. I think supporting only FUNCTION is enough because PostgreSQL doesn't have ROUTINE/PROCEDURE/METHOD. I think that the pg_routine_mapping catalog should have columns/indexes: rmname name not null rmprocid oid not null rmserverid oid not null rmoptions text[] pg_routine_mapping_oid_index UNIQUE, btree(oid) pg_routine_mapping_name_index UNIQUE, btree(rmname) pg_routine_mapping_proc_server_indexUNIQUE, btree(rmprocid, rmserverid) To use a remote function which PostgreSQL doesn't have in foreign query, we need a local dummy function which is mapped to remote function. The local function may be empty but should have appropriate parameters. Supported features: * Create routine mapping (USAGE on the server required) * Alter routine mappging's generic options (mustbe owner of the routine mappping) * Drop routine mapping (must be owner of the routine mapping) * Query routine mappinginformations via information_schema views routine_mappings and routine_mapping_options (no restriction) FDWs can define any specification of generic options. Maybe FDW for PostgreSQL needs no entry if no special functions was defined. Routine mappings can be used to absorb the difference of the versions. Any comments are welcome. Regards, -- Shigeru Hanada
Excerpts from Robert Haas's message of mar sep 28 10:26:54 -0400 2010: > Then: > > - Begin a sequential scan with the following set of quals. > - Begin an index scan using the index called X with the following set of quals. > - Fetch next tuple. > - End scan. I'm not sure that it's a good idea to embed into the FDW API the set of operations known to the executor. For example your proposal fails to consider bitmap scans. Seems simpler and more general to hand the quals over saying "I need to scan this relation with these quals", and have it return an opaque iterable object; the remote executor would be in charge of determining their usage for indexscans; or even for filtering tuples with quals that cannot be part of the index condition. There doesn't to be much point in knowing the names of remote indexes either (if it came to referencing them, better use OIDs) -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, Sep 29, 2010 at 11:09 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > I'm not sure that it's a good idea to embed into the FDW API the set of > operations known to the executor. For example your proposal fails to > consider bitmap scans. Seems simpler and more general to hand the quals > over saying "I need to scan this relation with these quals", and have it > return an opaque iterable object; Agreed. If possible, we will avoid dedicated interfaces for seqscans and index scans. However, bitmap scan is difficult anyway because foreign tables might not have ctid columns. It's a challenging task to identify individual tuples in foreign tables. It will be also used for UPDATE and DELETE. > There doesn't to be much point in knowing the names of remote indexes > either (if it came to referencing them, better use OIDs) FYI, HiRDB, that implements FDW routines, has CREATE FOREIGN INDEX. I think it is a little ugly and won't work in some cases -- for example, index organized tables -- but evidently it's a realistic solution. -- Itagaki Takahiro
On 30.09.2010 04:30, Itagaki Takahiro wrote: > On Wed, Sep 29, 2010 at 11:09 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: >> I'm not sure that it's a good idea to embed into the FDW API the set of >> operations known to the executor. For example your proposal fails to >> consider bitmap scans. Seems simpler and more general to hand the quals >> over saying "I need to scan this relation with these quals", and have it >> return an opaque iterable object; > > Agreed. If possible, we will avoid dedicated interfaces for seqscans and > index scans. However, bitmap scan is difficult anyway because foreign tables > might not have ctid columns. It's a challenging task to identify individual > tuples in foreign tables. It will be also used for UPDATE and DELETE. > >> There doesn't to be much point in knowing the names of remote indexes >> either (if it came to referencing them, better use OIDs) > > FYI, HiRDB, that implements FDW routines, has CREATE FOREIGN INDEX. > I think it is a little ugly and won't work in some cases -- for example, > index organized tables -- but evidently it's a realistic solution. A long time ago I used DB2's federated database feature, which is at least close to SQL/MED if not fully compatible. When you create a "federated index" there. it's just a planner hint to the local database, so that it knows how expensive it is to evaluate a qual remotely vs. locally. It shouldn't matter what technology the remote index uses in that case, as long as the cost model is roughly the same as a b-tree. I don't think we want to go down that path though, it's better to leave the cost estimation altogether to the wrapper. It has much better knowledge of expensive various quals are. However, the wrapper will likely need some local storage for indexes and like to do the cost estimation. Or maybe it can just keep the information in cache, loading it on first use from the remote database. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 30 Sep 2010 09:26:54 +0300 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > > FYI, HiRDB, that implements FDW routines, has CREATE FOREIGN INDEX. > > I think it is a little ugly and won't work in some cases -- for example, > > index organized tables -- but evidently it's a realistic solution. > > A long time ago I used DB2's federated database feature, which is at > least close to SQL/MED if not fully compatible. When you create a > "federated index" there. it's just a planner hint to the local database, > so that it knows how expensive it is to evaluate a qual remotely vs. > locally. It shouldn't matter what technology the remote index uses in > that case, as long as the cost model is roughly the same as a b-tree. > > I don't think we want to go down that path though, it's better to leave > the cost estimation altogether to the wrapper. It has much better > knowledge of expensive various quals are. > > However, the wrapper will likely need some local storage for indexes and > like to do the cost estimation. Or maybe it can just keep the > information in cache, loading it on first use from the remote database. How about having cost hints in generic option of the foreign table or its columns? Generic options are storage for wrappers, not for PostgreSQL core modules. Wrappers can use their own format to represent various information, and use the hints to estimate costs of a path. In addition, I think that the generic option of a server could be used to store cost hints which depend on each server, such as network transfer overhead for dbms wrappers, or disk I/O for file-wrappers. Regards, -- Shigeru Hanada
On Sep 29, 2010, at 10:09 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from Robert Haas's message of mar sep 28 10:26:54 -0400 2010: > >> Then: >> >> - Begin a sequential scan with the following set of quals. >> - Begin an index scan using the index called X with the following set of quals. >> - Fetch next tuple. >> - End scan. > > I'm not sure that it's a good idea to embed into the FDW API the set of > operations known to the executor. For example your proposal fails to > consider bitmap scans. Seems simpler and more general to hand the quals > over saying "I need to scan this relation with these quals", and have it > return an opaque iterable object; the remote executor would be in charge > of determining their usage for indexscans; or even for filtering tuples > with quals that cannot be part of the index condition. Yeah, that might be better. Is it reasonable to assume we always want to push down as much as possible, or do we need tothink about local work vs. remote work trade-offs? > There doesn't to be much point in knowing the names of remote indexes > either (if it came to referencing them, better use OIDs) Well, you can't assume the remote side is PG. ...Robert
On Thu, Sep 30, 2010 at 10:29:56PM -0400, Robert Haas wrote: > On Sep 29, 2010, at 10:09 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > > Excerpts from Robert Haas's message of mar sep 28 10:26:54 -0400 2010: > > > >> Then: > >> > >> - Begin a sequential scan with the following set of quals. > >> - Begin an index scan using the index called X with the following set of quals. > >> - Fetch next tuple. > >> - End scan. > > > > I'm not sure that it's a good idea to embed into the FDW API the > > set of operations known to the executor. For example your > > proposal fails to consider bitmap scans. Seems simpler and more > > general to hand the quals over saying "I need to scan this > > relation with these quals", and have it return an opaque iterable > > object; the remote executor would be in charge of determining > > their usage for indexscans; or even for filtering tuples with > > quals that cannot be part of the index condition. > > Yeah, that might be better. Is it reasonable to assume we always > want to push down as much as possible, or do we need to think about > local work vs. remote work trade-offs? In cases where the foreign side is not super reliable for correctness, I'd say there's a good case for not trusting it. Some of the Oracle properties are like this. I suppose we might want to make "trust the other side to handle pushed quals" optional somehow, but I'm not exactly sure how. > > There doesn't to be much point in knowing the names of remote > > indexes either (if it came to referencing them, better use OIDs) > > Well, you can't assume the remote side is PG. Very definitely not. The main point of the feature is that it allows for *heterogeneous* data stores. Of course this doesn't prevent PostgreSQL from optimizing aggressively using secret knowledge should the "foreign" side happen to be PostgreSQL. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > On Thu, Sep 30, 2010 at 10:29:56PM -0400, Robert Haas wrote: >> Yeah, that might be better. Is it reasonable to assume we always >> want to push down as much as possible, or do we need to think about >> local work vs. remote work trade-offs? > In cases where the foreign side is not super reliable for correctness, > I'd say there's a good case for not trusting it. Some of the Oracle > properties are like this. I suppose we might want to make "trust the > other side to handle pushed quals" optional somehow, but I'm not > exactly sure how. ISTM that such decisions would be embedded in the FDW --- it's unlikely that the FDW for Oracle would look even a little bit like the one for Postgres anyway, so contemplating ways to parameterize them seems a rather useless exercise. What we need is an API that lets the FDW decide which quals it will take responsibility for enforcing. What happens beyond that point is not ours to decide. regards, tom lane
On Fri, Oct 01, 2010 at 10:54:47AM -0400, Tom Lane wrote: > David Fetter <david@fetter.org> writes: > > On Thu, Sep 30, 2010 at 10:29:56PM -0400, Robert Haas wrote: > >> Yeah, that might be better. Is it reasonable to assume we always > >> want to push down as much as possible, or do we need to think about > >> local work vs. remote work trade-offs? > > > In cases where the foreign side is not super reliable for correctness, > > I'd say there's a good case for not trusting it. Some of the Oracle > > properties are like this. I suppose we might want to make "trust the > > other side to handle pushed quals" optional somehow, but I'm not > > exactly sure how. > > ISTM that such decisions would be embedded in the FDW s/embedded in/& and overrides communicable to/ In an upcoming version of DBI-Link, I've got some code that allows transformations between types on PostgreSQL and foreign sources, for now when they have the same name. The transformations are implemented on the PostgreSQL side, although I a plan for (optionally) pushing them to the foreign side. There's an input transformation and an output transformation, both of which default to the identity transformation, i.e. "do nothing." These transformations are over-ridable at the following levels, with the finer-grained one taking precedence over the coarser should there be more than one user-supplied transformation: - database type (Oracle, CSV, etc.) - catalog (Database cluster in PostgreSQL parlance, i.e. everything served on a single host/port combination. DB2 actuallyimplements this under that name.) - database - schema - table - column > --- it's unlikely that the FDW for Oracle would look even a little > bit like the one for Postgres anyway, so contemplating ways to > parameterize them seems a rather useless exercise. See above for a parameterization :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, Sep 29, 2010 at 3:56 AM, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > In current patch, fdwoptions for relations have been separated from > reloptins by introducing pg_foreign_table catalog. As mentioned in > wiki, integration into rel/attoptions is nothing but an idea, we're > willing to add pg_foreign_attribute catalog which has columns: > > farelid oid not null > faattnum smallint not null > faoptions text[] not null > > Though this catalog has only fdwoptions as non-key value. > Or, adding attrfdwoptions column to pg_attribute catalog is better? I could go either way. One possibility to isolate all this information in a separate catalog, similar to what we recently did with security labels. That approach scales to dealing with many object types (relations, attributes, functions, etc.). That might be overkill here, though: perhaps we should just add a column to each of pg_class and pg_attribute. I could go either way on this point. >> I think the section about WHERE clause push-down is way off base. >> First, it seems totally wrong to assume that the same functions and >> operators will be defined on the remote side as you have locally; >> indeed, for CSV files, you won't have anything defined on the remote >> side at all. You need some kind of a discovery mechanism here to >> figure out which quals are push-downable. And it should probably be >> something generic, not a bunch of hard-wired rules that may or may not >> be correct in any particular case. What if the remote side is a >> competing database product that doesn't understand X = ANY(Y)? >> Second, even if a functions or operators does exist on both sides of >> the link, how do you know whether they have compatible semantics? >> Short of solving the entscheidungsproblem, you're not going to be able >> to determine that algorithmically, so you need some kind of mechanism >> for controlling what assumptions get made. Otherwise, you'll end up >> with queries that don't work and no way for the user to fix it. > > First of all, WHERE clause push-down ideas written in wiki are just > for FDW for PostgreSQL, implicitly same version, so we have assumed > that the remote side has same syntax/semantics. WHERE clause > push-down is implemented in postgresql_fdw, and optimizer/planner are > not . Well, that's not the way it looks like from reading the section entitled "FDW routines". What I think we're looking for here is a general API... > Using remote indexes might be very effective, but I think there are > many issues. > > For instance, how can planner/optimizer know whether the foreign table > has been indexed or not? Checking remote catalogs for each scan must > be a bad idea. HiRDB, Hitachi's dbms product, seems to have > introduced FOREIGN INDEX for that purpose. Yeah, it's definitely not an easy problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Thu, Sep 30, 2010 at 3:48 AM, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > On Thu, 30 Sep 2010 09:26:54 +0300 > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> > FYI, HiRDB, that implements FDW routines, has CREATE FOREIGN INDEX. >> > I think it is a little ugly and won't work in some cases -- for example, >> > index organized tables -- but evidently it's a realistic solution. >> >> A long time ago I used DB2's federated database feature, which is at >> least close to SQL/MED if not fully compatible. When you create a >> "federated index" there. it's just a planner hint to the local database, >> so that it knows how expensive it is to evaluate a qual remotely vs. >> locally. It shouldn't matter what technology the remote index uses in >> that case, as long as the cost model is roughly the same as a b-tree. >> >> I don't think we want to go down that path though, it's better to leave >> the cost estimation altogether to the wrapper. It has much better >> knowledge of expensive various quals are. >> >> However, the wrapper will likely need some local storage for indexes and >> like to do the cost estimation. Or maybe it can just keep the >> information in cache, loading it on first use from the remote database. > How about having cost hints in generic option of the foreign table or > its columns? Generic options are storage for wrappers, not for > PostgreSQL core modules. Wrappers can use their own format to > represent various information, and use the hints to estimate costs of > a path. I do think we're going to need some kind of local caching of relevant information from the foreign side. Really, I doubt that fdwoptions are the place for that, though: that's data for the user to input, not a place for the wrapper to scribble on internally. The trick is that there's a lot of stuff you might want to cache, and we don't really know anything about what the format of it is - for example, you might have foreign-side statistics that need to get cached locally, but they needn't be in the same format we use for pg_statistic. Perhaps part of setting up an FDW should be creating tables with prespecified definitions and passing the table names to CREATE FOREIGN DATA WRAPPER as options. Maybe we could even arrange to set up the dependencies appropriately... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Mon, 4 Oct 2010 19:31:52 -0400 Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Sep 30, 2010 at 3:48 AM, Shigeru HANADA > <hanada@metrosystems.co.jp> wrote: > > How about having cost hints in generic option of the foreign table or > > its columns? ?Generic options are storage for wrappers, not for > > PostgreSQL core modules. ?Wrappers can use their own format to > > represent various information, and use the hints to estimate costs of > > a path. > > I do think we're going to need some kind of local caching of relevant > information from the foreign side. Really, I doubt that fdwoptions > are the place for that, though: that's data for the user to input, not > a place for the wrapper to scribble on internally. The trick is that > there's a lot of stuff you might want to cache, and we don't really > know anything about what the format of it is - for example, you might > have foreign-side statistics that need to get cached locally, but they > needn't be in the same format we use for pg_statistic. Perhaps part > of setting up an FDW should be creating tables with prespecified > definitions and passing the table names to CREATE FOREIGN DATA WRAPPER > as options. Maybe we could even arrange to set up the dependencies > appropriately... Agreed. I withdraw the idea to store foreign-side statistics into generic options. Can we treat statistics of a foreign table separately? 1. Same as local tables (maybe required) (pg_statistic.*, pg_class.reltuples/relpages) They will be used by planner/optimizer to estimate basic costs based on tuple selectivity, result row count and so on. Such statistics could be generated by ANALYZE module if the FDW can supply all tuples from foreign side. The basic costs should be able to correct by FDW via another API, because foreign queries might have some overheads, such as connection and transfer. ISTM that it is very difficult for non-PG FDW to generate PG-style statistics correctly. 2. depend on FDW (optional) (in various, arbitrary format) They will be used by FDW to optimize query to be executed on foreign-side in their own way. As you say, new table(s) to store such statistics can be created during creation of new FOREIGN DATA WRAPPER or installation of new fdw_handler module. Maybe ANALYZE should call another API which collect these kind of statistics. I think that(1) is necessary in the first step, but (2) is not. Regards, -- Shigeru Hanada
Shigeru HANADA <hanada@metrosystems.co.jp> writes: > Can we treat statistics of a foreign table separately? > 1. Same as local tables (maybe required) > (pg_statistic.*, pg_class.reltuples/relpages) This whole discussion seems to me to be about trying to do things outside the FDW that should properly be left inside the FDW. Who's to say that the remote side even *has* statistics of the sort that PG creates? We should provide an API that lets the FDW return a cost estimate for a proposed access path. Where it gets the cost estimate from is not something that should be presupposed. regards, tom lane
On Tue, Oct 5, 2010 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Shigeru HANADA <hanada@metrosystems.co.jp> writes: >> Can we treat statistics of a foreign table separately? > >> 1. Same as local tables (maybe required) >> (pg_statistic.*, pg_class.reltuples/relpages) > > This whole discussion seems to me to be about trying to do things outside > the FDW that should properly be left inside the FDW. Who's to say that > the remote side even *has* statistics of the sort that PG creates? > > We should provide an API that lets the FDW return a cost estimate for a > proposed access path. Where it gets the cost estimate from is not > something that should be presupposed. Unless there's some way for the FDW to have local tables for caching its statistics, the chances of this having decent performance seem to be near-zero. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Tue, Oct 5, 2010 at 8:10 AM, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > On Mon, 4 Oct 2010 19:31:52 -0400 > Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Sep 30, 2010 at 3:48 AM, Shigeru HANADA >> <hanada@metrosystems.co.jp> wrote: >> > How about having cost hints in generic option of the foreign table or >> > its columns? ?Generic options are storage for wrappers, not for >> > PostgreSQL core modules. ?Wrappers can use their own format to >> > represent various information, and use the hints to estimate costs of >> > a path. >> >> I do think we're going to need some kind of local caching of relevant >> information from the foreign side. Really, I doubt that fdwoptions >> are the place for that, though: that's data for the user to input, not >> a place for the wrapper to scribble on internally. The trick is that >> there's a lot of stuff you might want to cache, and we don't really >> know anything about what the format of it is - for example, you might >> have foreign-side statistics that need to get cached locally, but they >> needn't be in the same format we use for pg_statistic. Perhaps part >> of setting up an FDW should be creating tables with prespecified >> definitions and passing the table names to CREATE FOREIGN DATA WRAPPER >> as options. Maybe we could even arrange to set up the dependencies >> appropriately... > Agreed. I withdraw the idea to store foreign-side statistics into > generic options. > > Can we treat statistics of a foreign table separately? > > 1. Same as local tables (maybe required) > (pg_statistic.*, pg_class.reltuples/relpages) > They will be used by planner/optimizer to estimate basic costs based > on tuple selectivity, result row count and so on. Such statistics > could be generated by ANALYZE module if the FDW can supply all tuples > from foreign side. The basic costs should be able to correct by FDW > via another API, because foreign queries might have some overheads, > such as connection and transfer. > ISTM that it is very difficult for non-PG FDW to generate PG-style > statistics correctly. > > 2. depend on FDW (optional) > (in various, arbitrary format) > They will be used by FDW to optimize query to be executed on > foreign-side in their own way. As you say, new table(s) to store such > statistics can be created during creation of new FOREIGN DATA WRAPPER > or installation of new fdw_handler module. Maybe ANALYZE should call > another API which collect these kind of statistics. > > I think that(1) is necessary in the first step, but (2) is not. I disagree. I wouldn't bother doing either one of them unless you can do both. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Oct 5, 2010 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> This whole discussion seems to me to be about trying to do things outside >> the FDW that should properly be left inside the FDW. �Who's to say that >> the remote side even *has* statistics of the sort that PG creates? >> >> We should provide an API that lets the FDW return a cost estimate for a >> proposed access path. �Where it gets the cost estimate from is not >> something that should be presupposed. > Unless there's some way for the FDW to have local tables for caching > its statistics, the chances of this having decent performance seem to > be near-zero. Perhaps, but that would be the FDW's problem to implement. Trying to design such tables in advance of actually writing an FDW seems like a completely backwards design process. (I'd also say that your performance estimate is miles in advance of any facts; but even if it's true, the caching ought to be inside the FDW, because we have no clear idea of what it will need to cache.) regards, tom lane
On Tue, Oct 5, 2010 at 10:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Oct 5, 2010 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> This whole discussion seems to me to be about trying to do things outside >>> the FDW that should properly be left inside the FDW. Who's to say that >>> the remote side even *has* statistics of the sort that PG creates? >>> >>> We should provide an API that lets the FDW return a cost estimate for a >>> proposed access path. Where it gets the cost estimate from is not >>> something that should be presupposed. > >> Unless there's some way for the FDW to have local tables for caching >> its statistics, the chances of this having decent performance seem to >> be near-zero. > > Perhaps, but that would be the FDW's problem to implement. Trying to > design such tables in advance of actually writing an FDW seems like a > completely backwards design process. Oh, I agree. I don't want to dictate the structure of those tables; I just think it's inevitable that an FDW is going to need the ability to be bound to some local tables which the admin should set up before installing it. That is, we need a general capability, not a specific set of tables. > (I'd also say that your performance estimate is miles in advance of any > facts; but even if it's true, the caching ought to be inside the FDW, > because we have no clear idea of what it will need to cache.) I can't imagine how an FDW could possibly be expected to perform well without some persistent local data storage. Even assume the remote end is PG. To return a cost, it's going to need the contents of pg_statistic cached locally, for each remote table. Do you really think it's going to work to incur that overhead once per table per backend startup? Or else every time we try to plan against a foreign table we can fire off an SQL query to the remote side instead of trying to compute the cost locally. That's got to be two orders of magnitude slower than planning based off local stats. We could punt the issue of stats altogether for the first version and simply say, hey, this is only intended for things like reading from CSV files. But if we're going to have it at all then I can't see how we're going to get by without persistent local storage. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Oct 5, 2010 at 10:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> (I'd also say that your performance estimate is miles in advance of any >> facts; but even if it's true, the caching ought to be inside the FDW, >> because we have no clear idea of what it will need to cache.) > I can't imagine how an FDW could possibly be expected to perform well > without some persistent local data storage. Even assume the remote > end is PG. To return a cost, it's going to need the contents of > pg_statistic cached locally, for each remote table. Or it could ask the remote side. > Do you really > think it's going to work to incur that overhead once per table per > backend startup? If you have a cache, how are you going to manage updates of it? IMO this is a *clear* case of premature optimization being the root of all evil. We should get it working first and then see what needs to be optimized by measuring, rather than guessing in a vacuum. (BTW, if the remote end is PG I would hardly think that we'd send SQL queries at all. If we're interested in micro-optimization, we'd devise some lower-level protocol.) regards, tom lane
On 05.10.2010 17:56, Robert Haas wrote: > On Tue, Oct 5, 2010 at 10:41 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> (I'd also say that your performance estimate is miles in advance of any >> facts; but even if it's true, the caching ought to be inside the FDW, >> because we have no clear idea of what it will need to cache.) > > I can't imagine how an FDW could possibly be expected to perform well > without some persistent local data storage. Even assume the remote > end is PG. To return a cost, it's going to need the contents of > pg_statistic cached locally, for each remote table. Do you really > think it's going to work to incur that overhead once per table per > backend startup? It doesn't seem completely out of the question to me. Sure, it's expensive, but it's only incurred the first time a remote table is accessed in a session. Local persistent storage would be nice, but a lot of applications might prefer to not use it anyway, to ensure that fresh statistics are used. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 05.10.2010 17:56, Robert Haas wrote: >> I can't imagine how an FDW could possibly be expected to perform well >> without some persistent local data storage. > It doesn't seem completely out of the question to me. One other point that could be made here is that it's not going to surprise anybody that access to a remote table is really slow compared to access to a local table. Why would it surprise anybody if planning for a remote table is really slow compared to planning for a local table? Where is the evidence that anyone would even *notice* the extra planning time, compared to the execution time of the finished query? This is still all about premature optimization. regards, tom lane
On Tue, Oct 5, 2010 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Oct 5, 2010 at 10:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> (I'd also say that your performance estimate is miles in advance of any >>> facts; but even if it's true, the caching ought to be inside the FDW, >>> because we have no clear idea of what it will need to cache.) > >> I can't imagine how an FDW could possibly be expected to perform well >> without some persistent local data storage. Even assume the remote >> end is PG. To return a cost, it's going to need the contents of >> pg_statistic cached locally, for each remote table. > > Or it could ask the remote side. FWIW, I mentioned that option in that part you didn't quote. >> Do you really >> think it's going to work to incur that overhead once per table per >> backend startup? > > If you have a cache, how are you going to manage updates of it? I'm not. I'm going to let the FDW determine how often it would like to refresh its cache, as well as what it would like to cache and in what format it would like to cache it. > IMO this is a *clear* case of premature optimization being the root of > all evil. We should get it working first and then see what needs to be > optimized by measuring, rather than guessing in a vacuum. I have no problem with punting the issue of remote statistics to some time in the future. But I don't think we should have a half-baked implementation of remote statistics. We should either do it right (doing such testing as is necessary to establish what that means) or not do it at all. Frankly, if we could get from where we are today to a workable implementation of this technology for CSV files in time for 9.1, I think that would be an impressive accomplishment. Making it work for more complicated cases is almost certainly material for 9.2, 9.3, 9.4, and maybe further out than that. > (BTW, if the remote end is PG I would hardly think that we'd send SQL > queries at all. If we're interested in micro-optimization, we'd devise > some lower-level protocol.) Interesting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Oct 5, 2010 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> IMO this is a *clear* case of premature optimization being the root of >> all evil. �We should get it working first and then see what needs to be >> optimized by measuring, rather than guessing in a vacuum. > I have no problem with punting the issue of remote statistics to some > time in the future. But I don't think we should have a half-baked > implementation of remote statistics. We should either do it right > (doing such testing as is necessary to establish what that means) or > not do it at all. Frankly, if we could get from where we are today to > a workable implementation of this technology for CSV files in time for > 9.1, I think that would be an impressive accomplishment. Making it > work for more complicated cases is almost certainly material for 9.2, > 9.3, 9.4, and maybe further out than that. I quite agree that it's going to take multiple release cycles to have a really impressive version of SQL/MED. What I'm saying is that caching remote statistics is several versions out in my view of the world, and designing support for it shouldn't be happening now. Maybe we ought to take a step back and discuss what the development plan ought to be, before we spend more time on details like this. My idea of a good development process would involve working in parallel on at least two FDW adapters, so that we don't find we've skewed the API design to meet the needs of just one adapter. Probably a remote-PG adapter and a local-CSV-file adapter would be a good combination. I don't have any clear idea of how soon we might expect to see how much functionality, though. Thoughts? regards, tom lane
On Tue, Oct 5, 2010 at 12:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Oct 5, 2010 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> IMO this is a *clear* case of premature optimization being the root of >>> all evil. We should get it working first and then see what needs to be >>> optimized by measuring, rather than guessing in a vacuum. > >> I have no problem with punting the issue of remote statistics to some >> time in the future. But I don't think we should have a half-baked >> implementation of remote statistics. We should either do it right >> (doing such testing as is necessary to establish what that means) or >> not do it at all. Frankly, if we could get from where we are today to >> a workable implementation of this technology for CSV files in time for >> 9.1, I think that would be an impressive accomplishment. Making it >> work for more complicated cases is almost certainly material for 9.2, >> 9.3, 9.4, and maybe further out than that. > > I quite agree that it's going to take multiple release cycles to have > a really impressive version of SQL/MED. What I'm saying is that caching > remote statistics is several versions out in my view of the world, and > designing support for it shouldn't be happening now. Fair enough. > Maybe we ought to take a step back and discuss what the development plan > ought to be, before we spend more time on details like this. Good idea. > My idea of > a good development process would involve working in parallel on at least > two FDW adapters, so that we don't find we've skewed the API design to > meet the needs of just one adapter. Probably a remote-PG adapter and a > local-CSV-file adapter would be a good combination. I don't have any > clear idea of how soon we might expect to see how much functionality, > though. Thoughts? I'm somewhat afraid that a remote-PG adapter will turn into a can of worms. If we give up on remote statistics, does that mean we're also giving up on index use on the remote side? I fear that we'll end up crafting partial solutions that will only end up getting thrown away, but after a lot of work has been invested in them. I wonder if we should focus on first efforts on really simple cases like CSV files (as you mentioned) and perhaps something like memcached, which has different properties than a CSV file, but extremely simple ones. I think it's inevitable that the API is going to get more complicated from release to release and probably not in backward-compatible ways; I think it's too early to be worried about that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > I'm somewhat afraid that a remote-PG adapter will turn into a can of > worms. If we give up on remote statistics, does that mean we're also > giving up on index use on the remote side? I fear that we'll end up > crafting partial solutions that will only end up getting thrown away, > but after a lot of work has been invested in them. Per Fred Brooks: "Plan to throw one away. You will anyhow." > I wonder if we > should focus on first efforts on really simple cases like CSV files > (as you mentioned) and perhaps something like memcached, which has > different properties than a CSV file, but extremely simple ones. I > think it's inevitable that the API is going to get more complicated > from release to release and probably not in backward-compatible ways; > I think it's too early to be worried about that. +1 ... we can *not* assume that we know enough to get the API right the first time. It'll have to be stated up front that it's subject to change and you probably shouldn't be trying to develop FDWs separately from the main project for awhile yet. I think that looking only at CSV might be too simple, though. It'd be good to be at least trying to build a remote-PG adapter, even with the knowledge that it's all throwaway code. regards, tom lane
On Tue, 05 Oct 2010 14:59:27 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I wonder if we > > should focus on first efforts on really simple cases like CSV files > > (as you mentioned) and perhaps something like memcached, which has > > different properties than a CSV file, but extremely simple ones. I > > think it's inevitable that the API is going to get more complicated > > from release to release and probably not in backward-compatible ways; > > I think it's too early to be worried about that. > > +1 ... we can *not* assume that we know enough to get the API right > the first time. It'll have to be stated up front that it's subject to > change and you probably shouldn't be trying to develop FDWs separately > from the main project for awhile yet. > > I think that looking only at CSV might be too simple, though. It'd be > good to be at least trying to build a remote-PG adapter, even with the > knowledge that it's all throwaway code. Thanks for comments, and sorry for delayed reply. I propose revised version of FDW API which supports both of CSV and PG. It would be a minimal API to support SQL/MED, so some features such as optimizing query or using remote index are not considered much. ====================== This FDW API is designed to support scanning a foreign table. As SQL/MED standard says, updating external data is not supported. The instance of a foreign table may be a CSV file or a table on an external PostgreSQL server, or anything which could generate PG-style tuples. Scanning of a foreign table will be done in some steps below. Note: FdwRoutine->XXX() means FDW API function. 1. Parser parses a query string and generate a query tree. In this step, a foreign table is not distinguished from a local table. 2. Optimizer generates a Path pathnode (pathtype = T_ForeignPath) for each foreign table in the query. Because foreign tables don't have TID nor index (at least now), we don't add new Path type. In this step, optimizer calls FdwRoutine->GetStatistics() to get estimated costs of the foreign scan. FDW can calculate costs by itself with some statistics, or get costs from remote-side, or leave the costs as is. 3. Planner generates a ForeignScan plannode for each foreign Path. ForeignScan could be typedef of Scan because it has no other member, but it have been defined as a independent structure to make it be able to add other members in the future. 4. Executor executes each ForeignScan node in some steps. Through the Step-4, FDW can keep their own information about the foreign scan in the ForeignScan->FdwReply. 4-a. To initialize a ForeignScan plannodes, ExecInitForeignScan() is called for each ForeignScan node. In ExecInitForeignScan(), ForeignScanState will be created from: * ScanState which hold same information as SeqScan. * FdwRoutine pointer as cache (for cache). * some catalog informationabout the foreign table (for cache). * connection established with FdwRoutine->ConnectServer(). FDW can useGetFSConnectionByName(name) to get pooled connection by name in FdwRoutine->ConnectServer(). Then, ExecInitForeignScan() calls FdwRouteine->Open() to tell FDW that it's time to start the query. FDW can do some initialization if necessary. 4-b. To retrieve a tuple from the foreign table, ExecForeignScan() calls FdwRoutine->Iterate(). FDW should set next tuple into ss_ScanTupleSlot of ScanState or empty the slot to indicate EOF. Projection and qual evaluation will be done in ExecScan() later, so ExecForeignScan() should return all columns/tuples in the external data store. 4-c. To reset the scan and rewind cursor to the head of the foreign table, ExecForeignReScan() calls FdwRoutine->ReOpen(). This occurs when a ForeignScan node is a inner node of a nested loop join. FDW is required to return the first tuple again at next FdwRoutine->Iterate() call. 4-d. At the end of execution of ForeignScan, ExecEndForeignScan() is called. ExecEndForeignScan() calls FdwRoutine->Close() to tell FDW that no more Iterate will called. FDW can do some finalization if necessary. 5. Connections which have established via FdwRoutine->ConnectServer() are pooled in the backend for future query which accesses same foreign server. Pooling mechanism is implemented in core module, not in the each FDW. When a user executes DISCARD ALL command, or backend dies, FdwRoutine->FreeFSConnection() is called for each foreign connection to discard it. To achieve features above, I propose following FDW API: /** Almost same as SeqScan, but defined as a structure to allow add* members in the future.*/ typedef struct ForeignScan { Scan scan; } ForeignScan; /** Handle to access FDW-depend data.* Each FDW can use ForeignScanState->reply with casting between FdwReply* and actualtype.*/ typedef FdwReply FdwReply; /** ForeignScanState node is used to store scan status.*/ typedef struct ForeignScanState { ScanState ss; /* its first field is NodeTag */ FdwRoutine *routine; /* set of FDW routines */ ForeignDataWrapper *wrapper;/* foreign data wrapper */ ForeignServer *server; /* foreign server */ FSConnection *conn; /* connection to the foreign server */ UserMapping *user; /* user mapping */ ForeignTable *table; /* foreign table */ FdwReply *reply; /* private data for each data wrapper */ } ForeignScanState; typedef struct FdwRoutine { /* * Connect to the foreign server identified by server and user. */ FSConnection* (*ConnectServer)(ForeignServer*server, UserMapping *user); /* * Disconnect from the foreign server and free FSConnection object. */ void (*FreeFSConnection)(FSConnection*conn); /* * Estimate costs of a foreign path. FDW should update startup_cost * and total_cost of the Path. * To estimatecost, PG FDW might generate "EXPLAIN" SQL and * execute it on the remote side, or collect statistics in * somewhereand calculate from them. * This parameters are same as cost_foreignscan() in * optimizer/path/costsize.c (thefunction is just a proxy). */ void (*GetStatistics)(Path *path, PlannerInfo *root, RelOptInfo *baserel); /* * Prepare to return tuples. */ void (*Open)(ForeignScanState *scanstate); /* * Fetch the next tuple and fill tupleslot with it, or clear the slot * to indicate EOF. */ void (*Iterate)(ForeignScanState*scanstate); /* * End the foreign scan and do some cleanup if necessary. */ void (*Close)(ForeignScanState *scanstate); /* * Re-initialize the foreign scan, used when the INNER executor node is * executed again. */ void (*ReOpen)(ForeignScanState*scanstate); } FdwRoutine; ====================== Issues from ideas above: == Connection Pooling == Connect on each foreign scan is obviously useless, but it's open to argument that who should manage the pooled connections, because for CSV wrapper, no "connection" is needed at all, but PG wrapper might connect once per database per user. Maybe DBAs want to see how much and what kind of connections are established now, so we should show list of active connections via a view or function. To treat connections transparently, I propose that connection pooling mechanism in the core module, not in each FDW module. Or should we add API to return list of active connections in common format? If we have decided to leave connection management to each FDWs, ConnectServer() and FreeFSConnection() can be removed from API. == ANALYZE support == Even if a FDW wants to keep statistics in local area such as pg_class.reltuples, current API doesn't provide appropriate timing. Should we add API which is called from ANALYZE to allow FDWs to handle statistics when user wants. Any comments are welcome. Regards, -- Shigeru Hanada
On Thu, Oct 7, 2010 at 7:20 AM, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > [ design for FDW ] This seems mostly sensible. > In this step, optimizer calls FdwRoutine->GetStatistics() to get > estimated costs of the foreign scan. FDW can calculate costs by > itself with some statistics, or get costs from remote-side, or leave > the costs as is. I think this should be called EstimateCosts rather than GetStatistics. > Maybe DBAs want to see how much and what kind of connections are > established now, so we should show list of active connections via a > view or function. To treat connections transparently, I propose > that connection pooling mechanism in the core module, not in each FDW > module. Or should we add API to return list of active connections in > common format? I think for starters we should make this the responsibility of each FDW. It seems a bit hard to speculate about what we might want down the road at this point. > Should we add API which is called from ANALYZE to allow FDWs to handle > statistics when user wants. I would probably put this off also, to a later phase of the project. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, 11 Oct 2010 13:22:13 -0400 Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Oct 7, 2010 at 7:20 AM, Shigeru HANADA > <hanada@metrosystems.co.jp> wrote: > > In this step, optimizer calls FdwRoutine->GetStatistics() to get > > estimated costs of the foreign scan. ?FDW can calculate costs by > > itself with some statistics, or get costs from remote-side, or leave > > the costs as is. > > I think this should be called EstimateCosts rather than GetStatistics. Agreed, the name you suggested would be better. I've chosen the name GetStatistics because a function with that name is defined in SQL/MED standard as below, but GetStatistics might rather match handling of ANALYZE request in PostgreSQL. Section 22.3.28 GetStatistics (ISO/IEC 9075-9:2003) > Retrieve implementation-defined statistics associated with a > foreign server request. > > Maybe DBAs want to see how much and what kind of connections are > > established now, so we should show list of active connections via a > > view or function. ?To treat connections transparently, I propose > > that connection pooling mechanism in the core module, not in each FDW > > module. ?Or should we add API to return list of active connections in > > common format? > > I think for starters we should make this the responsibility of each > FDW. It seems a bit hard to speculate about what we might want down > the road at this point. Agreed. I would move connection cache mechanism to postgresql_fdw. I also would remove pg_foreign_connections view because it might need new interface to get list of active connections from FDWs. > > Should we add API which is called from ANALYZE to allow FDWs to handle > > statistics when user wants. > > I would probably put this off also, to a later phase of the project. Agreed. Thanks for the review! I'll finish the SQL/MED patch by applying your comments. Regards, -- Shigeru Hanada
On Tue, Oct 12, 2010 at 2:27 AM, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > Thanks for the review! > I'll finish the SQL/MED patch by applying your comments. I think this patch has gotten as much review as we can reasonably give it this CommitFest (which is about to end), so I'm going to mark it Returned with Feedback. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 13 Oct 2010 19:10:42 -0400 Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Oct 12, 2010 at 2:27 AM, Shigeru HANADA > <hanada@metrosystems.co.jp> wrote: > > Thanks for the review! > > I'll finish the SQL/MED patch by applying your comments. > > I think this patch has gotten as much review as we can reasonably give > it this CommitFest (which is about to end), so I'm going to mark it > Returned with Feedback. Some new issues other than interface of FDW have been clarified through reviews in this CommitFest, and it's difficult to solve them in this CommitFest. I would like to propose revised version of SQL/MED patch in the next CommitFest. Many thanks to your reviews. Regards, -- Shigeru Hanada