Thread: patch: SQL/MED(FDW) DDL

patch: SQL/MED(FDW) DDL

From
SAKAMOTO Masahiko
Date:
Hello.
This is a proposal patch for SQL/MED for 9.1.

At the prev. CF, this patch had so many features to make it hard
to review all of them.  So I devided it into smaller parts:
 (1) foreign table DDL support (this proposal):
     - support for foreign table DDL syntax (CREATE/ALTER FOREIGN TABLE)
     - Definition of FDW routine interface and system catalogs for it.
 (2) SELECT support for external PostgreSQL tables.


First of all, I'd like to discuss and make agreement on the basic design
and its implementation for FDW interface.
Comments and ideas would be very appriciated, especially
on how system catalog stores the information about
foreign tables, server, and user mappings.

Detail information are described in the wiki:
  http://wiki.postgresql.org/wiki/SQL/MED

I'll add this item to the CF site.
And WIP patch for (2) will be available on the page for how (1) is utilized.

- Open Issue:
 - fdw connection params issue:
    In this implimentation, foreign table connection params(including
    passwords) are visible to all users. To avoid it, store them in the
    user_mapping object, not in foreign server object.
    # It should be improved, though.
 - fdw security issue:
    It depends on how indivisual fdw extensions are designed.
    I'll focus on it after basic agreement on fdw interface.

- prev. CF page:
  https://commitfest.postgresql.org/action/patch_view?id=326

- code repository:
    http://repo.or.cz/w/pgsql-fdw.git (branch: foreign_table).

repository branches structures is as follows:
  master
     +foreign_table: (active) branch for this proposal
         - foreign_scan: (active) WIP branch for select support
     +fdw_ddl: (will be obsolute) branch for last CF
         - dblink
         - copy_from

Thanks in advance,

SAKAMOTO Masahiko
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
e-mail: sakamoto.masahiko@oss.ntt.co.jp

Attachment

Re: patch: SQL/MED(FDW) DDL

From
Heikki Linnakangas
Date:
On 15/09/10 08:46, SAKAMOTO Masahiko wrote:
> This is a proposal patch for SQL/MED for 9.1.
> 
> At the prev. CF, this patch had so many features to make it hard
> to review all of them.  So I devided it into smaller parts:
>   (1) foreign table DDL support (this proposal):
>       - support for foreign table DDL syntax (CREATE/ALTER FOREIGN TABLE)
>       - Definition of FDW routine interface and system catalogs for it.
>   (2) SELECT support for external PostgreSQL tables.
> 
> 
> First of all, I'd like to discuss and make agreement on the basic design
> and its implementation for FDW interface.
> Comments and ideas would be very appriciated, especially
> on how system catalog stores the information about
> foreign tables, server, and user mappings.

In my mind the key question is: what does the API for writing foreign
data wrappers look like? I couldn't figure that out by looking at the patch.

The API needs to be simple and version-independent, so that you can
write simple wrappers like the flat file wrapper easily. At the same
time, it needs to be very flexible, so that it allows safely pushing
down all kinds constructs like functions, aggregates and joins. The
planner needs to know which constructs the wrapper can handle and get
cost estimates for the foreign scans. Those two goals are at odds.

I've been thinking about that for some time. Here's one sketch I made a
while ago:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg00395.php.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
2010/9/15 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
> In my mind the key question is: what does the API for writing foreign
> data wrappers look like? I couldn't figure that out by looking at the patch.
>
> The API needs to be simple and version-independent, so that you can
> write simple wrappers like the flat file wrapper easily. At the same
> time, it needs to be very flexible, so that it allows safely pushing
> down all kinds constructs like functions, aggregates and joins. The
> planner needs to know which constructs the wrapper can handle and get
> cost estimates for the foreign scans. Those two goals are at odds.
>
> I've been thinking about that for some time. Here's one sketch I made a
> while ago:
> http://archives.postgresql.org/pgsql-hackers/2008-07/msg00395.php.

I wonder if we might not think of the foreign data wrapper problem as
an instance of an even more general problem.  Right now, we have
semi-pluggable index access methods - it's probably not quite possible
to implement one as a standalone module because of XLOG, and maybe
other reasons, but at least there's some abstraction layer there.
Perhaps we should be thinking about a similar facility of table-access
methods.  What if someone wants to implement column-oriented storage,
or index-organized tables, or or tables that are really slow under
heavy write loads but can instantaneously compute SELECT COUNT(*) FROM
table, or mauve-colored tables with magic pixie dust?  I don't want to
raise the bar for this project to the point where we can never get it
off the ground, but if there's a way to avoid baking in the assumption
that only foreign tables can ever have special capabilities, that
might be valuable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Heikki Linnakangas
Date:
On 15/09/10 21:21, Robert Haas wrote:
> I wonder if we might not think of the foreign data wrapper problem as
> an instance of an even more general problem.  Right now, we have
> semi-pluggable index access methods - it's probably not quite possible
> to implement one as a standalone module because of XLOG, and maybe
> other reasons, but at least there's some abstraction layer there.
> Perhaps we should be thinking about a similar facility of table-access
> methods.  What if someone wants to implement column-oriented storage,
> or index-organized tables, or or tables that are really slow under
> heavy write loads but can instantaneously compute SELECT COUNT(*) FROM
> table, or mauve-colored tables with magic pixie dust?  I don't want to
> raise the bar for this project to the point where we can never get it
> off the ground, but if there's a way to avoid baking in the assumption
> that only foreign tables can ever have special capabilities, that
> might be valuable.

Well, you could implement all that as a foreign data wrappers. Tables 
made out of pixie dust feels pretty foreign to me ;-).

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Wed, Sep 15, 2010 at 2:28 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 15/09/10 21:21, Robert Haas wrote:
>>
>> I wonder if we might not think of the foreign data wrapper problem as
>> an instance of an even more general problem.  Right now, we have
>> semi-pluggable index access methods - it's probably not quite possible
>> to implement one as a standalone module because of XLOG, and maybe
>> other reasons, but at least there's some abstraction layer there.
>> Perhaps we should be thinking about a similar facility of table-access
>> methods.  What if someone wants to implement column-oriented storage,
>> or index-organized tables, or or tables that are really slow under
>> heavy write loads but can instantaneously compute SELECT COUNT(*) FROM
>> table, or mauve-colored tables with magic pixie dust?  I don't want to
>> raise the bar for this project to the point where we can never get it
>> off the ground, but if there's a way to avoid baking in the assumption
>> that only foreign tables can ever have special capabilities, that
>> might be valuable.
>
> Well, you could implement all that as a foreign data wrappers. Tables made
> out of pixie dust feels pretty foreign to me ;-).

Eh, maybe.  It doesn't seem like the best name, if we're actually
managing the underlying data blocks with our smgr layer, etc.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> 2010/9/15 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
>> I've been thinking about that for some time. Here's one sketch I made a
>> while ago:
>> http://archives.postgresql.org/pgsql-hackers/2008-07/msg00395.php.

> I wonder if we might not think of the foreign data wrapper problem as
> an instance of an even more general problem.  Right now, we have
> semi-pluggable index access methods - it's probably not quite possible
> to implement one as a standalone module because of XLOG, and maybe
> other reasons, but at least there's some abstraction layer there.
> Perhaps we should be thinking about a similar facility of table-access
> methods.  What if someone wants to implement column-oriented storage,
> or index-organized tables, or or tables that are really slow under
> heavy write loads but can instantaneously compute SELECT COUNT(*) FROM
> table, or mauve-colored tables with magic pixie dust?  I don't want to
> raise the bar for this project to the point where we can never get it
> off the ground, but if there's a way to avoid baking in the assumption
> that only foreign tables can ever have special capabilities, that
> might be valuable.

Well, the problem is to not draw the abstraction boundary so high that
your plugins have to reimplement the world to get anything done.
mysql got this wrong IMO, and are still paying the price in the form of
bizarre functional incompatibilities between their different storage
engines.

As an example, I don't think there is any sane way to provide
column-oriented storage as a plugin.  The entire executor is based
around the assumption that table scans return a row at a time; in
consequence, the entire planner is too.  You can't have a plugin that
replaces all of that.  You could probably build a plugin that allows
columnar storage but reconstructs rows to return to the executor ... but
having to do that would largely destroy any advantages of a columnar DB,
I fear.

Yet there are other cases that probably *could* work well based on a
storage-level abstraction boundary; index-organized tables for instance.
So I think we need to have some realistic idea of what we want to
support and design an API accordingly, not hope that if we don't
know what we want we will somehow manage to pick an API that makes
all things possible.

I'm personally more worried about whether Heikki's sketch has the
boundary too high-level than too low-level.  It might work all right
for handing off to a full-fledged remote database, particularly if
the other DB is also Postgres; but otherwise it's leaving a lot of
work to be done by the plugin author.  And at the same time I don't
think it's exposing enough information to let the local planner do
anything intelligent in terms of trading off remote vs. local work.
        regards, tom lane


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, the problem is to not draw the abstraction boundary so high that
> your plugins have to reimplement the world to get anything done.
> mysql got this wrong IMO, and are still paying the price in the form of
> bizarre functional incompatibilities between their different storage
> engines.

Yeah, as far as I can tell there is pretty much universal consensus
that they got that wrong.  Actually, I have no personal opinion on the
topic, having no familiarity with the innards of MySQL: but that is
what people keep telling me.

> As an example, I don't think there is any sane way to provide
> column-oriented storage as a plugin.  The entire executor is based
> around the assumption that table scans return a row at a time; in
> consequence, the entire planner is too.  You can't have a plugin that
> replaces all of that.  You could probably build a plugin that allows
> columnar storage but reconstructs rows to return to the executor ... but
> having to do that would largely destroy any advantages of a columnar DB,
> I fear.

Yeah, I don't know.  A columnar DB is a bit like making "SELECT * FROM
table" really mean some kind of join between table_part1, table_part2,
and table_part3 (which could then perhaps be reordered, a candidate
for join removal, etc.).  But I have no position on whether whatever
infrastructure we'd need to support that is any way related to the
problem du jour.  It's worth noting, however, that even if we give up
on a column-oriented storage within PG, we might easily be talking to
a column-oriented DB on the other end of an SQL/MED connection; and
we'd like to be able to handle that sanely.

> Yet there are other cases that probably *could* work well based on a
> storage-level abstraction boundary; index-organized tables for instance.
> So I think we need to have some realistic idea of what we want to
> support and design an API accordingly, not hope that if we don't
> know what we want we will somehow manage to pick an API that makes
> all things possible.

Agreed.  Random ideas: index-organized tables, tables that use a
rollback log rather than VACUUM, tables that use strict two-phase
locking rather than MVCC, tables that have no concurrency control at
all and you get dirty reads (could be useful for logging tables),
write-once read-many tables, compressed tables, encrypted tables,
tables in formats used by previous versions of PostgreSQL, tables that
store data in a round-robin fashion (like MRTG rrdtool).  Within the
general orbit of index-organized tables, you can wonder about
different kinds of indices: btree, hash, and gist all seem promising.
You can even imagine a GIST-like structure that does something like
maintain running totals for certain columns on each non-leaf page, to
speed up SUM operations.  Feel free to ignore whatever of that seems
irrelevant.

> I'm personally more worried about whether Heikki's sketch has the
> boundary too high-level than too low-level.  It might work all right
> for handing off to a full-fledged remote database, particularly if
> the other DB is also Postgres; but otherwise it's leaving a lot of
> work to be done by the plugin author.  And at the same time I don't
> think it's exposing enough information to let the local planner do
> anything intelligent in terms of trading off remote vs. local work.

Yeah, I think the API for exposing cost information needs a lot of thought.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Wed, Sep 15, 2010 at 8:53 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
> 2010/9/16 Robert Haas <robertmhaas@gmail.com>:
>> On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Yet there are other cases that probably *could* work well based on a
>>> storage-level abstraction boundary; index-organized tables for instance.
>>> So I think we need to have some realistic idea of what we want to
>>> support and design an API accordingly, not hope that if we don't
>>> know what we want we will somehow manage to pick an API that makes
>>> all things possible.
>>
>> Agreed.  Random ideas: index-organized tables...
>
> I'd love to see a table that is based on one of the existing KVSs.

I'm not familiar with the term KVS?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Hitoshi Harada
Date:
2010/9/16 Robert Haas <robertmhaas@gmail.com>:
> On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yet there are other cases that probably *could* work well based on a
>> storage-level abstraction boundary; index-organized tables for instance.
>> So I think we need to have some realistic idea of what we want to
>> support and design an API accordingly, not hope that if we don't
>> know what we want we will somehow manage to pick an API that makes
>> all things possible.
>
> Agreed.  Random ideas: index-organized tables...

I'd love to see a table that is based on one of the existing KVSs.


--
Hitoshi Harada


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Wed, Sep 15, 2010 at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Sep 15, 2010 at 8:53 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
>> 2010/9/16 Robert Haas <robertmhaas@gmail.com>:
>>> On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Yet there are other cases that probably *could* work well based on a
>>>> storage-level abstraction boundary; index-organized tables for instance.
>>>> So I think we need to have some realistic idea of what we want to
>>>> support and design an API accordingly, not hope that if we don't
>>>> know what we want we will somehow manage to pick an API that makes
>>>> all things possible.
>>>
>>> Agreed.  Random ideas: index-organized tables...
>>
>> I'd love to see a table that is based on one of the existing KVSs.
>
> I'm not familiar with the term KVS?

Oh, key-value store, I bet.  Yeah, that would be cool.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Hitoshi Harada
Date:
2010/9/16 Robert Haas <robertmhaas@gmail.com>:
> On Wed, Sep 15, 2010 at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Sep 15, 2010 at 8:53 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
>>> 2010/9/16 Robert Haas <robertmhaas@gmail.com>:
>>>> On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>> Yet there are other cases that probably *could* work well based on a
>>>>> storage-level abstraction boundary; index-organized tables for instance.
>>>>> So I think we need to have some realistic idea of what we want to
>>>>> support and design an API accordingly, not hope that if we don't
>>>>> know what we want we will somehow manage to pick an API that makes
>>>>> all things possible.
>>>>
>>>> Agreed.  Random ideas: index-organized tables...
>>>
>>> I'd love to see a table that is based on one of the existing KVSs.
>>
>> I'm not familiar with the term KVS?
>
> Oh, key-value store, I bet.  Yeah, that would be cool.

That's it. Like Redis, Tokyo Cabinet, or something.



--
Hitoshi Harada


Re: patch: SQL/MED(FDW) DDL

From
Tom Lane
Date:
Hitoshi Harada <umi.tanuki@gmail.com> writes:
> 2010/9/16 Robert Haas <robertmhaas@gmail.com>:
>> Oh, key-value store, I bet. �Yeah, that would be cool.

> That's it. Like Redis, Tokyo Cabinet, or something.

What exactly do those get you that an ordinary index, or at worst an
index-organized table, doesn't get you?
        regards, tom lane


Re: patch: SQL/MED(FDW) DDL

From
Mark Kirkwood
Date:
On 16/09/10 13:22, Tom Lane wrote:
> Hitoshi Harada<umi.tanuki@gmail.com>  writes:
>    
>> 2010/9/16 Robert Haas<robertmhaas@gmail.com>:
>>      
>>> Oh, key-value store, I bet.  Yeah, that would be cool.
>>>        
>    
>> That's it. Like Redis, Tokyo Cabinet, or something.
>>      
> What exactly do those get you that an ordinary index, or at worst an
> index-organized table, doesn't get you?
>
>             regards, tom lane
>
>    

It is pretty rare to see key value stores vs relational engines 
discussed without a descent into total foolishiness, but this Wikipedia 
page looks like a reasonable summary:

http://en.wikipedia.org/wiki/NoSQL


Re: patch: SQL/MED(FDW) DDL

From
Tom Lane
Date:
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
> On 16/09/10 13:22, Tom Lane wrote:
>> What exactly do those get you that an ordinary index, or at worst an
>> index-organized table, doesn't get you?

> It is pretty rare to see key value stores vs relational engines 
> discussed without a descent into total foolishiness, but this Wikipedia 
> page looks like a reasonable summary:
> http://en.wikipedia.org/wiki/NoSQL

That doesn't do anything at all to answer my question.  I don't want
to debate NoSQL versus traditional RDBMS here.  What I asked was:
given that PG is a traditional RDBMS, what exactly are you hoping
to accomplish by putting a key-value storage mechanism in it?  And
if you did, how would that be different from an index-organized table?
        regards, tom lane


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Wed, Sep 15, 2010 at 9:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hitoshi Harada <umi.tanuki@gmail.com> writes:
>> 2010/9/16 Robert Haas <robertmhaas@gmail.com>:
>>> Oh, key-value store, I bet.  Yeah, that would be cool.
>
>> That's it. Like Redis, Tokyo Cabinet, or something.
>
> What exactly do those get you that an ordinary index, or at worst an
> index-organized table, doesn't get you?

Speed, hopefully.  Because otherwise there's little reason to think
that abandoning transactional semantics, durability, and strong typing
are good ideas.

We spend a lot of time focusing on people who need things to be MORE
ACID (sync rep and true serializability are just two examples) and
that's really great stuff but on occasion it can be useful to turn the
dial in the other direction.  There are an awful lot of solutions out
there that are based on eventually consistent replication, in-memory
databases with periodic snapshots to disk, etc. and, while I sure as
hell wouldn't advocate using those solutions to target nuclear
weapons, they are proving to be pretty useful for things like social
networking, storing the results of giant web crawls, etc., because
they reduce the amount of hardware that you need to buy to process a
given volume of data.

The work I'm doing on unlogged tables is a good example of how we can
provide users with the flexibility they need.  I just did my first
test (sorry, it's not in good enough shape to post for the CF, more's
the pity) and a bulk insert into an unlogged table was ~3x faster than
the same thing on a regular table.  There are a lot of people who can
benefit from that sort of thing.  For example, you can imagine that if
you have a "sessions" table where you store a record for each
currently-logged-in user, an unlogged table would be fine.  If the
database crashes and comes back up again, everyone has to log in
again, but that's a rare event and not a disaster if it happens.

Another avenue that I think we should explore in this regard is tables
that have some sort of weakened MVCC properties.  The most obvious
example is a table where all tuples are visible to everyone, period.
You can imagine using this as a logging table, for example., in lieu
of autonomous transactions.  There might be other cases that are
useful, too: what about a table where only committed tuples are
visible, but we ignore the effect of snapshots (i.e. if the inserting
or deleting transaction is committed, then we assume that the
XMIN/XMAX is visible to the current snapshot without checking)?  That
might seem like playing with firecrackers, but suppose the table has
very low concurrency: e.g. overnight you update data in bulk, then
after that's done you run daily reporting queries, then during the day
users run small read-only queries interactively.  You pretty much know
that you're never going to have a tuple that's visible to some
transactions but not others, so why pay the overhead of checking every
time?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Itagaki Takahiro
Date:
2010/9/15 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
> The API needs to be simple and version-independent, so that you can
> write simple wrappers like the flat file wrapper easily. At the same
> time, it needs to be very flexible, so that it allows safely pushing
> down all kinds constructs like functions, aggregates and joins. The
> planner needs to know which constructs the wrapper can handle and get
> cost estimates for the foreign scans. Those two goals are at odds.

The patch didn't say nothing about the API design, no?
It just implement FOREIGN TABLE commands in the SQL standard,
and we need the part anyway if we want to support the standard.

-- 
Itagaki Takahiro


Re: patch: SQL/MED(FDW) DDL

From
Mark Kirkwood
Date:
On 16/09/10 14:05, Robert Haas wrote:
> On Wed, Sep 15, 2010 at 9:22 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>    
>> Hitoshi Harada<umi.tanuki@gmail.com>  writes:
>>      
>>> 2010/9/16 Robert Haas<robertmhaas@gmail.com>:
>>>        
>>>> Oh, key-value store, I bet.  Yeah, that would be cool.
>>>>          
>>      
>>> That's it. Like Redis, Tokyo Cabinet, or something.
>>>        
>> What exactly do those get you that an ordinary index, or at worst an
>> index-organized table, doesn't get you?
>>      
>
> For example, you can imagine that if
> you have a "sessions" table where you store a record for each
> currently-logged-in user, an unlogged table would be fine.  If the
> database crashes and comes back up again, everyone has to log in
> again, but that's a rare event and not a disaster if it happens.
>
>    

Or perhaps even a "sessions" type table where the rows are overwritten 
in place in some manner, to avoid bloat.

regards

Mark


Re: patch: SQL/MED(FDW) DDL

From
Hitoshi Harada
Date:
2010/9/16 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>:
> On 16/09/10 14:05, Robert Haas wrote:
>>
>> On Wed, Sep 15, 2010 at 9:22 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>
>>>
>>> Hitoshi Harada<umi.tanuki@gmail.com>  writes:
>>>
>>>>
>>>> 2010/9/16 Robert Haas<robertmhaas@gmail.com>:
>>>>
>>>>>
>>>>> Oh, key-value store, I bet.  Yeah, that would be cool.
>>>>>
>>>
>>>
>>>>
>>>> That's it. Like Redis, Tokyo Cabinet, or something.
>>>>
>>>
>>> What exactly do those get you that an ordinary index, or at worst an
>>> index-organized table, doesn't get you?
>>>
>>
>> For example, you can imagine that if
>> you have a "sessions" table where you store a record for each
>> currently-logged-in user, an unlogged table would be fine.  If the
>> database crashes and comes back up again, everyone has to log in
>> again, but that's a rare event and not a disaster if it happens.
>>
>>
>
> Or perhaps even a "sessions" type table where the rows are overwritten in
> place in some manner, to avoid bloat.
>
My answer is "variety". If an index-organized table was the one best
solution, there would not been so many KVSes these days.

Regards,



--
Hitoshi Harada


Re: patch: SQL/MED(FDW) DDL

From
SAKAMOTO Masahiko
Date:
(2010/09/16 11:20), Itagaki Takahiro wrote:
> 2010/9/15 Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>:
>> The API needs to be simple and version-independent, so that you can
>> write simple wrappers like the flat file wrapper easily. At the same
>> time, it needs to be very flexible, so that it allows safely pushing
>> down all kinds constructs like functions, aggregates and joins. The
>> planner needs to know which constructs the wrapper can handle and get
>> cost estimates for the foreign scans. Those two goals are at odds.
>
> The patch didn't say nothing about the API design, no?
> It just implement FOREIGN TABLE commands in the SQL standard,
> and we need the part anyway if we want to support the standard.

you are right, sorry. Attached includes
  - FDW routine interface and
  - FDW implementation of FDW for external PostgreSQL.
This should be patched after the previous patch.

This is designed to be used in the executor module.
The API has some simple interfaces, such as FreeConnection,
Open, Iterate, Close, ReOpen, and OnError.

In this current FDW implementation for external PG,
It can push-down WHERE-clause by reconstructing query
that shouled be sent to the foreign server.

I think this FDW routine interface is simple, but
extension develpers can add intelligent features to some extent.

FDW routine interface design and what this fdw-for-pg can do
is summarized in:
  http://wiki.postgresql.org/wiki/SQL/MED#PostgreSQL

Regards,

SAKAMOTO Masahiko
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
e-mail: sakamoto.masahiko@oss.ntt.co.jp


Attachment

Re: patch: SQL/MED(FDW) DDL

From
SAKAMOTO Masahiko
Date:
And this is sample script to play the feature of this patch.

It includes:
  - create foreign data wrapper and foreign server(PG at localhost)
  - select foreign PostgreSQL table.

*Important*
This script initializes your database cluster(specified by $PGDATA).
Run with care....

Regards,

SAKAMOTO Masahiko
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
e-mail: sakamoto.masahiko@oss.ntt.co.jp

Attachment

Re: patch: SQL/MED(FDW) DDL

From
Itagaki Takahiro
Date:
On Thu, Sep 16, 2010 at 3:15 PM, SAKAMOTO Masahiko
<sakamoto.masahiko@oss.ntt.co.jp> wrote:
> This is designed to be used in the executor module.
> The API has some simple interfaces, such as FreeConnection,
> Open, Iterate, Close, ReOpen, and OnError.

Of course they are similar APIs mentioned in the SQL standard, but
I'm not sure they are enough for all purposes. For example, they
don't have planner hooks, but we will need them eventually to push
down ORDER BY and LIMIT into FDW.

We could learn from MySQL's storage engine interface, because
they are designed for flat file wrapper at first, but enhanced
with pains for condition push-down.

-- 
Itagaki Takahiro


Re: patch: SQL/MED(FDW) DDL

From
Garick Hamlin
Date:
On Wed, Sep 15, 2010 at 10:05:00PM -0400, Tom Lane wrote:
> Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
> > On 16/09/10 13:22, Tom Lane wrote:
> >> What exactly do those get you that an ordinary index, or at worst an
> >> index-organized table, doesn't get you?
> 
> > It is pretty rare to see key value stores vs relational engines 
> > discussed without a descent into total foolishiness, but this Wikipedia 
> > page looks like a reasonable summary:
> > http://en.wikipedia.org/wiki/NoSQL
> 
> That doesn't do anything at all to answer my question.  I don't want
> to debate NoSQL versus traditional RDBMS here.  What I asked was:
> given that PG is a traditional RDBMS, what exactly are you hoping
> to accomplish by putting a key-value storage mechanism in it?  And
> if you did, how would that be different from an index-organized table?

One thing it would get is integration with existing infrastructure that
makes up many critical apps.  Many horizontal apps use things like memcached 
or redis to provide a distributed data layer for developing their applications.  
Sometimes that becomes the middle layer for an enterprise.  Being able to hook 
into that middle layer is very handy.  Shared login or session information is 
a good example of data that one might want put in a KVS.  Also, many enterprises 
are full of different departments, orgs, teams, systems, etc ...  KVS are simple 
and limited enough they might make a good choice for standardizing on how to share 
data in some places.

Isn't this what SQL/Med is about?

Garick

> 
>             regards, tom lane
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: patch: SQL/MED(FDW) DDL

From
Itagaki Takahiro
Date:
2010/9/15 SAKAMOTO Masahiko <sakamoto.masahiko@oss.ntt.co.jp>:
> This is a proposal patch for SQL/MED for 9.1.
>  (1) foreign table DDL support (this proposal):
>     - support for foreign table DDL syntax (CREATE/ALTER FOREIGN TABLE)
>     - Definition of FDW routine interface and system catalogs for it.

I checked the patch. It includes changes for DDL, system catalogs,
information schema, and basic psql support. The patch itself have no useful
works, but we need the parts anyway to support the SQL standard.

I have a couples of comments:

* There are unused types in the patch. They might be used by additional patches based on the patch, but should be
removedfor now.   - enum GenericOptionFlags.ForeignTableOpt   - typedef struct FSConnection FSConnection;   - typedef
structFdwRoutine FdwRoutine;   - typedef struct FdwReply FdwReply; 
* Needs an error check to SELECT FROM foreign table. It might be replaced  to actual FDW routines soon, but the current
errormessage is not ideal.     postgres=# SELECT * FROM ft1;     ERROR:  could not open file "base/11908/16391": No
suchfile or directory 

* Type checks between TABLE and FOREIGN TABLE are a bit inconsistent. For example, "ALTER TABLE ADD COLUMN" can add a
columnto a foreign tables but "DROP TABLE" cannot remove foreign tables. IMHO, however, we can allow such looseness
becauseoperations actually forbidden will end with ERRORs without problems. 

--
Itagaki Takahiro


Re: patch: SQL/MED(FDW) DDL

From
SAKAMOTO Masahiko
Date:
Thanks for your comment. I've updated the patches.

 > I checked the patch. It includes changes for DDL, system catalogs,
 > information schema, and basic psql support. The patch itself have no
useful
 > works, but we need the parts anyway to support the SQL standard.

As you mentioned, this patch provides
CREATE/ALTER/DROP FOREIGNTABLE functionality only.
Please review this patch from the viewpoints:
   - confirming level to the SQL standard
   - features which should be supported on foreign table
     - WITH OIDS
     - DEFAULT
     - constraints (currently NOT NULL and CHECK are supported)
     - etc.

SELECTing on foreign tables functionality will be provided
by the fdw_select patch, as attached.

 > * There are unused types in the patch.
Fixed (definitions are moved to the other patch).

 >   * Needs an error check to SELECT FROM foreign table.
Fixed. I've added relkind check in set_plain_rel_pathlist(),
which aborts query with elog(ERROR) if the relation was a foreign table.
This fix also checks for DELETE, UPDATE and EXPLAIN SELECT.

 > * Type checks between TABLE and FOREIGN TABLE are a bit inconsistent.
 >    For example, "ALTER TABLE ADD COLUMN" can add a column to a
foreign tables
 >    but "DROP TABLE" cannot remove foreign tables.
 >    IMHO, however, we can allow such looseness because operations actually
 >    forbidden will end with ERRORs without problems.

In current implementation, we allow some kind of operations to
ALTER TABLE which can be done on views on foreign tables (i.e.
ALTER TABLE RENAME TO).
But as Itagaki-san says, the inconsistency will not cause serious
problem.

Regards,

(2010/09/17 11:29), Itagaki Takahiro wrote:it.
>
> I checked the patch. It includes changes for DDL, system catalogs,
> information schema, and basic psql support. The patch itself have no useful
> works, but we need the parts anyway to support the SQL standard.
>
> I have a couples of comments:
>
> * There are unused types in the patch. They might be used by additional
>    patches based on the patch, but should be removed for now.
>      - enum GenericOptionFlags.ForeignTableOpt
>      - typedef struct FSConnection FSConnection;
>      - typedef struct FdwRoutine FdwRoutine;
>      - typedef struct FdwReply FdwReply;
>
>   * Needs an error check to SELECT FROM foreign table. It might be replaced
>     to actual FDW routines soon, but the current error message is not ideal.
>        postgres=# SELECT * FROM ft1;
>        ERROR:  could not open file "base/11908/16391": No such file or directory
>
> * Type checks between TABLE and FOREIGN TABLE are a bit inconsistent.
>    For example, "ALTER TABLE ADD COLUMN" can add a column to a foreign tables
>    but "DROP TABLE" cannot remove foreign tables.
>    IMHO, however, we can allow such looseness because operations actually
>    forbidden will end with ERRORs without problems.
>

Attachment

Re: patch: SQL/MED(FDW) DDL

From
Itagaki Takahiro
Date:
On Fri, Sep 17, 2010 at 7:47 PM, SAKAMOTO Masahiko
<sakamoto.masahiko@oss.ntt.co.jp> wrote:
> Thanks for your comment. I've updated the patches.

I reviewed fdw_table20100917.patch.gz. It adds FOREIGN TABLE in the
SQL standard,
including CREATE/ALTER/DROP FOREIGN TABLE, information_schema (foreign_table
and foreign_table_options), and permission controls. But SELECT from
foreign tables
is not supported yet -- so, SELECT permission won't work in anyway.

In addition to the standard, it also supports some additional operations. * COMMENT ON FOREIGN TABLE * WITH OIDS --
reasonablebecause it is just same as adding a column. * RULE and DEFAULT -- useful to define updatable foreign table. *
INHERIT,NOT NULL, and CHECK -- they will be used by inter-server   partitioning in the future.
 

DEFAULT is not used by INSERT INTO foreign table (not supported),
but is used by INSERT RULE. So, it's reasonable for us to have the feature.
Foreign tables are read-only in the standard just like a VIEW.

>>   * Needs an error check to SELECT FROM foreign table.
>Fixed. I've added relkind check in set_plain_rel_pathlist(),

Not that if we create a foreign table inherits a normal table, we cannot
SELECT FROM the normal table because planner expands a query to retrieve
data from the foreign table, and the query fails with an error. ERROR:  foreign table scan not implemented.
But I think it's an acceptable limitation for now.


I found some trivial mistakes in the patch:

* src/bin/psql/describe.c : listForeignTables() + if (pset.sversion < 80500) We are in 90100 now ;-)

* SGML doc for CREATE FOREIGN TABLE contains "xxx (integer)". Is it a typo? All of the Storage Parameters section will
beremoved.
 

* SGML doc for DROP FOREIGN TABLE has "a<productname>PostgreSQL</>" A separator is required between a and
<productname>.

For the documentation, we could rewrite some duplicated descriptions
in CREATE/ALTER FOREIGN TABLE into links to CREATE/ALTER TABLE.
They have many copy-and-pasted texts. Do we need to adjust them?


Overall, I think the FOREIGN TABLE DDL part is almost ready to commit if we
accept the proposal itself. Of course we need more discussions about FDW
Routines, but it must be developed based on the infrastructure.

-- 
Itagaki Takahiro


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Thu, Sep 23, 2010 at 10:55 PM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
> Overall, I think the FOREIGN TABLE DDL part is almost ready to commit if we
> accept the proposal itself. Of course we need more discussions about FDW
> Routines, but it must be developed based on the infrastructure.

I think we need to further discuss how this is eventually going to get
integrated with the query planner and the executor before we commit
anything.  The syntax support by itself is quite trivial.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Itagaki Takahiro
Date:
On Fri, Sep 24, 2010 at 12:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think we need to further discuss how this is eventually going to get
> integrated with the query planner and the executor before we commit
> anything.  The syntax support by itself is quite trivial.

There are no active discussions :-(  I think the author tried his best, so if
other developers think it's a bad design, alternate plan must be proposed.

Also, if the syntax change is trivial, that's why we merge it at
earlier commitfests. I saw many patch went into "No patch reviewers
because of too large size" syndrome before. DDL changes are 5K
lines of diff -c patch, and "select" part is additional 6K lines.

--
Itagaki Takahiro


Re: patch: SQL/MED(FDW) DDL

From
Heikki Linnakangas
Date:
On 24/09/10 06:26, Itagaki Takahiro wrote:
> On Fri, Sep 24, 2010 at 12:08 PM, Robert Haas<robertmhaas@gmail.com>  wrote:
>> I think we need to further discuss how this is eventually going to get
>> integrated with the query planner and the executor before we commit
>> anything.  The syntax support by itself is quite trivial.
>
> There are no active discussions :-(  I think the author tried his best, so if
> other developers think it's a bad design, alternate plan must be proposed.

It's not that the design is bad, it's that it's non-existent. I haven't 
seen any design on how this integrates with the planner.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: patch: SQL/MED(FDW) DDL

From
Itagaki Takahiro
Date:
On Fri, Sep 24, 2010 at 6:12 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> It's not that the design is bad, it's that it's non-existent. I haven't seen
> any design on how this integrates with the planner.

In my understanding, the DDL part is independent from planner integration,
and that's why the author extract DDL changes from the composite patch.
(The proposal name is "SQL/MED(FDW) DDL".)


I think there are two type of FDWs. One is a simple flat file wrapper
used by COPY FROM now, that doesn't require any planner hooks.
Another is a connector to an external database, like as dblink, that
should be integrated with the planner.

The former is almost same as SRF functions, but it will skip unneeded
tuplestore caching. Even the level is useful enough because we can
support "SELECT * FROM csvfile" without any intermediate tables.
Could we set the first goal to the level?

Of course we need more discussion for the latter case. The current proposed
patch ("select" part") supports  executor integration -- WHERE clause
push-down, so it can use indexes in external servers. On the other hand,
unsupported ORDER BY, LIMIT, and JOIN push-down require planner integration.
More works will be required for fdw_select20100917.patch.gz.

-- 
Itagaki Takahiro


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Thu, Sep 23, 2010 at 11:26 PM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
> There are no active discussions :-(  I think the author tried his best, so if
> other developers think it's a bad design, alternate plan must be proposed.
>
> Also, if the syntax change is trivial, that's why we merge it at
> earlier commitfests. I saw many patch went into "No patch reviewers
> because of too large size" syndrome before. DDL changes are 5K
> lines of diff -c patch, and "select" part is additional 6K lines.

IMHO, merging a syntax that doesn't actually work leaves the tree in a
broken state.  It optimistically presumes that we will later commit a
patch to make the syntax in question do something useful.  I don't
wish to presume that, even if we were well along on the design of the
core functionality and especially because we are not.  It's fine to
break the patches into separate chunks for review, but our main tree
is not a dumping ground for half-finished features.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Fri, Sep 24, 2010 at 5:56 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
> I think there are two type of FDWs. One is a simple flat file wrapper
> used by COPY FROM now, that doesn't require any planner hooks.
> Another is a connector to an external database, like as dblink, that
> should be integrated with the planner.

This is a good point.  On the one hand, I do agree that the API for
simple things like processing CSV files shouldn't be overly complex.
So perhaps we could start with a simple API and extend it later.  On
the other hand, understanding how some of the more complex cases ought
to work provides insight into handling the simpler cases.  So I think
we should aim to have at least a sketch of a design for the whole
feature, and then if in phase 1 we want to implement only the easier
parts, that's OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
SAKAMOTO Masahiko
Date:
sorry for my slow response.

(2010/09/24 21:13), Robert Haas wrote:
> This is a good point.  On the one hand, I do agree that the API for
> simple things like processing CSV files shouldn't be overly complex.
> So perhaps we could start with a simple API and extend it later.  On
> the other hand, understanding how some of the more complex cases ought
> to work provides insight into handling the simpler cases.  So I think
> we should aim to have at least a sketch of a design for the whole
> feature, and then if in phase 1 we want to implement only the easier
> parts, that's OK.

Right. In any case, I should clearify what this API could cover
by this patch and what could not.
# And also how far I and my collaborator can implement..

As Itagaki points out, we have two types of FDW implementations in
progress: flat file wrapper and postgresql wrapper. these items are
described in wiki: http://wiki.postgresql.org/wiki/SQL/MED

But it may be hard to get what this fdw routines 'could' provide,
so I and my collaborator is now summarizing these items to discuss and
ask for help. Anyway I think these items are needed to discuss further. - API design that is used in executor for a
singleexternal table. - simple FDW implements to clearfy what this API could provide.
 

These items will be shown by my collaborator soon.
thanks in advance,

SAKAMOTO Masahiko
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
e-mail: sakamoto.masahiko@oss.ntt.co.jp


Re: patch: SQL/MED(FDW) DDL

From
Shigeru HANADA
Date:
Hi hackers,

On Mon, 27 Sep 2010 15:50:34 +0900
SAKAMOTO Masahiko <sakamoto.masahiko@oss.ntt.co.jp> wrote:
> Right. In any case, I should clearify what this API could cover
> by this patch and what could not.
> # And also how far I and my collaborator can implement..
> 
> As Itagaki points out, we have two types of FDW implementations in
> progress: flat file wrapper and postgresql wrapper. these items are
> described in wiki:
>   http://wiki.postgresql.org/wiki/SQL/MED
> 
> But it may be hard to get what this fdw routines 'could' provide,
> so I and my collaborator is now summarizing these items to discuss and
> ask for help. Anyway I think these items are needed to discuss further.
>   - API design that is used in executor for a single external table.
>   - simple FDW implements to clearfy what this API could provide.

To discuss how the SQL/MED patches should be, we made summary of
current SQL/MED proposal.

* What the current SQL/MED patches provide

Recent patch fdw_select20100917 consists of some parts: I/F of
FDW, new executor node ForeignScan, connection caching, and FDW
for PostgreSQL.  FDW for flat file is not included.  All of them
are based on DDLs and catalogs which are provided by
fdw_table20100917 patch.

Itagaki has summarized the details of fdw_table20100917 patch
well.  Thanks for review.
http://archives.postgresql.org/pgsql-hackers/2010-09/msg01653.php

The I/F of FDW is defined as struct FdwRoutine, set of API
functions which are called from backend modules.  The APIs are
designed to support only scanning a foreign at once.  Path/Plan
optimizations like JOIN/UNION push-down are out of scope.  Such
optimizations require planner hook as mentioned by Itagaki before. 
In short, our current goal is to implement SeqScan for foreign
tables.

ForeignScan node is an executor node which is like SeqScan node
for local table.  Optimizer generates T_ForeignScan path instead
of T_SeqScan path for a foreign table in set_plain_rel_pathlist()
if the RangeTblEntry was a foreign table.  Index paths and
tidscan paths are never generated for foreign tables.

ForeignScanState and FdwReply are introduced to represent the
status of a foreign scan.  ForeignScanState is a subclass of
ScanState, and FdwReply is a abstract type which is used to pass
FDW-specific data between API calls.

* Details of FDW API

FDWs should implement HANDLER function which returns a pointer to
a FdwRoutine instance which has pointers to actual functions.

struct FdRoutine {
   /*    * ConnectServer() will be called from ExecInitForeignScan()    * if the backend has no connection which can be
usedto    * execute the foreign query for the foreign table.    * FDW should establish a connection between foreign
server   * and return it with casting to pointer to FSConnection    * (abstract connection type).  If the FDW doesn't
needany    * connection, returning NULL is OK.    * The arguments, server and user, can be used to extract    *
connectioninformation.    */   FSConnection* (*ConnectServer)(ForeignServer *server,
UserMapping*user);
 
   /*    * FreeFSConnection() will be called when backend dies or    * DISCARD ALL command was executed.    * FDW
shouldclose connection gracefully and free resources    * if any.    */   void (*FreeFSConnection)(FSConnection
*conn);
   /*    * Open() will be called from ExecInitForeignScan().    * FDW should initialize ForeignScanState, internal
stateof    * a foreign scan, and ready to return tuple in next Iterate()    * call.    * For instance, FDW for
PostgreSQLonly generate SQL from    * ScanState.  And we implemented WHERE clause push-down here.    */   void
(*Open)(ForeignScanState*scanstate);
 
   /*    * Iterate() will be called from ExecForeignScan() when the    * executor requests next tuple.      * For
instance,FDW for PostgreSQL executes foreign query at    * first call and stores all results into TupleStore, and    *
returnseach tuple for each Iterate() call.    */   void (*Iterate)(ForeignScanState *scanstate);
 
   /*    * Close() will be called from ExecEndForeignScan().    * FDW should free resources for FdwReply if any.    */
void (*Close)(ForeignScanState *scanstate);
 
   /*    * ReOpen() will be called from ExecForeignReScan() when the    * foreign scan should be reseted to scan whole
datafrom the    * head again.    * For instance, FDW for PostgreSQL frees current result set    * to make next
Iterate()call to execute foreign query again.    */   void (*ReOpen)(ForeignScanState *scanstate);
 

};

Maybe FdwRoutine should have more APIs to support essential
features.

The startup/total cost of scanning a foreign table are fixed to
10000.0 in current patch.  They are groundless values, they just
say that scanning foreign table costs more than scanning local
table.  The cost should be estimated from statistics in
pg_statistic and pg_class.  This function has not been
implemented yet.
   /*    * UpdateStats() will be called from ANALYZE routine when    * statistics of a foreign table should be updated.
  *    * Some more parameters may be needed.    *    * For instance, FDW for PostgreSQL will retrieve    * pg_statistic
andpg_class from foreign server and store    * them in local catalog.  How to represent overhead of    * foreign query
isstill issue.    */   void (*UpdateStats)(Oid relid);
 

To support path/plan optimization, some APIs should be added, but
we think that simple FDWs can be implemented with APIs above. 

* Next action

To focus on API discussion, we are going to simplify and
re-submit patches.
 * Fix some mistakes pointed out by Itagaki. * Make FDW for PostgreSQL independent contrib module.   Currently it
sharesconnection management codes with   contrib/dblink. * Add simple FDW for flat file (maybe CSV ?) into core to
implementregression tests for FOREIGN TABLE DDLs and foreign   table query.  In the future, the FDW for flat file can
be  integrated with COPY FROM.
 

Any comments/questions are welcome.

Regards,
--
Shigeru Hanada



Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Mon, Sep 27, 2010 at 2:50 AM, SAKAMOTO Masahiko
<sakamoto.masahiko@oss.ntt.co.jp> wrote:
>  http://wiki.postgresql.org/wiki/SQL/MED

With regard to what is written here, it strikes me that it would be an
extremely bad idea to try to mix reloptions or attoptions with
fdwoptions.  fdwoptions are options to be passed transparently to the
fdw to handle as it likes; rel/attoptions affect the behavior of PG.

I think the section about WHERE clause push-down is way off base.
First, it seems totally wrong to assume that the same functions and
operators will be defined on the remote side as you have locally;
indeed, for CSV files, you won't have anything defined on the remote
side at all.  You need some kind of a discovery mechanism here to
figure out which quals are push-downable.  And it should probably be
something generic, not a bunch of hard-wired rules that may or may not
be correct in any particular case.  What if the remote side is a
competing database product that doesn't understand X = ANY(Y)?
Second, even if a functions or operators does exist on both sides of
the link, how do you know whether they have compatible semantics?
Short of solving the entscheidungsproblem, you're not going to be able
to determine that algorithmically, so you need some kind of mechanism
for controlling what assumptions get made.  Otherwise, you'll end up
with queries that don't work and no way for the user to fix it.

It seems to me that the API should allow PG to ask the FDW questions like this:

- How many tuples are there on the remote side?
- Here is a qual.  Are you able to evaluate this qual remotely?
- What are the startup and total costs of a sequential scan of the
remote side with the following set of remotely executable quals?
- Are there any indices available on the remote side, and if so what
are there names and which columns do they index in which order
(asc/desc, nulls first/last)?
- What are the startup and total costs of an index scan of the remote
side using the index called $NAME given the following set of remotely
executable quals?

and, as you mentIon:

- Please update pg_statistic for this foreign table, if you have that
capability.

Then:

- Begin a sequential scan with the following set of quals.
- Begin an index scan using the index called X with the following set of quals.
- Fetch next tuple.
- End scan.

Maybe that's too much for a first version but if we're not going to
deal with the problems in a general way, then we ought to not deal
with them at all, rather than having hacky rules that will work if
your environment is set up in exactly the way the code expects and
otherwise break horribly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Mon, Sep 27, 2010 at 2:50 AM, SAKAMOTO Masahiko
<sakamoto.masahiko@oss.ntt.co.jp> wrote:
>  http://wiki.postgresql.org/wiki/SQL/MED

With regard to what is written here, it strikes me that it would be an
extremely bad idea to try to mix reloptions or attoptions with
fdwoptions.  fdwoptions are options to be passed transparently to the
fdw to handle as it likes; rel/attoptions affect the behavior of PG.

I think the section about WHERE clause push-down is way off base.
First, it seems totally wrong to assume that the same functions and
operators will be defined on the remote side as you have locally;
indeed, for CSV files, you won't have anything defined on the remote
side at all.  You need some kind of a discovery mechanism here to
figure out which quals are push-downable.  And it should probably be
something generic, not a bunch of hard-wired rules that may or may not
be correct in any particular case.  What if the remote side is a
competing database product that doesn't understand X = ANY(Y)?
Second, even if a functions or operators does exist on both sides of
the link, how do you know whether they have compatible semantics?
Short of solving the entscheidungsproblem, you're not going to be able
to determine that algorithmically, so you need some kind of mechanism
for controlling what assumptions get made.  Otherwise, you'll end up
with queries that don't work and no way for the user to fix it.

It seems to me that the API should allow PG to ask the FDW questions like this:

- How many tuples are there on the remote side?
- Here is a qual.  Are you able to evaluate this qual remotely?
- What are the startup and total costs of a sequential scan of the
remote side with the following set of remotely executable quals?
- Are there any indices available on the remote side, and if so what
are there names and which columns do they index in which order
(asc/desc, nulls first/last)?
- What are the startup and total costs of an index scan of the remote
side using the index called $NAME given the following set of remotely
executable quals?

and, as you mentIon:

- Please update pg_statistic for this foreign table, if you have that
capability.

Then:

- Begin a sequential scan with the following set of quals.
- Begin an index scan using the index called X with the following set of quals.
- Fetch next tuple.
- End scan.

Maybe that's too much for a first version but if we're not going to
deal with the problems in a general way, then we ought to not deal
with them at all, rather than having hacky rules that will work if
your environment is set up in exactly the way the code expects and
otherwise break horribly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Heikki Linnakangas
Date:
On 09/28/10 17:26, Robert Haas wrote:
> First, it seems totally wrong to assume that the same functions and
> operators will be defined on the remote side as you have locally;
> indeed, for CSV files, you won't have anything defined on the remote
> side at all.  You need some kind of a discovery mechanism here to
> figure out which quals are push-downable.  And it should probably be
> something generic, not a bunch of hard-wired rules that may or may not
> be correct in any particular case.  What if the remote side is a
> competing database product that doesn't understand X = ANY(Y)?
> Second, even if a functions or operators does exist on both sides of
> the link, how do you know whether they have compatible semantics?

Or side-effects.

The SQL/MED specification has "routine mappings" for this purpose. We 
will need that or something similar.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: patch: SQL/MED(FDW) DDL

From
Shigeru HANADA
Date:
On Tue, 28 Sep 2010 10:26:42 -0400
Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Sep 27, 2010 at 2:50 AM, SAKAMOTO Masahiko
> <sakamoto.masahiko@oss.ntt.co.jp> wrote:
> > ?http://wiki.postgresql.org/wiki/SQL/MED
> With regard to what is written here, it strikes me that it would be an
> extremely bad idea to try to mix reloptions or attoptions with
> fdwoptions.  fdwoptions are options to be passed transparently to the
> fdw to handle as it likes; rel/attoptions affect the behavior of PG.

In current patch, fdwoptions for relations have been separated from
reloptins by introducing pg_foreign_table catalog.  As mentioned in
wiki, integration into rel/attoptions is nothing but an idea, we're
willing to add pg_foreign_attribute catalog which has columns:
   farelid     oid         not null   faattnum    smallint    not null   faoptions   text[]      not null

Though this catalog has only fdwoptions as non-key value.
Or, adding attrfdwoptions column to pg_attribute catalog is better?

> I think the section about WHERE clause push-down is way off base.
> First, it seems totally wrong to assume that the same functions and
> operators will be defined on the remote side as you have locally;
> indeed, for CSV files, you won't have anything defined on the remote
> side at all.  You need some kind of a discovery mechanism here to
> figure out which quals are push-downable.  And it should probably be
> something generic, not a bunch of hard-wired rules that may or may not
> be correct in any particular case.  What if the remote side is a
> competing database product that doesn't understand X = ANY(Y)?
> Second, even if a functions or operators does exist on both sides of
> the link, how do you know whether they have compatible semantics?
> Short of solving the entscheidungsproblem, you're not going to be able
> to determine that algorithmically, so you need some kind of mechanism
> for controlling what assumptions get made.  Otherwise, you'll end up
> with queries that don't work and no way for the user to fix it.

First of all, WHERE clause push-down ideas written in wiki are just
for FDW for PostgreSQL, implicitly same version, so we have assumed
that the remote side has same syntax/semantics.  WHERE clause
push-down is implemented in postgresql_fdw, and optimizer/planner are
not .

The postgresql_fdw pushes down WHERE clause with following steps. * scans all quals in the PlanState and pickup
remote-ablequals * for each remote-able quals,   * adds WHERE clause to remote SQL statement   * remove the qual from
PlanStatenode to avoid duplicate     evaluation at upper layer ExecScan()
 

If the foreign data wrapper didn't support WHERE clause push-down,
such as CSV-wrapper, the wrapper can retrieve all tuples from remote
side and pass them to upper layer, and ExecScan() will filter the
tuples based on the quals in the PlanState.

By the way, in current implementation, all operators/functions in
SELECT clause will be evaluated on the local side always, so FDWs
should provide only plain column values.

> It seems to me that the API should allow PG to ask the FDW questions like this:
> 
> - How many tuples are there on the remote side?
> - Here is a qual.  Are you able to evaluate this qual remotely?
> - What are the startup and total costs of a sequential scan of the
> remote side with the following set of remotely executable quals?
> - Are there any indices available on the remote side, and if so what
> are there names and which columns do they index in which order
> (asc/desc, nulls first/last)?
> - What are the startup and total costs of an index scan of the remote
> side using the index called $NAME given the following set of remotely
> executable quals?
> 
> and, as you mentIon:
> 
> - Please update pg_statistic for this foreign table, if you have that
> capability.
> 
> Then:
> 
> - Begin a sequential scan with the following set of quals.
> - Begin an index scan using the index called X with the following set of quals.
> - Fetch next tuple.
> - End scan.
> 
> Maybe that's too much for a first version but if we're not going to
> deal with the problems in a general way, then we ought to not deal
> with them at all, rather than having hacky rules that will work if
> your environment is set up in exactly the way the code expects and
> otherwise break horribly.
Using remote indexes might be very effective, but I think there are
many issues.

For instance, how can planner/optimizer know whether the foreign table
has been indexed or not?  Checking remote catalogs for each scan must
be a bad idea.  HiRDB, Hitachi's dbms product, seems to have
introduced FOREIGN INDEX for that purpose.

I'll consider about cost estimation and path/plan generation again,
and post later.

Regards, Shigeru Hanada
--
Shigeru Hanada



Re: patch: SQL/MED(FDW) DDL

From
Shigeru HANADA
Date:
On Tue, 28 Sep 2010 22:14:22 +0300
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> On 09/28/10 17:26, Robert Haas wrote:
> > First, it seems totally wrong to assume that the same functions and
> > operators will be defined on the remote side as you have locally;
> > indeed, for CSV files, you won't have anything defined on the remote
> > side at all.  You need some kind of a discovery mechanism here to
> > figure out which quals are push-downable.  And it should probably be
> > something generic, not a bunch of hard-wired rules that may or may not
> > be correct in any particular case.  What if the remote side is a
> > competing database product that doesn't understand X = ANY(Y)?
> > Second, even if a functions or operators does exist on both sides of
> > the link, how do you know whether they have compatible semantics?
> 
> Or side-effects.
> 
> The SQL/MED specification has "routine mappings" for this purpose. We 
> will need that or something similar.

Yes, I see the problem.

To resolve function-mismatch issues, we need routine mapping mechanism. 
With that, FDW can translate function calls into remote expressions
along rules which are defined by user.

SQL/MED standard requires a routine mapping to map a pair of local
routine and server to a set of generic options.  An local routine
is identified by type, name and parameters.

The type is one of ROUTINE/FUNCTION/PROCEDURE/METHOD.  I think
supporting only FUNCTION is enough because PostgreSQL doesn't have
ROUTINE/PROCEDURE/METHOD.

I think that the pg_routine_mapping catalog should have
columns/indexes:
   rmname      name    not null   rmprocid    oid     not null   rmserverid  oid     not null   rmoptions   text[]
   pg_routine_mapping_oid_index UNIQUE, btree(oid)   pg_routine_mapping_name_index UNIQUE, btree(rmname)
pg_routine_mapping_proc_server_indexUNIQUE, btree(rmprocid, rmserverid)
 

To use a remote function which PostgreSQL doesn't have in foreign
query, we need a local dummy function which is mapped to remote
function.  The local function may be empty but should have appropriate
parameters.

Supported features: * Create routine mapping (USAGE on the server required) * Alter routine mappging's generic options
(mustbe owner of the   routine mappping) * Drop routine mapping (must be owner of the routine mapping) * Query routine
mappinginformations via information_schema views   routine_mappings and routine_mapping_options (no restriction)
 

FDWs can define any specification of generic options.
Maybe FDW for PostgreSQL needs no entry if no special functions was
defined.  Routine mappings can be used to absorb the difference of the
versions. 

Any comments are welcome.

Regards,

--
Shigeru Hanada



Re: patch: SQL/MED(FDW) DDL

From
Alvaro Herrera
Date:
Excerpts from Robert Haas's message of mar sep 28 10:26:54 -0400 2010:

> Then:
> 
> - Begin a sequential scan with the following set of quals.
> - Begin an index scan using the index called X with the following set of quals.
> - Fetch next tuple.
> - End scan.

I'm not sure that it's a good idea to embed into the FDW API the set of
operations known to the executor.  For example your proposal fails to
consider bitmap scans.  Seems simpler and more general to hand the quals
over saying "I need to scan this relation with these quals", and have it
return an opaque iterable object; the remote executor would be in charge
of determining their usage for indexscans; or even for filtering tuples
with quals that cannot be part of the index condition.

There doesn't to be much point in knowing the names of remote indexes
either (if it came to referencing them, better use OIDs)

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: patch: SQL/MED(FDW) DDL

From
Itagaki Takahiro
Date:
On Wed, Sep 29, 2010 at 11:09 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> I'm not sure that it's a good idea to embed into the FDW API the set of
> operations known to the executor.  For example your proposal fails to
> consider bitmap scans.  Seems simpler and more general to hand the quals
> over saying "I need to scan this relation with these quals", and have it
> return an opaque iterable object;

Agreed. If possible, we will avoid dedicated interfaces for seqscans and
index scans. However, bitmap scan is difficult anyway because foreign tables
might not have ctid columns. It's a challenging task to identify individual
tuples in foreign tables. It will be also used for UPDATE and DELETE.

> There doesn't to be much point in knowing the names of remote indexes
> either (if it came to referencing them, better use OIDs)

FYI, HiRDB, that implements FDW routines, has CREATE FOREIGN INDEX.
I think it is a little ugly and won't work in some cases -- for example,
index organized tables -- but evidently it's a realistic solution.

--
Itagaki Takahiro


Re: patch: SQL/MED(FDW) DDL

From
Heikki Linnakangas
Date:
On 30.09.2010 04:30, Itagaki Takahiro wrote:
> On Wed, Sep 29, 2010 at 11:09 PM, Alvaro Herrera
> <alvherre@commandprompt.com>  wrote:
>> I'm not sure that it's a good idea to embed into the FDW API the set of
>> operations known to the executor.  For example your proposal fails to
>> consider bitmap scans.  Seems simpler and more general to hand the quals
>> over saying "I need to scan this relation with these quals", and have it
>> return an opaque iterable object;
>
> Agreed. If possible, we will avoid dedicated interfaces for seqscans and
> index scans. However, bitmap scan is difficult anyway because foreign tables
> might not have ctid columns. It's a challenging task to identify individual
> tuples in foreign tables. It will be also used for UPDATE and DELETE.
>
>> There doesn't to be much point in knowing the names of remote indexes
>> either (if it came to referencing them, better use OIDs)
>
> FYI, HiRDB, that implements FDW routines, has CREATE FOREIGN INDEX.
> I think it is a little ugly and won't work in some cases -- for example,
> index organized tables -- but evidently it's a realistic solution.

A long time ago I used DB2's federated database feature, which is at 
least close to SQL/MED if not fully compatible. When you create a 
"federated index" there. it's just a planner hint to the local database, 
so that it knows how expensive it is to evaluate a qual remotely vs. 
locally. It shouldn't matter what technology the remote index uses in 
that case, as long as the cost model is roughly the same as a b-tree.

I don't think we want to go down that path though, it's better to leave 
the cost estimation altogether to the wrapper. It has much better 
knowledge of expensive various quals are.

However, the wrapper will likely need some local storage for indexes and 
like to do the cost estimation. Or maybe it can just keep the 
information in cache, loading it on first use from the remote database.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: patch: SQL/MED(FDW) DDL

From
Shigeru HANADA
Date:
On Thu, 30 Sep 2010 09:26:54 +0300
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> > FYI, HiRDB, that implements FDW routines, has CREATE FOREIGN INDEX.
> > I think it is a little ugly and won't work in some cases -- for example,
> > index organized tables -- but evidently it's a realistic solution.
> 
> A long time ago I used DB2's federated database feature, which is at 
> least close to SQL/MED if not fully compatible. When you create a 
> "federated index" there. it's just a planner hint to the local database, 
> so that it knows how expensive it is to evaluate a qual remotely vs. 
> locally. It shouldn't matter what technology the remote index uses in 
> that case, as long as the cost model is roughly the same as a b-tree.
> 
> I don't think we want to go down that path though, it's better to leave 
> the cost estimation altogether to the wrapper. It has much better 
> knowledge of expensive various quals are.
> 
> However, the wrapper will likely need some local storage for indexes and 
> like to do the cost estimation. Or maybe it can just keep the 
> information in cache, loading it on first use from the remote database.
How about having cost hints in generic option of the foreign table or
its columns?  Generic options are storage for wrappers, not for
PostgreSQL core modules.  Wrappers can use their own format to
represent various information, and use the hints to estimate costs of
a path.

In addition, I think that the generic option of a server could be used
to store cost hints which depend on each server, such as network
transfer overhead for dbms wrappers, or disk I/O for file-wrappers.

Regards,
--
Shigeru Hanada



Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Sep 29, 2010, at 10:09 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> Excerpts from Robert Haas's message of mar sep 28 10:26:54 -0400 2010:
>
>> Then:
>>
>> - Begin a sequential scan with the following set of quals.
>> - Begin an index scan using the index called X with the following set of quals.
>> - Fetch next tuple.
>> - End scan.
>
> I'm not sure that it's a good idea to embed into the FDW API the set of
> operations known to the executor.  For example your proposal fails to
> consider bitmap scans.  Seems simpler and more general to hand the quals
> over saying "I need to scan this relation with these quals", and have it
> return an opaque iterable object; the remote executor would be in charge
> of determining their usage for indexscans; or even for filtering tuples
> with quals that cannot be part of the index condition.

Yeah, that might be better.  Is it reasonable to assume we always want to push down as much as possible, or do we need
tothink about local work vs. remote work trade-offs? 

> There doesn't to be much point in knowing the names of remote indexes
> either (if it came to referencing them, better use OIDs)

Well, you can't assume the remote side is PG.

...Robert

Re: patch: SQL/MED(FDW) DDL

From
David Fetter
Date:
On Thu, Sep 30, 2010 at 10:29:56PM -0400, Robert Haas wrote:
> On Sep 29, 2010, at 10:09 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> > Excerpts from Robert Haas's message of mar sep 28 10:26:54 -0400 2010:
> > 
> >> Then:
> >> 
> >> - Begin a sequential scan with the following set of quals.
> >> - Begin an index scan using the index called X with the following set of quals.
> >> - Fetch next tuple.
> >> - End scan.
> > 
> > I'm not sure that it's a good idea to embed into the FDW API the
> > set of operations known to the executor.  For example your
> > proposal fails to consider bitmap scans.  Seems simpler and more
> > general to hand the quals over saying "I need to scan this
> > relation with these quals", and have it return an opaque iterable
> > object; the remote executor would be in charge of determining
> > their usage for indexscans; or even for filtering tuples with
> > quals that cannot be part of the index condition.
> 
> Yeah, that might be better.  Is it reasonable to assume we always
> want to push down as much as possible, or do we need to think about
> local work vs. remote work trade-offs?

In cases where the foreign side is not super reliable for correctness,
I'd say there's a good case for not trusting it.  Some of the Oracle
properties are like this.  I suppose we might want to make "trust the
other side to handle pushed quals" optional somehow, but I'm not
exactly sure how.

> > There doesn't to be much point in knowing the names of remote
> > indexes either (if it came to referencing them, better use OIDs)
> 
> Well, you can't assume the remote side is PG.

Very definitely not.

The main point of the feature is that it allows for *heterogeneous*
data stores.  Of course this doesn't prevent PostgreSQL from
optimizing aggressively using secret knowledge should the "foreign"
side happen to be PostgreSQL.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: patch: SQL/MED(FDW) DDL

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> On Thu, Sep 30, 2010 at 10:29:56PM -0400, Robert Haas wrote:
>> Yeah, that might be better.  Is it reasonable to assume we always
>> want to push down as much as possible, or do we need to think about
>> local work vs. remote work trade-offs?

> In cases where the foreign side is not super reliable for correctness,
> I'd say there's a good case for not trusting it.  Some of the Oracle
> properties are like this.  I suppose we might want to make "trust the
> other side to handle pushed quals" optional somehow, but I'm not
> exactly sure how.

ISTM that such decisions would be embedded in the FDW --- it's unlikely
that the FDW for Oracle would look even a little bit like the one for
Postgres anyway, so contemplating ways to parameterize them seems a
rather useless exercise.

What we need is an API that lets the FDW decide which quals it will take
responsibility for enforcing.  What happens beyond that point is not
ours to decide.
        regards, tom lane


Re: patch: SQL/MED(FDW) DDL

From
David Fetter
Date:
On Fri, Oct 01, 2010 at 10:54:47AM -0400, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > On Thu, Sep 30, 2010 at 10:29:56PM -0400, Robert Haas wrote:
> >> Yeah, that might be better.  Is it reasonable to assume we always
> >> want to push down as much as possible, or do we need to think about
> >> local work vs. remote work trade-offs?
> 
> > In cases where the foreign side is not super reliable for correctness,
> > I'd say there's a good case for not trusting it.  Some of the Oracle
> > properties are like this.  I suppose we might want to make "trust the
> > other side to handle pushed quals" optional somehow, but I'm not
> > exactly sure how.
> 
> ISTM that such decisions would be embedded in the FDW

s/embedded in/& and overrides communicable to/

In an upcoming version of DBI-Link, I've got some code that allows
transformations between types on PostgreSQL and foreign sources, for
now when they have the same name.  The transformations are implemented
on the PostgreSQL side, although I a plan for (optionally) pushing
them to the foreign side.

There's an input transformation and an output transformation, both of
which default to the identity transformation, i.e. "do nothing."
These transformations are over-ridable at the following levels, with
the finer-grained one taking precedence over the coarser should there
be more than one user-supplied transformation:

- database type (Oracle, CSV, etc.)
- catalog (Database cluster in PostgreSQL parlance, i.e. everything served on a single host/port combination.  DB2
actuallyimplements this under that name.)
 
- database
- schema
- table
- column

> --- it's unlikely that the FDW for Oracle would look even a little
> bit like the one for Postgres anyway, so contemplating ways to
> parameterize them seems a rather useless exercise.

See above for a parameterization :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Wed, Sep 29, 2010 at 3:56 AM, Shigeru HANADA
<hanada@metrosystems.co.jp> wrote:
> In current patch, fdwoptions for relations have been separated from
> reloptins by introducing pg_foreign_table catalog.  As mentioned in
> wiki, integration into rel/attoptions is nothing but an idea, we're
> willing to add pg_foreign_attribute catalog which has columns:
>
>    farelid     oid         not null
>    faattnum    smallint    not null
>    faoptions   text[]      not null
>
> Though this catalog has only fdwoptions as non-key value.
> Or, adding attrfdwoptions column to pg_attribute catalog is better?

I could go either way.  One possibility to isolate all this
information in a separate catalog, similar to what we recently did
with security labels.  That approach scales to dealing with many
object types (relations, attributes, functions, etc.).  That might be
overkill here, though: perhaps we should just add a column to each of
pg_class and pg_attribute.  I could go either way on this point.

>> I think the section about WHERE clause push-down is way off base.
>> First, it seems totally wrong to assume that the same functions and
>> operators will be defined on the remote side as you have locally;
>> indeed, for CSV files, you won't have anything defined on the remote
>> side at all.  You need some kind of a discovery mechanism here to
>> figure out which quals are push-downable.  And it should probably be
>> something generic, not a bunch of hard-wired rules that may or may not
>> be correct in any particular case.  What if the remote side is a
>> competing database product that doesn't understand X = ANY(Y)?
>> Second, even if a functions or operators does exist on both sides of
>> the link, how do you know whether they have compatible semantics?
>> Short of solving the entscheidungsproblem, you're not going to be able
>> to determine that algorithmically, so you need some kind of mechanism
>> for controlling what assumptions get made.  Otherwise, you'll end up
>> with queries that don't work and no way for the user to fix it.
>
> First of all, WHERE clause push-down ideas written in wiki are just
> for FDW for PostgreSQL, implicitly same version, so we have assumed
> that the remote side has same syntax/semantics.  WHERE clause
> push-down is implemented in postgresql_fdw, and optimizer/planner are
> not .

Well, that's not the way it looks like from reading the section
entitled "FDW routines".  What I think we're looking for here is a
general API...

> Using remote indexes might be very effective, but I think there are
> many issues.
>
> For instance, how can planner/optimizer know whether the foreign table
> has been indexed or not?  Checking remote catalogs for each scan must
> be a bad idea.  HiRDB, Hitachi's dbms product, seems to have
> introduced FOREIGN INDEX for that purpose.

Yeah, it's definitely not an easy problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Thu, Sep 30, 2010 at 3:48 AM, Shigeru HANADA
<hanada@metrosystems.co.jp> wrote:
> On Thu, 30 Sep 2010 09:26:54 +0300
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
>> > FYI, HiRDB, that implements FDW routines, has CREATE FOREIGN INDEX.
>> > I think it is a little ugly and won't work in some cases -- for example,
>> > index organized tables -- but evidently it's a realistic solution.
>>
>> A long time ago I used DB2's federated database feature, which is at
>> least close to SQL/MED if not fully compatible. When you create a
>> "federated index" there. it's just a planner hint to the local database,
>> so that it knows how expensive it is to evaluate a qual remotely vs.
>> locally. It shouldn't matter what technology the remote index uses in
>> that case, as long as the cost model is roughly the same as a b-tree.
>>
>> I don't think we want to go down that path though, it's better to leave
>> the cost estimation altogether to the wrapper. It has much better
>> knowledge of expensive various quals are.
>>
>> However, the wrapper will likely need some local storage for indexes and
>> like to do the cost estimation. Or maybe it can just keep the
>> information in cache, loading it on first use from the remote database.
> How about having cost hints in generic option of the foreign table or
> its columns?  Generic options are storage for wrappers, not for
> PostgreSQL core modules.  Wrappers can use their own format to
> represent various information, and use the hints to estimate costs of
> a path.

I do think we're going to need some kind of local caching of relevant
information from the foreign side.  Really, I doubt that fdwoptions
are the place for that, though: that's data for the user to input, not
a place for the wrapper to scribble on internally.  The trick is that
there's a lot of stuff you might want to cache, and we don't really
know anything about what the format of it is - for example, you might
have foreign-side statistics that need to get cached locally, but they
needn't be in the same format we use for pg_statistic.  Perhaps part
of setting up an FDW should be creating tables with prespecified
definitions and passing the table names to CREATE FOREIGN DATA WRAPPER
as options.  Maybe we could even arrange to set up the dependencies
appropriately...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Shigeru HANADA
Date:
On Mon, 4 Oct 2010 19:31:52 -0400
Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Sep 30, 2010 at 3:48 AM, Shigeru HANADA
> <hanada@metrosystems.co.jp> wrote:
> > How about having cost hints in generic option of the foreign table or
> > its columns? ?Generic options are storage for wrappers, not for
> > PostgreSQL core modules. ?Wrappers can use their own format to
> > represent various information, and use the hints to estimate costs of
> > a path.
> 
> I do think we're going to need some kind of local caching of relevant
> information from the foreign side.  Really, I doubt that fdwoptions
> are the place for that, though: that's data for the user to input, not
> a place for the wrapper to scribble on internally.  The trick is that
> there's a lot of stuff you might want to cache, and we don't really
> know anything about what the format of it is - for example, you might
> have foreign-side statistics that need to get cached locally, but they
> needn't be in the same format we use for pg_statistic.  Perhaps part
> of setting up an FDW should be creating tables with prespecified
> definitions and passing the table names to CREATE FOREIGN DATA WRAPPER
> as options.  Maybe we could even arrange to set up the dependencies
> appropriately...
Agreed.  I withdraw the idea to store foreign-side statistics into
generic options.

Can we treat statistics of a foreign table separately?

1. Same as local tables (maybe required)  (pg_statistic.*, pg_class.reltuples/relpages)
They will be used by planner/optimizer to estimate basic costs based
on tuple selectivity, result row count and so on.  Such statistics
could be generated by ANALYZE module if the FDW can supply all tuples
from foreign side.  The basic costs should be able to correct by FDW
via another API, because foreign queries might have some overheads,
such as connection and transfer.
ISTM that it is very difficult for non-PG FDW to generate PG-style
statistics correctly.

2. depend on FDW (optional)  (in various, arbitrary format)
They will be used by FDW to optimize query to be executed on
foreign-side in their own way.  As you say, new table(s) to store such
statistics can be created during creation of new FOREIGN DATA WRAPPER
or installation of new fdw_handler module.  Maybe ANALYZE should call
another API which collect these kind of statistics.

I think that(1) is necessary in the first step, but (2) is not.

Regards,
--
Shigeru Hanada



Re: patch: SQL/MED(FDW) DDL

From
Tom Lane
Date:
Shigeru HANADA <hanada@metrosystems.co.jp> writes:
> Can we treat statistics of a foreign table separately?

> 1. Same as local tables (maybe required)
>    (pg_statistic.*, pg_class.reltuples/relpages)

This whole discussion seems to me to be about trying to do things outside
the FDW that should properly be left inside the FDW.  Who's to say that
the remote side even *has* statistics of the sort that PG creates?

We should provide an API that lets the FDW return a cost estimate for a
proposed access path.  Where it gets the cost estimate from is not
something that should be presupposed.
        regards, tom lane


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Tue, Oct 5, 2010 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Shigeru HANADA <hanada@metrosystems.co.jp> writes:
>> Can we treat statistics of a foreign table separately?
>
>> 1. Same as local tables (maybe required)
>>    (pg_statistic.*, pg_class.reltuples/relpages)
>
> This whole discussion seems to me to be about trying to do things outside
> the FDW that should properly be left inside the FDW.  Who's to say that
> the remote side even *has* statistics of the sort that PG creates?
>
> We should provide an API that lets the FDW return a cost estimate for a
> proposed access path.  Where it gets the cost estimate from is not
> something that should be presupposed.

Unless there's some way for the FDW to have local tables for caching
its statistics, the chances of this having decent performance seem to
be near-zero.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Tue, Oct 5, 2010 at 8:10 AM, Shigeru HANADA
<hanada@metrosystems.co.jp> wrote:
> On Mon, 4 Oct 2010 19:31:52 -0400
> Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Sep 30, 2010 at 3:48 AM, Shigeru HANADA
>> <hanada@metrosystems.co.jp> wrote:
>> > How about having cost hints in generic option of the foreign table or
>> > its columns? ?Generic options are storage for wrappers, not for
>> > PostgreSQL core modules. ?Wrappers can use their own format to
>> > represent various information, and use the hints to estimate costs of
>> > a path.
>>
>> I do think we're going to need some kind of local caching of relevant
>> information from the foreign side.  Really, I doubt that fdwoptions
>> are the place for that, though: that's data for the user to input, not
>> a place for the wrapper to scribble on internally.  The trick is that
>> there's a lot of stuff you might want to cache, and we don't really
>> know anything about what the format of it is - for example, you might
>> have foreign-side statistics that need to get cached locally, but they
>> needn't be in the same format we use for pg_statistic.  Perhaps part
>> of setting up an FDW should be creating tables with prespecified
>> definitions and passing the table names to CREATE FOREIGN DATA WRAPPER
>> as options.  Maybe we could even arrange to set up the dependencies
>> appropriately...
> Agreed.  I withdraw the idea to store foreign-side statistics into
> generic options.
>
> Can we treat statistics of a foreign table separately?
>
> 1. Same as local tables (maybe required)
>   (pg_statistic.*, pg_class.reltuples/relpages)
> They will be used by planner/optimizer to estimate basic costs based
> on tuple selectivity, result row count and so on.  Such statistics
> could be generated by ANALYZE module if the FDW can supply all tuples
> from foreign side.  The basic costs should be able to correct by FDW
> via another API, because foreign queries might have some overheads,
> such as connection and transfer.
> ISTM that it is very difficult for non-PG FDW to generate PG-style
> statistics correctly.
>
> 2. depend on FDW (optional)
>   (in various, arbitrary format)
> They will be used by FDW to optimize query to be executed on
> foreign-side in their own way.  As you say, new table(s) to store such
> statistics can be created during creation of new FOREIGN DATA WRAPPER
> or installation of new fdw_handler module.  Maybe ANALYZE should call
> another API which collect these kind of statistics.
>
> I think that(1) is necessary in the first step, but (2) is not.

I disagree.  I wouldn't bother doing either one of them unless you can do both.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Oct 5, 2010 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This whole discussion seems to me to be about trying to do things outside
>> the FDW that should properly be left inside the FDW. �Who's to say that
>> the remote side even *has* statistics of the sort that PG creates?
>> 
>> We should provide an API that lets the FDW return a cost estimate for a
>> proposed access path. �Where it gets the cost estimate from is not
>> something that should be presupposed.

> Unless there's some way for the FDW to have local tables for caching
> its statistics, the chances of this having decent performance seem to
> be near-zero.

Perhaps, but that would be the FDW's problem to implement.  Trying to
design such tables in advance of actually writing an FDW seems like a
completely backwards design process.

(I'd also say that your performance estimate is miles in advance of any
facts; but even if it's true, the caching ought to be inside the FDW,
because we have no clear idea of what it will need to cache.)
        regards, tom lane


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Tue, Oct 5, 2010 at 10:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Oct 5, 2010 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> This whole discussion seems to me to be about trying to do things outside
>>> the FDW that should properly be left inside the FDW.  Who's to say that
>>> the remote side even *has* statistics of the sort that PG creates?
>>>
>>> We should provide an API that lets the FDW return a cost estimate for a
>>> proposed access path.  Where it gets the cost estimate from is not
>>> something that should be presupposed.
>
>> Unless there's some way for the FDW to have local tables for caching
>> its statistics, the chances of this having decent performance seem to
>> be near-zero.
>
> Perhaps, but that would be the FDW's problem to implement.  Trying to
> design such tables in advance of actually writing an FDW seems like a
> completely backwards design process.

Oh, I agree.  I don't want to dictate the structure of those tables; I
just think it's inevitable that an FDW is going to need the ability to
be bound to some local tables which the admin should set up before
installing it.  That is, we need a general capability, not a specific
set of tables.

> (I'd also say that your performance estimate is miles in advance of any
> facts; but even if it's true, the caching ought to be inside the FDW,
> because we have no clear idea of what it will need to cache.)

I can't imagine how an FDW could possibly be expected to perform well
without some persistent local data storage.  Even assume the remote
end is PG.  To return a cost, it's going to need the contents of
pg_statistic cached locally, for each remote table.  Do you really
think it's going to work to incur that overhead once per table per
backend startup?  Or else every time we try to plan against a foreign
table we can fire off an SQL query to the remote side instead of
trying to compute the cost locally.  That's got to be two orders of
magnitude slower than planning based off local stats.

We could punt the issue of stats altogether for the first version and
simply say, hey, this is only intended for things like reading from
CSV files.  But if we're going to have it at all then I can't see how
we're going to get by without persistent local storage.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Oct 5, 2010 at 10:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (I'd also say that your performance estimate is miles in advance of any
>> facts; but even if it's true, the caching ought to be inside the FDW,
>> because we have no clear idea of what it will need to cache.)

> I can't imagine how an FDW could possibly be expected to perform well
> without some persistent local data storage.  Even assume the remote
> end is PG.  To return a cost, it's going to need the contents of
> pg_statistic cached locally, for each remote table.

Or it could ask the remote side.

> Do you really
> think it's going to work to incur that overhead once per table per
> backend startup?

If you have a cache, how are you going to manage updates of it?

IMO this is a *clear* case of premature optimization being the root of
all evil.  We should get it working first and then see what needs to be
optimized by measuring, rather than guessing in a vacuum.

(BTW, if the remote end is PG I would hardly think that we'd send SQL
queries at all.  If we're interested in micro-optimization, we'd devise
some lower-level protocol.)
        regards, tom lane


Re: patch: SQL/MED(FDW) DDL

From
Heikki Linnakangas
Date:
On 05.10.2010 17:56, Robert Haas wrote:
> On Tue, Oct 5, 2010 at 10:41 AM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> (I'd also say that your performance estimate is miles in advance of any
>> facts; but even if it's true, the caching ought to be inside the FDW,
>> because we have no clear idea of what it will need to cache.)
>
> I can't imagine how an FDW could possibly be expected to perform well
> without some persistent local data storage.  Even assume the remote
> end is PG.  To return a cost, it's going to need the contents of
> pg_statistic cached locally, for each remote table.  Do you really
> think it's going to work to incur that overhead once per table per
> backend startup?

It doesn't seem completely out of the question to me. Sure, it's 
expensive, but it's only incurred the first time a remote table is 
accessed in a session. Local persistent storage would be nice, but a lot 
of applications might prefer to not use it anyway, to ensure that fresh 
statistics are used.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: patch: SQL/MED(FDW) DDL

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 05.10.2010 17:56, Robert Haas wrote:
>> I can't imagine how an FDW could possibly be expected to perform well
>> without some persistent local data storage.

> It doesn't seem completely out of the question to me.

One other point that could be made here is that it's not going to
surprise anybody that access to a remote table is really slow compared
to access to a local table.  Why would it surprise anybody if planning
for a remote table is really slow compared to planning for a local
table?  Where is the evidence that anyone would even *notice* the extra
planning time, compared to the execution time of the finished query?

This is still all about premature optimization.
        regards, tom lane


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Tue, Oct 5, 2010 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Oct 5, 2010 at 10:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> (I'd also say that your performance estimate is miles in advance of any
>>> facts; but even if it's true, the caching ought to be inside the FDW,
>>> because we have no clear idea of what it will need to cache.)
>
>> I can't imagine how an FDW could possibly be expected to perform well
>> without some persistent local data storage.  Even assume the remote
>> end is PG.  To return a cost, it's going to need the contents of
>> pg_statistic cached locally, for each remote table.
>
> Or it could ask the remote side.

FWIW, I mentioned that option in that part you didn't quote.

>> Do you really
>> think it's going to work to incur that overhead once per table per
>> backend startup?
>
> If you have a cache, how are you going to manage updates of it?

I'm not.  I'm going to let the FDW determine how often it would like
to refresh its cache, as well as what it would like to cache and in
what format it would like to cache it.

> IMO this is a *clear* case of premature optimization being the root of
> all evil.  We should get it working first and then see what needs to be
> optimized by measuring, rather than guessing in a vacuum.

I have no problem with punting the issue of remote statistics to some
time in the future.  But I don't think we should have a half-baked
implementation of remote statistics.  We should either do it right
(doing such testing as is necessary to establish what that means) or
not do it at all.  Frankly, if we could get from where we are today to
a workable implementation of this technology for CSV files in time for
9.1, I think that would be an impressive accomplishment.  Making it
work for more complicated cases is almost certainly material for 9.2,
9.3, 9.4, and maybe further out than that.

> (BTW, if the remote end is PG I would hardly think that we'd send SQL
> queries at all.  If we're interested in micro-optimization, we'd devise
> some lower-level protocol.)

Interesting.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Oct 5, 2010 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> IMO this is a *clear* case of premature optimization being the root of
>> all evil. �We should get it working first and then see what needs to be
>> optimized by measuring, rather than guessing in a vacuum.

> I have no problem with punting the issue of remote statistics to some
> time in the future.  But I don't think we should have a half-baked
> implementation of remote statistics.  We should either do it right
> (doing such testing as is necessary to establish what that means) or
> not do it at all.  Frankly, if we could get from where we are today to
> a workable implementation of this technology for CSV files in time for
> 9.1, I think that would be an impressive accomplishment.  Making it
> work for more complicated cases is almost certainly material for 9.2,
> 9.3, 9.4, and maybe further out than that.

I quite agree that it's going to take multiple release cycles to have
a really impressive version of SQL/MED.  What I'm saying is that caching
remote statistics is several versions out in my view of the world, and
designing support for it shouldn't be happening now.

Maybe we ought to take a step back and discuss what the development plan
ought to be, before we spend more time on details like this.  My idea of
a good development process would involve working in parallel on at least
two FDW adapters, so that we don't find we've skewed the API design to
meet the needs of just one adapter.  Probably a remote-PG adapter and a
local-CSV-file adapter would be a good combination.  I don't have any
clear idea of how soon we might expect to see how much functionality,
though.  Thoughts?
        regards, tom lane


Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Tue, Oct 5, 2010 at 12:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Oct 5, 2010 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> IMO this is a *clear* case of premature optimization being the root of
>>> all evil.  We should get it working first and then see what needs to be
>>> optimized by measuring, rather than guessing in a vacuum.
>
>> I have no problem with punting the issue of remote statistics to some
>> time in the future.  But I don't think we should have a half-baked
>> implementation of remote statistics.  We should either do it right
>> (doing such testing as is necessary to establish what that means) or
>> not do it at all.  Frankly, if we could get from where we are today to
>> a workable implementation of this technology for CSV files in time for
>> 9.1, I think that would be an impressive accomplishment.  Making it
>> work for more complicated cases is almost certainly material for 9.2,
>> 9.3, 9.4, and maybe further out than that.
>
> I quite agree that it's going to take multiple release cycles to have
> a really impressive version of SQL/MED.  What I'm saying is that caching
> remote statistics is several versions out in my view of the world, and
> designing support for it shouldn't be happening now.

Fair enough.

> Maybe we ought to take a step back and discuss what the development plan
> ought to be, before we spend more time on details like this.

Good idea.

> My idea of
> a good development process would involve working in parallel on at least
> two FDW adapters, so that we don't find we've skewed the API design to
> meet the needs of just one adapter.  Probably a remote-PG adapter and a
> local-CSV-file adapter would be a good combination.  I don't have any
> clear idea of how soon we might expect to see how much functionality,
> though.  Thoughts?

I'm somewhat afraid that a remote-PG adapter will turn into a can of
worms.  If we give up on remote statistics, does that mean we're also
giving up on index use on the remote side?  I fear that we'll end up
crafting partial solutions that will only end up getting thrown away,
but after a lot of work has been invested in them.  I wonder if we
should focus on first efforts on really simple cases like CSV files
(as you mentioned) and perhaps something like memcached, which has
different properties than a CSV file, but extremely simple ones.  I
think it's inevitable that the API is going to get more complicated
from release to release and probably not in backward-compatible ways;
I think it's too early to be worried about that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: patch: SQL/MED(FDW) DDL

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I'm somewhat afraid that a remote-PG adapter will turn into a can of
> worms.  If we give up on remote statistics, does that mean we're also
> giving up on index use on the remote side?  I fear that we'll end up
> crafting partial solutions that will only end up getting thrown away,
> but after a lot of work has been invested in them.

Per Fred Brooks: "Plan to throw one away.  You will anyhow."

> I wonder if we
> should focus on first efforts on really simple cases like CSV files
> (as you mentioned) and perhaps something like memcached, which has
> different properties than a CSV file, but extremely simple ones.  I
> think it's inevitable that the API is going to get more complicated
> from release to release and probably not in backward-compatible ways;
> I think it's too early to be worried about that.

+1 ... we can *not* assume that we know enough to get the API right
the first time.  It'll have to be stated up front that it's subject to
change and you probably shouldn't be trying to develop FDWs separately
from the main project for awhile yet.

I think that looking only at CSV might be too simple, though.  It'd be
good to be at least trying to build a remote-PG adapter, even with the
knowledge that it's all throwaway code.
        regards, tom lane


Re: patch: SQL/MED(FDW) DDL

From
Shigeru HANADA
Date:
On Tue, 05 Oct 2010 14:59:27 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I wonder if we
> > should focus on first efforts on really simple cases like CSV files
> > (as you mentioned) and perhaps something like memcached, which has
> > different properties than a CSV file, but extremely simple ones.  I
> > think it's inevitable that the API is going to get more complicated
> > from release to release and probably not in backward-compatible ways;
> > I think it's too early to be worried about that.
> 
> +1 ... we can *not* assume that we know enough to get the API right
> the first time.  It'll have to be stated up front that it's subject to
> change and you probably shouldn't be trying to develop FDWs separately
> from the main project for awhile yet.
> 
> I think that looking only at CSV might be too simple, though.  It'd be
> good to be at least trying to build a remote-PG adapter, even with the
> knowledge that it's all throwaway code.
Thanks for comments, and sorry for delayed reply.

I propose revised version of FDW API which supports both of CSV and PG. 
It would be a minimal API to support SQL/MED, so some features such as
optimizing query or using remote index are not considered much.

======================
This FDW API is designed to support scanning a foreign table.  As
SQL/MED standard says, updating external data is not supported.  The
instance of a foreign table may be a CSV file or a table on an
external PostgreSQL server, or anything which could generate
PG-style tuples.

Scanning of a foreign table will be done in some steps below.
Note: FdwRoutine->XXX() means FDW API function.

1. Parser parses a query string and generate a query tree.  In this
step, a foreign table is not distinguished from a local table.

2. Optimizer generates a Path pathnode (pathtype = T_ForeignPath) for
each foreign table in the query.  Because foreign tables don't have
TID nor index (at least now), we don't add new Path type.

In this step, optimizer calls FdwRoutine->GetStatistics() to get
estimated costs of the foreign scan.  FDW can calculate costs by
itself with some statistics, or get costs from remote-side, or leave
the costs as is.

3. Planner generates a ForeignScan plannode for each foreign Path. 
ForeignScan could be typedef of Scan because it has no other member,
but it have been defined as a independent structure to make it be able
to add other members in the future.

4. Executor executes each ForeignScan node in some steps.  Through the
Step-4, FDW can keep their own information about the foreign scan in the
ForeignScan->FdwReply.

4-a. To initialize a ForeignScan plannodes, ExecInitForeignScan() is
called for each ForeignScan node.  In ExecInitForeignScan(),
ForeignScanState will be created from:
 * ScanState which hold same information as SeqScan. * FdwRoutine pointer as cache (for cache). * some catalog
informationabout the foreign table (for cache). * connection established with FdwRoutine->ConnectServer().   FDW can
useGetFSConnectionByName(name) to get pooled connection by   name in FdwRoutine->ConnectServer().
 

Then, ExecInitForeignScan() calls FdwRouteine->Open() to tell FDW that
it's time to start the query.  FDW can do some initialization if
necessary.

4-b. To retrieve a tuple from the foreign table, ExecForeignScan()
calls FdwRoutine->Iterate().  FDW should set next tuple into
ss_ScanTupleSlot of ScanState or empty the slot to indicate EOF.
Projection and qual evaluation will be done in ExecScan() later, so
ExecForeignScan() should return all columns/tuples in the external
data store.

4-c. To reset the scan and rewind cursor to the head of the foreign
table, ExecForeignReScan() calls FdwRoutine->ReOpen().  This occurs
when a ForeignScan node is a inner node of a nested loop join.
FDW is required to return the first tuple again at next
FdwRoutine->Iterate() call.

4-d. At the end of execution of ForeignScan, ExecEndForeignScan() is
called.  ExecEndForeignScan() calls FdwRoutine->Close() to tell FDW
that no more Iterate will called.  FDW can do some finalization if
necessary.

5. Connections which have established via FdwRoutine->ConnectServer()
are pooled in the backend for future query which accesses same foreign
server.  Pooling mechanism is implemented in core module, not in the
each FDW.

When a user executes DISCARD ALL command, or backend dies,
FdwRoutine->FreeFSConnection() is called for each foreign connection
to discard it.

To achieve features above, I propose following FDW API:

/** Almost same as SeqScan, but defined as a structure to allow add* members in the future.*/
typedef struct ForeignScan {   Scan    scan;
} ForeignScan;

/** Handle to access FDW-depend data.* Each FDW can use ForeignScanState->reply with casting between FdwReply* and
actualtype.*/
 
typedef FdwReply FdwReply;

/** ForeignScanState node is used to store scan status.*/
typedef struct ForeignScanState {   ScanState       ss;         /* its first field is NodeTag */
   FdwRoutine     *routine;    /* set of FDW routines */   ForeignDataWrapper *wrapper;/* foreign data wrapper */
ForeignServer *server;     /* foreign server */   FSConnection   *conn;       /* connection to the foreign server */
UserMapping   *user;       /* user mapping */   ForeignTable   *table;      /* foreign table */   FdwReply
*reply;     /* private data for each data wrapper */
 
} ForeignScanState;

typedef struct FdwRoutine
{   /*    * Connect to the foreign server identified by server and user.    */   FSConnection*
(*ConnectServer)(ForeignServer*server, UserMapping *user);
 
   /*    * Disconnect from the foreign server and free FSConnection object.    */   void
(*FreeFSConnection)(FSConnection*conn);
 
   /*    * Estimate costs of a foreign path.  FDW should update startup_cost    * and total_cost of the Path.    * To
estimatecost, PG FDW might generate "EXPLAIN" SQL and    * execute it on the remote side, or collect statistics in    *
somewhereand calculate from them.    * This parameters are same as cost_foreignscan() in    * optimizer/path/costsize.c
(thefunction is just a proxy).    */   void (*GetStatistics)(Path *path, PlannerInfo *root, RelOptInfo *baserel);
 
   /*    * Prepare to return tuples.    */   void (*Open)(ForeignScanState *scanstate);
   /*    * Fetch the next tuple and fill tupleslot with it, or clear the slot    * to indicate EOF.    */   void
(*Iterate)(ForeignScanState*scanstate);
 
   /*    * End the foreign scan and do some cleanup if necessary.    */   void (*Close)(ForeignScanState *scanstate);
   /*    * Re-initialize the foreign scan, used when the INNER executor node is    * executed again.    */   void
(*ReOpen)(ForeignScanState*scanstate);
 
} FdwRoutine;
======================

Issues from ideas above:

== Connection Pooling ==
Connect on each foreign scan is obviously useless, but it's open to
argument that who should manage the pooled connections, because for
CSV wrapper, no "connection" is needed at all, but PG wrapper might
connect once per database per user.

Maybe DBAs want to see how much and what kind of connections are
established now, so we should show list of active connections via a
view or function.  To treat connections transparently, I propose
that connection pooling mechanism in the core module, not in each FDW
module.  Or should we add API to return list of active connections in
common format?

If we have decided to leave connection management to each FDWs,
ConnectServer() and FreeFSConnection() can be removed from API.

== ANALYZE support ==
Even if a FDW wants to keep statistics in local area such as
pg_class.reltuples, current API doesn't provide appropriate timing.

Should we add API which is called from ANALYZE to allow FDWs to handle
statistics when user wants.

Any comments are welcome.

Regards,
--
Shigeru Hanada



Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Thu, Oct 7, 2010 at 7:20 AM, Shigeru HANADA
<hanada@metrosystems.co.jp> wrote:
> [ design for FDW ]

This seems mostly sensible.

> In this step, optimizer calls FdwRoutine->GetStatistics() to get
> estimated costs of the foreign scan.  FDW can calculate costs by
> itself with some statistics, or get costs from remote-side, or leave
> the costs as is.

I think this should be called EstimateCosts rather than GetStatistics.

> Maybe DBAs want to see how much and what kind of connections are
> established now, so we should show list of active connections via a
> view or function.  To treat connections transparently, I propose
> that connection pooling mechanism in the core module, not in each FDW
> module.  Or should we add API to return list of active connections in
> common format?

I think for starters we should make this the responsibility of each
FDW.  It seems a bit hard to speculate about what we might want down
the road at this point.

> Should we add API which is called from ANALYZE to allow FDWs to handle
> statistics when user wants.

I would probably put this off also, to a later phase of the project.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: patch: SQL/MED(FDW) DDL

From
Shigeru HANADA
Date:
On Mon, 11 Oct 2010 13:22:13 -0400
Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Oct 7, 2010 at 7:20 AM, Shigeru HANADA
> <hanada@metrosystems.co.jp> wrote:
> > In this step, optimizer calls FdwRoutine->GetStatistics() to get
> > estimated costs of the foreign scan. ?FDW can calculate costs by
> > itself with some statistics, or get costs from remote-side, or leave
> > the costs as is.
> 
> I think this should be called EstimateCosts rather than GetStatistics.

Agreed, the name you suggested would be better.
I've chosen the name GetStatistics because a function with that name
is defined in SQL/MED standard as below, but GetStatistics might
rather match handling of ANALYZE request in PostgreSQL.
   Section 22.3.28 GetStatistics (ISO/IEC 9075-9:2003)   > Retrieve implementation-defined statistics associated with a
 > foreign server request.
 

> > Maybe DBAs want to see how much and what kind of connections are
> > established now, so we should show list of active connections via a
> > view or function. ?To treat connections transparently, I propose
> > that connection pooling mechanism in the core module, not in each FDW
> > module. ?Or should we add API to return list of active connections in
> > common format?
> 
> I think for starters we should make this the responsibility of each
> FDW.  It seems a bit hard to speculate about what we might want down
> the road at this point.

Agreed.  I would move connection cache mechanism to postgresql_fdw. 
I also would remove pg_foreign_connections view because it might need
new interface to get list of active connections from FDWs.

> > Should we add API which is called from ANALYZE to allow FDWs to handle
> > statistics when user wants.
> 
> I would probably put this off also, to a later phase of the project.
Agreed.

Thanks for the review!
I'll finish the SQL/MED patch by applying your comments.

Regards,
--
Shigeru Hanada



Re: patch: SQL/MED(FDW) DDL

From
Robert Haas
Date:
On Tue, Oct 12, 2010 at 2:27 AM, Shigeru HANADA
<hanada@metrosystems.co.jp> wrote:
> Thanks for the review!
> I'll finish the SQL/MED patch by applying your comments.

I think this patch has gotten as much review as we can reasonably give
it this CommitFest (which is about to end), so I'm going to mark it
Returned with Feedback.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: patch: SQL/MED(FDW) DDL

From
Shigeru HANADA
Date:
On Wed, 13 Oct 2010 19:10:42 -0400
Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Oct 12, 2010 at 2:27 AM, Shigeru HANADA
> <hanada@metrosystems.co.jp> wrote:
> > Thanks for the review!
> > I'll finish the SQL/MED patch by applying your comments.
> 
> I think this patch has gotten as much review as we can reasonably give
> it this CommitFest (which is about to end), so I'm going to mark it
> Returned with Feedback.

Some new issues other than interface of FDW have been clarified
through reviews in this CommitFest, and it's difficult to solve them
in this CommitFest.  I would like to propose revised version of
SQL/MED patch in the next CommitFest.

Many thanks to your reviews.

Regards,
--
Shigeru Hanada