FDW-based dblink - Mailing list pgsql-hackers

From Itagaki Takahiro
Subject FDW-based dblink
Date
Msg-id 20090813184143.7714.52131E4D@oss.ntt.co.jp
Whole thread Raw
Responses Re: FDW-based dblink  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Re: FDW-based dblink  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
Here is a proposal to integrate contrib/dblink and SQL/MED (foreign
data wrapper).

Dblink manages connections and transactions by itself at the moment,
but there are some benefits to split database connectors into FDW.
Dblink will uses those multiple connectors. For example, we will be
able to retrieve data from Oracle into PostgreSQL directly if we had
Oracle-connector.
New syntax in SQL
-------------------
CREATE FOREIGN DATA WRAPPER postgres   CONNECTOR pg_catalog.dblink_postgres;
or
CREATE FOREIGN DATA WRAPPER postgres   OPTIONS (connector 'pg_catalog.dblink_postgres')

We don't have to modify gram.y if we take the latter syntax, but need
to modify VALIDATORs to distinguish 'connector' and other options.
The 'connector' option should not be passed as connection string.
New interface in C
--------------------
pg_catalog.dblink_postgres is a function that havs folloing prototype:
   Connection *connector(List *defElems);

The argument 'defElems' is a concatenated connection options
in FDW, server, and user-mapping.

Also new two interfaces will be introduced:
   interface Connection    /* represents PGconn */   {       void    disconnect(self);       Cursor *open(self, query,
fetchsize);/* for SELECT */       int64   exec(self, query);            /* for UPDATE, INSERT, DELETE */       bool
transaction_command(self,type);   }
 
   interface Cursor    /* represents PGresult and server-side cursor */   {       bool fetch(self, OUT values);
voidclose(self);   }
 

They have some methods implemented with function pointers. The benefit
of using function pointers is that we only have to export one connector
function to pg_proc.

The Cursor interface represents both result-set and server-side cursor.
PostgreSQL has SQL-level cursor, but there are some database that have
protocol-level cursor. This abstraction layer is needed for dblink to
handle connectors to other databases.
Other features
----------------
Present dblink is a thin wrapper of libpq, but some of my customers
want automatic transaction managements. Remote transactions are
committed with 2PC when the local transaction is committed.
To achieve it, I think we need on-commit trigger is needed,
but it is hard to implement with current infrastructure.
(That is one of the reason I proposed to merge dblink into core.)
Other considerations
----------------------
The proposed method is a SQL-based connector. There might be another
approach -- ScanKey-based connector. It is similar to the index access
method interface (pg_am). It takes relation id and scankeys instead of
SQL text. The scanKey-based approach will work better if we try to pass
WHERE-clause to an external database. However, I think we need SQL-based
interface in any case. ScanKey will be converted to SQL and passed
to an external database.


I have a prototype of the feature. I'd like to submit it for 8.5.
Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Pierre Frédéric Caillaud
Date:
Subject: Re: COPY speedup
Next
From: Peter Eisentraut
Date:
Subject: trigger functions can only be called as triggers