pgsql_fdw, FDW for PostgreSQL server - Mailing list pgsql-hackers

From Shigeru Hanada
Subject pgsql_fdw, FDW for PostgreSQL server
Date
Msg-id 4EA67D24.2030707@gmail.com
Whole thread Raw
Responses Re: pgsql_fdw, FDW for PostgreSQL server
Re: pgsql_fdw, FDW for PostgreSQL server
List pgsql-hackers
I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
contrib module.  I think that this module would be the basis of further
SQL/MED development for core, e.g. join-push-down and ANALYZE support.

I attached three patches for this new FDW.  They should be applied in
the order below.  I separated these patches so that first (or first two)
can be committed separately.

* fdw_helper_doc.patch provides documents for FDW developers about
helper functions existing in 9.1, so this can be back-patched.
* fdw_helper_funcs.patch provides additional helper functions which
would make manipulation of FDW options easier.
* pgsql_fdw.patch provides new FDW for external PG server.

Here are details of pgsql_fdw.

Name of the wrapper
===================
I used the name "pgsql_fdw" for the wrapper and its derivatives.  I
think it would be better to leave contrib/dblink and built-in
postgresql_fdw_validator for backward compatibility, and use new name
for new wrapper.

Or, it might be OK to rename postgresql_fdw_validator to
dblink_validator or something, or fix dblink to use validator of new
wrapper.  I'm not sure that dblink should be alone or integrated with
pgsql_fdw...

Connection management
=====================
The pgsql_fdw establishes a new connection when a foreign server is
accessed first for the local session.  Established connection is shared
between all foreign scans in the local query, and shared between even
scans in following queries.  Connections are discarded when the current
transaction aborts so that unexpected failure won't cause connection
leak.  This is implemented with resource owner mechanism.

User can see active connections via pgsql_fdw_connections view, and
discard arbitrary connection via pgsql_fdw_disconnect() function.  These
can be done from only same local session.

If local role has changed via SET ROLE or SET SESSION AUTHENTICATION,
pgsql_fdw ignores old role's connections and looks up appropriate
connection for the new role from the pool.  If there wasn't suitable
one, pgsql_fdw establishes new connection.  When local role has changed
to old role again, pooled connection will be used again.

Unlike contrib/dblink, one foreign server can have only one connection
at a time for one local role.  This is because pgsql_fdw doesn't support
named connections.

Cost estimation
===============
The pgsql_fdw executes an EXPLAIN command on remote side for each
PlanForeignScan call.  Returned costs and rows are used as local
estimation for the Path with adding connection costs and data transfer
costs.

SELECT optimization
===================
To reduce amount of data transferred from remote server, references to
unnecessary columns are replaced with NULL literal in remote query.

WHERE clause push-down
======================
Some kind of qualifiers in WHERE clause are pushed down to remote server
so that the query result can be reduced.  Currently qualifiers which
include any volatile or stable element can't be pushed down.  Even with
these limitations, most qualifiers would be pushed down in usual cases.

Cursor mode
===========
The pgsql_fdw switches the way to retrieve result records according to
estimated result rows; use simple SELECT for small result, and use
cursor with DECLARE/FETCH statements for large result.  The threshold
is default to 1000, and configurable with FDW option "min_cursor_rows".
In cursor mode, number of rows fetched at once can be controlled by FDW
option "fetch_count".

EXPLAIN output
==============
The pgsql_fdw shows a remote query used for each foreign scan node in
the output of EXPLAIN command with title "Remote SQL".  If pgsql_fdw
decided to use cursor for the scan, DECLARE statement is shown.

Regards,
--
Shigeru Hanada

Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Online base backup from the hot-standby
Next
From: Simon Riggs
Date:
Subject: Re: Hot Backup with rsync fails at pg_clog if under load