Re: FDW for PostgreSQL - Mailing list pgsql-hackers

From Tom Lane
Subject Re: FDW for PostgreSQL
Date
Msg-id 22766.1360859657@sss.pgh.pa.us
Whole thread Raw
In response to Re: FDW for PostgreSQL  (Shigeru Hanada <shigeru.hanada@gmail.com>)
Responses Re: FDW for PostgreSQL  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> On Thu, Feb 14, 2013 at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> * AFAICT, the patch expects to use a single connection for all
>> operations initiated under one foreign server + user mapping pair.
>> I don't think this can possibly be workable.  For instance, we don't
>> really want postgresIterateForeignScan executing the entire remote query
>> to completion and stashing the results locally -- what if that's many
>> megabytes?

> It uses single-row-mode of libpq and TuplestoreState to keep result
> locally, so it uses limited memory at a time.  If the result is larger
> than work_mem, overflowed tuples are written to temp file.  I think
> this is similar to materializing query results.

Well, yeah, but that doesn't make it an acceptable solution.  Consider
for instance "SELECT * FROM huge_foreign_table LIMIT 10".  People are
not going to be satisfied if that pulls back the entire foreign table
before handing them the 10 rows.  Comparable performance problems can
arise even without LIMIT, for instance in handling of nestloop inner
scans.

>> I think we'd better be prepared to allow multiple similar connections.

> Main reason to use single connection is to make multiple results
> retrieved from same server in a local query consistent.

Hmm.  That could be a good argument, although the current patch pretty
much destroys any such advantage by being willing to use READ COMMITTED
mode on the far end --- with that, you lose any right to expect
snapshot-consistent data anyway.  I'm inclined to think that maybe we
should always use at least REPEATABLE READ mode, rather than blindly
copying the local transaction mode.  Or maybe this should be driven by a
foreign-server option instead of looking at the local mode at all?

Anyway, it does seem like maybe we need to use cursors so that we can
have several active scans that we are pulling back just a few rows at a
time from.

I'm not convinced that that gets us out of the woods though WRT needing
only one connection.  Consider a query that is scanning some foreign
table, and it calls a plpgsql function, and that function (inside an
EXCEPTION block) does a query that scans another foreign table on the
same server.  This second query gets an error on the remote side.  If
the error is caught via the exception block, and the outer query
continues, what then?  We could imagine adding enough infrastructure
to establish a remote savepoint for each local subtransaction and clean
things up on failure, but no such logic is in the patch now, and I think
it wouldn't be too simple either.  The least painful way to make this
scenario work, given what's in the patch, is to allow such a
subtransaction to use a separate connection.

In any case, I'm pretty well convinced that the connection-bookkeeping
logic needs a major rewrite to have any hope of working in
subtransactions.  I'm going to work on that first and see where it leads.

>> * I find postgres_fdw_get_connections() and postgres_fdw_disconnect()
>> to be a bad idea altogether.

> I agree that separate the issue from FDW core.

OK, so we'll drop these from the current version of the patch and
revisit the problem of closing connections later.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal or just idea for psql - show first N rows from relation backslash statement
Next
From: Pavel Stehule
Date:
Subject: Re: proposal or just idea for psql - show first N rows from relation backslash statement