Re: Option to dump foreign data in pg_dump - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Option to dump foreign data in pg_dump
Date
Msg-id 8001.1573759651@sss.pgh.pa.us
Whole thread Raw
In response to Re: Option to dump foreign data in pg_dump  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Nov-12, Luis Carril wrote:
>> But, not all foreign tables are necessarily in a remote server like
>> the ones referenced by the postgres_fdw.
>> In FDWs like swarm64da, cstore, citus or timescaledb, the foreign
>> tables are part of your database, and one could expect that a dump of
>> the database includes data from these FDWs.

> BTW these are not FDWs in the "foreign" sense at all; they're just
> abusing the FDW system in order to be able to store data in some
> different way.  The right thing to do IMO is to port these systems to be
> users of the new storage abstraction (table AM).  If we do that, what
> value is there to the feature being proposed here?

That is a pretty valid point.  I'm not sure however that there would
be *no* use-cases for the proposed option if all of those FDWs were
converted to table AMs.  Also, even if the authors of those systems
are all hard at work on such a conversion, it'd probably be years
before the FDW implementations disappear from the wild.

Having said that, I'm ending up -0.5 or so on the patch as it stands,
mainly because it seems like it is bringing way more maintenance
burden than it's realistically worth.  I'm particularly unhappy about
the proposed regression test additions --- the cycles added to
check-world, and the maintenance effort that's inevitably going to be
needed for all that code, seem unwarranted for something that's at
best a very niche use-case.  And, despite the bulk of the test
additions, they're in no sense offering an end-to-end test, because
that would require successfully reloading the data as well.

That objection could be addressed, perhaps, by scaling down the tests
to just have a goal of exercising the new pg_dump option-handling
code, and not to attempt to do meaningful data extraction from a
foreign table.  You could do that with an entirely dummy foreign data
wrapper and server (cf. sql/foreign_data.sql).  I'm imagining perhaps
create two dummy servers, of which only one has a table, and we ask to
dump data from the other one.  This would cover parsing and validation
of the --include-foreign-data option, and make sure that we don't dump
from servers we're not supposed to.  It doesn't actually dump any
data, but that part is a completely trivial aspect of the patch,
really, and almost all of the code relevant to that does get tested
already.

In the department of minor nitpicks ... why bother with joining to
pg_foreign_server in the query that retrieves a foreign table's
server OID?  ft.ftserver is already the answer you seek.  Also,
I think it'd be wise from a performance standpoint to skip doing
that query altogether in the normal case where --include-foreign-data
hasn't been requested.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: format of pg_upgrade loadable_libraries warning
Next
From: Andrew Dunstan
Date:
Subject: Re: ssl passphrase callback