Re: Option to dump foreign data in pg_dump - Mailing list pgsql-hackers

From Luis Carril
Subject Re: Option to dump foreign data in pg_dump
Date
Msg-id LEXPR01MB0591470841E51370BE9A3644E7E20@LEXPR01MB0591.DEUPRD01.PROD.OUTLOOK.DE
Whole thread Raw
In response to Re: Option to dump foreign data in pg_dump  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: Option to dump foreign data in pg_dump  (David Steele <david@pgmasters.net>)
List pgsql-hackers
Hi everyone,

I am just responding on the latest mail on this thread. But the question is about functionality. The proposal is to add a single flag --include-foreign-data which controls whether or not data is dumped for all the foreign tables in a database. That may not serve the purpose. A foreign table may point to a view, materialized view or inheritance tree, and so on. A database can have foreign tables pointing to all of those kinds. Restoring data to a view won't be possible and restoring it into an inheritance tree would insert it into the parent only and not the children. Further, a user may not want the data to be dumped for all the foreign tables since their usages are different esp. considering restore. I think a better option is to extract data in a foreign table using --table if that's the only usage. Otherwise, we need a foreign table level flag indicating whether pg_dump should dump the data for that foreign table or not.

The option enables the user to dump data of tables backed by a specific foreign_server. It is up to the user to guarantee that the foreign server is also writable, that is the reason to make the option opt-in. The option can be combined with --table to dump specific tables if needed. If the user has different foreign servers in the database has to make the conscious decision of dumping each one of them. Without this option the user is totally unable to do it.


> On 2020-01-21 10:36, Luis Carril wrote:
>>> Yes we can support --include-foreign-data without parallel option and
>>> later add support for parallel option as a different patch.
>>
>> Hi,
>>
>>      I've attached a new version of the patch in which an error is
>> emitted if the parallel backup is used with the --include-foreign-data
>> option.
>
> This seems like an overreaction.  The whole point of
> lockTableForWorker() is to avoid deadlocks, but foreign tables don't
> have locks, so it's not a problem.  I think you can just skip foreign
> tables in lockTableForWorker() using the same logic that getTables() uses.
>
> I think parallel data dump would be an especially interesting option
> when using foreign tables, so it's worth figuring this out.

What do you think of Peter's comment?
I took a look at it, we could skip foreign tables by checking the catalog in lockTableForWorker but this would imply an extra query per call to the function (as in getTables), which would be irrelevant for most of the cases. Or we could pass in the TocEntry that it is a foreign table (although that seems highly specific).
Also, would it not be possible to offer support of LOCK TABLE on foreign tables?

At this point I would like to leave the patch as is, and discuss further improvement in a future patch.

Luis M.


From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Sent: Wednesday, March 4, 2020 5:39 PM
To: David Steele <david@pgmasters.net>
Cc: Luis Carril <luis.carril@swarm64.com>; vignesh C <vignesh21@gmail.com>; Peter Eisentraut <peter.eisentraut@2ndquadrant.com>; Alvaro Herrera <alvherre@2ndquadrant.com>; Daniel Gustafsson <daniel@yesql.se>; Laurenz Albe <laurenz.albe@cybertec.at>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Option to dump foreign data in pg_dump
 
I am just responding on the latest mail on this thread. But the question is about functionality. The proposal is to add a single flag --include-foreign-data which controls whether or not data is dumped for all the foreign tables in a database. That may not serve the purpose. A foreign table may point to a view, materialized view or inheritance tree, and so on. A database can have foreign tables pointing to all of those kinds. Restoring data to a view won't be possible and restoring it into an inheritance tree would insert it into the parent only and not the children. Further, a user may not want the data to be dumped for all the foreign tables since their usages are different esp. considering restore. I think a better option is to extract data in a foreign table using --table if that's the only usage. Otherwise, we need a foreign table level flag indicating whether pg_dump should dump the data for that foreign table or not.

On Wed, Mar 4, 2020 at 12:41 AM David Steele <david@pgmasters.net> wrote:
Hi Luis,

On 1/29/20 11:05 AM, Peter Eisentraut wrote:
> On 2020-01-21 10:36, Luis Carril wrote:
>>> Yes we can support --include-foreign-data without parallel option and
>>> later add support for parallel option as a different patch.
>>
>> Hi,
>>
>>      I've attached a new version of the patch in which an error is
>> emitted if the parallel backup is used with the --include-foreign-data
>> option.
>
> This seems like an overreaction.  The whole point of
> lockTableForWorker() is to avoid deadlocks, but foreign tables don't
> have locks, so it's not a problem.  I think you can just skip foreign
> tables in lockTableForWorker() using the same logic that getTables() uses.
>
> I think parallel data dump would be an especially interesting option
> when using foreign tables, so it's worth figuring this out.

What do you think of Peter's comment?

Regards,
--
-David
david@pgmasters.net




--
Best Wishes,
Ashutosh Bapat

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Should we remove a fallback promotion? take 2
Next
From: Asif Rehman
Date:
Subject: Re: proposal: schema variables