Feature Proposal: schema renaming in pg_dump/pg_restore - Mailing list pgsql-hackers
| From | Brad Arndt |
|---|---|
| Subject | Feature Proposal: schema renaming in pg_dump/pg_restore |
| Date | |
| Msg-id | CAE4qKNK1HjMTBAWG5TJU=YHySOD05S4bq7zu1-9Fg30hjt26zQ@mail.gmail.com Whole thread Raw |
| Responses |
Re: Feature Proposal: schema renaming in pg_dump/pg_restore
|
| List | pgsql-hackers |
First time going through the contribution process with postgres, but from what I read in the documentation, the best place to start with a new feature proposal is an email to this mailing list for feedback. So here goes...
pg_dump ... --dbname database_1 --schema schema_1 --rename-schema schema_2
pg_dump ... --dbname database_1 --schema schema_1 --rename-schema *:schema_2
Use Case:
We use pg_dump/pg_restore both as a mechanism to "archive"/"restore" data as well as a mechanism to move data between databases. As part of both of these operations, one thing we frequently run into is the need to rename the schema as part of the process (either to fit a naming convention, or due to naming conflicts).
For a concrete example, suppose we have the following:
- database_1
- schema_1
- schema_2
- database_2
- schema_1
And we wanted to copy database_1.schema_1 to database_2.schema_2.
What we have tried so far (and the drawbacks)
1. Using temporary schemas to avoid collisions.
In our example above, this looks like the following:a. Clone database_1.schema_1 to database_1.temp_schemab. dump/restore database_1.temp_schema to database_2.temp_schemac. rename database_2.temp_schema to database_2.schema_2d. delete database_1.temp_schemaThe obvious downside to this approach is the need for the temp schemas. Especially on database_1 where the data must be duplicated temporarily.
2. "Post-processing" the output of pg_dump to do renaming
Specifically:a. pg_dump database_1.schema_1 using SQL/text outputb. piping that output to a process which uses regular expressions to replace schema_1 with schema_2c. piping that output into pgsql for restore
While we were able to make this workflow work, it has a couple of drawbacks:
1. regular expression replacement can be brittle to maintain as pg_dump output changes
2. because we needed access to the text sql statements, we were limited to sql/text output from pg_dump and pgsql import which precluded us from taking advantage of many of the performance optimizations the other formats of pg_dump/pg_restore provide
My Proposal:
Build in the ability to rename schemas as part of pg_dump (and to a lesser extent pg_restore - more on that below) by adding a --rename-schema flag.
The flag could support:
- rename all schemas to a single replacement: --rename-schema <replacement name>
- rename a specific schema: --rename-schema <src schema>:<replacement name>
- or a combination (multiple flag occurrences): --rename-schema <schema1>:<schema1 replacement> --rename-schema <replacement for all other schemas>
For example, the following commands would all rename schema_1 to schema_2 in the dump output:
pg_dump ... --dbname database_1 --schema schema_1 --rename-schema schema_1:schema_2pg_dump ... --dbname database_1 --schema schema_1 --rename-schema schema_2
pg_dump ... --dbname database_1 --schema schema_1 --rename-schema *:schema_2
Secondary Feature: pg_restore replacement
Renaming as part of the dump works great when we know the ultimate restore name at the time of dumping. However, there are times we may want to dump a db to cold storage, and then restore it at a later time to an arbitary schema. Renaming on the restore is a little less precise than on dump as in most cases we are just working with the raw sql statements created by the dump. So for this use case, I'm proposing a more simplistic text find/replace. Because of the unstructured nature, it would be on the user to ensure that their replacement string is unique. For example (again, using the above scenario):
1. dump database_1.schema_1 and use the dump rename feature to rename schema1 to some_very_unique_schema in the dump file
2. restore at a later time doing a simple text replacement:
pg_restore ... --replace some_very_unique_schema:schema_1
What I have done so far / Next Steps:
I have a working proof of concept built on top of postgres 16.8 (what we currently have in production).
If the community feels this is a reasonable feature to add, I would like to go through the process of building on top of the release branch and submitting the patch. However, I wanted to get feedback on whether this is a reasonable feature to pursue first.
Open to any and all feedback. Thanks!!
-Brad
pgsql-hackers by date: