Feature Proposal: schema renaming in pg_dump/pg_restore - Mailing list pgsql-hackers

From Brad Arndt
Subject Feature Proposal: schema renaming in pg_dump/pg_restore
Date
Msg-id CAE4qKNK1HjMTBAWG5TJU=YHySOD05S4bq7zu1-9Fg30hjt26zQ@mail.gmail.com
Whole thread Raw
Responses Re: Feature Proposal: schema renaming in pg_dump/pg_restore
List pgsql-hackers
First time going through the contribution process with postgres, but from what I read in the documentation, the best place to start with a new feature proposal is an email to this mailing list for feedback.  So here goes...

Use Case:
We use pg_dump/pg_restore both as a mechanism to "archive"/"restore" data as well as a mechanism to move data between databases.  As part of both of these operations, one thing we frequently run into is the need to rename the schema as part of the process (either to fit a naming convention, or due to naming conflicts).

For a concrete example, suppose we have the following:
  • database_1
    • schema_1
    • schema_2
  • database_2
    • schema_1
And we wanted to copy database_1.schema_1 to database_2.schema_2.

What we have tried so far (and the drawbacks)

1. Using temporary schemas to avoid collisions.  

In our example above, this looks like the following:
  a. Clone database_1.schema_1 to database_1.temp_schema
  b. dump/restore database_1.temp_schema to database_2.temp_schema
  c. rename database_2.temp_schema to database_2.schema_2
  d. delete database_1.temp_schema

The obvious downside to this approach is the need for the temp schemas.  Especially on database_1 where the data must be duplicated temporarily.

2. "Post-processing" the output of pg_dump to do renaming

Specifically:
a. pg_dump database_1.schema_1 using SQL/text output
b. piping that output to a process which uses regular expressions to replace schema_1 with schema_2
c. piping that output into pgsql for restore 

While we were able to make this workflow work, it has a couple of drawbacks:
1. regular expression replacement can be brittle to maintain as pg_dump output changes
2. because we needed access to the text sql statements, we were limited to sql/text output from pg_dump and pgsql import which precluded us from taking advantage of many of the performance optimizations the other formats of pg_dump/pg_restore provide

My Proposal:
Build in the ability to rename schemas as part of pg_dump (and to a lesser extent pg_restore - more on that below) by adding a --rename-schema flag.

The flag could support:
- rename all schemas to a single replacement: --rename-schema <replacement name>
- rename a specific schema: --rename-schema <src schema>:<replacement name>
- or a combination (multiple flag occurrences): --rename-schema <schema1>:<schema1 replacement> --rename-schema <replacement for all other schemas>

For example, the following commands would all rename schema_1 to schema_2 in the dump output:
pg_dump ... --dbname database_1 --schema schema_1 --rename-schema schema_1:schema_2
pg_dump ... --dbname database_1 --schema schema_1 --rename-schema schema_2
pg_dump ... --dbname database_1 --schema schema_1 --rename-schema *:schema_2

Secondary Feature: pg_restore replacement
Renaming as part of the dump works great when we know the ultimate restore name at the time of dumping.  However, there are times we may want to dump a db to cold storage, and then restore it at a later time to an arbitary schema. Renaming on the restore is a little less precise than on dump as in most cases we are just working with the raw sql statements created by the dump.  So for this use case, I'm proposing a more simplistic text find/replace.  Because of the unstructured nature, it would be on the user to ensure that their replacement string is unique.  For example (again, using the above scenario):

1. dump database_1.schema_1 and use the dump rename feature to rename schema1 to some_very_unique_schema in the dump file
2. restore at a later time doing a simple text replacement:
    pg_restore ... --replace some_very_unique_schema:schema_1


What I have done so far / Next Steps:
I have a working proof of concept built on top of postgres 16.8 (what we currently have in production).  

If the community feels this is a reasonable feature to add, I would like to go through the process of building on top of the release branch and submitting the patch.  However, I wanted to get feedback on whether this is a reasonable feature to pursue first.

Open to any and all feedback. Thanks!!

-Brad





pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: IPC/MultixactCreation on the Standby server
Next
From: Mihail Nikalayeu
Date:
Subject: Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY