Thread: Restoring only a subset of schemas

Restoring only a subset of schemas

From
Sylvain Cuaz
Date:
Hi all,

     I have a DB with one schema named "Common" holding data referenced by other schemas. All other 
schemas have the same structure (tables and fields) and are named "cXXX" where XXX is just an int. 
Thus the only cross-schema foreign keys are in "cXXX" pointing to "Common", and each "cXXX" is 
completely independent of other "cXXX" schemas.
     Now if I want to restore from a full dump of this DB, but with only one "cXXX" and the "Common" 
schema :
- if I pass --create --schema=Common, then the CREATE SCHEMA is missing, i.e. it only emits data 
inside "Common" and the restore fails.
- if I could pass --create --exclude-schema='c*' (fictional notation as patterns are only recognized 
by pg_dump), then all schemas would be created, with no data inside except for "Common". Creating 
all schemas is a waste of time, but more importantly would make restoring other schemas more 
difficult (e.g. rows should be inserted before creating foreign keys).
Note : to check the behaviour of pg_restore above, I pass -f- to check the SQL as it is far quicker 
than to actually restore a DB.

Maybe a new --include-create-schema option should be added to emit CREATE SCHEMA in addition to 
objects inside it ? That way I could :
1. --create --include-create-schema --schema=Common and have a DB with all DB-level properties 
(DEFAULT PRIVILEGES, COMMENT, SET parameter, etc.) and one schema with all of its data and 
schema-level properties (DEFAULT PRIVILEGES, COMMENT, GRANT USAGE).
2. then at any point later without the --create, with as many schemas I need :  
--include-create-schema --schema=cXXX. And if I need to reset a "cXXX" schema, just manually DROP 
SCHEMA and restore again.

Similarly, maybe add --exclude-create-schema to additionally exclude CREATE SCHEMA for schemas 
targeted by --exclude-schema.

IOW --schema and --exclude-schema both target objects inside schemas, these 2 new options would 
allow to also have control on the schemas themselves (and their properties like DEFAULT PRIVILEGES, 
COMMENT, etc.)

Cheers,

Sylvain




Re: Restoring only a subset of schemas

From
Adrian Klaver
Date:
On 3/17/25 07:57, Sylvain Cuaz wrote:
> Hi all,
> 
>      I have a DB with one schema named "Common" holding data referenced 
> by other schemas. All other schemas have the same structure (tables and 
> fields) and are named "cXXX" where XXX is just an int. Thus the only 
> cross-schema foreign keys are in "cXXX" pointing to "Common", and each 
> "cXXX" is completely independent of other "cXXX" schemas.
>      Now if I want to restore from a full dump of this DB, but with only 
> one "cXXX" and the "Common" schema :
> - if I pass --create --schema=Common, then the CREATE SCHEMA is missing, 
> i.e. it only emits data inside "Common" and the restore fails.

I am not seeing that.

For:

pg_dump -d test -U postgres -s --create --schema=other_sch --schema=public

In the output I get:


[...]

CREATE SCHEMA other_sch;


ALTER SCHEMA other_sch OWNER TO postgres;

--
-- Name: public; Type: SCHEMA; Schema: -; Owner: pg_database_owner
--

CREATE SCHEMA public;


ALTER SCHEMA public OWNER TO pg_database_owner;

[...]

What is the complete command you are using for the pg_dump?

What Postgres version(s) are you using?


> 
> Cheers,
> 
> Sylvain
> 
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Restoring only a subset of schemas

From
Tom Lane
Date:
Sylvain Cuaz <sylvain@ilm-informatique.fr> writes:
>      Now if I want to restore from a full dump of this DB, but with only one "cXXX" and the "Common"
> schema :
> - if I pass --create --schema=Common, then the CREATE SCHEMA is missing, i.e. it only emits data
> inside "Common" and the restore fails.
> - if I could pass --create --exclude-schema='c*' (fictional notation as patterns are only recognized
> by pg_dump), then all schemas would be created, with no data inside except for "Common". Creating
> all schemas is a waste of time, but more importantly would make restoring other schemas more
> difficult (e.g. rows should be inserted before creating foreign keys).

In general, the solution for edge-case restore selection needs is to
make a list of the dump's contents with "pg_restore -l", edit out what
you don't want using any method you like, then use the edited list with
"pg_restore -L".

While I'd be in favor of improving pg_restore to accept wild-card
patterns, I'm very hesitant to start inventing new kinds of selection
switches for it.  The interactions between such switches would be a
mess.

            regards, tom lane



Re: Restoring only a subset of schemas

From
Sylvain Cuaz
Date:
Le 17/03/2025 à 16:21, Adrian Klaver a écrit :
> On 3/17/25 07:57, Sylvain Cuaz wrote:
>> Hi all,
>>
>>      I have a DB with one schema named "Common" holding data referenced by other schemas. All 
>> other schemas have the same structure (tables and fields) and are named "cXXX" where XXX is just 
>> an int. Thus the only cross-schema foreign keys are in "cXXX" pointing to "Common", and each 
>> "cXXX" is completely independent of other "cXXX" schemas.
>>      Now if I want to restore from a full dump of this DB, but with only one "cXXX" and the 
>> "Common" schema :
>> - if I pass --create --schema=Common, then the CREATE SCHEMA is missing, i.e. it only emits data 
>> inside "Common" and the restore fails.
>
> I am not seeing that.
>
> For:
>
> pg_dump -d test -U postgres -s --create --schema=other_sch --schema=public
>
> What is the complete command you are using for the pg_dump?

Hi,

     As I said I'm restoring, not dumping. I make daily full backups and sometimes need to restore a 
specific day, but the full database is quite big and I would like to only restore one or two schemas.

As you said, if one passes --create --schema to pg_dump then a valid SQL is produced with CREATE 
DATABASE, CREATE SCHEMA, CREATE TABLE and all objects inside the schema.

But if one passes --create --schema to pg_restore then an invalid SQL is produced because it 
contains CREATE DATABASE, CREATE TABLE but it doesn't contain the CREATE SCHEMA needed for the 
tables. Is there any reason for that discrepancy between dump & restore and for outputting invalid 
SQL ?

My proposed --include-create-schema would just add the CREATE SCHEMA so that pg_restore behaves the 
same as pg_dump, and would allow to output valid SQL. But ideally this option shouldn't even be 
needed because pg_restore would just emit CREATE SCHEMA like pg_dump.

>
> What Postgres version(s) are you using? 

A lot :-) But for this problem I'm using 13 & 15.


Cheers,

Sylvain.




Re: Restoring only a subset of schemas

From
Sylvain Cuaz
Date:
Le 17/03/2025 à 16:29, Tom Lane a écrit :
> Sylvain Cuaz <sylvain@ilm-informatique.fr> writes:
>>       Now if I want to restore from a full dump of this DB, but with only one "cXXX" and the "Common"
>> schema :
>> - if I pass --create --schema=Common, then the CREATE SCHEMA is missing, i.e. it only emits data
>> inside "Common" and the restore fails.
>> - if I could pass --create --exclude-schema='c*' (fictional notation as patterns are only recognized
>> by pg_dump), then all schemas would be created, with no data inside except for "Common". Creating
>> all schemas is a waste of time, but more importantly would make restoring other schemas more
>> difficult (e.g. rows should be inserted before creating foreign keys).
> In general, the solution for edge-case restore selection needs is to
> make a list of the dump's contents with "pg_restore -l", edit out what
> you don't want using any method you like, then use the edited list with
> "pg_restore -L".

Hi,

     I am aware of that feature, but that forces me to know every type of entry that pertains to a 
schema or database (e.g. DEFAULT ACL, ACL, COMMENT, DATABASE PROPERTIES, etc.) and what about new 
ones that will be added in the future ?

Further, I don't see how it's an edge-case, at the core I just want to restore some but not all the 
schemas. This is possible for pg_dump, see my response to Adrian Klaver.

> While I'd be in favor of improving pg_restore to accept wild-card
> patterns,
That would definitely be appreciated.
>   I'm very hesitant to start inventing new kinds of selection
> switches for it.  The interactions between such switches would be a
> mess.

Which interactions ? It seems to me that the name of the schema should be used as the namespace to 
check in _tocEntryRequired() in pg_backup_archiver.c, and then the dependent entries (e.g. ACL, 
COMMENT) would be handled around line 3050. I've attached a patch with some pseudo-code. In fact, 
were it not for compatibility, I'd argue that my proposed options should be the default, at least 
with --create, so as to neither output invalid SQL (for -n) nor extra unwanted ones (for -N) and to 
behave like pg_dump.

Cheers,

Sylvain


Attachment

Re: Restoring only a subset of schemas

From
Tom Lane
Date:
Sylvain Cuaz <sylvain@ilm-informatique.fr> writes:
> Further, I don't see how it's an edge-case, at the core I just want to restore some but not all the
> schemas. This is possible for pg_dump, see my response to Adrian Klaver.

You have a very good point that it's annoying that pg_restore's
--schema switch doesn't act like pg_dump's --schema switch.
I could support changing that, but only if we also did something
about the fact that pg_restore's switch doesn't handle patterns.

The pattern aspect is kind of a mess, because pg_dump implements
that through a server-side regex, which is a facility pg_restore
doesn't have access to.  Maybe it'd be good enough to implement
the "*" and "?" wildcards and stop there, but I'm not sure.

            regards, tom lane