Automatic tablespace management in pg_basebackup - Mailing list pgsql-hackers
From | Thom Brown |
---|---|
Subject | Automatic tablespace management in pg_basebackup |
Date | |
Msg-id | CAA-aLv7BRCNaqFeWp7Wx6ro74+WpWURyuVEYP0j=HNue1a3DTQ@mail.gmail.com Whole thread Raw |
List | pgsql-hackers |
Hi,
Manually specifying tablespace mappings in pg_basebackup, especially in environments where tablespaces can come and go, or with incremental backups, can be tedious and error-prone. I propose a solution using pattern-based mapping to automate this process.
So rather than having to specify.
-T /path/to/original/tablespace/a=/path/to/backup/tablespace/a -T /path/to/original/tablespace/b=/path/to/backup/tablespace/b
So rather than having to specify.
-T /path/to/original/tablespace/a=/path/to/backup/tablespace/a -T /path/to/original/tablespace/b=/path/to/backup/tablespace/b
And then coming up with a new location to map to for the subsequent incremental backups, perhaps we could have a parameter (I’m just going to choose M for “mapping”), like so:
-M %p/%d_backup_1.1
Where it can interpolate the following values:
%p = path
%d = directory
%l = label (not sure about this one)
Using the -M example above, when pg_basebackup finds:
/path/to/original/tablespace/a
/path/to/original/tablespace/b
It creates:
/path/to/original/tablespace/a_backup_1.1
/path/to/original/tablespace/b_backup_1.1
Or:
-M /path/to/backup/tablespaces/1.1/%d
Creates:
/path/to/backup/tablespaces/1.1/a
/path/to/backup/tablespaces/1.1/b
Or possibly allowing something like %l to insert the backup label.
For example:
-M /path/to/backup/tablespaces/%f_%l -l 1.1
Creates:
/path/to/backup/tablespaces/a_1.1
/path/to/backup/tablespaces/b_1.1
This of course would not work if there were tablespaces as follows:
/path/to/first/tablespace/a
/path/to/second/tablespace/a
Where %d would yield the same result for both tablespaces. However, this seems like an unlikely scenario as the tablespace name within the database would need to be unique, but then requires them to use a directory name that isn't unique. This could just be a scenario that isn't supported.
Perhaps even allow it to auto-increment a version number it defines itself. Maybe %v implies “make up a version number here, and if one existed in the manifest previously, increment it”.
-M %p/%d_backup_1.1
Where it can interpolate the following values:
%p = path
%d = directory
%l = label (not sure about this one)
Using the -M example above, when pg_basebackup finds:
/path/to/original/tablespace/a
/path/to/original/tablespace/b
It creates:
/path/to/original/tablespace/a_backup_1.1
/path/to/original/tablespace/b_backup_1.1
Or:
-M /path/to/backup/tablespaces/1.1/%d
Creates:
/path/to/backup/tablespaces/1.1/a
/path/to/backup/tablespaces/1.1/b
Or possibly allowing something like %l to insert the backup label.
For example:
-M /path/to/backup/tablespaces/%f_%l -l 1.1
Creates:
/path/to/backup/tablespaces/a_1.1
/path/to/backup/tablespaces/b_1.1
This of course would not work if there were tablespaces as follows:
/path/to/first/tablespace/a
/path/to/second/tablespace/a
Where %d would yield the same result for both tablespaces. However, this seems like an unlikely scenario as the tablespace name within the database would need to be unique, but then requires them to use a directory name that isn't unique. This could just be a scenario that isn't supported.
Perhaps even allow it to auto-increment a version number it defines itself. Maybe %v implies “make up a version number here, and if one existed in the manifest previously, increment it”.
Ultimately, it would turn this:
pg_basebackup
-D /Users/thombrown/Development/backups/data1.5
-h /tmp
-p 5999
-c fast
-U thombrown
-l 1.5
-T /Users/thombrown/Development/tablespaces/ts_a=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_a
-T /Users/thombrown/Development/tablespaces/ts_b=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_b
-T /Users/thombrown/Development/tablespaces/ts_c=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_c
-T /Users/thombrown/Development/tablespaces/ts_d=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_d
-i /Users/thombrown/Development/backups/data1.4/backup_manifest
Into this:
pg_basebackup
-D /Users/thombrown/Development/backups/1.5/data
-h /tmp
-p 5999
-c fast
-U thombrown
-l 1.5
-M /Users/thombrown/Development/backups/tablespaces/%v/%d
-i /Users/thombrown/Development/backups/data1.4/backup_manifest
In fact, if I were permitted to get carried away:
-D /Users/thombrown/Development/backups/%v/%d
Then, the only thing that needs changing for each incremental backup is the manifest location (and optionally the label).
Given that pg_combinebackup has the same option, I imagine something similar would need to be added there too. We should already know where the tablespaces reside, as they are in the final backup specified in the list of backups, so that seems to just be a matter of getting input of how the tablespaces should be named in the reconstructed backup.
For example:
pg_combinebackup
-T /Users/thombrown/Development/backups/tablespaces/1.4/ts_a=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_a
-T /Users/thombrown/Development/backups/tablespaces/1.4/ts_b=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_b
-T /Users/thombrown/Development/backups/tablespaces/1.4/ts_c=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_c
-T /Users/thombrown/Development/backups/tablespaces/1.4/ts_d=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_d
-o /Users/thombrown/Development/backups/combined
/Users/thombrown/Development/backups/data{1.0_full,1.1,1.2,1.3,1.4}
Becomes:
pg_combinebackup
-M /Users/thombrown/Development/backups/tablespaces/%v_combined/%d
-o /Users/thombrown/Development/backups/%v_combined/%d
/Users/thombrown/Development/backups/{1.0_full,1.1,1.2,1.3,1.4}/data
You may have inferred that I decided pg_combinebackup increments the version to the next major version, whereas pg_basebackup in incremental mode increments the minor version number.
This, of course, becomes messy if the user decided to include the version number in the backup tablespace directory name, but then these sorts of things need to be figured out prior to placing into production anyway.
I also get the feeling that accepting an unquoted % as a parameter on the command line could be problematic, such as it having a special meaning I haven't accounted for here. In which case, it may require quoting.
Thoughts?
Regards
Thom
pgsql-hackers by date: