Thread: Regression test PANICs with master-standby setup on same machine
Hello hackers, With a master-standby setup configured on the same machine, I'm getting a panic in tablespace test while running make installcheck. 1. CREATE TABLESPACE regress_tblspacewith LOCATION 'blah'; 2. DROP TABLESPACE regress_tblspacewith; 3. CREATE TABLESPACE regress_tblspace LOCATION 'blah'; -- do some operations in this tablespace 4. DROP TABLESPACE regress_tblspace; The master panics at the last statement when standby has completed applying the WAL up to step 2 but hasn't started step 3. PANIC: could not fsync file "pg_tblspc/16387/PG_12_201904072/16384/16446": No such file or directory The reason is both the tablespace points to the same location. When master tries to delete the new tablespace (and requests a checkpoint), the corresponding folder is already deleted by the standby while applying WAL to delete the old tablespace. I'm able to reproduce the issue with the attached script. sh standby-server-setup.sh make installcheck I accept that configuring master-standby on the same machine for this test is not okay. But, can we avoid the PANIC somehow? Or, is this intentional and I should not include testtablespace in this case? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Attachment
Hello. At Mon, 22 Apr 2019 15:52:59 +0530, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote in <CAGz5QC+j1BDq7onp6H8Cye-ahD2zS1dLttp-dEuEoZStEjxq5Q@mail.gmail.com> > Hello hackers, > > With a master-standby setup configured on the same machine, I'm > getting a panic in tablespace test while running make installcheck. > > 1. CREATE TABLESPACE regress_tblspacewith LOCATION 'blah'; > 2. DROP TABLESPACE regress_tblspacewith; > 3. CREATE TABLESPACE regress_tblspace LOCATION 'blah'; > -- do some operations in this tablespace > 4. DROP TABLESPACE regress_tblspace; > > The master panics at the last statement when standby has completed > applying the WAL up to step 2 but hasn't started step 3. > PANIC: could not fsync file > "pg_tblspc/16387/PG_12_201904072/16384/16446": No such file or > directory > > The reason is both the tablespace points to the same location. When > master tries to delete the new tablespace (and requests a checkpoint), > the corresponding folder is already deleted by the standby while > applying WAL to delete the old tablespace. I'm able to reproduce the > issue with the attached script. > > sh standby-server-setup.sh > make installcheck > > I accept that configuring master-standby on the same machine for this > test is not okay. But, can we avoid the PANIC somehow? Or, is this > intentional and I should not include testtablespace in this case? If you don't have a problem using TAP test suite, tablespace is allowed with a bit restricted steps using the first patch in my just posted patchset[1]. This will work for you if you are okay with creating a standby after creating a tablespace. See the second patch in the patchset. If you stick on shell script, the following steps allow tablespaces. 1. Create tablespace directories for both master and standby. 2. Create a master then start. 3. Create tablespaces on the master. 4. Create a standby using pg_basebackup --tablespace_mapping=<mstdir>=<sbydir> 5. Start the standby. [1] https://www.postgresql.org/message-id/20190422.211933.156769089.horiguchi.kyotaro@lab.ntt.co.jp regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Apr 22, 2019 at 6:07 PM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > If you don't have a problem using TAP test suite, tablespace is > allowed with a bit restricted steps using the first patch in my > just posted patchset[1]. This will work for you if you are okay > with creating a standby after creating a tablespace. See the > second patch in the patchset. > > If you stick on shell script, the following steps allow tablespaces. > > 1. Create tablespace directories for both master and standby. > 2. Create a master then start. > 3. Create tablespaces on the master. > 4. Create a standby using pg_basebackup --tablespace_mapping=<mstdir>=<sbydir> > 5. Start the standby. > > [1] https://www.postgresql.org/message-id/20190422.211933.156769089.horiguchi.kyotaro@lab.ntt.co.jp > Thank you for the info. I'll try the same. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
On Mon, Apr 22, 2019 at 03:52:59PM +0530, Kuntal Ghosh wrote: > I accept that configuring master-standby on the same machine for this > test is not okay. But, can we avoid the PANIC somehow? Or, is this > intentional and I should not include testtablespace in this case? Well, it is a bit more than "not okay", as the primary and the standby step on each other's toe because they are trying to use the same tablespace path. The PANIC is also expected as that's what we want with data_sync_retry = off, which is the default, and the wanted behavior to PANIC immediately and enforce WAL recovery should a fsync fail. Obviously, not being able to have transparent tablespace handling for multiple nodes on the same host is a problem, though this implies grammar changes for CREATE TABLESPACE or having a sort of node name handling which makes the experience trouble-less. Still there is the argument that not all users would want both instances to use the same tablespace path. So the problem is not as simple as it looks, and the cost of solving it is not worth the use cases either. -- Michael
Attachment
At Tue, 23 Apr 2019 11:27:06 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190423022706.GG2712@paquier.xyz> > On Mon, Apr 22, 2019 at 03:52:59PM +0530, Kuntal Ghosh wrote: > > I accept that configuring master-standby on the same machine for this > > test is not okay. But, can we avoid the PANIC somehow? Or, is this > > intentional and I should not include testtablespace in this case? > > Well, it is a bit more than "not okay", as the primary and the > standby step on each other's toe because they are trying to use the > same tablespace path. The PANIC is also expected as that's what we > want with data_sync_retry = off, which is the default, and the wanted > behavior to PANIC immediately and enforce WAL recovery should a fsync > fail. Obviously, not being able to have transparent tablespace > handling for multiple nodes on the same host is a problem, though this > implies grammar changes for CREATE TABLESPACE or having a sort of > node name handling which makes the experience trouble-less. Still > there is the argument that not all users would want both instances to > use the same tablespace path. So the problem is not as simple as it > looks, and the cost of solving it is not worth the use cases either. We could easily adopt a jail or chroot like feature to tablespace paths. Suppose a new GUC(!), say, tablespace_chroot and the value can contain replacements like %a, %p, %h, we would set the variable as: tablespace_chroot = '/somewhere/%p'; then the tablespace location is prefixed by '/somewhere/5432' for the first server, '/somehwere/5433' for the second. I think this is rahter a testing or debugging feature. This can be apply to all paths, so the variable might be "path_prefix" or something more generic than tablespace_chroot. Does it make sense? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 23 Apr 2019 13:33:39 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20190423.133339.113770648.horiguchi.kyotaro@lab.ntt.co.jp> > At Tue, 23 Apr 2019 11:27:06 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190423022706.GG2712@paquier.xyz> > > On Mon, Apr 22, 2019 at 03:52:59PM +0530, Kuntal Ghosh wrote: > > > I accept that configuring master-standby on the same machine for this > > > test is not okay. But, can we avoid the PANIC somehow? Or, is this > > > intentional and I should not include testtablespace in this case? > > > > Well, it is a bit more than "not okay", as the primary and the > > standby step on each other's toe because they are trying to use the > > same tablespace path. The PANIC is also expected as that's what we > > want with data_sync_retry = off, which is the default, and the wanted > > behavior to PANIC immediately and enforce WAL recovery should a fsync > > fail. Obviously, not being able to have transparent tablespace > > handling for multiple nodes on the same host is a problem, though this > > implies grammar changes for CREATE TABLESPACE or having a sort of > > node name handling which makes the experience trouble-less. Still > > there is the argument that not all users would want both instances to > > use the same tablespace path. So the problem is not as simple as it > > looks, and the cost of solving it is not worth the use cases either. > > We could easily adopt a jail or chroot like feature to tablespace > paths. Suppose a new GUC(!), say, tablespace_chroot and the value > can contain replacements like %a, %p, %h, we would set the > variable as: > > tablespace_chroot = '/somewhere/%p'; > > then the tablespace location is prefixed by '/somewhere/5432' for > the first server, '/somehwere/5433' for the second. > > I think this is rahter a testing or debugging feature. This can > be apply to all paths, so the variable might be "path_prefix" or all paths out of $PGDATA directory. tablespace locations, log_directory and stats_temp_directory? > something more generic than tablespace_chroot. > > Does it make sense? -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Apr 23, 2019 at 01:33:39PM +0900, Kyotaro HORIGUCHI wrote: > I think this is rahter a testing or debugging feature. This can > be apply to all paths, so the variable might be "path_prefix" or > something more generic than tablespace_chroot. > > Does it make sense? A GUC which enforces object creation does not sound like a good idea to me, and what you propose would still bite back, for example two local nodes could use the same port, but a different Unix socket path. -- Michael
Attachment
Hi, On 2019-04-22 15:52:59 +0530, Kuntal Ghosh wrote: > Hello hackers, > > With a master-standby setup configured on the same machine, I'm > getting a panic in tablespace test while running make installcheck. > > 1. CREATE TABLESPACE regress_tblspacewith LOCATION 'blah'; > 2. DROP TABLESPACE regress_tblspacewith; > 3. CREATE TABLESPACE regress_tblspace LOCATION 'blah'; > -- do some operations in this tablespace > 4. DROP TABLESPACE regress_tblspace; > > The master panics at the last statement when standby has completed > applying the WAL up to step 2 but hasn't started step 3. > PANIC: could not fsync file > "pg_tblspc/16387/PG_12_201904072/16384/16446": No such file or > directory > > The reason is both the tablespace points to the same location. When > master tries to delete the new tablespace (and requests a checkpoint), > the corresponding folder is already deleted by the standby while > applying WAL to delete the old tablespace. I'm able to reproduce the > issue with the attached script. > > sh standby-server-setup.sh > make installcheck > > I accept that configuring master-standby on the same machine for this > test is not okay. But, can we avoid the PANIC somehow? Or, is this > intentional and I should not include testtablespace in this case? FWIW, I think the right fix for this is to simply drop the requirement that tablespace paths need to be absolute. It's not buying us anything, it's just making things more complicated. We should just do a simple check against the tablespace being inside PGDATA, and leave it at that. Yes, that can be tricked, but so can the current system. That'd make both regression tests easier, as well as operations. Greetings, Andres Freund
On Mon, Apr 22, 2019 at 11:00:03PM -0700, Andres Freund wrote: > FWIW, I think the right fix for this is to simply drop the requirement > that tablespace paths need to be absolute. It's not buying us anything, > it's just making things more complicated. We should just do a simple > check against the tablespace being inside PGDATA, and leave it at > that. Yes, that can be tricked, but so can the current system. convert_and_check_filename() checks after that already, mostly. For TAP tests I am not sure that this would help much though as all the nodes of a given test use the same root path for their data folders, so you cannot just use "../hoge/" as location. We already generate a warning when a tablespace is in a data folder, as this causes issues with recursion lookups of base backups. What do you mean in this case? Forbidding the behavior? -- Michael
Attachment
At Tue, 23 Apr 2019 14:53:28 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190423055328.GK2712@paquier.xyz> > On Tue, Apr 23, 2019 at 01:33:39PM +0900, Kyotaro HORIGUCHI wrote: > > I think this is rahter a testing or debugging feature. This can > > be apply to all paths, so the variable might be "path_prefix" or > > something more generic than tablespace_chroot. > > > > Does it make sense? > > A GUC which enforces object creation does not sound like a good idea > to me, and what you propose would still bite back, for example two > local nodes could use the same port, but a different Unix socket > path. It's not necessarily be port number, but I agree that it's not a good idea. I prefer allowing relative paths for tablespaces. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 23 Apr 2019 16:08:18 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190423070818.GM2712@paquier.xyz> > On Mon, Apr 22, 2019 at 11:00:03PM -0700, Andres Freund wrote: > > FWIW, I think the right fix for this is to simply drop the requirement > > that tablespace paths need to be absolute. It's not buying us anything, > > it's just making things more complicated. We should just do a simple > > check against the tablespace being inside PGDATA, and leave it at > > that. Yes, that can be tricked, but so can the current system. > > convert_and_check_filename() checks after that already, mostly. For > TAP tests I am not sure that this would help much though as all the > nodes of a given test use the same root path for their data folders, > so you cannot just use "../hoge/" as location. We already generate a > warning when a tablespace is in a data folder, as this causes issues > with recursion lookups of base backups. What do you mean in this > case? Forbidding the behavior? Isn't it good enough just warning when we see pg_tblspc twice while scanning? The check is not perfect for an "abosolute path" that continas '/./' above pgdata directory. For TAP tests, we can point generated temporary directory by "../../<tmpdirsname>". regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 23 Apr 2019 17:44:18 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20190423.174418.262292011.horiguchi.kyotaro@lab.ntt.co.jp> > At Tue, 23 Apr 2019 16:08:18 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190423070818.GM2712@paquier.xyz> > > On Mon, Apr 22, 2019 at 11:00:03PM -0700, Andres Freund wrote: > > > FWIW, I think the right fix for this is to simply drop the requirement > > > that tablespace paths need to be absolute. It's not buying us anything, > > > it's just making things more complicated. We should just do a simple > > > check against the tablespace being inside PGDATA, and leave it at > > > that. Yes, that can be tricked, but so can the current system. > > > > convert_and_check_filename() checks after that already, mostly. For > > TAP tests I am not sure that this would help much though as all the > > nodes of a given test use the same root path for their data folders, > > so you cannot just use "../hoge/" as location. We already generate a > > warning when a tablespace is in a data folder, as this causes issues > > with recursion lookups of base backups. What do you mean in this > > case? Forbidding the behavior? > > Isn't it good enough just warning when we see pg_tblspc twice > while scanning? The check is not perfect for an "abosolute path" > that continas '/./' above pgdata directory. I don't get basebackup recurse. How can I do that? basebackup rejects non-empty direcoty as a tablespace directory. I'm not sure about pg_upgrade but it's not a problem as far as we keep waning on that kind of tablespace directory. So I propose this: - Allow relative path as a tablespace direcotry in exchange for issueing WARNING. =# CREATE TABLESPACE ts1 LOCATION '../../hoge'; "WARNING: tablespace location should be in absolute path" - For abosolute paths, we keep warning as before. Of course we don't bother '.' and '..'. =# CREATE TABLESPACE ts1 LOCATION '/home/.../data'; "WARNING: tablespace location should not be in the data directory" > For TAP tests, we can point generated temporary directory by > "../../<tmpdirsname>". regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 23 Apr 2019 19:00:54 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20190423.190054.262966274.horiguchi.kyotaro@lab.ntt.co.jp> > > For TAP tests, we can point generated temporary directory by > > "../../<tmpdirsname>". Wrong. A generating tmpdir (how?) in "../" (that is, in the node direcotry) would work. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hi, On 2019-04-23 16:08:18 +0900, Michael Paquier wrote: > On Mon, Apr 22, 2019 at 11:00:03PM -0700, Andres Freund wrote: > > FWIW, I think the right fix for this is to simply drop the requirement > > that tablespace paths need to be absolute. It's not buying us anything, > > it's just making things more complicated. We should just do a simple > > check against the tablespace being inside PGDATA, and leave it at > > that. Yes, that can be tricked, but so can the current system. > > convert_and_check_filename() checks after that already, mostly. For > TAP tests I am not sure that this would help much though as all the > nodes of a given test use the same root path for their data folders, > so you cannot just use "../hoge/" as location. I don't see the problem here. Putting the primary and standby PGDATAs into a subdirectory that also can contain a relatively referenced tablespace seems trivial? > I'm not We already generate a warning when a tablespace is in a data > folder, as this causes issues with recursion lookups of base backups. > What do you mean in this case? Forbidding the behavior? -- Michael I mostly am talking about replacing Oid CreateTableSpace(CreateTableSpaceStmt *stmt) { ... /* * Allowing relative paths seems risky * * this also helps us ensure that location is not empty or whitespace */ if (!is_absolute_path(location)) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("tablespace location must be an absolute path"))); with a check that forces relative paths to be outside of PGDATA (baring symlinks). As far as I can tell convert_and_check_filename() would check just about the opposite. Greetings, Andres Freund
At Tue, 23 Apr 2019 10:05:03 -0700, Andres Freund <andres@anarazel.de> wrote in <20190423170503.uw5jxrujqlozg23l@alap3.anarazel.de> > Hi, > > On 2019-04-23 16:08:18 +0900, Michael Paquier wrote: > > On Mon, Apr 22, 2019 at 11:00:03PM -0700, Andres Freund wrote: > > > FWIW, I think the right fix for this is to simply drop the requirement > > > that tablespace paths need to be absolute. It's not buying us anything, > > > it's just making things more complicated. We should just do a simple > > > check against the tablespace being inside PGDATA, and leave it at > > > that. Yes, that can be tricked, but so can the current system. > > > > convert_and_check_filename() checks after that already, mostly. For > > TAP tests I am not sure that this would help much though as all the > > nodes of a given test use the same root path for their data folders, > > so you cannot just use "../hoge/" as location. > > I don't see the problem here. Putting the primary and standby PGDATAs > into a subdirectory that also can contain a relatively referenced > tablespace seems trivial? > > > I'm not We already generate a warning when a tablespace is in a data > > folder, as this causes issues with recursion lookups of base backups. > > What do you mean in this case? Forbidding the behavior? -- Michael > > I mostly am talking about replacing > > Oid > CreateTableSpace(CreateTableSpaceStmt *stmt) > { > ... > /* > * Allowing relative paths seems risky > * > * this also helps us ensure that location is not empty or whitespace > */ > if (!is_absolute_path(location)) > ereport(ERROR, > (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), > errmsg("tablespace location must be an absolute path"))); > > with a check that forces relative paths to be outside of PGDATA (baring > symlinks). As far as I can tell convert_and_check_filename() would check > just about the opposite. We need to adjust relative path between PGDATA-based and pg_tblspc based. The attached first patch does that. - I'm not sure it is OK to use getcwd this way. The second attached is TAP change to support tablespaces using relative tablespaces. One issue here is is_in_data_directory canonicalizes DataDir on-the-fly. It is needed when DataDir contains '/./' or such. I think the canonicalization should be done far earlier. - This is tentative or sample. I'll visit the current discussion thread. The third is test for this issue. - Tablespace handling gets easier. The fourth is the fix for the issue here. - Not all possible simliar issue is not checked. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 714702a6abfe1987eab98020c799b845917af156 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Wed, 24 Apr 2019 11:50:41 +0900 Subject: [PATCH 1/4] Allow relative tablespace location paths Currently relative paths are not allowed as tablespace directory. That makes tests about tablespaces bothersome but doens't prevent them points into the data directory perfectly. This patch allows relative direcotry based on the data directory as tablespace directory. Then makes the check more strict. --- src/backend/access/transam/xlog.c | 8 ++++ src/backend/commands/tablespace.c | 76 ++++++++++++++++++++++++------- src/backend/replication/basebackup.c | 7 +++ src/backend/utils/adt/misc.c | 7 +++ src/bin/pg_basebackup/pg_basebackup.c | 56 +++++++++++++++++++++-- src/test/regress/input/tablespace.source | 3 ++ src/test/regress/output/tablespace.source | 5 +- 7 files changed, 142 insertions(+), 20 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index c00b63c751..abc2fd951f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10419,6 +10419,14 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, } linkpath[rllen] = '\0'; + /* + * In the relative path cases, the link target is always prefixed + * by "../" to convert from data directory-based. So we just do + * the reverse here. + */ + if (strncmp(s, "../", 3) == 0) + s += 3; + /* * Add the escape character '\\' before newline in a string to * ensure that we can distinguish between the newline in the diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index 3784ea4b4f..66a70871e6 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -94,6 +94,42 @@ static void create_tablespace_directories(const char *location, const Oid tablespaceoid); static bool destroy_tablespace_directories(Oid tablespaceoid, bool redo); +/* + * Check if the path is in the data directory strictly. + */ +static bool +is_in_data_directory(const char *path) +{ + char cwd[MAXPGPATH]; + char abspath[MAXPGPATH]; + char absdatadir[MAXPGPATH]; + + getcwd(cwd, MAXPGPATH); + if (chdir(path) < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid directory \"%s\": %m", path))); + + /* getcwd is defined as returning absolute path */ + getcwd(abspath, MAXPGPATH); + + /* DataDir needs to be canonicalized */ + if (chdir(DataDir)) + ereport(FATAL, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("could not chdir to the data directory \"%s\": %m", + DataDir))); + getcwd(absdatadir, MAXPGPATH); + + /* this must succeed */ + if (chdir(cwd)) + ereport(FATAL, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("could not chdir to the current working directory \"%s\": %m", + cwd))); + + return path_is_prefix_of_path(absdatadir, abspath); +} /* * Each database using a table space is isolated into its own name space @@ -267,35 +303,26 @@ CreateTableSpace(CreateTableSpaceStmt *stmt) (errcode(ERRCODE_INVALID_NAME), errmsg("tablespace location cannot contain single quotes"))); - /* - * Allowing relative paths seems risky - * - * this also helps us ensure that location is not empty or whitespace - */ - if (!is_absolute_path(location)) + /* Reject tablespaces in the data directory. */ + if (is_in_data_directory(location)) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg("tablespace location must be an absolute path"))); + errmsg("tablespace location must not be inside the data directory"))); /* * Check that location isn't too long. Remember that we're going to append - * 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'. FYI, we never actually + * 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'. In the relative path case, we + * attach "../" at the beginning of the path. FYI, we never actually * reference the whole path here, but MakePGDirectory() uses the first two * parts. */ - if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + + if (3 + strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1 + OIDCHARS > MAXPGPATH) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("tablespace location \"%s\" is too long", location))); - /* Warn if the tablespace is in the data directory. */ - if (path_is_prefix_of_path(DataDir, location)) - ereport(WARNING, - (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg("tablespace location should not be inside the data directory"))); - /* * Disallow creation of tablespaces named "pg_xxx"; we reserve this * namespace for system purposes. @@ -570,7 +597,9 @@ static void create_tablespace_directories(const char *location, const Oid tablespaceoid) { char *linkloc; + char *linktarget; char *location_with_version_dir; + bool free_linktarget = false; struct stat st; linkloc = psprintf("pg_tblspc/%u", tablespaceoid); @@ -639,13 +668,28 @@ create_tablespace_directories(const char *location, const Oid tablespaceoid) /* * Create the symlink under PGDATA + + * Relative tablespace location is based on PGDATA directory. The symlink + * is created in PGDATA/pg_tblspc. So adjust relative paths by attaching + * "../" at the head. */ - if (symlink(location, linkloc) < 0) + if (is_absolute_path(location)) + linktarget = unconstify(char *, location); + else + { + linktarget = psprintf("../%s", location); + free_linktarget = true; + } + + if (symlink(linktarget, linkloc) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not create symbolic link \"%s\": %m", linkloc))); + if (free_linktarget) + pfree(linktarget); + pfree(linkloc); pfree(location_with_version_dir); } diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 36dcb28754..82ff4703d9 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1238,6 +1238,13 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, pathbuf))); linkpath[rllen] = '\0'; + /* + * Relative link target is always prefixed by "../". We need to + * remove it to obtain a relative tablespace directory. + */ + if (strncmp(linkpath, "../", 3) == 0) + memmove(linkpath, linkpath + 3, strlen(linkpath) + 1 - 3); + size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath, &statbuf, sizeonly); #else diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c index d330a88e3c..59f987e5ae 100644 --- a/src/backend/utils/adt/misc.c +++ b/src/backend/utils/adt/misc.c @@ -330,6 +330,13 @@ pg_tablespace_location(PG_FUNCTION_ARGS) sourcepath))); targetpath[rllen] = '\0'; + /* + * Relative target paths are always prefixed by "../". We need to remove + * it to obtain tablespace directory. + */ + if (strncmp(targetpath, "../", 3) == 0) + PG_RETURN_TEXT_P(cstring_to_text(targetpath + 3)); + PG_RETURN_TEXT_P(cstring_to_text(targetpath)); #else ereport(ERROR, diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 1a735b8046..9fb73bfd8a 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -1399,9 +1399,20 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum) if (basetablespace) strlcpy(current_path, basedir, sizeof(current_path)); else - strlcpy(current_path, - get_tablespace_mapping(PQgetvalue(res, rownum, 1)), - sizeof(current_path)); + { + const char *path = get_tablespace_mapping(PQgetvalue(res, rownum, 1)); + + if (is_absolute_path(path)) + strlcpy(current_path, path, sizeof(current_path)); + else + { + /* Relative tablespace locations need to be prefixed by basedir. */ + strlcpy(current_path, basedir, sizeof(current_path)); + if (current_path[strlen(current_path) - 1] != '/') + strlcat(current_path, "/", sizeof(current_path)); + strlcat(current_path, path, sizeof(current_path)); + } + } /* * Get the COPY data @@ -1525,15 +1536,34 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum) * are, you can call it an undocumented feature that you * can map them too.) */ + bool free_path = false; + filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */ mapped_tblspc_path = get_tablespace_mapping(©buf[157]); + if (!is_absolute_path(mapped_tblspc_path)) + { + /* + * Convert relative tablespace location based on data + * directory into path link target based on pg_tblspc + * directory. + */ + char *p = malloc(3 + strlen(mapped_tblspc_path) + 1); + + strcpy(p, "../"); + strcat(p, mapped_tblspc_path); + mapped_tblspc_path = p; + free_path = true; + } if (symlink(mapped_tblspc_path, filename) != 0) { pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m", filename, mapped_tblspc_path); exit(1); } + + if (free_path) + free(unconstify(char *, mapped_tblspc_path)); } else { @@ -1957,8 +1987,28 @@ BaseBackup(void) if (format == 'p' && !PQgetisnull(res, i, 1)) { char *path = unconstify(char *, get_tablespace_mapping(PQgetvalue(res, i, 1))); + bool freeit = false; + + if (!is_absolute_path(path)) + { + /* + * Relative tablespace locations need to be converted + * attaching basedir. + */ + char *p = malloc(strlen(basedir) + 1 + strlen(path) + 1); + + strcpy(p, basedir); + if (p[strlen(p) - 1] != '/') + strcat(p, "/"); + strcat(p, path); + path = p; + freeit = true; + } verify_dir_is_empty_or_create(path, &made_tablespace_dirs, &found_tablespace_dirs); + + if (freeit) + free(path); } } diff --git a/src/test/regress/input/tablespace.source b/src/test/regress/input/tablespace.source index 14ce0e7e04..cf596d607f 100644 --- a/src/test/regress/input/tablespace.source +++ b/src/test/regress/input/tablespace.source @@ -125,6 +125,9 @@ SELECT COUNT(*) FROM testschema.atable; -- checks heap -- Will fail with bad path CREATE TABLESPACE regress_badspace LOCATION '/no/such/location'; +-- Will fail with data directory +CREATE TABLESPACE regress_badspace LOCATION '.'; + -- No such tablespace CREATE TABLE bar (i int) TABLESPACE regress_nosuchspace; diff --git a/src/test/regress/output/tablespace.source b/src/test/regress/output/tablespace.source index 8ebe08b9b2..c404d599f0 100644 --- a/src/test/regress/output/tablespace.source +++ b/src/test/regress/output/tablespace.source @@ -257,7 +257,10 @@ SELECT COUNT(*) FROM testschema.atable; -- checks heap -- Will fail with bad path CREATE TABLESPACE regress_badspace LOCATION '/no/such/location'; -ERROR: directory "/no/such/location" does not exist +ERROR: invalid directory "/no/such/location": No such file or directory +-- Will fail with data directory +CREATE TABLESPACE regress_badspace LOCATION '.'; +ERROR: tablespace location must not be inside the data directory -- No such tablespace CREATE TABLE bar (i int) TABLESPACE regress_nosuchspace; ERROR: tablespace "regress_nosuchspace" does not exist -- 2.16.3 From 4855a9c7b0b5f67b3c6fba667c5c1892eb58199c Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Mon, 22 Apr 2019 20:58:53 +0900 Subject: [PATCH 2/4] Allow TAP test to excecise tablespace. To perform tablespace related checks, this patch lets PostgresNode::backup have a new parameter "tablespace_mapping", and make init_from_backup handle capable to handle a backup created using tablespace_mapping. --- src/test/perl/PostgresNode.pm | 45 +++++++++++++++++++++++++++++++++++++----- src/test/perl/RecursiveCopy.pm | 38 +++++++++++++++++++++++++++++++---- 2 files changed, 74 insertions(+), 9 deletions(-) diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index 76874141c5..e951acc461 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -157,6 +157,7 @@ sub new _host => $pghost, _basedir => "$TestLib::tmp_check/t_${testname}_${name}_data", _name => $name, + _tablespaces => [], _logfile_generation => 0, _logfile_base => "$TestLib::log_path/${testname}_${name}", _logfile => "$TestLib::log_path/${testname}_${name}.log" @@ -342,6 +343,26 @@ sub backup_dir =pod +=item $node->make_tablespace_dir() + +make a tablespace directory + +=cut + +sub make_tablespace_dir +{ + my ($self, $name) = @_; + my $basedir = $self->basedir; + + die "tablespace name contains '/'" if ($name =~ m#/#); + my $reldir = "../$name"; + mkdir $self->data_dir() . "/". $reldir; + push($self->{_tablespaces}, $reldir); + return $reldir; +} + +=pod + =item $node->info() Return a string containing human-readable diagnostic information (paths, etc) @@ -540,13 +561,27 @@ target server since it isn't done by default. sub backup { - my ($self, $backup_name) = @_; - my $backup_path = $self->backup_dir . '/' . $backup_name; + my ($self, $backup_name, %params) = @_; + my $backup_path = $self->backup_dir . '/' . $backup_name . '/' . "pgdata"; my $name = $self->name; + my @rest = (); + + if (defined $params{tablespace_mapping}) + { + foreach my $p (split /,/, $params{tablespace_mapping}) + { + push(@rest, "--tablespace-mapping=$p"); + } + } + + foreach my $p ($self->{_tablespaces}) + { + mkdir "$backup_path/$p"; + } print "# Taking pg_basebackup $backup_name from node \"$name\"\n"; TestLib::system_or_bail('pg_basebackup', '-D', $backup_path, '-h', - $self->host, '-p', $self->port, '--no-sync'); + $self->host, '-p', $self->port, '--no-sync', @rest); print "# Backup finished\n"; return; } @@ -592,7 +627,7 @@ sub backup_fs_cold sub _backup_fs { my ($self, $backup_name, $hot) = @_; - my $backup_path = $self->backup_dir . '/' . $backup_name; + my $backup_path = $self->backup_dir . '/' . $backup_name . '/' . "pgdata"; my $port = $self->port; my $name = $self->name; @@ -655,7 +690,7 @@ unconditionally set to enable replication connections. sub init_from_backup { my ($self, $root_node, $backup_name, %params) = @_; - my $backup_path = $root_node->backup_dir . '/' . $backup_name; + my $backup_path = $root_node->backup_dir . '/' . $backup_name . '/' . "pgdata"; my $host = $self->host; my $port = $self->port; my $node_name = $self->name; diff --git a/src/test/perl/RecursiveCopy.pm b/src/test/perl/RecursiveCopy.pm index baf5d0ac63..f165db7348 100644 --- a/src/test/perl/RecursiveCopy.pm +++ b/src/test/perl/RecursiveCopy.pm @@ -22,6 +22,7 @@ use warnings; use Carp; use File::Basename; use File::Copy; +use TestLib; =pod @@ -97,14 +98,43 @@ sub _copypath_recurse # invoke the filter and skip all further operation if it returns false return 1 unless &$filterfn($curr_path); - # Check for symlink -- needed only on source dir - # (note: this will fall through quietly if file is already gone) - croak "Cannot operate on symlink \"$srcpath\"" if -l $srcpath; - # Abort if destination path already exists. Should we allow directories # to exist already? croak "Destination path \"$destpath\" already exists" if -e $destpath; + # Check for symlink -- needed only on source dir + # (note: this will fall through quietly if file is already gone) + if (-l $srcpath) + { + croak "Cannot operate on symlink \"$srcpath\"" + if ($srcpath !~ /\/(pg_tblspc\/[0-9]+)$/); + + # We have mapped tablespaces. Copy them individually + my $linkname = $1; + my $rpath = my $target = readlink($srcpath); + + #convert to pgdata-based if relative + $rpath =~ s#^\.\./##; + + my $srcrealdir = "$base_src_dir/$rpath"; + my $dstrealdir = "$base_dest_dir/$rpath"; + + mkdir $dstrealdir; + opendir(my $dh, $srcrealdir); + while (readdir $dh) + { + next if (/^\.\.?$/); + my $spath = "$srcrealdir/$_"; + my $dpath = "$dstrealdir/$_"; + + copypath($spath, $dpath); + } + closedir $dh; + + symlink $target, $destpath; + return 1; + } + # If this source path is a file, simply copy it to destination with the # same name and we're done. if (-f $srcpath) -- 2.16.3 From 51e5cb0996b71ee3cc32ea00e0abd3184df7e82d Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Mon, 22 Apr 2019 20:10:25 +0900 Subject: [PATCH 3/4] Add check for recovery failure caused by tablespace. Removal of a tablespace on master can cause recovery failure on standby. This patch adds the check for the case. --- src/test/recovery/t/011_crash_recovery.pl | 45 ++++++++++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/src/test/recovery/t/011_crash_recovery.pl b/src/test/recovery/t/011_crash_recovery.pl index 5dc52412ca..40cf1d0cc3 100644 --- a/src/test/recovery/t/011_crash_recovery.pl +++ b/src/test/recovery/t/011_crash_recovery.pl @@ -15,7 +15,7 @@ if ($Config{osname} eq 'MSWin32') } else { - plan tests => 3; + plan tests => 4; } my $node = get_new_node('master'); @@ -66,3 +66,46 @@ is($node->safe_psql('postgres', qq[SELECT txid_status('$xid');]), 'aborted', 'xid is aborted after crash'); $tx->kill_kill; + + +# Ensure that tablespace removal doesn't cause error while recoverying +# the preceding create datbase or objects. + +my $node_master = get_new_node('master2'); +$node_master->init(allows_streaming => 1); +$node_master->start; + +# Create tablespace +my $tspdir = $node_master->make_tablespace_dir('ts1'); +$node_master->safe_psql('postgres', "CREATE TABLESPACE ts1 LOCATION '$tspdir'"); + +# Take backup +my $backup_name = 'my_backup'; +$node_master->backup($backup_name); +my $node_standby = get_new_node('standby'); +$node_standby->init_from_backup($node_master, $backup_name, has_streaming => 1); +$node_standby->start; + +# Make sure connection is made +$node_master->poll_query_until( + 'postgres', 'SELECT count(*) = 1 FROM pg_stat_replication'); + +# Make sure to perform restartpoint after tablespace creation +$node_master->wait_for_catchup($node_standby, 'replay', + $node_master->lsn('replay')); +$node_standby->safe_psql('postgres', 'CHECKPOINT'); + +# Do immediate shutdown just after a sequence of CREAT DATABASE / DROP +# DATABASE / DROP TABLESPACE. This leaves a CREATE DATBASE WAL record +# that is to be applied to already-removed tablespace. +$node_master->safe_psql('postgres', + q[CREATE DATABASE db1 WITH TABLESPACE ts1; + DROP DATABASE db1; + DROP TABLESPACE ts1;]); +$node_master->wait_for_catchup($node_standby, 'replay', + $node_master->lsn('replay')); +system("ls -l ". $node_standby->data_dir() . "/../ts1"); +$node_standby->stop('immediate'); + +# Should restart ignoring directory creation error. +is($node_standby->start(fail_ok => 1), 1); -- 2.16.3 From 51a80429c5b7e67779071400e43b0c7fc08c857c Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Mon, 22 Apr 2019 20:59:15 +0900 Subject: [PATCH 4/4] Fix failure of standby startup caused by tablespace removal When standby restarts after a crash after drop of a tablespace, there's a possibility that recovery fails trying an object-creation in already removed tablespace directory. Allow recovery to continue by ignoring the error if not reaching consistency point. --- src/backend/access/transam/xlogutils.c | 34 ++++++++++++++++++++++++++++++++++ src/backend/commands/tablespace.c | 12 ++++++------ src/backend/storage/file/copydir.c | 12 +++++++----- src/include/access/xlogutils.h | 1 + 4 files changed, 48 insertions(+), 11 deletions(-) diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 10a663bae6..75cdb882cd 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -522,6 +522,40 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, return buffer; } +/* + * XLogMakePGDirectory + * + * There is a possibility that WAL replay causes an error by creation of a + * directory under a directory removed before the previous crash. Issuing + * ERROR prevents the caller from continuing recovery. + * + * To prevent that case, this function issues WARNING instead of ERROR on + * error if consistency is not reached yet. + */ +int +XLogMakePGDirectory(const char *directoryName) +{ + int ret; + + ret = MakePGDirectory(directoryName); + + if (ret != 0) + { + int elevel = ERROR; + + /* Don't issue ERROR for this failure before reaching consistency. */ + if (InRecovery && !reachedConsistency) + elevel = WARNING; + + ereport(elevel, + (errcode_for_file_access(), + errmsg("could not create directory \"%s\": %m", directoryName))); + return ret; + } + + return 0; +} + /* * Struct actually returned by XLogFakeRelcacheEntry, though the declared * return type is Relation. diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index 66a70871e6..c9fb2af015 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -303,12 +303,6 @@ CreateTableSpace(CreateTableSpaceStmt *stmt) (errcode(ERRCODE_INVALID_NAME), errmsg("tablespace location cannot contain single quotes"))); - /* Reject tablespaces in the data directory. */ - if (is_in_data_directory(location)) - ereport(ERROR, - (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg("tablespace location must not be inside the data directory"))); - /* * Check that location isn't too long. Remember that we're going to append * 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'. In the relative path case, we @@ -323,6 +317,12 @@ CreateTableSpace(CreateTableSpaceStmt *stmt) errmsg("tablespace location \"%s\" is too long", location))); + /* Reject tablespaces in the data directory. */ + if (is_in_data_directory(location)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("tablespace location must not be inside the data directory"))); + /* * Disallow creation of tablespaces named "pg_xxx"; we reserve this * namespace for system purposes. diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c index 30f6200a86..0216270dd3 100644 --- a/src/backend/storage/file/copydir.c +++ b/src/backend/storage/file/copydir.c @@ -22,11 +22,11 @@ #include <unistd.h> #include <sys/stat.h> +#include "access/xlogutils.h" #include "storage/copydir.h" #include "storage/fd.h" #include "miscadmin.h" #include "pgstat.h" - /* * copydir: copy a directory * @@ -41,10 +41,12 @@ copydir(char *fromdir, char *todir, bool recurse) char fromfile[MAXPGPATH * 2]; char tofile[MAXPGPATH * 2]; - if (MakePGDirectory(todir) != 0) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not create directory \"%s\": %m", todir))); + /* + * We might have to skip copydir to continue recovery. See the function + * for details. + */ + if (XLogMakePGDirectory(todir) != 0) + return; xldir = AllocateDir(fromdir); diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h index 0ab5ba62f5..46a7596315 100644 --- a/src/include/access/xlogutils.h +++ b/src/include/access/xlogutils.h @@ -43,6 +43,7 @@ extern XLogRedoAction XLogReadBufferForRedoExtended(XLogReaderState *record, extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, BlockNumber blkno, ReadBufferMode mode); +extern int XLogMakePGDirectory(const char *directoryName); extern Relation CreateFakeRelcacheEntry(RelFileNode rnode); extern void FreeFakeRelcacheEntry(Relation fakerel); -- 2.16.3
Sorry, I was in haste. At Wed, 24 Apr 2019 13:18:45 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20190424.131845.116224815.horiguchi.kyotaro@lab.ntt.co.jp> > At Tue, 23 Apr 2019 10:05:03 -0700, Andres Freund <andres@anarazel.de> wrote in <20190423170503.uw5jxrujqlozg23l@alap3.anarazel.de> > > Hi, > > > > On 2019-04-23 16:08:18 +0900, Michael Paquier wrote: > > > On Mon, Apr 22, 2019 at 11:00:03PM -0700, Andres Freund wrote: > > > > FWIW, I think the right fix for this is to simply drop the requirement > > > > that tablespace paths need to be absolute. It's not buying us anything, > > > > it's just making things more complicated. We should just do a simple > > > > check against the tablespace being inside PGDATA, and leave it at > > > > that. Yes, that can be tricked, but so can the current system. > > > > > > convert_and_check_filename() checks after that already, mostly. For > > > TAP tests I am not sure that this would help much though as all the > > > nodes of a given test use the same root path for their data folders, > > > so you cannot just use "../hoge/" as location. > > > > I don't see the problem here. Putting the primary and standby PGDATAs > > into a subdirectory that also can contain a relatively referenced > > tablespace seems trivial? > > > > > I'm not We already generate a warning when a tablespace is in a data > > > folder, as this causes issues with recursion lookups of base backups. > > > What do you mean in this case? Forbidding the behavior? -- Michael > > > > I mostly am talking about replacing > > > > Oid > > CreateTableSpace(CreateTableSpaceStmt *stmt) > > { > > ... > > /* > > * Allowing relative paths seems risky > > * > > * this also helps us ensure that location is not empty or whitespace > > */ > > if (!is_absolute_path(location)) > > ereport(ERROR, > > (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), > > errmsg("tablespace location must be an absolute path"))); > > > > with a check that forces relative paths to be outside of PGDATA (baring > > symlinks). As far as I can tell convert_and_check_filename() would check > > just about the opposite. We need to adjust relative path between PGDATA-based and pg_tblspc based. The attached first patch does that. - I'm not sure it is OK to use getcwd this way. Another issue here is is_in_data_directory canonicalizes DataDir on-the-fly. It is needed when DataDir contains '/./' or such. I think the canonicalization should be done far earlier. The second attached is TAP change to support tablespaces using relative tablespaces. - This is tentative or sample. I'll visit the current discussion thread. The third is test for this issue. - Tablespace handling gets easier. The fourth is the fix for the issue here. - Not all possible simliar issue is not checked. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hello. At Wed, 24 Apr 2019 13:23:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20190424.132304.40676137.horiguchi.kyotaro@lab.ntt.co.jp> > > > with a check that forces relative paths to be outside of PGDATA (baring > > > symlinks). As far as I can tell convert_and_check_filename() would check > > > just about the opposite. > > We need to adjust relative path between PGDATA-based and > pg_tblspc based. The attached first patch does that. > > - I'm not sure it is OK to use getcwd this way. Another issue > here is is_in_data_directory canonicalizes DataDir > on-the-fly. It is needed when DataDir contains '/./' or such. I > think the canonicalization should be done far earlier. > > The second attached is TAP change to support tablespaces using > relative tablespaces. > > - This is tentative or sample. I'll visit the current discussion thread. > > The third is test for this issue. > > - Tablespace handling gets easier. > > The fourth is the fix for the issue here. > > - Not all possible simliar issue is not checked. This is new version. Adjusted pg_basebackup's behavior to allow relative mappings. But.. This is apparently out of a bug fix. What should I do with it? Should we applying only 0004 (after further checking) or something as bug fix, then register the rest for v13? regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 16b158756a8e51279604bf9f8995b9392052b274 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Wed, 24 Apr 2019 11:50:41 +0900 Subject: [PATCH 1/5] Allow relative tablespace location paths Currently relative paths are not allowed as tablespace directory. That makes tests about tablespaces bothersome but doens't prevent them points into the data directory perfectly. This patch allows relative direcotry based on the data directory as tablespace directory. Then makes the check more strict. --- src/backend/access/transam/xlog.c | 8 ++ src/backend/commands/tablespace.c | 76 ++++++++--- src/backend/replication/basebackup.c | 7 ++ src/backend/utils/adt/misc.c | 7 ++ src/bin/pg_basebackup/pg_basebackup.c | 180 +++++++++++++++++++++++---- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 17 ++- src/test/regress/input/tablespace.source | 3 + src/test/regress/output/tablespace.source | 5 +- 8 files changed, 258 insertions(+), 45 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index c00b63c751..abc2fd951f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10419,6 +10419,14 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, } linkpath[rllen] = '\0'; + /* + * In the relative path cases, the link target is always prefixed + * by "../" to convert from data directory-based. So we just do + * the reverse here. + */ + if (strncmp(s, "../", 3) == 0) + s += 3; + /* * Add the escape character '\\' before newline in a string to * ensure that we can distinguish between the newline in the diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index 3784ea4b4f..66a70871e6 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -94,6 +94,42 @@ static void create_tablespace_directories(const char *location, const Oid tablespaceoid); static bool destroy_tablespace_directories(Oid tablespaceoid, bool redo); +/* + * Check if the path is in the data directory strictly. + */ +static bool +is_in_data_directory(const char *path) +{ + char cwd[MAXPGPATH]; + char abspath[MAXPGPATH]; + char absdatadir[MAXPGPATH]; + + getcwd(cwd, MAXPGPATH); + if (chdir(path) < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid directory \"%s\": %m", path))); + + /* getcwd is defined as returning absolute path */ + getcwd(abspath, MAXPGPATH); + + /* DataDir needs to be canonicalized */ + if (chdir(DataDir)) + ereport(FATAL, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("could not chdir to the data directory \"%s\": %m", + DataDir))); + getcwd(absdatadir, MAXPGPATH); + + /* this must succeed */ + if (chdir(cwd)) + ereport(FATAL, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("could not chdir to the current working directory \"%s\": %m", + cwd))); + + return path_is_prefix_of_path(absdatadir, abspath); +} /* * Each database using a table space is isolated into its own name space @@ -267,35 +303,26 @@ CreateTableSpace(CreateTableSpaceStmt *stmt) (errcode(ERRCODE_INVALID_NAME), errmsg("tablespace location cannot contain single quotes"))); - /* - * Allowing relative paths seems risky - * - * this also helps us ensure that location is not empty or whitespace - */ - if (!is_absolute_path(location)) + /* Reject tablespaces in the data directory. */ + if (is_in_data_directory(location)) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg("tablespace location must be an absolute path"))); + errmsg("tablespace location must not be inside the data directory"))); /* * Check that location isn't too long. Remember that we're going to append - * 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'. FYI, we never actually + * 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'. In the relative path case, we + * attach "../" at the beginning of the path. FYI, we never actually * reference the whole path here, but MakePGDirectory() uses the first two * parts. */ - if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + + if (3 + strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1 + OIDCHARS > MAXPGPATH) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("tablespace location \"%s\" is too long", location))); - /* Warn if the tablespace is in the data directory. */ - if (path_is_prefix_of_path(DataDir, location)) - ereport(WARNING, - (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg("tablespace location should not be inside the data directory"))); - /* * Disallow creation of tablespaces named "pg_xxx"; we reserve this * namespace for system purposes. @@ -570,7 +597,9 @@ static void create_tablespace_directories(const char *location, const Oid tablespaceoid) { char *linkloc; + char *linktarget; char *location_with_version_dir; + bool free_linktarget = false; struct stat st; linkloc = psprintf("pg_tblspc/%u", tablespaceoid); @@ -639,13 +668,28 @@ create_tablespace_directories(const char *location, const Oid tablespaceoid) /* * Create the symlink under PGDATA + + * Relative tablespace location is based on PGDATA directory. The symlink + * is created in PGDATA/pg_tblspc. So adjust relative paths by attaching + * "../" at the head. */ - if (symlink(location, linkloc) < 0) + if (is_absolute_path(location)) + linktarget = unconstify(char *, location); + else + { + linktarget = psprintf("../%s", location); + free_linktarget = true; + } + + if (symlink(linktarget, linkloc) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not create symbolic link \"%s\": %m", linkloc))); + if (free_linktarget) + pfree(linktarget); + pfree(linkloc); pfree(location_with_version_dir); } diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 36dcb28754..82ff4703d9 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1238,6 +1238,13 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, pathbuf))); linkpath[rllen] = '\0'; + /* + * Relative link target is always prefixed by "../". We need to + * remove it to obtain a relative tablespace directory. + */ + if (strncmp(linkpath, "../", 3) == 0) + memmove(linkpath, linkpath + 3, strlen(linkpath) + 1 - 3); + size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath, &statbuf, sizeonly); #else diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c index d330a88e3c..59f987e5ae 100644 --- a/src/backend/utils/adt/misc.c +++ b/src/backend/utils/adt/misc.c @@ -330,6 +330,13 @@ pg_tablespace_location(PG_FUNCTION_ARGS) sourcepath))); targetpath[rllen] = '\0'; + /* + * Relative target paths are always prefixed by "../". We need to remove + * it to obtain tablespace directory. + */ + if (strncmp(targetpath, "../", 3) == 0) + PG_RETURN_TEXT_P(cstring_to_text(targetpath + 3)); + PG_RETURN_TEXT_P(cstring_to_text(targetpath)); #else ereport(ERROR, diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 1a735b8046..0cd7a89c56 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -269,26 +269,6 @@ tablespace_list_append(const char *arg) exit(1); } - /* - * This check isn't absolutely necessary. But all tablespaces are created - * with absolute directories, so specifying a non-absolute path here would - * just never match, possibly confusing users. It's also good to be - * consistent with the new_dir check. - */ - if (!is_absolute_path(cell->old_dir)) - { - pg_log_error("old directory is not an absolute path in tablespace mapping: %s", - cell->old_dir); - exit(1); - } - - if (!is_absolute_path(cell->new_dir)) - { - pg_log_error("new directory is not an absolute path in tablespace mapping: %s", - cell->new_dir); - exit(1); - } - /* * Comparisons done with these values should involve similarly * canonicalized path values. This is particularly sensitive on Windows @@ -1399,9 +1379,20 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum) if (basetablespace) strlcpy(current_path, basedir, sizeof(current_path)); else - strlcpy(current_path, - get_tablespace_mapping(PQgetvalue(res, rownum, 1)), - sizeof(current_path)); + { + const char *path = get_tablespace_mapping(PQgetvalue(res, rownum, 1)); + + if (is_absolute_path(path)) + strlcpy(current_path, path, sizeof(current_path)); + else + { + /* Relative tablespace locations need to be prefixed by basedir. */ + strlcpy(current_path, basedir, sizeof(current_path)); + if (current_path[strlen(current_path) - 1] != '/') + strlcat(current_path, "/", sizeof(current_path)); + strlcat(current_path, path, sizeof(current_path)); + } + } /* * Get the COPY data @@ -1525,15 +1516,34 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum) * are, you can call it an undocumented feature that you * can map them too.) */ + bool free_path = false; + filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */ mapped_tblspc_path = get_tablespace_mapping(©buf[157]); + if (!is_absolute_path(mapped_tblspc_path)) + { + /* + * Convert relative tablespace location based on data + * directory into path link target based on pg_tblspc + * directory. + */ + char *p = malloc(3 + strlen(mapped_tblspc_path) + 1); + + strcpy(p, "../"); + strcat(p, mapped_tblspc_path); + mapped_tblspc_path = p; + free_path = true; + } if (symlink(mapped_tblspc_path, filename) != 0) { pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m", filename, mapped_tblspc_path); exit(1); } + + if (free_path) + free(unconstify(char *, mapped_tblspc_path)); } else { @@ -1957,8 +1967,28 @@ BaseBackup(void) if (format == 'p' && !PQgetisnull(res, i, 1)) { char *path = unconstify(char *, get_tablespace_mapping(PQgetvalue(res, i, 1))); + bool freeit = false; + + if (!is_absolute_path(path)) + { + /* + * Relative tablespace locations need to be converted + * attaching basedir. + */ + char *p = malloc(strlen(basedir) + 1 + strlen(path) + 1); + + strcpy(p, basedir); + if (p[strlen(p) - 1] != '/') + strcat(p, "/"); + strcat(p, path); + path = p; + freeit = true; + } verify_dir_is_empty_or_create(path, &made_tablespace_dirs, &found_tablespace_dirs); + + if (freeit) + free(path); } } @@ -2163,6 +2193,104 @@ BaseBackup(void) pg_log_info("base backup completed"); } +/* functions to avoid duplicate error handling */ +static void +simple_chdir(const char *path) +{ + if (chdir(path) < 0) + { + pg_log_error("chdir failed to \"%s\": %m", path); + exit(1); + } +} + +static void +simple_getcwd(char *buf, int len) +{ + if (getcwd(buf, len) < 0) + { + pg_log_error("getcwd failed: %m"); + exit(1); + } +} + +/* Sanity check of tablespace mappings */ +static void +check_tablespace_mappings(void) +{ + TablespaceListCell *cell; + char cwd[MAXPGPATH]; + char canonbase[MAXPGPATH]; + + simple_getcwd(cwd, MAXPGPATH); + + /* canonicalize basedir */ + simple_chdir(basedir); + simple_getcwd(canonbase, MAXPGPATH); + + for (cell = tablespace_dirs.head ; cell ; cell = cell->next) + { + char path[MAXPGPATH]; + + if (is_absolute_path(cell->new_dir)) + strlcpy(path, cell->new_dir, MAXPGPATH); + else + { + /* construct absolute path of relative new_dir */ + path[0] = 0; + if (!is_absolute_path(basedir)) + { + strlcpy(path, cwd, MAXPGPATH); + if (path[strlen(path) - 1] != '/') + strlcat(path, "/", MAXPGPATH); + } + + strlcat(path, basedir, MAXPGPATH); + if (path[strlen(path) - 1] != '/') + strlcat(path, "/", MAXPGPATH); + strlcat(path, cell->new_dir, MAXPGPATH); + } + + /* strip off trailing separators */ + while (path[strlen(path) - 1] == '/') + path[strlen(path) - 1] = 0; + + /* + * We allow new_dir is a nonexistent path. Even in that case, the + * parent directory must be exist. If chdir failed, go up one step + * then recheck then repeat. + */ + while (chdir(path) < 0) + { + /* only one step up is needed */ + if (errno == ENOENT) + { + /* Find the last separator */ + char *p = strrchr(path, '/'); + + /* We cannot go up further */ + if (p == NULL || p == path) + break; + + *p = 0; + continue; + } + + pg_log_error("invalid new_dir \"%s\" of mapping for \"%s\": %m", + cell->new_dir, cell->old_dir); + exit(1); + } + + simple_getcwd(path, MAXPGPATH); + if (strncmp(canonbase, path, strlen(canonbase)) == 0) + { + pg_log_error("new_dir \"%s\" of mapping for \"%s\" is inside target data directory \"%s\"", cell->new_dir, cell->old_dir,canonbase); + exit(1); + } + } + + simple_chdir(cwd); +} int main(int argc, char **argv) @@ -2513,6 +2641,12 @@ main(int argc, char **argv) if (format == 'p' || strcmp(basedir, "-") != 0) verify_dir_is_empty_or_create(basedir, &made_new_pgdata, &found_existing_pgdata); + /* + * Sanity check of tablespace mapping. This can be perform after creating + * basedir. + */ + check_tablespace_mappings(); + /* determine remote server's xlog segment size */ if (!RetrieveWalSegSize(conn)) exit(1); diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 33869fecc9..bd1a92df36 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -6,7 +6,7 @@ use File::Basename qw(basename dirname); use File::Path qw(rmtree); use PostgresNode; use TestLib; -use Test::More tests => 106; +use Test::More tests => 108; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -186,12 +186,19 @@ $node->command_fails( "-T/foo=/bar=/baz" ], '-T with multiple = fails'); +$node->command_ok( + [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo=../bar/baz/foo" ], + '-T with old directory not absolute succeeds'); +rmtree("$tempdir/backup_foo"); $node->command_fails( - [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo=/bar" ], - '-T with old directory not absolute fails'); + [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=$tempdir/backup_foo/bar" ], + '-T with new absolute directory inside data dir fails'); $node->command_fails( - [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=bar" ], - '-T with new directory not absolute fails'); + [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=../backup_foo/bar/baz" ], + '-T with new relative directory inside data dir fails'); +$node->command_fails( + [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=/foo/bar" ], + '-T with new directory under nonexistent directory fails'); $node->command_fails( [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo" ], '-T with invalid format fails'); diff --git a/src/test/regress/input/tablespace.source b/src/test/regress/input/tablespace.source index 14ce0e7e04..cf596d607f 100644 --- a/src/test/regress/input/tablespace.source +++ b/src/test/regress/input/tablespace.source @@ -125,6 +125,9 @@ SELECT COUNT(*) FROM testschema.atable; -- checks heap -- Will fail with bad path CREATE TABLESPACE regress_badspace LOCATION '/no/such/location'; +-- Will fail with data directory +CREATE TABLESPACE regress_badspace LOCATION '.'; + -- No such tablespace CREATE TABLE bar (i int) TABLESPACE regress_nosuchspace; diff --git a/src/test/regress/output/tablespace.source b/src/test/regress/output/tablespace.source index 8ebe08b9b2..c404d599f0 100644 --- a/src/test/regress/output/tablespace.source +++ b/src/test/regress/output/tablespace.source @@ -257,7 +257,10 @@ SELECT COUNT(*) FROM testschema.atable; -- checks heap -- Will fail with bad path CREATE TABLESPACE regress_badspace LOCATION '/no/such/location'; -ERROR: directory "/no/such/location" does not exist +ERROR: invalid directory "/no/such/location": No such file or directory +-- Will fail with data directory +CREATE TABLESPACE regress_badspace LOCATION '.'; +ERROR: tablespace location must not be inside the data directory -- No such tablespace CREATE TABLE bar (i int) TABLESPACE regress_nosuchspace; ERROR: tablespace "regress_nosuchspace" does not exist -- 2.16.3 From 007a366f3a51b02bdbce3f505b26c0340c47b67f Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Mon, 22 Apr 2019 20:58:53 +0900 Subject: [PATCH 2/5] Allow TAP test to excecise tablespace. To perform tablespace related checks, this patch lets PostgresNode::backup have a new parameter "tablespace_mapping", and make init_from_backup handle capable to handle a backup created using tablespace_mapping. --- src/test/perl/PostgresNode.pm | 45 +++++++++++++++++++++++++++++++++++++----- src/test/perl/RecursiveCopy.pm | 38 +++++++++++++++++++++++++++++++---- 2 files changed, 74 insertions(+), 9 deletions(-) diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index 76874141c5..e951acc461 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -157,6 +157,7 @@ sub new _host => $pghost, _basedir => "$TestLib::tmp_check/t_${testname}_${name}_data", _name => $name, + _tablespaces => [], _logfile_generation => 0, _logfile_base => "$TestLib::log_path/${testname}_${name}", _logfile => "$TestLib::log_path/${testname}_${name}.log" @@ -342,6 +343,26 @@ sub backup_dir =pod +=item $node->make_tablespace_dir() + +make a tablespace directory + +=cut + +sub make_tablespace_dir +{ + my ($self, $name) = @_; + my $basedir = $self->basedir; + + die "tablespace name contains '/'" if ($name =~ m#/#); + my $reldir = "../$name"; + mkdir $self->data_dir() . "/". $reldir; + push($self->{_tablespaces}, $reldir); + return $reldir; +} + +=pod + =item $node->info() Return a string containing human-readable diagnostic information (paths, etc) @@ -540,13 +561,27 @@ target server since it isn't done by default. sub backup { - my ($self, $backup_name) = @_; - my $backup_path = $self->backup_dir . '/' . $backup_name; + my ($self, $backup_name, %params) = @_; + my $backup_path = $self->backup_dir . '/' . $backup_name . '/' . "pgdata"; my $name = $self->name; + my @rest = (); + + if (defined $params{tablespace_mapping}) + { + foreach my $p (split /,/, $params{tablespace_mapping}) + { + push(@rest, "--tablespace-mapping=$p"); + } + } + + foreach my $p ($self->{_tablespaces}) + { + mkdir "$backup_path/$p"; + } print "# Taking pg_basebackup $backup_name from node \"$name\"\n"; TestLib::system_or_bail('pg_basebackup', '-D', $backup_path, '-h', - $self->host, '-p', $self->port, '--no-sync'); + $self->host, '-p', $self->port, '--no-sync', @rest); print "# Backup finished\n"; return; } @@ -592,7 +627,7 @@ sub backup_fs_cold sub _backup_fs { my ($self, $backup_name, $hot) = @_; - my $backup_path = $self->backup_dir . '/' . $backup_name; + my $backup_path = $self->backup_dir . '/' . $backup_name . '/' . "pgdata"; my $port = $self->port; my $name = $self->name; @@ -655,7 +690,7 @@ unconditionally set to enable replication connections. sub init_from_backup { my ($self, $root_node, $backup_name, %params) = @_; - my $backup_path = $root_node->backup_dir . '/' . $backup_name; + my $backup_path = $root_node->backup_dir . '/' . $backup_name . '/' . "pgdata"; my $host = $self->host; my $port = $self->port; my $node_name = $self->name; diff --git a/src/test/perl/RecursiveCopy.pm b/src/test/perl/RecursiveCopy.pm index baf5d0ac63..f165db7348 100644 --- a/src/test/perl/RecursiveCopy.pm +++ b/src/test/perl/RecursiveCopy.pm @@ -22,6 +22,7 @@ use warnings; use Carp; use File::Basename; use File::Copy; +use TestLib; =pod @@ -97,14 +98,43 @@ sub _copypath_recurse # invoke the filter and skip all further operation if it returns false return 1 unless &$filterfn($curr_path); - # Check for symlink -- needed only on source dir - # (note: this will fall through quietly if file is already gone) - croak "Cannot operate on symlink \"$srcpath\"" if -l $srcpath; - # Abort if destination path already exists. Should we allow directories # to exist already? croak "Destination path \"$destpath\" already exists" if -e $destpath; + # Check for symlink -- needed only on source dir + # (note: this will fall through quietly if file is already gone) + if (-l $srcpath) + { + croak "Cannot operate on symlink \"$srcpath\"" + if ($srcpath !~ /\/(pg_tblspc\/[0-9]+)$/); + + # We have mapped tablespaces. Copy them individually + my $linkname = $1; + my $rpath = my $target = readlink($srcpath); + + #convert to pgdata-based if relative + $rpath =~ s#^\.\./##; + + my $srcrealdir = "$base_src_dir/$rpath"; + my $dstrealdir = "$base_dest_dir/$rpath"; + + mkdir $dstrealdir; + opendir(my $dh, $srcrealdir); + while (readdir $dh) + { + next if (/^\.\.?$/); + my $spath = "$srcrealdir/$_"; + my $dpath = "$dstrealdir/$_"; + + copypath($spath, $dpath); + } + closedir $dh; + + symlink $target, $destpath; + return 1; + } + # If this source path is a file, simply copy it to destination with the # same name and we're done. if (-f $srcpath) -- 2.16.3 From 743bd557db00be16e5d59324b82b62ccf49e87ab Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Mon, 22 Apr 2019 20:10:25 +0900 Subject: [PATCH 3/5] Add check for recovery failure caused by tablespace. Removal of a tablespace on master can cause recovery failure on standby. This patch adds the check for the case. --- src/test/recovery/t/011_crash_recovery.pl | 45 ++++++++++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/src/test/recovery/t/011_crash_recovery.pl b/src/test/recovery/t/011_crash_recovery.pl index 5dc52412ca..40cf1d0cc3 100644 --- a/src/test/recovery/t/011_crash_recovery.pl +++ b/src/test/recovery/t/011_crash_recovery.pl @@ -15,7 +15,7 @@ if ($Config{osname} eq 'MSWin32') } else { - plan tests => 3; + plan tests => 4; } my $node = get_new_node('master'); @@ -66,3 +66,46 @@ is($node->safe_psql('postgres', qq[SELECT txid_status('$xid');]), 'aborted', 'xid is aborted after crash'); $tx->kill_kill; + + +# Ensure that tablespace removal doesn't cause error while recoverying +# the preceding create datbase or objects. + +my $node_master = get_new_node('master2'); +$node_master->init(allows_streaming => 1); +$node_master->start; + +# Create tablespace +my $tspdir = $node_master->make_tablespace_dir('ts1'); +$node_master->safe_psql('postgres', "CREATE TABLESPACE ts1 LOCATION '$tspdir'"); + +# Take backup +my $backup_name = 'my_backup'; +$node_master->backup($backup_name); +my $node_standby = get_new_node('standby'); +$node_standby->init_from_backup($node_master, $backup_name, has_streaming => 1); +$node_standby->start; + +# Make sure connection is made +$node_master->poll_query_until( + 'postgres', 'SELECT count(*) = 1 FROM pg_stat_replication'); + +# Make sure to perform restartpoint after tablespace creation +$node_master->wait_for_catchup($node_standby, 'replay', + $node_master->lsn('replay')); +$node_standby->safe_psql('postgres', 'CHECKPOINT'); + +# Do immediate shutdown just after a sequence of CREAT DATABASE / DROP +# DATABASE / DROP TABLESPACE. This leaves a CREATE DATBASE WAL record +# that is to be applied to already-removed tablespace. +$node_master->safe_psql('postgres', + q[CREATE DATABASE db1 WITH TABLESPACE ts1; + DROP DATABASE db1; + DROP TABLESPACE ts1;]); +$node_master->wait_for_catchup($node_standby, 'replay', + $node_master->lsn('replay')); +system("ls -l ". $node_standby->data_dir() . "/../ts1"); +$node_standby->stop('immediate'); + +# Should restart ignoring directory creation error. +is($node_standby->start(fail_ok => 1), 1); -- 2.16.3 From bc97e195f21af5d715d85424efc21fcbe8bb902c Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Mon, 22 Apr 2019 20:59:15 +0900 Subject: [PATCH 4/5] Fix failure of standby startup caused by tablespace removal When standby restarts after a crash after drop of a tablespace, there's a possibility that recovery fails trying an object-creation in already removed tablespace directory. Allow recovery to continue by ignoring the error if not reaching consistency point. --- src/backend/access/transam/xlogutils.c | 34 ++++++++++++++++++++++++++++++++++ src/backend/commands/tablespace.c | 12 ++++++------ src/backend/storage/file/copydir.c | 12 +++++++----- src/include/access/xlogutils.h | 1 + 4 files changed, 48 insertions(+), 11 deletions(-) diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 10a663bae6..75cdb882cd 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -522,6 +522,40 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, return buffer; } +/* + * XLogMakePGDirectory + * + * There is a possibility that WAL replay causes an error by creation of a + * directory under a directory removed before the previous crash. Issuing + * ERROR prevents the caller from continuing recovery. + * + * To prevent that case, this function issues WARNING instead of ERROR on + * error if consistency is not reached yet. + */ +int +XLogMakePGDirectory(const char *directoryName) +{ + int ret; + + ret = MakePGDirectory(directoryName); + + if (ret != 0) + { + int elevel = ERROR; + + /* Don't issue ERROR for this failure before reaching consistency. */ + if (InRecovery && !reachedConsistency) + elevel = WARNING; + + ereport(elevel, + (errcode_for_file_access(), + errmsg("could not create directory \"%s\": %m", directoryName))); + return ret; + } + + return 0; +} + /* * Struct actually returned by XLogFakeRelcacheEntry, though the declared * return type is Relation. diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index 66a70871e6..c9fb2af015 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -303,12 +303,6 @@ CreateTableSpace(CreateTableSpaceStmt *stmt) (errcode(ERRCODE_INVALID_NAME), errmsg("tablespace location cannot contain single quotes"))); - /* Reject tablespaces in the data directory. */ - if (is_in_data_directory(location)) - ereport(ERROR, - (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg("tablespace location must not be inside the data directory"))); - /* * Check that location isn't too long. Remember that we're going to append * 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'. In the relative path case, we @@ -323,6 +317,12 @@ CreateTableSpace(CreateTableSpaceStmt *stmt) errmsg("tablespace location \"%s\" is too long", location))); + /* Reject tablespaces in the data directory. */ + if (is_in_data_directory(location)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("tablespace location must not be inside the data directory"))); + /* * Disallow creation of tablespaces named "pg_xxx"; we reserve this * namespace for system purposes. diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c index 30f6200a86..0216270dd3 100644 --- a/src/backend/storage/file/copydir.c +++ b/src/backend/storage/file/copydir.c @@ -22,11 +22,11 @@ #include <unistd.h> #include <sys/stat.h> +#include "access/xlogutils.h" #include "storage/copydir.h" #include "storage/fd.h" #include "miscadmin.h" #include "pgstat.h" - /* * copydir: copy a directory * @@ -41,10 +41,12 @@ copydir(char *fromdir, char *todir, bool recurse) char fromfile[MAXPGPATH * 2]; char tofile[MAXPGPATH * 2]; - if (MakePGDirectory(todir) != 0) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not create directory \"%s\": %m", todir))); + /* + * We might have to skip copydir to continue recovery. See the function + * for details. + */ + if (XLogMakePGDirectory(todir) != 0) + return; xldir = AllocateDir(fromdir); diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h index 0ab5ba62f5..46a7596315 100644 --- a/src/include/access/xlogutils.h +++ b/src/include/access/xlogutils.h @@ -43,6 +43,7 @@ extern XLogRedoAction XLogReadBufferForRedoExtended(XLogReaderState *record, extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, BlockNumber blkno, ReadBufferMode mode); +extern int XLogMakePGDirectory(const char *directoryName); extern Relation CreateFakeRelcacheEntry(RelFileNode rnode); extern void FreeFakeRelcacheEntry(Relation fakerel); -- 2.16.3 From d699571de1d0fe3665d28026bcbe720e8f6ba738 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Wed, 24 Apr 2019 16:02:47 +0900 Subject: [PATCH 5/5] Documentation update --- doc/src/sgml/ref/create_tablespace.sgml | 5 +++-- doc/src/sgml/ref/pg_basebackup.sgml | 4 ++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/ref/create_tablespace.sgml b/doc/src/sgml/ref/create_tablespace.sgml index c621ec2c6b..e2e9625483 100644 --- a/doc/src/sgml/ref/create_tablespace.sgml +++ b/doc/src/sgml/ref/create_tablespace.sgml @@ -94,8 +94,9 @@ CREATE TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> The directory that will be used for the tablespace. The directory must exist (<command>CREATE TABLESPACE</command> will not create it), should be empty, and must be owned by the - <productname>PostgreSQL</productname> system user. The directory must be - specified by an absolute path name. + <productname>PostgreSQL</productname> system user. The directory may + be specified by a relative path name based on the data directory but + must not be inside the data directory. </para> </listitem> </varlistentry> diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index fc9e222f8d..ba9ba13b4c 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -238,8 +238,8 @@ PostgreSQL documentation path specification of the tablespace as it is currently defined. (But it is not an error if there is no tablespace in <replaceable>olddir</replaceable> contained in the backup.) - Both <replaceable>olddir</replaceable> - and <replaceable>newdir</replaceable> must be absolute paths. If a + If <replaceable>newdir</replaceable> is a relative path, + it is translated based on the output directory. If a path happens to contain a <literal>=</literal> sign, escape it with a backslash. This option can be specified multiple times for multiple tablespaces. See examples below. -- 2.16.3
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > At Wed, 24 Apr 2019 13:23:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in<20190424.132304.40676137.horiguchi.kyotaro@lab.ntt.co.jp> >> We need to adjust relative path between PGDATA-based and >> pg_tblspc based. The attached first patch does that. > This is new version. Adjusted pg_basebackup's behavior to allow > relative mappings. But.. I can't say that I like 0001 at all. It adds a bunch of complication and new failure modes (e.g., having to panic on chdir failure) in order to do what exactly? I've not been following the thread closely, but the original problem is surely just a dont-do-that misconfiguration. I also suspect that this is assuming way too much about the semantics of getcwd --- some filesystem configurations may have funny situations like multiple paths to the same place. 0004 also seems like it's adding at least as many failure modes as it removes. Moreover, isn't it just postponing the failure a little? Later WAL might well try to touch the directory you skipped creation of. We can't realistically decide that all WAL-application errors are ignorable, but that seems like the direction this would have us go in. regards, tom lane
Hi, On 2019-04-24 10:13:09 -0400, Tom Lane wrote: > Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > > At Wed, 24 Apr 2019 13:23:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in<20190424.132304.40676137.horiguchi.kyotaro@lab.ntt.co.jp> > >> We need to adjust relative path between PGDATA-based and > >> pg_tblspc based. The attached first patch does that. > > > This is new version. Adjusted pg_basebackup's behavior to allow > > relative mappings. But.. > > I can't say that I like 0001 at all. It adds a bunch of complication and > new failure modes (e.g., having to panic on chdir failure) in order to do > what exactly? I've not been following the thread closely, but the > original problem is surely just a dont-do-that misconfiguration. I also > suspect that this is assuming way too much about the semantics of getcwd > --- some filesystem configurations may have funny situations like multiple > paths to the same place. I'm not at all defending the conrete patch. But I think allowing relative paths to tablespaces would solve a whole lot of practical problems, while not meaningfully increasing failure modes. The inability to reasonably test master/standby setups on a single machine is pretty jarring (yes, one can use basebackup tablespace maps - but that doesn't work well for new tablespaces). And for a lot of production setups absolute paths suck too - it's far from a given that primary / standby databases need to have the same exact path layout. Greetings, Andres Freund
Hi, On 2019-04-24 17:02:28 +0900, Kyotaro HORIGUCHI wrote: > +/* > + * Check if the path is in the data directory strictly. > + */ > +static bool > +is_in_data_directory(const char *path) > +{ > + char cwd[MAXPGPATH]; > + char abspath[MAXPGPATH]; > + char absdatadir[MAXPGPATH]; > + > + getcwd(cwd, MAXPGPATH); > + if (chdir(path) < 0) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("invalid directory \"%s\": %m", path))); > + > + /* getcwd is defined as returning absolute path */ > + getcwd(abspath, MAXPGPATH); > + > + /* DataDir needs to be canonicalized */ > + if (chdir(DataDir)) > + ereport(FATAL, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("could not chdir to the data directory \"%s\": %m", > + DataDir))); > + getcwd(absdatadir, MAXPGPATH); > + > + /* this must succeed */ > + if (chdir(cwd)) > + ereport(FATAL, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("could not chdir to the current working directory \"%s\": %m", > + cwd))); > + > + return path_is_prefix_of_path(absdatadir, abspath); > +} This seems like a bad idea to me. Why don't we just use make_absolute_path() on the proposed tablespace path, and then check path_is_prefix_of() or such? Sure, that can be tricked using symlinks etc, but that's already the case. Greetings, Andres Freund
On Wed, Apr 24, 2019 at 9:25 AM Andres Freund <andres@anarazel.de> wrote:
The inability
to reasonably test master/standby setups on a single machine is pretty
jarring (yes, one can use basebackup tablespace maps - but that doesn't
work well for new tablespaces).
+1 agree. Feature which can't be easily tested becomes hurdle for development and this is one of them. For reference the bug reported in [1] is hard to test and fix without having easy ability to setup master/standby on same node. We discussed few ways to eliminate the issue in thread [2] but I wasn't able to find a workable solution. It would be really helpful to lift this testing limitation.
Andres Freund <andres@anarazel.de> writes: > On 2019-04-24 10:13:09 -0400, Tom Lane wrote: >> I can't say that I like 0001 at all. It adds a bunch of complication and >> new failure modes (e.g., having to panic on chdir failure) in order to do >> what exactly? I've not been following the thread closely, but the >> original problem is surely just a dont-do-that misconfiguration. I also >> suspect that this is assuming way too much about the semantics of getcwd >> --- some filesystem configurations may have funny situations like multiple >> paths to the same place. > I'm not at all defending the conrete patch. But I think allowing > relative paths to tablespaces would solve a whole lot of practical > problems, while not meaningfully increasing failure modes. I'm not against allowing relative tablespace paths. But I did not like the chdir and getcwd-semantics hazards --- why do we have to buy into all that to allow relative paths? I think it would likely be sufficient to state plainly in the docs that a relative path had better point outside $PGDATA, and maybe have some *simple* tests on the canonicalized form of the path to prevent obvious mistakes. Going further than that is likely to add more problems than it removes. regards, tom lane
At Wed, 24 Apr 2019 09:30:12 -0700, Andres Freund <andres@anarazel.de> wrote in <20190424163012.7wzdl6j2v73cufip@alap3.anarazel.de> > Hi, > > On 2019-04-24 17:02:28 +0900, Kyotaro HORIGUCHI wrote: > > +/* > > + * Check if the path is in the data directory strictly. > > + */ > > +static bool > > +is_in_data_directory(const char *path) > > +{ <hinde the ugly thing:p> > > + errmsg("could not chdir to the current working directory \"%s\": %m", > > + cwd))); > > + > > + return path_is_prefix_of_path(absdatadir, abspath); > > +} > > This seems like a bad idea to me. Why don't we just use > make_absolute_path() on the proposed tablespace path, and then check > path_is_prefix_of() or such? Sure, that can be tricked using symlinks > etc, but that's already the case. Thanks for the suggestions, Tom, Andres. For clarity, as I mentioned in the post, I didn't like the getcwd in 0001. The reason for the previous patch are: 1. canonicalize_path doesn't process '/.' and '/..' in the middle of a path. That prevents correct checking of directory inclusiveness. Actually regression test suffers that. 2. Simply I missed make_absolute_path.. So, I rewote canonicalize_path to process '.' and '..'s not only at the end of a path but all occurrances in a path. This makes the strange loop of chdir-getwen useless. But the new canonicalize_path is a bit complex. The modified canonicalize_path works filesystem-access-free so it doesn't follow symlinks. Thus it makes a misjudge when two paths are in an inclusion relationship after resolving symlinks in the paths. But I don't think we don't need to treat such a malicious situation. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From b5d620940c70d59b70617e129eab72d54f010384 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Thu, 25 Apr 2019 15:08:54 +0900 Subject: [PATCH] Allow relative tablespace location paths Currently relative paths are not allowed as tablespace directory. That makes tests about tablespaces bothersome but doesn't prevent them points into the data directory perfectly. This patch allows relative direcotry based on the data directory as tablespace directory. Then makes the check more strict. --- src/backend/access/transam/xlog.c | 8 ++ src/backend/commands/tablespace.c | 55 ++++++-- src/backend/replication/basebackup.c | 11 +- src/backend/utils/adt/misc.c | 10 +- src/bin/pg_basebackup/pg_basebackup.c | 113 ++++++++++++--- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 18 ++- src/port/path.c | 204 +++++++++++++++++---------- 7 files changed, 300 insertions(+), 119 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index c00b63c751..abc2fd951f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10419,6 +10419,14 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, } linkpath[rllen] = '\0'; + /* + * In the relative path cases, the link target is always prefixed + * by "../" to convert from data directory-based. So we just do + * the reverse here. + */ + if (strncmp(s, "../", 3) == 0) + s += 3; + /* * Add the escape character '\\' before newline in a string to * ensure that we can distinguish between the newline in the diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index 3784ea4b4f..98d9e65629 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -94,6 +94,25 @@ static void create_tablespace_directories(const char *location, const Oid tablespaceoid); static bool destroy_tablespace_directories(Oid tablespaceoid, bool redo); +/* + * Check if the path is in the data directory strictly. + */ +static bool +is_in_data_directory(const char *path) +{ + char *tmp_path = make_absolute_path(path); + char tmp_datadir[MAXPGPATH]; + bool in_datadir; + + Assert (tmp_path != NULL); + strlcpy(tmp_datadir, DataDir, MAXPGPATH); + canonicalize_path(tmp_datadir); + + in_datadir = path_is_prefix_of_path(tmp_datadir, tmp_path); + free(tmp_path); + + return in_datadir; +} /* * Each database using a table space is isolated into its own name space @@ -272,30 +291,26 @@ CreateTableSpace(CreateTableSpaceStmt *stmt) * * this also helps us ensure that location is not empty or whitespace */ - if (!is_absolute_path(location)) + /* Reject tablespaces in the data directory. */ + if (is_in_data_directory(location)) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg("tablespace location must be an absolute path"))); + errmsg("tablespace location must not be inside the data directory"))); /* * Check that location isn't too long. Remember that we're going to append - * 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'. FYI, we never actually + * 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'. In the relative path case, we + * attach "../" at the beginning of the path. FYI, we never actually * reference the whole path here, but MakePGDirectory() uses the first two * parts. */ - if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + + if (3 + strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1 + OIDCHARS > MAXPGPATH) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("tablespace location \"%s\" is too long", location))); - /* Warn if the tablespace is in the data directory. */ - if (path_is_prefix_of_path(DataDir, location)) - ereport(WARNING, - (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg("tablespace location should not be inside the data directory"))); - /* * Disallow creation of tablespaces named "pg_xxx"; we reserve this * namespace for system purposes. @@ -570,8 +585,10 @@ static void create_tablespace_directories(const char *location, const Oid tablespaceoid) { char *linkloc; + char *linktarget; char *location_with_version_dir; - struct stat st; + bool free_linktarget = false; + struct stat st; linkloc = psprintf("pg_tblspc/%u", tablespaceoid); location_with_version_dir = psprintf("%s/%s", location, @@ -639,13 +656,27 @@ create_tablespace_directories(const char *location, const Oid tablespaceoid) /* * Create the symlink under PGDATA + * Relative tablespace location is based on PGDATA directory. The symlink + * is created in PGDATA/pg_tblspc. So adjust relative paths by attaching + * "../" at the head. */ - if (symlink(location, linkloc) < 0) + if (is_absolute_path(location)) + linktarget = unconstify(char *, location); + else + { + linktarget = psprintf("../%s", location); + free_linktarget = true; + } + + if (symlink(linktarget, linkloc) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not create symbolic link \"%s\": %m", linkloc))); + if (free_linktarget) + pfree(linktarget); + pfree(linkloc); pfree(location_with_version_dir); } diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 36dcb28754..98e34e21c1 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1224,6 +1224,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, #if defined(HAVE_READLINK) || defined(WIN32) char linkpath[MAXPGPATH]; int rllen; + int linkoffset = 0; rllen = readlink(pathbuf, linkpath, sizeof(linkpath)); if (rllen < 0) @@ -1238,7 +1239,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, pathbuf))); linkpath[rllen] = '\0'; - size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath, + /* + * Relative link target is always prefixed by "../". Remove it to + * obtain a tablespace directory relative to the data directory. + */ + if (strncmp(linkpath, "../", 3) == 0) + linkoffset = 3; + + size += _tarWriteHeader(pathbuf + basepathlen + 1, + linkpath + linkoffset, &statbuf, sizeonly); #else diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c index d330a88e3c..205c8ac903 100644 --- a/src/backend/utils/adt/misc.c +++ b/src/backend/utils/adt/misc.c @@ -293,6 +293,7 @@ pg_tablespace_location(PG_FUNCTION_ARGS) char sourcepath[MAXPGPATH]; char targetpath[MAXPGPATH]; int rllen; + int pathoffset = 0; /* * It's useful to apply this function to pg_class.reltablespace, wherein @@ -330,7 +331,14 @@ pg_tablespace_location(PG_FUNCTION_ARGS) sourcepath))); targetpath[rllen] = '\0'; - PG_RETURN_TEXT_P(cstring_to_text(targetpath)); + /* + * Relative target paths are always prefixed by "../". Remove it to obtain + * tablespace directory relative to the data directory. + */ + if (strncmp(targetpath, "../", 3) == 0) + pathoffset = 3; + + PG_RETURN_TEXT_P(cstring_to_text(targetpath + pathoffset)); #else ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 1a735b8046..2a9a10e3f4 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -269,26 +269,6 @@ tablespace_list_append(const char *arg) exit(1); } - /* - * This check isn't absolutely necessary. But all tablespaces are created - * with absolute directories, so specifying a non-absolute path here would - * just never match, possibly confusing users. It's also good to be - * consistent with the new_dir check. - */ - if (!is_absolute_path(cell->old_dir)) - { - pg_log_error("old directory is not an absolute path in tablespace mapping: %s", - cell->old_dir); - exit(1); - } - - if (!is_absolute_path(cell->new_dir)) - { - pg_log_error("new directory is not an absolute path in tablespace mapping: %s", - cell->new_dir); - exit(1); - } - /* * Comparisons done with these values should involve similarly * canonicalized path values. This is particularly sensitive on Windows @@ -1399,9 +1379,20 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum) if (basetablespace) strlcpy(current_path, basedir, sizeof(current_path)); else - strlcpy(current_path, - get_tablespace_mapping(PQgetvalue(res, rownum, 1)), - sizeof(current_path)); + { + const char *path = get_tablespace_mapping(PQgetvalue(res, rownum, 1)); + + if (is_absolute_path(path)) + strlcpy(current_path, path, sizeof(current_path)); + else + { + /* Relative tablespace locations need to be prefixed by basedir. */ + strlcpy(current_path, basedir, sizeof(current_path)); + if (current_path[strlen(current_path) - 1] != '/') + strlcat(current_path, "/", sizeof(current_path)); + strlcat(current_path, path, sizeof(current_path)); + } + } /* * Get the COPY data @@ -1525,15 +1516,35 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum) * are, you can call it an undocumented feature that you * can map them too.) */ + bool free_path = false; + filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */ mapped_tblspc_path = get_tablespace_mapping(©buf[157]); + if (!is_absolute_path(mapped_tblspc_path)) + { + /* + * Convert relative tablespace location based on data + * directory into path link target based on pg_tblspc + * directory. This must be in sync with + * create_tablespace_directories(). + */ + char *p = malloc(3 + strlen(mapped_tblspc_path) + 1); + + strcpy(p, "../"); + strcat(p, mapped_tblspc_path); + mapped_tblspc_path = p; + free_path = true; + } if (symlink(mapped_tblspc_path, filename) != 0) { pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m", filename, mapped_tblspc_path); exit(1); } + + if (free_path) + free(unconstify(char *, mapped_tblspc_path)); } else { @@ -1957,8 +1968,28 @@ BaseBackup(void) if (format == 'p' && !PQgetisnull(res, i, 1)) { char *path = unconstify(char *, get_tablespace_mapping(PQgetvalue(res, i, 1))); + bool freeit = false; + + if (!is_absolute_path(path)) + { + /* + * Relative tablespace locations need to be converted + * attaching basedir. + */ + char *p = malloc(strlen(basedir) + 1 + strlen(path) + 1); + + strcpy(p, basedir); + if (p[strlen(p) - 1] != '/') + strcat(p, "/"); + strcat(p, path); + path = p; + freeit = true; + } verify_dir_is_empty_or_create(path, &made_tablespace_dirs, &found_tablespace_dirs); + + if (freeit) + free(path); } } @@ -2163,6 +2194,39 @@ BaseBackup(void) pg_log_info("base backup completed"); } +/* Sanity check of tablespace mappings */ +static void +check_tablespace_mappings(void) +{ + TablespaceListCell *cell; + char *absbasedir = make_absolute_path(basedir); + + for (cell = tablespace_dirs.head ; cell ; cell = cell->next) + { + char pathbuf[MAXPGPATH]; + char *path = cell->new_dir; + + /* + * absbasedir doesn't have trailing '/'. new_dir is already + * canonicalized. + */ + if (!is_absolute_path(cell->new_dir)) + { + /* but concatenated path needs re-canonicalization */ + snprintf(pathbuf, MAXPGPATH, "%s/%s", absbasedir, cell->new_dir); + canonicalize_path(pathbuf); + path = pathbuf; + } + + if (path_is_prefix_of_path(absbasedir, path)) + { + pg_log_error("new_dir \"%s\" of mapping for \"%s\" is inside target data directory \"%s\"", cell->new_dir, cell->old_dir,absbasedir); + exit(1); + } + } + + free(absbasedir); +} int main(int argc, char **argv) @@ -2478,6 +2542,9 @@ main(int argc, char **argv) } } + /* Sanity check of tablespace mapping */ + check_tablespace_mappings(); + #ifndef HAVE_LIBZ if (compresslevel != 0) { diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 33869fecc9..239569a4c3 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -6,7 +6,7 @@ use File::Basename qw(basename dirname); use File::Path qw(rmtree); use PostgresNode; use TestLib; -use Test::More tests => 106; +use Test::More tests => 108; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -186,12 +186,20 @@ $node->command_fails( "-T/foo=/bar=/baz" ], '-T with multiple = fails'); +$node->command_ok( + [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo=../bar/baz/foo" ], + '-T with relative old directory not rejected'); +rmtree("$tempdir/backup_foo"); $node->command_fails( - [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo=/bar" ], - '-T with old directory not absolute fails'); + [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=../backup_foo/bar/baz" ], + '-T with new relative directory inside data dir fails'); +$node->command_ok( + [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=/foo/bar" ], + '-T with new directory under nonexistent directory succeeds'); +rmtree("$tempdir/backup_foo"); $node->command_fails( - [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=bar" ], - '-T with new directory not absolute fails'); + [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=../backup_foo/bar/baz" ], + '-T with new relative directory inside data dir fails'); $node->command_fails( [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo" ], '-T with invalid format fails'); diff --git a/src/port/path.c b/src/port/path.c index 4b214e89e4..8a03ea065b 100644 --- a/src/port/path.c +++ b/src/port/path.c @@ -247,19 +247,17 @@ join_path_components(char *ret_path, * o remove trailing quote on Win32 * o remove trailing slash * o remove duplicate adjacent separators - * o remove trailing '.' - * o process trailing '..' ourselves + * o process '.' and '..' ourselves */ void canonicalize_path(char *path) { - char *p, - *to_p; - char *spath; - bool was_sep = false; - int pending_strips; + char *src; + char *dst; + int totallen PG_USED_FOR_ASSERTS_ONLY; #ifdef WIN32 + char *p; /* * The Windows command processor will accept suitably quoted paths with @@ -277,91 +275,143 @@ canonicalize_path(char *path) */ if (p > path && *(p - 1) == '"') *(p - 1) = '/'; + + /* Skip drive component or UNC header */ + path = skip_drive(path); +#endif + +#ifdef USE_ASSERT_CHECKING + totallen = strlen(path); #endif /* - * Removing the trailing slash on a path means we never get ugly double - * trailing slashes. Also, Win32 can't stat() a directory with a trailing - * slash. Don't remove a leading slash, though. - */ - trim_trailing_separator(path); - - /* - * Remove duplicate adjacent separators - */ - p = path; -#ifdef WIN32 - /* Don't remove leading double-slash on Win32 */ - if (*p) - p++; -#endif - to_p = p; - for (; *p; p++, to_p++) - { - /* Handle many adjacent slashes, like "/a///b" */ - while (*p == '/' && was_sep) - p++; - if (to_p != p) - *to_p = *p; - was_sep = (*p == '/'); - } - *to_p = '\0'; - - /* - * Remove any trailing uses of "." and process ".." ourselves + * Process any uses of "." and ".." ourselves * * Note that "/../.." should reduce to just "/", while "../.." has to be - * kept as-is. In the latter case we put back mistakenly trimmed ".." - * components below. Also note that we want a Windows drive spec to be - * visible to trim_directory(), but it's not part of the logic that's - * looking at the name components; hence distinction between path and - * spath. + * kept as-is. */ - spath = skip_drive(path); - pending_strips = 0; - for (;;) - { - int len = strlen(spath); + src = dst = path; - if (len >= 2 && strcmp(spath + len - 2, "/.") == 0) - trim_directory(path); - else if (strcmp(spath, ".") == 0) + while (*src) + { + char *p; + int elen; + + Assert(dst >= path && src >= path && + dst - path <= totallen && src - path <= totallen); + + /* copy separator, removing duplicate adjacent separators */ + if (IS_DIR_SEP(*src)) { - /* Want to leave "." alone, but "./.." has to become ".." */ - if (pending_strips > 0) - *spath = '\0'; - break; + if (dst > path && IS_DIR_SEP(dst[-1])) + { + /* skip duplicates */ + src++; + continue; + } + *dst++ = *src++; + continue; } - else if ((len >= 3 && strcmp(spath + len - 3, "/..") == 0) || - strcmp(spath, "..") == 0) + + /* locate the next element */ + for (p = src ; *p && !IS_DIR_SEP(*p) ; p++); + elen = p - src; + + Assert (elen != 0); + + /* process "." */ + if (elen == 1 && src[0] == '.') { - trim_directory(path); - pending_strips++; + /* just skip the element and succeeding separators */ + for (src += elen ; IS_DIR_SEP(*src) ; src++); + + /* If the path was exactly ".", don't skip it */ + if (*src || src != path + 1) + continue; + + src = path; } - else if (pending_strips > 0 && *spath != '\0') + /* process ".." not at the beginning of a relative path */ + else if (elen == 2 && src[0] == '.' && src[1] == '.' && dst != path) { - /* trim a regular directory name canceled by ".." */ - trim_directory(path); - pending_strips--; - /* foo/.. should become ".", not empty */ - if (*spath == '\0') - strcpy(spath, "."); + /* + * Found ".." and there's something up-step, step back. + * But only when we have the upper element to back up to. + * Also don't annihilate two successive ".."s. + */ + if (dst == path + 1) + { + /* + * dst is absolute root. skip the ".." and trailing separator + * if any + */ + src += elen; + if (IS_DIR_SEP(*src)) + src++; + continue; + } + else + { + /* Find the previous element. dst[-1] == '/' here. */ + char *pelm; + + for (pelm = dst - 1 ; + pelm > path && !IS_DIR_SEP(pelm[-1]) ; + pelm--); + + if (dst - pelm == 2 && *pelm == '.') + { + /* + * Convert "./.." into ".." then process this ".." + * again. + */ + dst = pelm; + continue; + } + else if (dst - pelm > 3 || strncmp(pelm, "..", 2) != 0) + { + /* The previous element is not a relative movement */ + if (pelm == path) + { + /* + * The relative path is being translated as + * nothing. Insert "./" instead. + */ + memcpy(path, "./", 2); + dst = path + 2; + src += elen; + if (IS_DIR_SEP(*src)) + src++; + } + else + { + /* annihilate involving the previous element */ + dst = pelm; + src += elen; + if (IS_DIR_SEP(*src)) + src++; + } + continue; + } + } } - else - break; + else if ((dst == path + 2 && *path == '.') || + (dst > path + 2 && IS_DIR_SEP(dst[-3]) && dst[-2] == '.')) + { + /* the previous element is removable ".", eliminate it */ + dst -= 2; + } + + /* copy it, then move */ + memcpy(dst, src, elen); + dst += elen; + src += elen; } - if (pending_strips > 0) - { - /* - * We could only get here if path is now totally empty (other than a - * possible drive specifier on Windows). We have to put back one or - * more ".."'s that we took off. - */ - while (--pending_strips > 0) - strcat(path, "../"); - strcat(path, ".."); - } + /* trim trailing separators */ + while (dst > path + 1 && dst[-1] == '/') + dst--; + *dst = 0; } /* -- 2.16.3
Hello. Win32 implement cannot have symbolic link feature as Linux-like OSes for some restrictions. (Windows 7 and 10 behave differently, as I heard.) So the 0002 patch implemnets "fake" symbolic link as mentioned in its commit message. Also I fixed 0001 slightly. regards. At Thu, 25 Apr 2019 17:08:55 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in > Thanks for the suggestions, Tom, Andres. For clarity, as I > mentioned in the post, I didn't like the getcwd in 0001. The > reason for the previous patch are: > > 1. canonicalize_path doesn't process '/.' and '/..' in the middle > of a path. That prevents correct checking of directory > inclusiveness. Actually regression test suffers that. > > 2. Simply I missed make_absolute_path.. > > So, I rewote canonicalize_path to process '.' and '..'s not only > at the end of a path but all occurrances in a path. This makes > the strange loop of chdir-getwen useless. But the new > canonicalize_path is a bit complex. > > The modified canonicalize_path works filesystem-access-free so it > doesn't follow symlinks. Thus it makes a misjudge when two paths > are in an inclusion relationship after resolving symlinks in the > paths. But I don't think we don't need to treat such a malicious > situation. reagrds. -- Kyotaro Horiguchi NTT Open Source Software Center From 762ec1cbea1a6a11b6f4569549611a7c2c31a3dc Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Thu, 25 Apr 2019 15:08:54 +0900 Subject: [PATCH 1/2] Allow relative tablespace location paths Currently relative paths are not allowed as tablespace directory. That makes tests about tablespaces bothersome but doesn't prevent them points into the data directory perfectly. This patch allows relative direcotry based on the data directory as tablespace directory. Then makes the check more strict. --- src/backend/access/transam/xlog.c | 8 ++ src/backend/commands/tablespace.c | 55 ++++++-- src/backend/replication/basebackup.c | 11 +- src/backend/utils/adt/misc.c | 10 +- src/bin/pg_basebackup/pg_basebackup.c | 105 +++++++++++--- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 18 ++- src/port/path.c | 204 +++++++++++++++++---------- 7 files changed, 292 insertions(+), 119 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index c00b63c751..abc2fd951f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10419,6 +10419,14 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, } linkpath[rllen] = '\0'; + /* + * In the relative path cases, the link target is always prefixed + * by "../" to convert from data directory-based. So we just do + * the reverse here. + */ + if (strncmp(s, "../", 3) == 0) + s += 3; + /* * Add the escape character '\\' before newline in a string to * ensure that we can distinguish between the newline in the diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index 8ec963f1cf..fada719687 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -94,6 +94,25 @@ static void create_tablespace_directories(const char *location, const Oid tablespaceoid); static bool destroy_tablespace_directories(Oid tablespaceoid, bool redo); +/* + * Check if the path is in the data directory strictly. + */ +static bool +is_in_data_directory(const char *path) +{ + char *tmp_path = make_absolute_path(path); + char tmp_datadir[MAXPGPATH]; + bool in_datadir; + + Assert (tmp_path != NULL); + strlcpy(tmp_datadir, DataDir, MAXPGPATH); + canonicalize_path(tmp_datadir); + + in_datadir = path_is_prefix_of_path(tmp_datadir, tmp_path); + free(tmp_path); + + return in_datadir; +} /* * Each database using a table space is isolated into its own name space @@ -272,30 +291,26 @@ CreateTableSpace(CreateTableSpaceStmt *stmt) * * this also helps us ensure that location is not empty or whitespace */ - if (!is_absolute_path(location)) + /* Reject tablespaces in the data directory. */ + if (is_in_data_directory(location)) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg("tablespace location must be an absolute path"))); + errmsg("tablespace location must not be inside the data directory"))); /* * Check that location isn't too long. Remember that we're going to append - * 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'. FYI, we never actually + * 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'. In the relative path case, we + * attach "../" at the beginning of the path. FYI, we never actually * reference the whole path here, but MakePGDirectory() uses the first two * parts. */ - if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + + if (3 + strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1 + OIDCHARS > MAXPGPATH) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("tablespace location \"%s\" is too long", location))); - /* Warn if the tablespace is in the data directory. */ - if (path_is_prefix_of_path(DataDir, location)) - ereport(WARNING, - (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg("tablespace location should not be inside the data directory"))); - /* * Disallow creation of tablespaces named "pg_xxx"; we reserve this * namespace for system purposes. @@ -570,8 +585,10 @@ static void create_tablespace_directories(const char *location, const Oid tablespaceoid) { char *linkloc; + char *linktarget; char *location_with_version_dir; - struct stat st; + bool free_linktarget = false; + struct stat st; linkloc = psprintf("pg_tblspc/%u", tablespaceoid); location_with_version_dir = psprintf("%s/%s", location, @@ -639,13 +656,27 @@ create_tablespace_directories(const char *location, const Oid tablespaceoid) /* * Create the symlink under PGDATA + * Relative tablespace location is based on PGDATA directory. The symlink + * is created in PGDATA/pg_tblspc. So adjust relative paths by attaching + * "../" at the head. */ - if (symlink(location, linkloc) < 0) + if (is_absolute_path(location)) + linktarget = unconstify(char *, location); + else + { + linktarget = psprintf("../%s", location); + free_linktarget = true; + } + + if (symlink(linktarget, linkloc) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not create symbolic link \"%s\": %m", linkloc))); + if (free_linktarget) + pfree(linktarget); + pfree(linkloc); pfree(location_with_version_dir); } diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 36dcb28754..98e34e21c1 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1224,6 +1224,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, #if defined(HAVE_READLINK) || defined(WIN32) char linkpath[MAXPGPATH]; int rllen; + int linkoffset = 0; rllen = readlink(pathbuf, linkpath, sizeof(linkpath)); if (rllen < 0) @@ -1238,7 +1239,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, pathbuf))); linkpath[rllen] = '\0'; - size += _tarWriteHeader(pathbuf + basepathlen + 1, linkpath, + /* + * Relative link target is always prefixed by "../". Remove it to + * obtain a tablespace directory relative to the data directory. + */ + if (strncmp(linkpath, "../", 3) == 0) + linkoffset = 3; + + size += _tarWriteHeader(pathbuf + basepathlen + 1, + linkpath + linkoffset, &statbuf, sizeonly); #else diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c index d330a88e3c..205c8ac903 100644 --- a/src/backend/utils/adt/misc.c +++ b/src/backend/utils/adt/misc.c @@ -293,6 +293,7 @@ pg_tablespace_location(PG_FUNCTION_ARGS) char sourcepath[MAXPGPATH]; char targetpath[MAXPGPATH]; int rllen; + int pathoffset = 0; /* * It's useful to apply this function to pg_class.reltablespace, wherein @@ -330,7 +331,14 @@ pg_tablespace_location(PG_FUNCTION_ARGS) sourcepath))); targetpath[rllen] = '\0'; - PG_RETURN_TEXT_P(cstring_to_text(targetpath)); + /* + * Relative target paths are always prefixed by "../". Remove it to obtain + * tablespace directory relative to the data directory. + */ + if (strncmp(targetpath, "../", 3) == 0) + pathoffset = 3; + + PG_RETURN_TEXT_P(cstring_to_text(targetpath + pathoffset)); #else ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 1a735b8046..428e2340cc 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -269,26 +269,6 @@ tablespace_list_append(const char *arg) exit(1); } - /* - * This check isn't absolutely necessary. But all tablespaces are created - * with absolute directories, so specifying a non-absolute path here would - * just never match, possibly confusing users. It's also good to be - * consistent with the new_dir check. - */ - if (!is_absolute_path(cell->old_dir)) - { - pg_log_error("old directory is not an absolute path in tablespace mapping: %s", - cell->old_dir); - exit(1); - } - - if (!is_absolute_path(cell->new_dir)) - { - pg_log_error("new directory is not an absolute path in tablespace mapping: %s", - cell->new_dir); - exit(1); - } - /* * Comparisons done with these values should involve similarly * canonicalized path values. This is particularly sensitive on Windows @@ -1399,9 +1379,15 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum) if (basetablespace) strlcpy(current_path, basedir, sizeof(current_path)); else - strlcpy(current_path, - get_tablespace_mapping(PQgetvalue(res, rownum, 1)), - sizeof(current_path)); + { + const char *path = get_tablespace_mapping(PQgetvalue(res, rownum, 1)); + + /* Relative tablespace locations need to be prefixed by basedir. */ + if (is_absolute_path(path)) + strlcpy(current_path, path, sizeof(current_path)); + else + join_path_components(currennt_path, basedir, path); + } /* * Get the COPY data @@ -1525,15 +1511,35 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum) * are, you can call it an undocumented feature that you * can map them too.) */ + bool free_path = false; + filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */ mapped_tblspc_path = get_tablespace_mapping(©buf[157]); + if (!is_absolute_path(mapped_tblspc_path)) + { + /* + * Convert relative tablespace location based on data + * directory into path link target based on pg_tblspc + * directory. This must be in sync with + * create_tablespace_directories(). + */ + char *p = malloc(3 + strlen(mapped_tblspc_path) + 1); + + strcpy(p, "../"); + strcat(p, mapped_tblspc_path); + mapped_tblspc_path = p; + free_path = true; + } if (symlink(mapped_tblspc_path, filename) != 0) { pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m", filename, mapped_tblspc_path); exit(1); } + + if (free_path) + free(unconstify(char *, mapped_tblspc_path)); } else { @@ -1957,8 +1963,25 @@ BaseBackup(void) if (format == 'p' && !PQgetisnull(res, i, 1)) { char *path = unconstify(char *, get_tablespace_mapping(PQgetvalue(res, i, 1))); + bool freeit = false; + + if (!is_absolute_path(path)) + { + /* + * Relative tablespace locations need to be converted + * attaching basedir. + */ + char *p = malloc(strlen(basedir) + 1 + strlen(path) + 1); + + join_path_components(p, basedir, path); + path = p; + freeit = true; + } verify_dir_is_empty_or_create(path, &made_tablespace_dirs, &found_tablespace_dirs); + + if (freeit) + free(path); } } @@ -2163,6 +2186,39 @@ BaseBackup(void) pg_log_info("base backup completed"); } +/* Sanity check of tablespace mappings */ +static void +check_tablespace_mappings(void) +{ + TablespaceListCell *cell; + char *absbasedir = make_absolute_path(basedir); + + for (cell = tablespace_dirs.head ; cell ; cell = cell->next) + { + char pathbuf[MAXPGPATH]; + char *path = cell->new_dir; + + /* + * absbasedir doesn't have trailing '/'. new_dir is already + * canonicalized. + */ + if (!is_absolute_path(cell->new_dir)) + { + /* but concatenated path needs re-canonicalization */ + snprintf(pathbuf, MAXPGPATH, "%s/%s", absbasedir, cell->new_dir); + canonicalize_path(pathbuf); + path = pathbuf; + } + + if (path_is_prefix_of_path(absbasedir, path)) + { + pg_log_error("new_dir \"%s\" of mapping for \"%s\" is inside target data directory \"%s\"", cell->new_dir, cell->old_dir,absbasedir); + exit(1); + } + } + + free(absbasedir); +} int main(int argc, char **argv) @@ -2478,6 +2534,9 @@ main(int argc, char **argv) } } + /* Sanity check of tablespace mapping */ + check_tablespace_mappings(); + #ifndef HAVE_LIBZ if (compresslevel != 0) { diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 33869fecc9..239569a4c3 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -6,7 +6,7 @@ use File::Basename qw(basename dirname); use File::Path qw(rmtree); use PostgresNode; use TestLib; -use Test::More tests => 106; +use Test::More tests => 108; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -186,12 +186,20 @@ $node->command_fails( "-T/foo=/bar=/baz" ], '-T with multiple = fails'); +$node->command_ok( + [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo=../bar/baz/foo" ], + '-T with relative old directory not rejected'); +rmtree("$tempdir/backup_foo"); $node->command_fails( - [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo=/bar" ], - '-T with old directory not absolute fails'); + [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=../backup_foo/bar/baz" ], + '-T with new relative directory inside data dir fails'); +$node->command_ok( + [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=/foo/bar" ], + '-T with new directory under nonexistent directory succeeds'); +rmtree("$tempdir/backup_foo"); $node->command_fails( - [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=bar" ], - '-T with new directory not absolute fails'); + [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=../backup_foo/bar/baz" ], + '-T with new relative directory inside data dir fails'); $node->command_fails( [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo" ], '-T with invalid format fails'); diff --git a/src/port/path.c b/src/port/path.c index 4b214e89e4..dd8ae40e0f 100644 --- a/src/port/path.c +++ b/src/port/path.c @@ -247,19 +247,17 @@ join_path_components(char *ret_path, * o remove trailing quote on Win32 * o remove trailing slash * o remove duplicate adjacent separators - * o remove trailing '.' - * o process trailing '..' ourselves + * o process '.' and '..' ourselves */ void canonicalize_path(char *path) { - char *p, - *to_p; - char *spath; - bool was_sep = false; - int pending_strips; + char *src; + char *dst; + int totallen PG_USED_FOR_ASSERTS_ONLY; #ifdef WIN32 + char *p; /* * The Windows command processor will accept suitably quoted paths with @@ -277,91 +275,143 @@ canonicalize_path(char *path) */ if (p > path && *(p - 1) == '"') *(p - 1) = '/'; + + /* Skip drive component or UNC header */ + path = skip_drive(path); +#endif + +#ifdef USE_ASSERT_CHECKING + totallen = strlen(path); #endif /* - * Removing the trailing slash on a path means we never get ugly double - * trailing slashes. Also, Win32 can't stat() a directory with a trailing - * slash. Don't remove a leading slash, though. - */ - trim_trailing_separator(path); - - /* - * Remove duplicate adjacent separators - */ - p = path; -#ifdef WIN32 - /* Don't remove leading double-slash on Win32 */ - if (*p) - p++; -#endif - to_p = p; - for (; *p; p++, to_p++) - { - /* Handle many adjacent slashes, like "/a///b" */ - while (*p == '/' && was_sep) - p++; - if (to_p != p) - *to_p = *p; - was_sep = (*p == '/'); - } - *to_p = '\0'; - - /* - * Remove any trailing uses of "." and process ".." ourselves + * Process any uses of "." and ".." ourselves * * Note that "/../.." should reduce to just "/", while "../.." has to be - * kept as-is. In the latter case we put back mistakenly trimmed ".." - * components below. Also note that we want a Windows drive spec to be - * visible to trim_directory(), but it's not part of the logic that's - * looking at the name components; hence distinction between path and - * spath. + * kept as-is. */ - spath = skip_drive(path); - pending_strips = 0; - for (;;) - { - int len = strlen(spath); + src = dst = path; - if (len >= 2 && strcmp(spath + len - 2, "/.") == 0) - trim_directory(path); - else if (strcmp(spath, ".") == 0) + while (*src) + { + char *p; + int elen; + + Assert(dst >= path && src >= path && + dst - path <= totallen && src - path <= totallen); + + /* copy separator, removing duplicate adjacent separators */ + if (IS_DIR_SEP(*src)) { - /* Want to leave "." alone, but "./.." has to become ".." */ - if (pending_strips > 0) - *spath = '\0'; - break; + if (dst > path && IS_DIR_SEP(dst[-1])) + { + /* skip duplicates */ + src++; + continue; + } + *dst++ = *src++; + continue; } - else if ((len >= 3 && strcmp(spath + len - 3, "/..") == 0) || - strcmp(spath, "..") == 0) + + /* locate the next element */ + for (p = src ; *p && !IS_DIR_SEP(*p) ; p++); + elen = p - src; + + Assert (elen != 0); + + /* process "." */ + if (elen == 1 && src[0] == '.') { - trim_directory(path); - pending_strips++; + /* just skip the element and succeeding separators */ + for (src += elen ; IS_DIR_SEP(*src) ; src++); + + /* If the path was exactly ".", don't skip it */ + if (*src || src != path + 1) + continue; + + src = path; } - else if (pending_strips > 0 && *spath != '\0') + /* process ".." not at the beginning of a relative path */ + else if (elen == 2 && src[0] == '.' && src[1] == '.' && dst != path) { - /* trim a regular directory name canceled by ".." */ - trim_directory(path); - pending_strips--; - /* foo/.. should become ".", not empty */ - if (*spath == '\0') - strcpy(spath, "."); + /* + * Found ".." and there's something up-step, step back. + * But only when we have the upper element to back up to. + * Also don't annihilate two successive ".."s. + */ + if (dst == path + 1) + { + /* + * dst is absolute root. skip the ".." and trailing separator + * if any + */ + src += elen; + if (IS_DIR_SEP(*src)) + src++; + continue; + } + else + { + /* Find the previous element. dst[-1] == '/' here. */ + char *pelm; + + for (pelm = dst - 1 ; + pelm > path && !IS_DIR_SEP(pelm[-1]) ; + pelm--); + + if (dst - pelm == 2 && *pelm == '.') + { + /* + * Convert "./.." into ".." then process this ".." + * again. + */ + dst = pelm; + continue; + } + else if (dst - pelm > 3 || strncmp(pelm, "..", 2) != 0) + { + /* The previous element is not a relative movement */ + if (pelm == path) + { + /* + * The relative path is being translated as + * nothing. Insert "./" instead. + */ + memcpy(path, "./", 2); + dst = path + 2; + src += elen; + if (IS_DIR_SEP(*src)) + src++; + } + else + { + /* annihilate involving the previous element */ + dst = pelm; + src += elen; + if (IS_DIR_SEP(*src)) + src++; + } + continue; + } + } } - else - break; + else if ((dst == path + 2 && *path == '.') || + (dst > path + 2 && IS_DIR_SEP(dst[-3]) && dst[-2] == '.')) + { + /* the previous element is removable ".", eliminate it */ + dst -= 2; + } + + /* copy it, then move */ + memcpy(dst, src, elen); + dst += elen; + src += elen; } - if (pending_strips > 0) - { - /* - * We could only get here if path is now totally empty (other than a - * possible drive specifier on Windows). We have to put back one or - * more ".."'s that we took off. - */ - while (--pending_strips > 0) - strcat(path, "../"); - strcat(path, ".."); - } + /* trim trailing separators */ + while (dst > path + 1 && IS_DIR_SEP(dst[-1])) + dst--; + *dst = 0; } /* -- 2.16.3 From e89a60f0451f1c63f53d3ecb609556785ce44930 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Fri, 26 Apr 2019 17:19:01 +0900 Subject: [PATCH 2/2] win32 tentative implement This is a patch for discussion. Win32 implement doesn't have relative symbolic links. AFAICS relative symbolic links on Win32 requires special setting to use CreateSymbolicLink with non-administrator accounts, and there's no way to use DeviceIoControl for IO_REPARSE_TAG_SYMLINK with non-admistrator accounts. So, this patch creates "fake" symbolic links, which is, it's really a mount point but looks as if a symbolic links. The difference is revealed (that is, the "relative" link will be broken) when moving the data directory and tablespace directory to another place. --- src/port/dirmod.c | 75 ++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 58 insertions(+), 17 deletions(-) diff --git a/src/port/dirmod.c b/src/port/dirmod.c index d7932401ef..f8aaae2c64 100644 --- a/src/port/dirmod.c +++ b/src/port/dirmod.c @@ -159,10 +159,13 @@ int pgsymlink(const char *oldpath, const char *newpath) { HANDLE dirhandle; - DWORD len; - char buffer[MAX_PATH * sizeof(WCHAR) + offsetof(REPARSE_JUNCTION_DATA_BUFFER, PathBuffer)]; + DWORD len, plen; + char buffer[MAX_PATH * sizeof(WCHAR) * 2 + offsetof(REPARSE_JUNCTION_DATA_BUFFER, PathBuffer)]; char nativeTarget[MAX_PATH]; + char nativePrint[MAX_PATH]; + char tmp[MAX_PATH]; char *p = nativeTarget; + char *poldpath = oldpath; REPARSE_JUNCTION_DATA_BUFFER *reparseBuf = (REPARSE_JUNCTION_DATA_BUFFER *) buffer; CreateDirectory(newpath, 0); @@ -173,25 +176,50 @@ pgsymlink(const char *oldpath, const char *newpath) if (dirhandle == INVALID_HANDLE_VALUE) return -1; - /* make sure we have an unparsed native win32 path */ - if (memcmp("\\??\\", oldpath, 4) != 0) - snprintf(nativeTarget, sizeof(nativeTarget), "\\??\\%s", oldpath); + /* make absolute path if oldpath is not */ + if (!is_absolute_path(poldpath)) + { + char *absnew = make_absolute_path(newpath); + /* back up to pg_tblspc */ + get_parent_directory(absnew); + join_path_components(tmp, absnew, oldpath); + free(absnew); + + canonicalize_path(tmp); + + /* Copy oldpath as print name */ + strlcpy(nativePrint, oldpath, sizeof(nativePrint)); + + poldpath = tmp; + } else - strlcpy(nativeTarget, oldpath, sizeof(nativeTarget)); + nativePrint[0] = 0; + + /* make sure we have an unparsed native win32 path */ + if (is_absolute_path(poldpath)) + { + if (memcmp("\\??\\", poldpath, 4) != 0) + snprintf(nativeTarget, sizeof(nativeTarget), "\\??\\%s", poldpath); + else + strlcpy(nativeTarget, poldpath, sizeof(nativeTarget)); + } while ((p = strchr(p, '/')) != NULL) *p++ = '\\'; len = strlen(nativeTarget) * sizeof(WCHAR); + plen = strlen(nativePrint) * sizeof(WCHAR); reparseBuf->ReparseTag = IO_REPARSE_TAG_MOUNT_POINT; - reparseBuf->ReparseDataLength = len + 12; + reparseBuf->ReparseDataLength = len + plen + 12; reparseBuf->Reserved = 0; reparseBuf->SubstituteNameOffset = 0; reparseBuf->SubstituteNameLength = len; reparseBuf->PrintNameOffset = len + sizeof(WCHAR); - reparseBuf->PrintNameLength = 0; + reparseBuf->PrintNameLength = plen; MultiByteToWideChar(CP_ACP, 0, nativeTarget, -1, reparseBuf->PathBuffer, MAX_PATH); + MultiByteToWideChar(CP_ACP, 0, nativePrint, -1, + &reparseBuf->PathBuffer[reparseBuf->PrintNameOffset/2], MAX_PATH); /* * FSCTL_SET_REPARSE_POINT is coded differently depending on SDK version; @@ -308,15 +336,27 @@ pgreadlink(const char *path, char *buf, size_t size) /* Got it, let's get some results from this */ if (reparseBuf->ReparseTag != IO_REPARSE_TAG_MOUNT_POINT) { - errno = EINVAL; - return -1; + errno = EINVAL; + return -1; } - r = WideCharToMultiByte(CP_ACP, 0, - reparseBuf->PathBuffer, -1, - buf, - size, - NULL, NULL); + /* + * if PrintName is registered, this link is emulated relative + * symbolic link. Return the PrintName rather than SubstituteName. + */ + if (reparseBuf->PrintNameLength > 0) + r = WideCharToMultiByte(CP_ACP, 0, + &reparseBuf->PathBuffer[reparseBuf->PrintNameOffset / 2], + -1, + buf, + size, + NULL, NULL); + else + r = WideCharToMultiByte(CP_ACP, 0, + reparseBuf->PathBuffer, -1, + buf, + size, + NULL, NULL); if (r <= 0) { @@ -328,8 +368,9 @@ pgreadlink(const char *path, char *buf, size_t size) * If the path starts with "\??\", which it will do in most (all?) cases, * strip those out. */ - if (r > 4 && strncmp(buf, "\\??\\", 4) == 0) - { + if (reparseBuf->PrintNameLength == 0 && + r > 4 && strncmp(buf, "\\??\\", 4) == 0) + { memmove(buf, buf + 4, strlen(buf + 4) + 1); r -= 4; } -- 2.16.3
Hi, On 2019-04-26 17:29:56 +0900, Kyotaro HORIGUCHI wrote: > Win32 implement cannot have symbolic link feature as Linux-like > OSes for some restrictions. (Windows 7 and 10 behave differently, > as I heard.) > > So the 0002 patch implemnets "fake" symbolic link as mentioned in > its commit message. I'm confused - what does this have to do with the topic at hand? Also, don't we already emulate symlinks with junction points? - Andres
Hello. At Fri, 26 Apr 2019 12:25:10 -0700, Andres Freund <andres@anarazel.de> wrote in <20190426192510.dndtaxslneoh4rs5@alap3.anarazel.de> > On 2019-04-26 17:29:56 +0900, Kyotaro HORIGUCHI wrote: > > Win32 implement cannot have symbolic link feature as Linux-like > > OSes for some restrictions. (Windows 7 and 10 behave differently, > > as I heard.) > > > > So the 0002 patch implemnets "fake" symbolic link as mentioned in > > its commit message. > > I'm confused - what does this have to do with the topic at hand? Also, > don't we already emulate symlinks with junction points? Just to know how we have, or whether we can have relative tablespaces on Windows. The answer for the second question is "no" for relative symbolic links. The current implement based on reparse point emulates *nix symlinks partly. It is using "mount point"(= junktion point) which accepts only absolute paths (to a directory). Windows has an API CreateSymbolincLink() but it needs administrator privilege at least on Win7. Giving a flag allows unprivileged creation if the OS is running under "Developer Mode". On Windows10 (I don't have one), AFAIK CreateSymbolicLink() is changed not to need the privilege, but the flag in turn leads to error that "invalid flag". Reparse point also can implement symbolic link but it needs administrator privilege at least on Windows7. The fake symlinks need correction after the data directory and tablespsce directory are moved. Maybe needs to call CorrectSymlink() or something at startup... Or relative tablespaces should be rejected on Windows? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, May 07, 2019 at 10:16:54AM +0900, Kyotaro HORIGUCHI wrote: > The fake symlinks need correction after the data directory and > tablespsce directory are moved. Maybe needs to call > CorrectSymlink() or something at startup... Or relative > tablespaces should be rejected on Windows? It took enough sweat and tears to have an implementation with junction points done correctly on Windows and we know that it works, so I am not sure that we need an actual wrapper for readlink() and such for the backend code to replace junction points. The issue with Windows is that perl's symlink() is not directly available on Windows. -- Michael
Attachment
At Tue, 7 May 2019 10:55:06 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190507015506.GC1499@paquier.xyz> > On Tue, May 07, 2019 at 10:16:54AM +0900, Kyotaro HORIGUCHI wrote: > > The fake symlinks need correction after the data directory and > > tablespsce directory are moved. Maybe needs to call > > CorrectSymlink() or something at startup... Or relative > > tablespaces should be rejected on Windows? > > It took enough sweat and tears to have an implementation with junction > points done correctly on Windows and we know that it works, so I am Indeed. It is very ill documented and complex. > not sure that we need an actual wrapper for readlink() and such for > the backend code to replace junction points. The issue with Windows > is that perl's symlink() is not directly available on Windows. Ugg. If we want to run tablespace-related tests involving replication on Windows, we need to make the tests using absolute tablespace paths. Period...? regards. -- Kyotaro Horiguchi NTT Open Source Software Center