Thread: Moving tablespaces
Do we have any documentation about how to move a tablespace to a new directory? If not, I think we should write some. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Sun, Dec 4, 2011 at 00:43, Bruce Momjian <bruce@momjian.us> wrote: > Do we have any documentation about how to move a tablespace to a new > directory? If not, I think we should write some. Do we have any support for doing it? (Yes, it works, but anything that requires manual hacking of system catalogs really can't be considered supported, can it?) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander wrote: > On Sun, Dec 4, 2011 at 00:43, Bruce Momjian <bruce@momjian.us> wrote: > > Do we have any documentation about how to move a tablespace to a new > > directory? ?If not, I think we should write some. > > Do we have any support for doing it? (Yes, it works, but anything that > requires manual hacking of system catalogs really can't be considered > supported, can it?) True. It is something we just don't support? They have to dump, edit the dump, and reload, to change a tablespace directory? Yikes. Is that the state we are in? Has no one complained about this? They just use symlinks? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Sun, Dec 4, 2011 at 17:12, Bruce Momjian <bruce@momjian.us> wrote: > Magnus Hagander wrote: >> On Sun, Dec 4, 2011 at 00:43, Bruce Momjian <bruce@momjian.us> wrote: >> > Do we have any documentation about how to move a tablespace to a new >> > directory? ?If not, I think we should write some. >> >> Do we have any support for doing it? (Yes, it works, but anything that >> requires manual hacking of system catalogs really can't be considered >> supported, can it?) > > True. It is something we just don't support? They have to dump, edit > the dump, and reload, to change a tablespace directory? Yikes. Is that > the state we are in? Has no one complained about this? They just use > symlinks? AFAIK, we don't. What you can do is take the db offline, change the symlink, start it up agin and manually change pg_tablespace.spclocation. That seems quite ugly though. And if you forget one step, everything seems to work, but you have two inconsistent definitions of the tablespace. And IIRC, we don't actually *use* spclocation anywhere. How about we just get rid of them as independents? We could either: 1) Remove the column. Rely on the symlink. Create a pg_get_tablespace_location(oid) function, that could be used by pg_dumpall and friends, that just reads the symlink. 2) Forcibly update the spclocation column when we start the server to be whatever the symlink points to. That will at least automatically restore the system to being consistent. Option 1 would also make it a lot easier to in a supported way allow tablespaces to have different locations on replica masters and slaves. A tool like pg_basebackup could easily provide for something like --relocate-tablespace mytblspc=/new/path and just rewrite the symlink on the fly. But we cannot modify the pg_tablespace system catalog to be different on the slave and the master... It does seem rather obvious to me that this would be a win, so I'm most likely missing something here. So please shoot a hole in the theory for me ;) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > And IIRC, we don't actually *use* spclocation anywhere. Just for pg_dump, I think. > How about we > just get rid of them as independents? We could either: > 1) Remove the column. Rely on the symlink. Create a > pg_get_tablespace_location(oid) function, that could be used by > pg_dumpall and friends, that just reads the symlink. Hm, how portable is symlink-reading? If we can actually do that without big headaches, then +1. > 2) Forcibly update the spclocation column when we start the server to > be whatever the symlink points to. That will at least automatically > restore the system to being consistent. -1, running a transaction at that level will be a giant pita. And how would you do it at all on a hot standby slave? regards, tom lane
On Sun, Dec 4, 2011 at 17:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Magnus Hagander <magnus@hagander.net> writes: >> And IIRC, we don't actually *use* spclocation anywhere. > > Just for pg_dump, I think. pg_dumpall :-) It's also used in pg_upgrade and pg_basebackup, but those are easily dealt with if we define a function for it. >> How about we >> just get rid of them as independents? We could either: > >> 1) Remove the column. Rely on the symlink. Create a >> pg_get_tablespace_location(oid) function, that could be used by >> pg_dumpall and friends, that just reads the symlink. > > Hm, how portable is symlink-reading? If we can actually do that > without big headaches, then +1. We already use readlink in a couple of places - including getting called from find_my_exec. It's also used in pg_basebackup. The use in find_my_exec is conditional on HAVE_READLINK, but not the one in backend/replication/basebackup.c. An oversight from my side when putting that in, but it shows that every single bf member we have now has readlink - else the whole compilation would fail, AFAICT. It's not directly portable to win32, but we have a wrapper function for it in port/ already. So I think we're fairly committed to having readlink already. >> 2) Forcibly update the spclocation column when we start the server to >> be whatever the symlink points to. That will at least automatically >> restore the system to being consistent. > > -1, running a transaction at that level will be a giant pita. > And how would you do it at all on a hot standby slave? yeah, it would break that usage scenario, so I'm -1 for that one as well. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On 12/04/2011 11:41 AM, Tom Lane wrote: > >> 1) Remove the column. Rely on the symlink. Create a >> pg_get_tablespace_location(oid) function, that could be used by >> pg_dumpall and friends, that just reads the symlink. > Hm, how portable is symlink-reading? If we can actually do that > without big headaches, then +1. > I wondered that, specifically about Windows junction points, but we seem to have support for it already in dirmod.c::pgreadlink(). Surely there's no other currently supported platform where it would even be a question? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 12/04/2011 11:41 AM, Tom Lane wrote: >> Hm, how portable is symlink-reading? If we can actually do that >> without big headaches, then +1. > I wondered that, specifically about Windows junction points, but we seem > to have support for it already in dirmod.c::pgreadlink(). Surely there's > no other currently supported platform where it would even be a question? readlink is required by Single Unix Spec v2 (1997), which is what we've been treating as our baseline expectation for Unix-oid platforms for awhile now. Given that we dealt with the Windows side already, I don't see a problem with making this assumption. At worst we'd end up needing a couple more emulations in src/port, since surely there's *some* way to do it on any platform with symlinks. regards, tom lane
On Sun, Dec 4, 2011 at 18:07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 12/04/2011 11:41 AM, Tom Lane wrote: >>> Hm, how portable is symlink-reading? If we can actually do that >>> without big headaches, then +1. > >> I wondered that, specifically about Windows junction points, but we seem >> to have support for it already in dirmod.c::pgreadlink(). Surely there's >> no other currently supported platform where it would even be a question? > > readlink is required by Single Unix Spec v2 (1997), which is what we've > been treating as our baseline expectation for Unix-oid platforms for > awhile now. Given that we dealt with the Windows side already, I don't > see a problem with making this assumption. At worst we'd end up needing > a couple more emulations in src/port, since surely there's *some* way to > do it on any platform with symlinks. AFAICT, it should be as simple as the attached. Doesn't include the required fixes for pg_upgrade, I'll get on those next. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Attachment
Magnus Hagander <magnus@hagander.net> writes: > + snprintf(sourcepath, sizeof(sourcepath), "pg_tblspc/%d", tablespaceOid); %u for an OID, please. Otherwise seems reasonably sane on first glance. regards, tom lane
Magnus Hagander <magnus@hagander.net> writes: > AFAICT, it should be as simple as the attached. Oh, one other thought is that the function body has to be conditionalized on HAVE_READLINK (the fact that you forgot that somewhere else isn't an excuse for not doing it here). IIRC, only the two built-in tablespaces can exist when not HAVE_READLINK, so it seems sufficient to handle those cases and then PG_RETURN_NULL (or maybe throw error) when not HAVE_READLINK. regards, tom lane
On Tue, Dec 6, 2011 at 16:12, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Magnus Hagander <magnus@hagander.net> writes: >> AFAICT, it should be as simple as the attached. > > Oh, one other thought is that the function body has to be > conditionalized on HAVE_READLINK (the fact that you forgot that > somewhere else isn't an excuse for not doing it here). IIRC, > only the two built-in tablespaces can exist when not HAVE_READLINK, > so it seems sufficient to handle those cases and then PG_RETURN_NULL > (or maybe throw error) when not HAVE_READLINK. Hmm. good point. Throwing an error seems a lot more safe in this case than just returning NULL. Since it's a situtation that really shouldn't happen. Maybe an assert, but I think a regular ereport(ERROR) would be the best. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > Throwing an error seems a lot more safe in this case than just > returning NULL. Since it's a situtation that really shouldn't happen. > Maybe an assert, but I think a regular ereport(ERROR) would be the > best. Not an assert, since it's trivially user-triggerable. regards, tom lane
On Tue, Dec 6, 2011 at 16:17, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Throwing an error seems a lot more safe in this case than just >> returning NULL. Since it's a situtation that really shouldn't happen. >> Maybe an assert, but I think a regular ereport(ERROR) would be the >> best. > > Not an assert, since it's trivially user-triggerable. Right. There is some nice precedent in the CREATE TABLESPACE command (though dependent on HAVE_SYMLINK and not HAVE_READLINK), so I'm just going to copy the error message from there. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > There is some nice precedent in the CREATE TABLESPACE command (though > dependent on HAVE_SYMLINK and not HAVE_READLINK), so I'm just going to > copy the error message from there. Fair enough. Looking at the existing readlink use in port/exec.c, it strikes me that another thing you'd better do is include a check for buffer overrun, ie the test needs to be more like rllen = readlink(fname, link_buf, sizeof(link_buf)); if (rllen < 0 || rllen >= sizeof(link_buf)) ... fail ... Also, you're assuming that the result is already null-terminated, which is incorrect. regards, tom lane
On Tue, Dec 6, 2011 at 17:07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Magnus Hagander <magnus@hagander.net> writes: >> There is some nice precedent in the CREATE TABLESPACE command (though >> dependent on HAVE_SYMLINK and not HAVE_READLINK), so I'm just going to >> copy the error message from there. > > Fair enough. > > Looking at the existing readlink use in port/exec.c, it strikes me that > another thing you'd better do is include a check for buffer overrun, > ie the test needs to be more like > > rllen = readlink(fname, link_buf, sizeof(link_buf)); > if (rllen < 0 || rllen >= sizeof(link_buf)) > ... fail ... Seems reasonable, yeah. I'll go put a similar check in the basebackup.c file as well when I'm done here. > Also, you're assuming that the result is already null-terminated, > which is incorrect. No, I'm not - I'm MemSet()ing the whole buffer to 0 before I start. But I'll change that to work the same way as the on in port/exec.c, for consistency. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Wed, Dec 7, 2011 at 10:05, Magnus Hagander <magnus@hagander.net> wrote: > On Tue, Dec 6, 2011 at 17:07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Magnus Hagander <magnus@hagander.net> writes: >>> There is some nice precedent in the CREATE TABLESPACE command (though >>> dependent on HAVE_SYMLINK and not HAVE_READLINK), so I'm just going to >>> copy the error message from there. >> >> Fair enough. >> >> Looking at the existing readlink use in port/exec.c, it strikes me that >> another thing you'd better do is include a check for buffer overrun, >> ie the test needs to be more like >> >> rllen = readlink(fname, link_buf, sizeof(link_buf)); >> if (rllen < 0 || rllen >= sizeof(link_buf)) >> ... fail ... > > Seems reasonable, yeah. I'll go put a similar check in the > basebackup.c file as well when I'm done here. To close this thread (hopefully): Fixed and applied. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/