Thread: Moving tablespaces

Moving tablespaces

From
Bruce Momjian
Date:
Do we have any documentation about how to move a tablespace to a new
directory?  If not, I think we should write some.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: Moving tablespaces

From
Magnus Hagander
Date:
On Sun, Dec 4, 2011 at 00:43, Bruce Momjian <bruce@momjian.us> wrote:
> Do we have any documentation about how to move a tablespace to a new
> directory?  If not, I think we should write some.

Do we have any support for doing it? (Yes, it works, but anything that
requires manual hacking of system catalogs really can't be considered
supported, can it?)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Moving tablespaces

From
Bruce Momjian
Date:
Magnus Hagander wrote:
> On Sun, Dec 4, 2011 at 00:43, Bruce Momjian <bruce@momjian.us> wrote:
> > Do we have any documentation about how to move a tablespace to a new
> > directory? ?If not, I think we should write some.
>
> Do we have any support for doing it? (Yes, it works, but anything that
> requires manual hacking of system catalogs really can't be considered
> supported, can it?)

True.  It is something we just don't support?  They have to dump, edit
the dump, and reload, to change a tablespace directory?  Yikes.  Is that
the state we are in?  Has no one complained about this?  They just use
symlinks?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: Moving tablespaces

From
Magnus Hagander
Date:
On Sun, Dec 4, 2011 at 17:12, Bruce Momjian <bruce@momjian.us> wrote:
> Magnus Hagander wrote:
>> On Sun, Dec 4, 2011 at 00:43, Bruce Momjian <bruce@momjian.us> wrote:
>> > Do we have any documentation about how to move a tablespace to a new
>> > directory? ?If not, I think we should write some.
>>
>> Do we have any support for doing it? (Yes, it works, but anything that
>> requires manual hacking of system catalogs really can't be considered
>> supported, can it?)
>
> True.  It is something we just don't support?  They have to dump, edit
> the dump, and reload, to change a tablespace directory?  Yikes.  Is that
> the state we are in?  Has no one complained about this?  They just use
> symlinks?

AFAIK, we don't. What you can do is take the db offline, change the
symlink, start it up agin and manually change
pg_tablespace.spclocation. That seems quite ugly though. And if you
forget one step, everything seems to work, but you have two
inconsistent definitions of the tablespace.

And IIRC, we don't actually *use* spclocation anywhere. How about we
just get rid of them as independents? We could either:

1) Remove the column. Rely on the symlink. Create a
pg_get_tablespace_location(oid) function, that could be used by
pg_dumpall and friends, that just reads the symlink.

2) Forcibly update the spclocation column when we start the server to
be whatever the symlink points to. That will at least automatically
restore the system to being consistent.

Option 1 would also make it a lot easier to in a supported way allow
tablespaces to have different locations on replica masters and slaves.
A tool like pg_basebackup could easily provide for something like
--relocate-tablespace mytblspc=/new/path and just rewrite the symlink
on the fly. But we cannot modify the pg_tablespace system catalog to
be different on the slave and the master...

It does seem rather obvious to me that this would be a win, so I'm
most likely missing something here. So please shoot a hole in the
theory for me ;)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Moving tablespaces

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> And IIRC, we don't actually *use* spclocation anywhere.

Just for pg_dump, I think.

> How about we
> just get rid of them as independents? We could either:

> 1) Remove the column. Rely on the symlink. Create a
> pg_get_tablespace_location(oid) function, that could be used by
> pg_dumpall and friends, that just reads the symlink.

Hm, how portable is symlink-reading?  If we can actually do that
without big headaches, then +1.

> 2) Forcibly update the spclocation column when we start the server to
> be whatever the symlink points to. That will at least automatically
> restore the system to being consistent.

-1, running a transaction at that level will be a giant pita.
And how would you do it at all on a hot standby slave?

            regards, tom lane

Re: Moving tablespaces

From
Magnus Hagander
Date:
On Sun, Dec 4, 2011 at 17:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> And IIRC, we don't actually *use* spclocation anywhere.
>
> Just for pg_dump, I think.

pg_dumpall :-)

It's also used in pg_upgrade and pg_basebackup, but those are easily
dealt with if we define a function for it.


>> How about we
>> just get rid of them as independents? We could either:
>
>> 1) Remove the column. Rely on the symlink. Create a
>> pg_get_tablespace_location(oid) function, that could be used by
>> pg_dumpall and friends, that just reads the symlink.
>
> Hm, how portable is symlink-reading?  If we can actually do that
> without big headaches, then +1.

We already use readlink in a couple of places - including getting
called from find_my_exec. It's also used in pg_basebackup. The use in
find_my_exec is conditional on HAVE_READLINK, but not the one in
backend/replication/basebackup.c. An oversight from my side when
putting that in, but it shows that every single bf member we have now
has readlink - else the whole compilation would fail, AFAICT.

It's not directly portable to win32, but we have a wrapper function
for it in port/ already.

So I think we're fairly committed to having readlink already.


>> 2) Forcibly update the spclocation column when we start the server to
>> be whatever the symlink points to. That will at least automatically
>> restore the system to being consistent.
>
> -1, running a transaction at that level will be a giant pita.
> And how would you do it at all on a hot standby slave?

yeah, it would break that usage scenario, so I'm -1 for that one as well.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Moving tablespaces

From
Andrew Dunstan
Date:

On 12/04/2011 11:41 AM, Tom Lane wrote:
>
>> 1) Remove the column. Rely on the symlink. Create a
>> pg_get_tablespace_location(oid) function, that could be used by
>> pg_dumpall and friends, that just reads the symlink.
> Hm, how portable is symlink-reading?  If we can actually do that
> without big headaches, then +1.
>


I wondered that, specifically about Windows junction points, but we seem
to have support for it already in dirmod.c::pgreadlink(). Surely there's
no other currently supported platform where it would even be a question?

cheers

andrew

Re: [HACKERS] Moving tablespaces

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 12/04/2011 11:41 AM, Tom Lane wrote:
>> Hm, how portable is symlink-reading?  If we can actually do that
>> without big headaches, then +1.

> I wondered that, specifically about Windows junction points, but we seem
> to have support for it already in dirmod.c::pgreadlink(). Surely there's
> no other currently supported platform where it would even be a question?

readlink is required by Single Unix Spec v2 (1997), which is what we've
been treating as our baseline expectation for Unix-oid platforms for
awhile now.  Given that we dealt with the Windows side already, I don't
see a problem with making this assumption.  At worst we'd end up needing
a couple more emulations in src/port, since surely there's *some* way to
do it on any platform with symlinks.

            regards, tom lane

Re: [HACKERS] Moving tablespaces

From
Magnus Hagander
Date:
On Sun, Dec 4, 2011 at 18:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 12/04/2011 11:41 AM, Tom Lane wrote:
>>> Hm, how portable is symlink-reading?  If we can actually do that
>>> without big headaches, then +1.
>
>> I wondered that, specifically about Windows junction points, but we seem
>> to have support for it already in dirmod.c::pgreadlink(). Surely there's
>> no other currently supported platform where it would even be a question?
>
> readlink is required by Single Unix Spec v2 (1997), which is what we've
> been treating as our baseline expectation for Unix-oid platforms for
> awhile now.  Given that we dealt with the Windows side already, I don't
> see a problem with making this assumption.  At worst we'd end up needing
> a couple more emulations in src/port, since surely there's *some* way to
> do it on any platform with symlinks.

AFAICT, it should be as simple as the attached.

Doesn't include the required fixes for pg_upgrade, I'll get on those next.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment

Re: [HACKERS] Moving tablespaces

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> +     snprintf(sourcepath, sizeof(sourcepath), "pg_tblspc/%d", tablespaceOid);

%u for an OID, please.  Otherwise seems reasonably sane on first glance.

            regards, tom lane

Re: [HACKERS] Moving tablespaces

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> AFAICT, it should be as simple as the attached.

Oh, one other thought is that the function body has to be
conditionalized on HAVE_READLINK (the fact that you forgot that
somewhere else isn't an excuse for not doing it here).  IIRC,
only the two built-in tablespaces can exist when not HAVE_READLINK,
so it seems sufficient to handle those cases and then PG_RETURN_NULL
(or maybe throw error) when not HAVE_READLINK.

            regards, tom lane

Re: [HACKERS] Moving tablespaces

From
Magnus Hagander
Date:
On Tue, Dec 6, 2011 at 16:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> AFAICT, it should be as simple as the attached.
>
> Oh, one other thought is that the function body has to be
> conditionalized on HAVE_READLINK (the fact that you forgot that
> somewhere else isn't an excuse for not doing it here).  IIRC,
> only the two built-in tablespaces can exist when not HAVE_READLINK,
> so it seems sufficient to handle those cases and then PG_RETURN_NULL
> (or maybe throw error) when not HAVE_READLINK.

Hmm. good point.

Throwing an error seems a lot more safe in this case than just
returning NULL. Since it's a situtation that really shouldn't happen.
Maybe an assert, but I think a regular ereport(ERROR) would be the
best.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Moving tablespaces

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> Throwing an error seems a lot more safe in this case than just
> returning NULL. Since it's a situtation that really shouldn't happen.
> Maybe an assert, but I think a regular ereport(ERROR) would be the
> best.

Not an assert, since it's trivially user-triggerable.

            regards, tom lane

Re: [HACKERS] Moving tablespaces

From
Magnus Hagander
Date:
On Tue, Dec 6, 2011 at 16:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> Throwing an error seems a lot more safe in this case than just
>> returning NULL. Since it's a situtation that really shouldn't happen.
>> Maybe an assert, but I think a regular ereport(ERROR) would be the
>> best.
>
> Not an assert, since it's trivially user-triggerable.

Right.

There is some nice precedent in the CREATE TABLESPACE command (though
dependent on HAVE_SYMLINK and not HAVE_READLINK), so I'm just going to
copy the error message from there.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Moving tablespaces

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> There is some nice precedent in the CREATE TABLESPACE command (though
> dependent on HAVE_SYMLINK and not HAVE_READLINK), so I'm just going to
> copy the error message from there.

Fair enough.

Looking at the existing readlink use in port/exec.c, it strikes me that
another thing you'd better do is include a check for buffer overrun,
ie the test needs to be more like

        rllen = readlink(fname, link_buf, sizeof(link_buf));
        if (rllen < 0 || rllen >= sizeof(link_buf))
            ... fail ...

Also, you're assuming that the result is already null-terminated,
which is incorrect.

            regards, tom lane

Re: [HACKERS] Moving tablespaces

From
Magnus Hagander
Date:
On Tue, Dec 6, 2011 at 17:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> There is some nice precedent in the CREATE TABLESPACE command (though
>> dependent on HAVE_SYMLINK and not HAVE_READLINK), so I'm just going to
>> copy the error message from there.
>
> Fair enough.
>
> Looking at the existing readlink use in port/exec.c, it strikes me that
> another thing you'd better do is include a check for buffer overrun,
> ie the test needs to be more like
>
>                rllen = readlink(fname, link_buf, sizeof(link_buf));
>                if (rllen < 0 || rllen >= sizeof(link_buf))
>                        ... fail ...

Seems reasonable, yeah. I'll go put a similar check in the
basebackup.c file as well when I'm done here.


> Also, you're assuming that the result is already null-terminated,
> which is incorrect.

No, I'm not - I'm MemSet()ing the whole buffer to 0 before I start.
But I'll change that to work the same way as the on in port/exec.c,
for consistency.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Moving tablespaces

From
Magnus Hagander
Date:
On Wed, Dec 7, 2011 at 10:05, Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Dec 6, 2011 at 17:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Magnus Hagander <magnus@hagander.net> writes:
>>> There is some nice precedent in the CREATE TABLESPACE command (though
>>> dependent on HAVE_SYMLINK and not HAVE_READLINK), so I'm just going to
>>> copy the error message from there.
>>
>> Fair enough.
>>
>> Looking at the existing readlink use in port/exec.c, it strikes me that
>> another thing you'd better do is include a check for buffer overrun,
>> ie the test needs to be more like
>>
>>                rllen = readlink(fname, link_buf, sizeof(link_buf));
>>                if (rllen < 0 || rllen >= sizeof(link_buf))
>>                        ... fail ...
>
> Seems reasonable, yeah. I'll go put a similar check in the
> basebackup.c file as well when I'm done here.

To close this thread (hopefully): Fixed and applied.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/