Thread: allow_in_place_tablespaces vs. pg_basebackup

allow_in_place_tablespaces vs. pg_basebackup

From
Robert Haas
Date:
Commit 7170f2159fb21b62c263acd458d781e2f3c3f8bb, which introduced
in-place tablespaces, didn't make any adjustments to pg_basebackup.
The resulting behavior is pretty bizarre.

If you take a plain-format backup using pg_basebackup -Fp, then the
files in the in-place tablespace are backed up, but if you take a
tar-format backup using pg_basebackup -Ft, then they aren't.

I had at first wondered how this could even be possible, since after
all a plain format backup is little more than a tar-format backup that
pg_basebackup chooses to extract for you. The answer turns out to be
that a tar-format backup activates the server's TABLESPACE_MAP option,
and a plain-format backup doesn't, and so pg_basebackup can handle
this case differently depending on the value of that flag, and does.
Even for a plain-format backup, the server's behavior is not really
correct, because I think what's happening is that the tablespace files
are getting included in the base.tar file generated by the server,
rather than in a separate ${TSOID}.tar file as would normally happen
for a user-defined tablespace, but because the tar files get extracted
before the user lays eyes on them, the user-visible consequences are
masked, at least in the cases I've tested so far.

I'm not sure how messy it's going to be to clean this up. I think each
tablespace really needs to go into a separate ${TSOID}.tar file,
because we've got tons of code both on the server side and in
pg_basebackup that assumes that's how things work, and having in-place
tablespaces be a rare exception to that rule seems really unappealing.
This of course also implies that files in an in-place tablespace must
not go missing from the backup altogether.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: allow_in_place_tablespaces vs. pg_basebackup

From
Kyotaro Horiguchi
Date:
At Wed, 8 Mar 2023 16:30:05 -0500, Robert Haas <robertmhaas@gmail.com> wrote in 
> Commit 7170f2159fb21b62c263acd458d781e2f3c3f8bb, which introduced
> in-place tablespaces, didn't make any adjustments to pg_basebackup.
> The resulting behavior is pretty bizarre.
> 
> If you take a plain-format backup using pg_basebackup -Fp, then the
> files in the in-place tablespace are backed up, but if you take a
> tar-format backup using pg_basebackup -Ft, then they aren't.
>
> I had at first wondered how this could even be possible, since after
> all a plain format backup is little more than a tar-format backup that
> pg_basebackup chooses to extract for you. The answer turns out to be
> that a tar-format backup activates the server's TABLESPACE_MAP option,
> and a plain-format backup doesn't, and so pg_basebackup can handle
> this case differently depending on the value of that flag, and does.
> Even for a plain-format backup, the server's behavior is not really
> correct, because I think what's happening is that the tablespace files
> are getting included in the base.tar file generated by the server,
> rather than in a separate ${TSOID}.tar file as would normally happen
> for a user-defined tablespace, but because the tar files get extracted
> before the user lays eyes on them, the user-visible consequences are
> masked, at least in the cases I've tested so far.

In my understading, in-place tablespaces are a feature for testing
purpose only. Therefore, the feature was intentionally designed to
have minimal impact on the code and to provide minimum-necessary
behavior for that purpose. I believe it is reasonable to make
basebackup error-out when it encounters an in-place tablespace
directory when TABLESPACE_MAP is activated.

> I'm not sure how messy it's going to be to clean this up. I think each
> tablespace really needs to go into a separate ${TSOID}.tar file,
> because we've got tons of code both on the server side and in
> pg_basebackup that assumes that's how things work, and having in-place
> tablespaces be a rare exception to that rule seems really unappealing.
> This of course also implies that files in an in-place tablespace must
> not go missing from the backup altogether.

The doc clearly describes the purpose of in-place tablespaces and the
potential confusion they can cause for backup tools not excluding
pg_basebackup. Does this not provide sufficient information?

https://www.postgresql.org/docs/devel/runtime-config-developer.html

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: allow_in_place_tablespaces vs. pg_basebackup

From
Thomas Munro
Date:
On Thu, Mar 9, 2023 at 2:58 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> At Wed, 8 Mar 2023 16:30:05 -0500, Robert Haas <robertmhaas@gmail.com> wrote in
> > I'm not sure how messy it's going to be to clean this up. I think each
> > tablespace really needs to go into a separate ${TSOID}.tar file,
> > because we've got tons of code both on the server side and in
> > pg_basebackup that assumes that's how things work, and having in-place
> > tablespaces be a rare exception to that rule seems really unappealing.
> > This of course also implies that files in an in-place tablespace must
> > not go missing from the backup altogether.
>
> The doc clearly describes the purpose of in-place tablespaces and the
> potential confusion they can cause for backup tools not excluding
> pg_basebackup. Does this not provide sufficient information?
>
> https://www.postgresql.org/docs/devel/runtime-config-developer.html

Yeah.  We knew that this didn't work (was discussed in a couple of
other threads), but you might be right that an error would be better
for now.  It's absolutely not a user facing mode of operation, it was
intended just for the replication tests.  Clearly it might be useful
for testing purposes in the backup area too, which is probably how
Robert got here.  I will think about what changes would be needed as I
am looking at backup code currently, but I'm not sure when I'll be
able to post a patch so don't let that stop anyone else who sees how
to get it working...



Re: allow_in_place_tablespaces vs. pg_basebackup

From
Kyotaro Horiguchi
Date:
At Thu, 09 Mar 2023 10:58:41 +0900 (JST), I wrote
> behavior for that purpose. I believe it is reasonable to make
> basebackup error-out when it encounters an in-place tablespace
> directory when TABLESPACE_MAP is activated.

It turned out to be not as simple as I thought, though...

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment

Re: allow_in_place_tablespaces vs. pg_basebackup

From
Kyotaro Horiguchi
Date:
At Thu, 09 Mar 2023 11:53:26 +0900 (JST), I wrote
> It turned out to be not as simple as I thought, though...

The error message and the location where the error condition is
checked don't match, but making the messages more generic may not be
helpful for users..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: allow_in_place_tablespaces vs. pg_basebackup

From
Robert Haas
Date:
On Wed, Mar 8, 2023 at 9:52 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Yeah.  We knew that this didn't work (was discussed in a couple of
> other threads), but you might be right that an error would be better
> for now.  It's absolutely not a user facing mode of operation, it was
> intended just for the replication tests.  Clearly it might be useful
> for testing purposes in the backup area too, which is probably how
> Robert got here.  I will think about what changes would be needed as I
> am looking at backup code currently, but I'm not sure when I'll be
> able to post a patch so don't let that stop anyone else who sees how
> to get it working...

If there had been an error message like "ERROR: pg_basebackup cannot
back up a database that contains in-place tablespaces," it would have
saved me a lot of time yesterday. If there had at least been a
documentation mention, I would have found it eventually (but not
nearly as quickly). As it is, the only reference to this topic in the
repository seems to be c6f2f01611d4f2c412e92eb7893f76fa590818e8, "Fix
pg_basebackup with in-place tablespaces," which makes it look like
this is supposed to be working.

I also think that if we're going to have in-place tablespaces, it's a
good idea for them to work with pg_basebackup. I wasn't really looking
to test pg_basebackup with this feature (although it's a good idea); I
was just trying to make sure I didn't break in-place tablespaces while
working on something else. I think it's sometimes OK to add stuff for
testing that doesn't work with absolutely everything we have in the
system, but the tablespace code is awfully closely related to the
pg_basebackup code for them to just not work together at all.

Now that I'm done grumbling, here's a patch.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

Re: allow_in_place_tablespaces vs. pg_basebackup

From
Robert Haas
Date:
On Thu, Mar 9, 2023 at 4:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Now that I'm done grumbling, here's a patch.

Anyone want to comment on this?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: allow_in_place_tablespaces vs. pg_basebackup

From
Michael Paquier
Date:
On Mon, Mar 20, 2023 at 07:56:42AM -0400, Robert Haas wrote:
> Anyone want to comment on this?

I have not checked the patch in details, but perhaps this needs at
least one test?
--
Michael

Attachment

Re: allow_in_place_tablespaces vs. pg_basebackup

From
Robert Haas
Date:
On Tue, Mar 21, 2023 at 7:59 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Mon, Mar 20, 2023 at 07:56:42AM -0400, Robert Haas wrote:
> > Anyone want to comment on this?
>
> I have not checked the patch in details, but perhaps this needs at
> least one test?

Sure. I was sort of hoping to get feedback on the overall plan first,
but it's easy enough to add a test so I've done that in the attached
version. I've put it in a separate file because 010_pg_basebackup.pl
is pushing a thousand lines and it's got a ton of unrelated tests in
there already. The test included here fails on master, but passes with
the patch.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

Re: allow_in_place_tablespaces vs. pg_basebackup

From
Thomas Munro
Date:
The commit message explains prettty well, and it seems to work in
simple testing, and yeah commit c6f2f016 was not a work of art.
pg_basebackeup --format=plain "worked", but your way is better.  I
guess we should add a test of -Fp too, to keep it working?  Here's one
of those.

I know it's not your patch's fault, but I wonder if this might break
something.  We have this strange beast ti->rpath (commit b168c5ef in
2014):

+                               /*
+                                * Relpath holds the relative path of
the tablespace directory
+                                * when it's located within PGDATA, or
NULL if it's located
+                                * elsewhere.
+                                */

That's pretty confusing, because relative paths have been banned since
the birth of tablespaces (commit 2467394ee15, 2004):

+       /*
+        * Allowing relative paths seems risky
+        *
+        * this also helps us ensure that location is not empty or whitespace
+        */
+       if (!is_absolute_path(location))
+               ereport(ERROR,
+                               (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                                errmsg("tablespace location must be
an absolute path")));

The discussion that produced the contradiction:
https://www.postgresql.org/message-id/flat/m2ob3vl3et.fsf%402ndQuadrant.fr

I guess what I'm wondering here is if there is a hazard where we
confuse these outlawed tablespaces that supposedly roam the plains
somewhere in your code that is assuming that "relative implies
in-place".  Or if not now, maybe in future changes.  I wonder if these
"semi-supported-but-don't-tell-anyone" relative symlinks are worthy of
a defensive test (or is it in there somewhere already?).

Originally when trying to implement allow_in_place_tablespaces, I
instead tried allowing limited relative tablespaces, so you could use
LOCATION 'pg_tblspc/my_directory', and then it would still create a
symlink 1234 -> my_directory, which probably would have All Just
Worked™ given the existing ti->rpath stuff, and possibly made the
users that thread was about happy, but I had to give that up because
it didn't work on Windows (no relative symlinks).  Oh well.

Attachment

Re: allow_in_place_tablespaces vs. pg_basebackup

From
Thomas Munro
Date:
On Tue, Mar 28, 2023 at 5:15 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> I guess we should add a test of -Fp too, to keep it working?

Oops, that was already tested in existing tests, so I take that back.



Re: allow_in_place_tablespaces vs. pg_basebackup

From
Michael Paquier
Date:
On Tue, Mar 28, 2023 at 05:15:25PM +1300, Thomas Munro wrote:
> The commit message explains pretty well, and it seems to work in
> simple testing, and yeah commit c6f2f016 was not a work of art.
> pg_basebackeup --format=plain "worked", but your way is better.  I
> guess we should add a test of -Fp too, to keep it working?  Here's one
> of those.

The patch is larger than it is complicated.  Particularly, in xlog.c,
this is just adding one block for the PGFILETYPE_DIR case to store a
relative path.

> I know it's not your patch's fault, but I wonder if this might break
> something.  We have this strange beast ti->rpath (commit b168c5ef in
> 2014):
>
> +            /*
> +             * Relpath holds the relative path of the tablespace directory
> +             * when it's located within PGDATA, or NULL if it's located
> +             * elsewhere.
> +             */
>
> That's pretty confusing, because relative paths have been banned since
> the birth of tablespaces (commit 2467394ee15, 2004):

I think that this comes down to people manipulating pg_tblspc/ by
themselves post-creation with CREATE, because we don't track the
location anywhere post-creation and rely on the structure of
pg_tblspc/ for any future decision taken by the backend?  I recall
this specific case, where users were creating tablespaces in PGDATA
itself to make "cleaner" for them the structure in the data folder,
even if it makes no sense as the mount point is the same..  33cb8ff
added a warning about that in tablespace.c, but we could be more
aggressive and outright drop this case entirely when in-place
tablespaces are not involved.

Tablespace maps defined in pg_basebackup -T require both the old and
new locations to be absolute, so it seems shaky to me to assume that
this should always be fine in the backend..

> I guess what I'm wondering here is if there is a hazard where we
> confuse these outlawed tablespaces that supposedly roam the plains
> somewhere in your code that is assuming that "relative implies
> in-place".  Or if not now, maybe in future changes.  I wonder if these
> "semi-supported-but-don't-tell-anyone" relative symlinks are worthy of
> a defensive test (or is it in there somewhere already?).

Yeah, it makes for surprising and sometimes undocumented behaviors,
which is confusing for users and developers at the end.  I'd like to
think that we'd live happier long-term if the borders are clearer, aka
switch to more aggressive ERRORs instead of WARNINGs in some places
where the boundaries are blurry.  A good thing about in-place
tablespaces taken in isolation is that the borders of what you can do
are well-defined, which comes down to the absolute vs relative path
handling.

Looking at the patch, nothing really stands out..
--
Michael

Attachment

Re: allow_in_place_tablespaces vs. pg_basebackup

From
Robert Haas
Date:
On Tue, Mar 28, 2023 at 9:40 PM Michael Paquier <michael@paquier.xyz> wrote:
> Looking at the patch, nothing really stands out..

It doesn't seem like anyone's unhappy about this patch. I don't think
it's necessary to back-patch it, given that in-place tablespaces are
intended for developer use, not real-world use, and also given that
the patch requires changing both a bit of server-side behavior and
some client-side behavior and it seems unfriendly to create behavior
skew of that sort in minor release. However, I would like to get it
committed to master.

Do people think it's OK to do that now, or should I wait until we've
branched? I personally think this is bug-fix-ish enough that now is
OK, but I'll certainly forebear if others disagree.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: allow_in_place_tablespaces vs. pg_basebackup

From
Michael Paquier
Date:
On Fri, Apr 14, 2023 at 04:11:47PM -0400, Robert Haas wrote:
> Do people think it's OK to do that now, or should I wait until we've
> branched? I personally think this is bug-fix-ish enough that now is
> OK, but I'll certainly forebear if others disagree.

FWIW, doing that now rather than the beginning of July is OK for me
for this stuff.
--
Michael

Attachment

Re: allow_in_place_tablespaces vs. pg_basebackup

From
Robert Haas
Date:
On Mon, Apr 17, 2023 at 1:30 AM Michael Paquier <michael@paquier.xyz> wrote:
> FWIW, doing that now rather than the beginning of July is OK for me
> for this stuff.

OK, committed.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: allow_in_place_tablespaces vs. pg_basebackup

From
Michael Paquier
Date:
On Tue, Apr 18, 2023 at 11:35:41AM -0400, Robert Haas wrote:
> On Mon, Apr 17, 2023 at 1:30 AM Michael Paquier <michael@paquier.xyz> wrote:
>> FWIW, doing that now rather than the beginning of July is OK for me
>> for this stuff.
>
> OK, committed.

Thanks!
--
Michael

Attachment