Thread: pg_rewind fails on Windows where tablespaces are used

pg_rewind fails on Windows where tablespaces are used

From
Chris Travers
Date:
Hi,

Setup is PostgreSQL on Windows with a tablespace on a separate drive.  When I go to run pg_rewind it consistently fails with the following error:

pg_rewind: servers diverged at WAL location 39B/7EC6F60 on timeline 2
pg_rewind: rewinding from last common checkpoint at 39B/7E8E3F8 on timeline 2
pg_rewind: error: file "pg_tblspc/34244696" is of different type in source and target

The file is confirmed to be a JUNCTION to the correct location on both the source and target.  So the error looks like a problem interacting with Windows and detecting JUNCTION types in this case.

I came across the following which looks like it would fix this problem but don't have a proper build environment.  Please consider backporting the fix at least as far as Postgres 15 as this bug fix does apply to non-in-place tablespaces on Windows.  The thread is https://postgrespro.com/list/thread-id/2657122

Best Regards,
Chris Travers

Re: pg_rewind fails on Windows where tablespaces are used

From
Michael Paquier
Date:
On Wed, May 08, 2024 at 03:02:21PM +0700, Chris Travers wrote:
> Setup is PostgreSQL on Windows with a tablespace on a separate drive.  When
> I go to run pg_rewind it consistently fails with the following error:

(Chris has poked me regarding this issue last week in Vancouver.)

> pg_rewind: servers diverged at WAL location 39B/7EC6F60 on timeline 2
> pg_rewind: rewinding from last common checkpoint at 39B/7E8E3F8 on timeline
> 2
> pg_rewind: error: file "pg_tblspc/34244696" is of different type in source
> and target
>
> The file is confirmed to be a JUNCTION to the correct location on both the
> source and target.  So the error looks like a problem interacting with
> Windows and detecting JUNCTION types in this case.

I am not completely sure to follow here.  Aren't you making use of an
in-place tablespace here?  Could you provide more details about the
structure of the data folders, because these are on separate hosts,
right?  When rewinding from a live server, readlink() returns an
absolute path for a junction point, meaning that the result would not
be influenced by bf227926d22b as we would always handle such an entry
with FILE_TYPE_SYMLINK.  On Windows, the link creation would be
covered by pgsymlink(), which would create the link as a junction
point.

Note that I do not object to a backpatch of bf227926d22b, as I did not
do it for the sake of caution as in-place tablespaces are a developer
feature.  If you use it for tests of your own on stable branches,
well, why not.

> I came across the following which looks like it would fix this problem but
> don't have a proper build environment.  Please consider backporting the fix
> at least as far as Postgres 15 as this bug fix does apply to non-in-place
> tablespaces on Windows.  The thread is
> https://postgrespro.com/list/thread-id/2657122

I'd suggest to use the postgresql.org reference.  This refers to
commit bf227926d22b, for the following thread:
https://www.postgresql.org/message-id/2b79d2a8-b2d5-4bd7-a15b-31e485100980.xiyuan.zr@alibaba-inc.com

Thanks,
--
Michael

Attachment

Re: pg_rewind fails on Windows where tablespaces are used

From
Andrew Dunstan
Date:


On 2024-06-04 Tu 12:53 AM, Michael Paquier wrote:
On Wed, May 08, 2024 at 03:02:21PM +0700, Chris Travers wrote:
Setup is PostgreSQL on Windows with a tablespace on a separate drive.  When
I go to run pg_rewind it consistently fails with the following error:
(Chris has poked me regarding this issue last week in Vancouver.)

pg_rewind: servers diverged at WAL location 39B/7EC6F60 on timeline 2
pg_rewind: rewinding from last common checkpoint at 39B/7E8E3F8 on timeline
2
pg_rewind: error: file "pg_tblspc/34244696" is of different type in source
and target

The file is confirmed to be a JUNCTION to the correct location on both the
source and target.  So the error looks like a problem interacting with
Windows and detecting JUNCTION types in this case.


An EDB customer has encountered the same issue. They are not using in-place tablespaces.

I have reproduced this problem on release 15, not using an in-place tablespace.

The solution I came up with was to backpatch commits c5cb8f3b, 387803d8 and 5fc88c5d53.

I don't think we need to do anything relating to in-place tablespaces. These are documented as a developer only option and not for production.

The only question in my mind is whether those patches should be backpatched. It's a couple of hundred lines, and I think it's safe, but I'd welcome other opinions. If we are going to backpatch them we should also look at adding to adding tests for use of tablespaces with pg_rewind on the back branches. Ideally we'd get this done in time for the next maintenance release.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: pg_rewind fails on Windows where tablespaces are used

From
Michael Paquier
Date:
On Tue, Jul 09, 2024 at 12:01:17PM -0400, Andrew Dunstan wrote:
> The solution I came up with was to backpatch commits c5cb8f3b, 387803d8 and
> 5fc88c5d53.

The lstat() wrapper for Windows, noted.

> I don't think we need to do anything relating to in-place tablespaces. These
> are documented as a developer only option and not for production.

Okay, cool.

> The only question in my mind is whether those patches should be
> backpatched.
>
> It's a couple of hundred lines, and I think it's safe, but I'd welcome other
> opinions. If we are going to backpatch them we should also look at adding to
> adding tests for use of tablespaces with pg_rewind on the back branches.
> Ideally we'd get this done in time for the next maintenance release.

Seeing that the commits all go down to v16, meaning that these have
brewed across 3 minor releases already, I'd like to assume that we
would have already heard about problems related to them.  So that
seems like a rather safe thing to do at this stage.
--
Michael

Attachment

Re: pg_rewind fails on Windows where tablespaces are used

From
Chris Travers
Date:

Sorry for the late reply.  I had apparently had it buried.

On Wed, Jul 10, 2024 at 6:12 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Jul 09, 2024 at 12:01:17PM -0400, Andrew Dunstan wrote:
> The solution I came up with was to backpatch commits c5cb8f3b, 387803d8 and
> 5fc88c5d53.

The lstat() wrapper for Windows, noted.

> I don't think we need to do anything relating to in-place tablespaces. These
> are documented as a developer only option and not for production.

Okay, cool.


Yeah in place tablespaces are not used.  They did have to be briefly enbaled due to another issue probably with the same wrapper but they were never used.  They were disabled again shortly after.


 
> The only question in my mind is whether those patches should be
> backpatched.
>
> It's a couple of hundred lines, and I think it's safe, but I'd welcome other
> opinions. If we are going to backpatch them we should also look at adding to
> adding tests for use of tablespaces with pg_rewind on the back branches.
> Ideally we'd get this done in time for the next maintenance release.

Seeing that the commits all go down to v16, meaning that these have
brewed across 3 minor releases already, I'd like to assume that we
would have already heard about problems related to them.  So that
seems like a rather safe thing to do at this stage.

Just as some added context, I have noticed that manually moving a tablespace and creating the junction with mklink /d sometimes causes PostgreSQL to decide this must be an in place tablespace even though dir clearly shows it as a junction.  I don't have the resources to determine if this is limited to some builds of Windows or patch levels but the problem goes away after a pg_basebackup rebuild so I don;t think it is something that is so urgent,  So this is more of a note that there seem to be some issues in this area on Windows at least in 15.  I don't know if that affects discussion but it is worth noting.

Michael

Re: pg_rewind fails on Windows where tablespaces are used

From
Andrew Dunstan
Date:
On 2024-10-23 We 7:03 PM, Michael Paquier wrote:
> On Wed, Oct 23, 2024 at 11:19:14AM -0500, Alexandra Wang wrote:
>> I encountered this issue while working on the fix for branch 14 and
>> running the tablespace regress test. This simple test is not covered
>> in branch 15’s regress tests, as we started setting
>> allow_in_place_tablespaces = true since commit d6d317db.
> Yes, for the reasons stated in this commit because we rely on
> everything to be on the same host, and tablespace paths would overlap
> across the primary and its replica.
>
>> I also had to backpatch additional commits for branches 12 to 14, as
>> follows:
>>
>> branch 14: e2f0f8ed, af9e6331, and the commits for branch 15
>> [f357233c, c5cb8f3b, 387803d8, and 5fc88c5d53].
>> branches 12 & 13: bed90759, 54fb8c7d, de8feb1f, 101c37cd, and the
>> commits for branch 14.
>>
>> With these additional commits for branches 12 to 14, I’m not sure if
>> it’s worth backpatching, or should we backpatch only to branch 15?
> 12 is going to be EOL in a couple of days, so I'd rather leave it out.
> If it were down to me, I'd also leave 13 and 14 as well, based on
> e2f0f8ed25 to let the beast sleep there.  Perhaps others have a
> different opinion. though.


Well, it seems like it's clearly a bug. I'm never happy leaving bugs 
unfixed. As for 12, what's the point of putting out one last release if 
it's not to fix bugs?


EDB's customer will probably be happy if we just fix 15, but I would 
rather take a broader view and fix it for other possible users too.


I'm traveling for a few days but it was my intention to work on these 
when I am back.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pg_rewind fails on Windows where tablespaces are used

From
Andrew Dunstan
Date:


On Tue, Oct 29, 2024 at 9:48 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Oct 28, 2024 at 03:00:12PM -0400, Andrew Dunstan wrote:
> Well, it seems like it's clearly a bug. I'm never happy leaving bugs
> unfixed. As for 12, what's the point of putting out one last release if it's
> not to fix bugs?

There is always a risk of breaking something that worked previously,
and we would be out of options to address these once the branch is
EOL'd.  The risk/reward ratio for v12 is really different, so I'd
advise some caution particularly with this area of the code.

> EDB's customer will probably be happy if we just fix 15, but I would rather
> take a broader view and fix it for other possible users too.
>
> I'm traveling for a few days but it was my intention to work on these when I
> am back.




I didn't push the fixes for release 12, like you requested, but I pushed the rest.

Unfortunately for various reasons I don't currently have a Windows test instance, but these were previously tested by Alexandra and me so I'm pretty confident they will be ok.

When I get my hands on a Windows machine again I will work on adding some Windows tests for pg_reqind with tablespaces.

cheers

andrew

Re: pg_rewind fails on Windows where tablespaces are used

From
Thomas Munro
Date:
It might be worth checking if 4517358e and f71007fb should be
back-patched too.  There was a brief discussion[1], but no one with
Windows-testing capabilities was around and it didn't seem too
serious, and then there was the whole re-wrap and it seemed best to
keep out of the way of that.  But now that the coast is clear...

[1] https://www.postgresql.org/message-id/CAD5tBcKnE3C1hycBYZYtYpNssQR_e%2Bu2%3DCmDhGRFvDMEg3onRg%40mail.gmail.com



Re: pg_rewind fails on Windows where tablespaces are used

From
Andrew Dunstan
Date:
On 2024-11-26 Tu 8:58 PM, Thomas Munro wrote:
> It might be worth checking if 4517358e and f71007fb should be
> back-patched too.  There was a brief discussion[1], but no one with
> Windows-testing capabilities was around and it didn't seem too
> serious, and then there was the whole re-wrap and it seemed best to
> keep out of the way of that.  But now that the coast is clear...
>
> [1] https://www.postgresql.org/message-id/CAD5tBcKnE3C1hycBYZYtYpNssQR_e%2Bu2%3DCmDhGRFvDMEg3onRg%40mail.gmail.com



Yes, it's on my TODO list. Have been waiting for a) the releases to 
settle down b) getting availability of Windows resources again and c) 
being back at my home location. All three are now met, so will work on it.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pg_rewind fails on Windows where tablespaces are used

From
Andrew Dunstan
Date:
On 2024-11-27 We 7:52 AM, Andrew Dunstan wrote:
>
> On 2024-11-26 Tu 8:58 PM, Thomas Munro wrote:
>> It might be worth checking if 4517358e and f71007fb should be
>> back-patched too.  There was a brief discussion[1], but no one with
>> Windows-testing capabilities was around and it didn't seem too
>> serious, and then there was the whole re-wrap and it seemed best to
>> keep out of the way of that.  But now that the coast is clear...
>>
>> [1] 
>> https://www.postgresql.org/message-id/CAD5tBcKnE3C1hycBYZYtYpNssQR_e%2Bu2%3DCmDhGRFvDMEg3onRg%40mail.gmail.com
>
>
>
> Yes, it's on my TODO list. Have been waiting for a) the releases to 
> settle down b) getting availability of Windows resources again and c) 
> being back at my home location. All three are now met, so will work on 
> it.
>
>
>

Those patches didn't actually include any tests. I guess the best test 
would be to create a chain of several junction points and then run 
initdb on the leaf of the chain?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pg_rewind fails on Windows where tablespaces are used

From
Thomas Munro
Date:
On Thu, Jan 9, 2025 at 3:45 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> Those patches didn't actually include any tests. I guess the best test
> would be to create a chain of several junction points and then run
> initdb on the leaf of the chain?

Yeah I think the three interesting cases were initdb when run under
junctions like these:

1.  Volume GUID format: mklink /J foo \\?\Volume{12341234-1234...},
expected to break without patch
2.  Chain: mklink /J C:\\aaa1 C:\\aaa2, mkdir /J C:\\aaa2 c:\\aaa3,
expected to break without patch
3.  Chain of length > 8, expected to fail with ELOOP once the patch is applied.

(Syntax may be off, I just googled it but don't have Windows to try).

The way to get decent tests for this stuff and all the rest of the
wrappers would probably be to develop this test suite further:

https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com