Thread: pg_rewind fails on Windows where tablespaces are used
pg_rewind: servers diverged at WAL location 39B/7EC6F60 on timeline 2
pg_rewind: rewinding from last common checkpoint at 39B/7E8E3F8 on timeline 2
pg_rewind: error: file "pg_tblspc/34244696" is of different type in source and target
On Wed, May 08, 2024 at 03:02:21PM +0700, Chris Travers wrote: > Setup is PostgreSQL on Windows with a tablespace on a separate drive. When > I go to run pg_rewind it consistently fails with the following error: (Chris has poked me regarding this issue last week in Vancouver.) > pg_rewind: servers diverged at WAL location 39B/7EC6F60 on timeline 2 > pg_rewind: rewinding from last common checkpoint at 39B/7E8E3F8 on timeline > 2 > pg_rewind: error: file "pg_tblspc/34244696" is of different type in source > and target > > The file is confirmed to be a JUNCTION to the correct location on both the > source and target. So the error looks like a problem interacting with > Windows and detecting JUNCTION types in this case. I am not completely sure to follow here. Aren't you making use of an in-place tablespace here? Could you provide more details about the structure of the data folders, because these are on separate hosts, right? When rewinding from a live server, readlink() returns an absolute path for a junction point, meaning that the result would not be influenced by bf227926d22b as we would always handle such an entry with FILE_TYPE_SYMLINK. On Windows, the link creation would be covered by pgsymlink(), which would create the link as a junction point. Note that I do not object to a backpatch of bf227926d22b, as I did not do it for the sake of caution as in-place tablespaces are a developer feature. If you use it for tests of your own on stable branches, well, why not. > I came across the following which looks like it would fix this problem but > don't have a proper build environment. Please consider backporting the fix > at least as far as Postgres 15 as this bug fix does apply to non-in-place > tablespaces on Windows. The thread is > https://postgrespro.com/list/thread-id/2657122 I'd suggest to use the postgresql.org reference. This refers to commit bf227926d22b, for the following thread: https://www.postgresql.org/message-id/2b79d2a8-b2d5-4bd7-a15b-31e485100980.xiyuan.zr@alibaba-inc.com Thanks, -- Michael
Attachment
On Wed, May 08, 2024 at 03:02:21PM +0700, Chris Travers wrote:Setup is PostgreSQL on Windows with a tablespace on a separate drive. When I go to run pg_rewind it consistently fails with the following error:(Chris has poked me regarding this issue last week in Vancouver.)pg_rewind: servers diverged at WAL location 39B/7EC6F60 on timeline 2 pg_rewind: rewinding from last common checkpoint at 39B/7E8E3F8 on timeline 2 pg_rewind: error: file "pg_tblspc/34244696" is of different type in source and target The file is confirmed to be a JUNCTION to the correct location on both the source and target. So the error looks like a problem interacting with Windows and detecting JUNCTION types in this case.
An EDB customer has encountered the same issue. They are not using in-place tablespaces.
I have reproduced this problem on release 15, not using an in-place tablespace.
The solution I came up with was to backpatch commits c5cb8f3b, 387803d8 and 5fc88c5d53.
I don't think we need to do anything relating to in-place tablespaces. These are documented as a developer only option and not for production.
The only question in my mind is whether those patches should be backpatched. It's a couple of hundred lines, and I think it's safe, but I'd welcome other opinions. If we are going to backpatch them we should also look at adding to adding tests for use of tablespaces with pg_rewind on the back branches. Ideally we'd get this done in time for the next maintenance release.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On Tue, Jul 09, 2024 at 12:01:17PM -0400, Andrew Dunstan wrote: > The solution I came up with was to backpatch commits c5cb8f3b, 387803d8 and > 5fc88c5d53. The lstat() wrapper for Windows, noted. > I don't think we need to do anything relating to in-place tablespaces. These > are documented as a developer only option and not for production. Okay, cool. > The only question in my mind is whether those patches should be > backpatched. > > It's a couple of hundred lines, and I think it's safe, but I'd welcome other > opinions. If we are going to backpatch them we should also look at adding to > adding tests for use of tablespaces with pg_rewind on the back branches. > Ideally we'd get this done in time for the next maintenance release. Seeing that the commits all go down to v16, meaning that these have brewed across 3 minor releases already, I'd like to assume that we would have already heard about problems related to them. So that seems like a rather safe thing to do at this stage. -- Michael
Attachment
On Tue, Jul 09, 2024 at 12:01:17PM -0400, Andrew Dunstan wrote:
> The solution I came up with was to backpatch commits c5cb8f3b, 387803d8 and
> 5fc88c5d53.
The lstat() wrapper for Windows, noted.
> I don't think we need to do anything relating to in-place tablespaces. These
> are documented as a developer only option and not for production.
Okay, cool.
> The only question in my mind is whether those patches should be
> backpatched.
>
> It's a couple of hundred lines, and I think it's safe, but I'd welcome other
> opinions. If we are going to backpatch them we should also look at adding to
> adding tests for use of tablespaces with pg_rewind on the back branches.
> Ideally we'd get this done in time for the next maintenance release.
Seeing that the commits all go down to v16, meaning that these have
brewed across 3 minor releases already, I'd like to assume that we
would have already heard about problems related to them. So that
seems like a rather safe thing to do at this stage.
Michael
On 2024-10-23 We 7:03 PM, Michael Paquier wrote: > On Wed, Oct 23, 2024 at 11:19:14AM -0500, Alexandra Wang wrote: >> I encountered this issue while working on the fix for branch 14 and >> running the tablespace regress test. This simple test is not covered >> in branch 15’s regress tests, as we started setting >> allow_in_place_tablespaces = true since commit d6d317db. > Yes, for the reasons stated in this commit because we rely on > everything to be on the same host, and tablespace paths would overlap > across the primary and its replica. > >> I also had to backpatch additional commits for branches 12 to 14, as >> follows: >> >> branch 14: e2f0f8ed, af9e6331, and the commits for branch 15 >> [f357233c, c5cb8f3b, 387803d8, and 5fc88c5d53]. >> branches 12 & 13: bed90759, 54fb8c7d, de8feb1f, 101c37cd, and the >> commits for branch 14. >> >> With these additional commits for branches 12 to 14, I’m not sure if >> it’s worth backpatching, or should we backpatch only to branch 15? > 12 is going to be EOL in a couple of days, so I'd rather leave it out. > If it were down to me, I'd also leave 13 and 14 as well, based on > e2f0f8ed25 to let the beast sleep there. Perhaps others have a > different opinion. though. Well, it seems like it's clearly a bug. I'm never happy leaving bugs unfixed. As for 12, what's the point of putting out one last release if it's not to fix bugs? EDB's customer will probably be happy if we just fix 15, but I would rather take a broader view and fix it for other possible users too. I'm traveling for a few days but it was my intention to work on these when I am back. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Mon, Oct 28, 2024 at 03:00:12PM -0400, Andrew Dunstan wrote:
> Well, it seems like it's clearly a bug. I'm never happy leaving bugs
> unfixed. As for 12, what's the point of putting out one last release if it's
> not to fix bugs?
There is always a risk of breaking something that worked previously,
and we would be out of options to address these once the branch is
EOL'd. The risk/reward ratio for v12 is really different, so I'd
advise some caution particularly with this area of the code.
> EDB's customer will probably be happy if we just fix 15, but I would rather
> take a broader view and fix it for other possible users too.
>
> I'm traveling for a few days but it was my intention to work on these when I
> am back.
It might be worth checking if 4517358e and f71007fb should be back-patched too. There was a brief discussion[1], but no one with Windows-testing capabilities was around and it didn't seem too serious, and then there was the whole re-wrap and it seemed best to keep out of the way of that. But now that the coast is clear... [1] https://www.postgresql.org/message-id/CAD5tBcKnE3C1hycBYZYtYpNssQR_e%2Bu2%3DCmDhGRFvDMEg3onRg%40mail.gmail.com
On 2024-11-26 Tu 8:58 PM, Thomas Munro wrote: > It might be worth checking if 4517358e and f71007fb should be > back-patched too. There was a brief discussion[1], but no one with > Windows-testing capabilities was around and it didn't seem too > serious, and then there was the whole re-wrap and it seemed best to > keep out of the way of that. But now that the coast is clear... > > [1] https://www.postgresql.org/message-id/CAD5tBcKnE3C1hycBYZYtYpNssQR_e%2Bu2%3DCmDhGRFvDMEg3onRg%40mail.gmail.com Yes, it's on my TODO list. Have been waiting for a) the releases to settle down b) getting availability of Windows resources again and c) being back at my home location. All three are now met, so will work on it. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2024-11-27 We 7:52 AM, Andrew Dunstan wrote: > > On 2024-11-26 Tu 8:58 PM, Thomas Munro wrote: >> It might be worth checking if 4517358e and f71007fb should be >> back-patched too. There was a brief discussion[1], but no one with >> Windows-testing capabilities was around and it didn't seem too >> serious, and then there was the whole re-wrap and it seemed best to >> keep out of the way of that. But now that the coast is clear... >> >> [1] >> https://www.postgresql.org/message-id/CAD5tBcKnE3C1hycBYZYtYpNssQR_e%2Bu2%3DCmDhGRFvDMEg3onRg%40mail.gmail.com > > > > Yes, it's on my TODO list. Have been waiting for a) the releases to > settle down b) getting availability of Windows resources again and c) > being back at my home location. All three are now met, so will work on > it. > > > Those patches didn't actually include any tests. I guess the best test would be to create a chain of several junction points and then run initdb on the leaf of the chain? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, Jan 9, 2025 at 3:45 AM Andrew Dunstan <andrew@dunslane.net> wrote: > Those patches didn't actually include any tests. I guess the best test > would be to create a chain of several junction points and then run > initdb on the leaf of the chain? Yeah I think the three interesting cases were initdb when run under junctions like these: 1. Volume GUID format: mklink /J foo \\?\Volume{12341234-1234...}, expected to break without patch 2. Chain: mklink /J C:\\aaa1 C:\\aaa2, mkdir /J C:\\aaa2 c:\\aaa3, expected to break without patch 3. Chain of length > 8, expected to fail with ELOOP once the patch is applied. (Syntax may be off, I just googled it but don't have Windows to try). The way to get decent tests for this stuff and all the rest of the wrappers would probably be to develop this test suite further: https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com