Thread: [PATCH] Add native windows on arm64 support
Compilation steps are consistent and similar to other windows platforms.
The change has been tested on windows/x86_64 and windows/arm64 and all regression tests passes on both platforms.
Thanks,
Niyas
Attachment
On Wed, Feb 23, 2022 at 11:09 PM Niyas Sait <niyas.sait@linaro.org> wrote: > I have created a patch that adds support for building Postgres for the windows/arm64 platform using the MSVC toolchain.Following changes have been included > > 1. Extend MSVC scripts to handle ARM64 platform. > 2. Add arm64 definition of spin_delay function. > 3. Exclude arm_acle.h import with MSVC compiler. > > Compilation steps are consistent and similar to other windows platforms. > > The change has been tested on windows/x86_64 and windows/arm64 and all regression tests passes on both platforms. + # arm64 linker only supports dynamic base address + my $cfgrandbaseaddress = $self->{platform} eq 'ARM64' ? 'true' : 'false'; ... + <RandomizedBaseAddress>$cfgrandbaseaddress</RandomizedBaseAddress> Does that mean that you can't turn off ASLR on this arch? Does it cause random backend failures due to inability to map shared memory at the expected address? Our experience with EXEC_BACKEND on various Unix systems tells us that you have to turn off ASLR if you want 100% success rate on starting backends, but I guess it depends on the details of how the randomisation is done. Any interest in providing a build farm animal, so that we could know that this works?
On Wed, Feb 23, 2022 at 11:09 PM Niyas Sait <niyas.sait@linaro.org> wrote:
> I have created a patch that adds support for building Postgres for the windows/arm64 platform using the MSVC toolchain. Following changes have been included
>
> 1. Extend MSVC scripts to handle ARM64 platform.
> 2. Add arm64 definition of spin_delay function.
> 3. Exclude arm_acle.h import with MSVC compiler.
>
> Compilation steps are consistent and similar to other windows platforms.
>
> The change has been tested on windows/x86_64 and windows/arm64 and all regression tests passes on both platforms.
+ # arm64 linker only supports dynamic base address
+ my $cfgrandbaseaddress = $self->{platform} eq 'ARM64' ? 'true' : 'false';
...
+ <RandomizedBaseAddress>$cfgrandbaseaddress</RandomizedBaseAddress>
Does that mean that you can't turn off ASLR on this arch? Does it
cause random backend failures due to inability to map shared memory at
the expected address? Our experience with EXEC_BACKEND on various
Unix systems tells us that you have to turn off ASLR if you want 100%
success rate on starting backends, but I guess it depends on the
details of how the randomisation is done.
Any interest in providing a build farm animal, so that we could know
that this works?
Hi, Please don't top-post here. See https://wiki.postgresql.org/wiki/Mailing_Lists#Email_etiquette_mechanics. On Tue, Mar 22, 2022 at 09:37:46AM +0000, Niyas Sait wrote: > > Yes, we could look into providing a build machine. Do you have any > reference to what the CI system looks like now for PostgresSQL and how to > add new workers etc.? It's all documented at https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto.
On Tue, Mar 22, 2022 at 11:30 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > On Tue, Mar 22, 2022 at 09:37:46AM +0000, Niyas Sait wrote: > > Yes, we could look into providing a build machine. Do you have any > > reference to what the CI system looks like now for PostgresSQL and how to > > add new workers etc.? > > It's all documented at > https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto. It seems likely there will be more and more Windows/ARM users, so yeah having a machine to test that combination would be great. I wonder if ASLR isn't breaking for you by good luck only...
Hi, On 2022-03-23 16:30:58 +1300, Thomas Munro wrote: > On Tue, Mar 22, 2022 at 11:30 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, Mar 22, 2022 at 09:37:46AM +0000, Niyas Sait wrote: > > > Yes, we could look into providing a build machine. Do you have any > > > reference to what the CI system looks like now for PostgresSQL and how to > > > add new workers etc.? > > > > It's all documented at > > https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto. > > It seems likely there will be more and more Windows/ARM users, so yeah > having a machine to test that combination would be great. I wonder if > ASLR isn't breaking for you by good luck only... I think we've generally seen the ASLR issue become less prominent on windows. Whether that's because of the silent retries we added, or because just about everyone moved to 64bit windows / PG, I don't know. I'd guess both, with 64bit being the larger influence. Wonder if it's worth adding some debug logging to the retry code and stop disabling ASLR on 64bit windows... It's imo pretty crazy that we loop up to 100 times in internal_forkexec() around CreateProcess() && pgwin32_ReserveSharedMemoryRegion() without, as far as I can see, a single debug message. I don't think we can infer too much about the failure rate on windows from the !windows EXEC_BACKEND rates. The two internal_forkexec() implementations behaves just too differently. Greetings, Andres Freund
On Thu, Mar 24, 2022 at 2:15 PM Andres Freund <andres@anarazel.de> wrote: > I think we've generally seen the ASLR issue become less prominent on > windows. Whether that's because of the silent retries we added, or because > just about everyone moved to 64bit windows / PG, I don't know. I'd guess both, > with 64bit being the larger influence. > > Wonder if it's worth adding some debug logging to the retry code and stop > disabling ASLR on 64bit windows... It's imo pretty crazy that we loop up to > 100 times in internal_forkexec() around CreateProcess() && > pgwin32_ReserveSharedMemoryRegion() without, as far as I can see, a single > debug message. Yeah. I think we should commit this patch, but decree that Windows/aarch64 support is experimental only for now. That allows a build farm animal to be set up. Then we add a bit of extra logging and see how it does running our test suite over time and learn more.
On Fri, Mar 25, 2022 at 11:38:44AM +1300, Thomas Munro wrote: > Yeah. I think we should commit this patch, but decree that > Windows/aarch64 support is experimental only for now. That allows a > build farm animal to be set up. Then we add a bit of extra logging > and see how it does running our test suite over time and learn more. I don't have such a setup so my testing capabilities are limited. Does anybody have one? I think that we could be flexible for this patch, even after feature freeze as it introduces something entirely new without impacting the existing code. The patch has been moved to the next CF for now. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Fri, Mar 25, 2022 at 11:38:44AM +1300, Thomas Munro wrote: >> Yeah. I think we should commit this patch, but decree that >> Windows/aarch64 support is experimental only for now. That allows a >> build farm animal to be set up. Then we add a bit of extra logging >> and see how it does running our test suite over time and learn more. > I don't have such a setup so my testing capabilities are limited. > Does anybody have one? I think that we could be flexible for this > patch, even after feature freeze as it introduces something entirely > new without impacting the existing code. The patch has been moved to > the next CF for now. I dunno, the lack of any in-house capability for this makes me very nervous. If it causes problems down the road, how will we debug it? So it seems like the sort of patch to put in at the beginning of a development cycle, not post-feature-freeze. regards, tom lane
> On 7 Apr 2022, at 15:40, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Michael Paquier <michael@paquier.xyz> writes: >> On Fri, Mar 25, 2022 at 11:38:44AM +1300, Thomas Munro wrote: >>> Yeah. I think we should commit this patch, but decree that >>> Windows/aarch64 support is experimental only for now. That allows a >>> build farm animal to be set up. Then we add a bit of extra logging >>> and see how it does running our test suite over time and learn more. > >> I don't have such a setup so my testing capabilities are limited. >> Does anybody have one? I think that we could be flexible for this >> patch, even after feature freeze as it introduces something entirely >> new without impacting the existing code. The patch has been moved to >> the next CF for now. > > I dunno, the lack of any in-house capability for this makes me very > nervous. If it causes problems down the road, how will we debug it? If those with an interest in such platform support cannot spare the cycles to at least run a buildfarm member, then it seems a stretch for us to maintain such support with any confidence. > So it seems like the sort of patch to put in at the beginning of a > development cycle, not post-feature-freeze. +1 -- Daniel Gustafsson https://vmware.com/
Hi, On 2022-04-07 09:40:43 -0400, Tom Lane wrote: > If it causes problems down the road, how will we debug it? If what causes problems down the road? Afaics the patch doesn't change anything outside of windows-on-arm, so it shouldn't cause any breakage we care about until we get a buildfarm animal. We've traditionally been somewhat relaxed about adding support for new platforms, on similar notions. That said: > So it seems like the sort of patch to put in at the beginning of a > development cycle, not post-feature-freeze. There doesn't seem to be a great urgency, and there's plenty stuff going on right now. I can see us merging it post branching off, and then backpatching it a bit later? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2022-04-07 09:40:43 -0400, Tom Lane wrote: >> If it causes problems down the road, how will we debug it? > If what causes problems down the road? Afaics the patch doesn't change > anything outside of windows-on-arm, so it shouldn't cause any breakage we care > about until we get a buildfarm animal. Do we have a volunteer to run a buildfarm animal? I don't see much point in thinking about this till there is one ready to go. regards, tom lane
Hi, On 2022-04-07 13:42:49 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2022-04-07 09:40:43 -0400, Tom Lane wrote: > >> If it causes problems down the road, how will we debug it? > > > If what causes problems down the road? Afaics the patch doesn't change > > anything outside of windows-on-arm, so it shouldn't cause any breakage we care > > about until we get a buildfarm animal. > > Do we have a volunteer to run a buildfarm animal? I don't see much > point in thinking about this till there is one ready to go. OP said that they might be able to: https://www.postgresql.org/message-id/CAFPTBD_NZb%3Dq_6uE-QV%2BS%2Bpm%3DRc9sBKrg8CQ_Y3dc27Awrm7cQ%40mail.gmail.com Niyas, any updates on that? Greetings, Andres Freund
Hi,
On 2022-04-07 13:42:49 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-04-07 09:40:43 -0400, Tom Lane wrote:
> >> If it causes problems down the road, how will we debug it?
>
> > If what causes problems down the road? Afaics the patch doesn't change
> > anything outside of windows-on-arm, so it shouldn't cause any breakage we care
> > about until we get a buildfarm animal.
>
> Do we have a volunteer to run a buildfarm animal? I don't see much
> point in thinking about this till there is one ready to go.
OP said that they might be able to:
https://www.postgresql.org/message-id/CAFPTBD_NZb%3Dq_6uE-QV%2BS%2Bpm%3DRc9sBKrg8CQ_Y3dc27Awrm7cQ%40mail.gmail.com
Niyas, any updates on that?
Greetings,
Andres Freund
On Tue, Apr 19, 2022 at 03:22:30PM +0100, Niyas Sait wrote: > Sorry for the delay! Configuring the scripts took some time. I have > successfully run the builfarm animal script using my git repository [1] > which contains the proposed patch on a Windows Arm64 machine. > > I made a request to add a new machine to build farm through [2]. Have you tested with the amount of coverage provided by vcregress.pl? Another thing I was wondering about is if it would be possible to have this option in Travis, but that does not seem to be the case: https://docs.travis-ci.com/user/reference/windows/#windows-version > I believe we should be able to enable the build farm machine once the > change has been merged. Better to wait for the beginning of the development cycle at this stage, based on all the replies received. That would bring us to the beginning of July. + if ($solution->{platform} eq 'ARM64') { + push(@pgportfiles, 'pg_crc32c_armv8_choose.c'); + push(@pgportfiles, 'pg_crc32c_armv8.c'); + } else { + push(@pgportfiles, 'pg_crc32c_sse42_choose.c'); + push(@pgportfiles, 'pg_crc32c_sse42.c'); + } +++ b/src/port/pg_crc32c_armv8.c +#ifndef _MSC_VER #include <arm_acle.h> +#endif [ ... ] +#ifdef _M_ARM64 + /* + * arm64 way of hinting processor for spin loops optimisations + * ref: https://community.arm.com/support-forums/f/infrastructure-solutions-forum/48654/ssetoneon-faq + */ + __isb(_ARM64_BARRIER_SY); +#else I think that such extra optimizations had better be in a separate patch, and we should focus on getting the build done first. + # arm64 linker only supports dynamic base address + my $cfgrandbaseaddress = $self->{platform} eq 'ARM64' ? 'true' : 'false'; This issue is still lying around, and you may have been lucky. Would there be any issues to remove this change to get a basic support in? As mentioned upthread, there is a long history of Postgres with ASLR. This would mean that a basic patch could be done with just the changes for gendef.pl, and the first part of the changes inMSBuildProject.pm. Would that not be enough? -- Michael
Attachment
> this option in Travis, but that does not seem to be the case:
> https://docs.travis-ci.com/user/reference/windows/#windows-version
> + push(@pgportfiles, 'pg_crc32c_armv8_choose.c');
> + push(@pgportfiles, 'pg_crc32c_armv8.c');
> + } else {
> + push(@pgportfiles, 'pg_crc32c_sse42_choose.c');
> + push(@pgportfiles, 'pg_crc32c_sse42.c');
> + }
>
> +++ b/src/port/pg_crc32c_armv8.c
> +#ifndef _MSC_VER
> #include <arm_acle.h>
> +#endif
> [ ... ]
> +#ifdef _M_ARM64
> + /*
> + * arm64 way of hinting processor for spin loops optimisations
> + * ref: https://community.arm.com/support-forums/f/infrastructure-solutions-forum/48654/ssetoneon-faq
> + */
> + __isb(_ARM64_BARRIER_SY);
> +#else
> I think that such extra optimizations had better be in a separate
> patch, and we should focus on getting the build done first.
> for gendef.pl, and the first part of the changes inMSBuildProject.pm.
> Would that not be enough?
> + my $cfgrandbaseaddress = $self->{platform} eq 'ARM64' ? 'true' : 'false';
> This issue is still lying around, and you may have been lucky. Would
> there be any issues to remove this change to get a basic support in?
> As mentioned upthread, there is a long history of Postgres with ASLR.
On Wed, Apr 20, 2022 at 10:43:06AM +0100, Niyas Sait wrote: >> This issue is still lying around, and you may have been lucky. Would >> there be any issues to remove this change to get a basic support in? >> As mentioned upthread, there is a long history of Postgres with ASLR. > > MSVC linker doesn't allow non-random base addresses for Arm64 platforms. > It is needed for basic support. Why does it not allow that? Is that just a limitation of the compiler? If yes, what is the error happening? This question is not exactly answered yet as of this thread. I may be missing a reference about that in the upstream docs, but I see nowhere an explanation about the reason and the why. That's also one of the first questions from Thomas upthread. -- Michael
Attachment
> compiler? If yes, what is the error happening? This question is not
> exactly answered yet as of this thread. I may be missing a reference
> about that in the upstream docs, but I see nowhere an explanation
> about the reason and the why. That's also one of the first questions
> from Thomas upthread.
On Wed, Apr 20, 2022 at 10:43:06AM +0100, Niyas Sait wrote:
>> This issue is still lying around, and you may have been lucky. Would
>> there be any issues to remove this change to get a basic support in?
>> As mentioned upthread, there is a long history of Postgres with ASLR.
>
> MSVC linker doesn't allow non-random base addresses for Arm64 platforms.
> It is needed for basic support.
Why does it not allow that? Is that just a limitation of the
compiler? If yes, what is the error happening? This question is not
exactly answered yet as of this thread. I may be missing a reference
about that in the upstream docs, but I see nowhere an explanation
about the reason and the why. That's also one of the first questions
from Thomas upthread.
--
Michael
On Thu, Apr 21, 2022 at 10:21:04AM +0100, Niyas Sait wrote: > The following error occurs: > > LINK : fatal error LNK1246: '/DYNAMICBASE:NO' not compatible with 'ARM64' > target machine; link without '/DYNAMICBASE:NO Okay, that's interesting. In light of things like 7f3e17b, that may be annoying. Perhaps newer Windows versions are able to handle that better. I am wondering if it would be worth re-evaluating this change, and attempt to re-enable that in environments other than arm64. This could become interesting if we consider that there have been talks to cut the support for a bunch of Windows versions to focus on the newer ones only. > This seems to be a deliberate restriction for Arm64 targets. However, no > references were provided. To clarify, I have posted a question [1] on the > community channel of Visual Studio. Thanks for doing that! Your post is just a couple of days old, let's see if we get a reply on that. -- Michael
Attachment
> see if we get a reply on that.
Because ASLR can't be disabled on ARM, ARM64, or ARM64EC architectures, /DYNAMICBASE:NO
isn't supported for these targets.
On Thu, Apr 21, 2022 at 10:21:04AM +0100, Niyas Sait wrote:
> The following error occurs:
>
> LINK : fatal error LNK1246: '/DYNAMICBASE:NO' not compatible with 'ARM64'
> target machine; link without '/DYNAMICBASE:NO
Okay, that's interesting. In light of things like 7f3e17b, that may
be annoying. Perhaps newer Windows versions are able to handle that
better. I am wondering if it would be worth re-evaluating this
change, and attempt to re-enable that in environments other than
arm64. This could become interesting if we consider that there have
been talks to cut the support for a bunch of Windows versions to focus
on the newer ones only.
> This seems to be a deliberate restriction for Arm64 targets. However, no
> references were provided. To clarify, I have posted a question [1] on the
> community channel of Visual Studio.
Thanks for doing that! Your post is just a couple of days old, let's
see if we get a reply on that.
--
Michael
On Mon, May 09, 2022 at 12:44:22PM +0100, Niyas Sait wrote: > Microsoft updated documentation [1] and clarified that ASLR cannot be > disabled for Arm64 targets. > > Because ASLR can't be disabled on ARM, ARM64, or ARM64EC architectures, > > /DYNAMICBASE:NO isn't supported for these targets. Better than nothing, I guess, though this does not stand as a reason explaining why this choice has been made. And it looks like we will never know. We are going to need more advanced testing to check that DYNAMICBASE is stable enough if we begin to enable it. Perhaps we should really look at that for all the build targets we currently support rather than just ARM, for the most modern Win32 environments if we are going to cut support for most of the oldest releases.. -- Michael
Attachment
On Mon, May 09, 2022 at 12:44:22PM +0100, Niyas Sait wrote:
> Microsoft updated documentation [1] and clarified that ASLR cannot be
> disabled for Arm64 targets.
>
> Because ASLR can't be disabled on ARM, ARM64, or ARM64EC architectures,
> > /DYNAMICBASE:NO isn't supported for these targets.
Better than nothing, I guess, though this does not stand as a reason
explaining why this choice has been made. And it looks like we will
never know. We are going to need more advanced testing to check that
DYNAMICBASE is stable enough if we begin to enable it. Perhaps we
should really look at that for all the build targets we currently
support rather than just ARM, for the most modern Win32 environments
if we are going to cut support for most of the oldest releases..
--
Michael
Linaro Limited
Attachment
Hi, On 2022-05-10 10:01:55 +0900, Michael Paquier wrote: > On Mon, May 09, 2022 at 12:44:22PM +0100, Niyas Sait wrote: > > Microsoft updated documentation [1] and clarified that ASLR cannot be > > disabled for Arm64 targets. > > > > Because ASLR can't be disabled on ARM, ARM64, or ARM64EC architectures, > > > /DYNAMICBASE:NO isn't supported for these targets. > > Better than nothing, I guess, though this does not stand as a reason > explaining why this choice has been made. And it looks like we will > never know. We are going to need more advanced testing to check that > DYNAMICBASE is stable enough if we begin to enable it. Perhaps we > should really look at that for all the build targets we currently > support rather than just ARM, for the most modern Win32 environments > if we are going to cut support for most of the oldest releases.. I accidentally did a lot of testing with DYNAMICBASE - I accidentally mistranslated MSBuildProject.pm's <RandomizedBaseAddress>false</> to adding /DYNAMICBASE in the meson port. Survived several hundred CI cycles and dozens of local runs without observing any problems related to that. Which doesn't surprise me, given the way we reserve space in the new process. Greetings, Andres Freund
On Thu, Aug 25, 2022 at 06:29:07PM -0700, Andres Freund wrote: > I accidentally did a lot of testing with DYNAMICBASE - I accidentally > mistranslated MSBuildProject.pm's <RandomizedBaseAddress>false</> to adding > /DYNAMICBASE in the meson port. Survived several hundred CI cycles and dozens > of local runs without observing any problems related to that. > > Which doesn't surprise me, given the way we reserve space in the new process. Still, we had problems with that in the past and I don't recall huge changes in the way we allocate shmem on WIN32. Could there be some stuff in Windows itself that would explain more stability? I would not mind switching back to DYNAMICBASE on HEAD seeing your results. That would simplify this thread's patch a bit as well. -- Michael
Attachment
Hi, On 2022-08-26 11:09:41 +0900, Michael Paquier wrote: > On Thu, Aug 25, 2022 at 06:29:07PM -0700, Andres Freund wrote: > > I accidentally did a lot of testing with DYNAMICBASE - I accidentally > > mistranslated MSBuildProject.pm's <RandomizedBaseAddress>false</> to adding > > /DYNAMICBASE in the meson port. Survived several hundred CI cycles and dozens > > of local runs without observing any problems related to that. > > > > Which doesn't surprise me, given the way we reserve space in the new process. > > Still, we had problems with that in the past and I don't recall huge > changes in the way we allocate shmem on WIN32. It's really not great that 7f3e17b4827 disabled randomization without even a comment as to why... There actually seems to have been a lot of work in that area. See 617dc6d299c / bcbf2346d69 and as well as the retrying in 45e004fb4e39. Those weren't prevented by us disabling ASLR. Greetings, Andres Freund
On Thu, Aug 25, 2022 at 08:29:43PM -0700, Andres Freund wrote: > It's really not great that 7f3e17b4827 disabled randomization without even a > comment as to why... This story is on this thread, with some processes not able to start: https://www.postgresql.org/message-id/BD0D89EC2438455C9DE0DC94D36912F4@maumau > There actually seems to have been a lot of work in that area. See 617dc6d299c > / bcbf2346d69 and as well as the retrying in 45e004fb4e39. Those weren't > prevented by us disabling ASLR. Indeed. 45e004f looks like the most interesting bit here. FWIW, I would not mind re-enabling that on HEAD, as of something like the attached. I have done a dozen of runs without seeing a test failure, and knowing that we don't support anything older than Win10 makes me feel safer about this change. Any objections? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > Indeed. 45e004f looks like the most interesting bit here. FWIW, I > would not mind re-enabling that on HEAD, as of something like the > attached. I have done a dozen of runs without seeing a test failure, > and knowing that we don't support anything older than Win10 makes me > feel safer about this change. Any objections? We're early enough in the v16 cycle to have plenty of time to detect any problems, so I see little reason not to try it. regards, tom lane
Hi, On 2022-08-27 11:33:51 +0900, Michael Paquier wrote: > FWIW, I would not mind re-enabling that on HEAD, as of something like the > attached. I have done a dozen of runs without seeing a test failure, and > knowing that we don't support anything older than Win10 makes me feel safer > about this change. Any objections? +1. We don't have a choice about it on other architectures and it seems like a bad idea to disable on its own. I checked, and it looks like we didn't add --disable-dynamicbase, so ASLR wasn't disabled for mingw builds. Greetings, Andres Freund
On Sat, Aug 27, 2022 at 12:27:57PM -0700, Andres Freund wrote: > I checked, and it looks like we didn't add --disable-dynamicbase, so ASLR > wasn't disabled for mingw builds. True enough. I have applied the patch to re-enable that on HEAD. Let's now see what happens in the next couple of days. Popcorn is ready here. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > True enough. I have applied the patch to re-enable that on HEAD. > Let's now see what happens in the next couple of days. Popcorn is > ready here. Hmm: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2022-08-28%2010%3A30%3A29 Maybe that's just unrelated noise, but it sure looks weird: after passing the core regression tests in "make check", it failed in pg_upgrade's run with diff -w -U3 H:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/xml.out H:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/results/xml.out --- H:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/xml.out Wed May 18 18:30:12 2022 +++ H:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/results/xml.out Sun Aug 28 06:55:14 2022 @@ -1557,7 +1557,7 @@ \\x SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&deep</n2>post</e>' COLUMNS x xmlPATH 'node()', y xml PATH '/'); -[ RECORD 1 ]----------------------------------------------------------- -x | pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&deep</n2>post +x | pre<?pi arg?><![CDATA[&ent1]]><!--c1--><n2>&deep</n2>post y | <e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&deep</n2>post</e>+ | I don't recall ever seeing a failure quite like this. regards, tom lane
Hi, On 2022-08-28 10:09:53 -0400, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: > > True enough. I have applied the patch to re-enable that on HEAD. > > Let's now see what happens in the next couple of days. Popcorn is > > ready here. > > Hmm: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2022-08-28%2010%3A30%3A29 > > Maybe that's just unrelated noise, but it sure looks weird: after > passing the core regression tests in "make check", it failed in > pg_upgrade's run with > > diff -w -U3 H:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/xml.out H:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/results/xml.out > --- H:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/xml.out Wed May 18 18:30:12 2022 > +++ H:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/results/xml.out Sun Aug 28 06:55:14 2022 > @@ -1557,7 +1557,7 @@ > \\x > SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&deep</n2>post</e>' COLUMNS x xmlPATH 'node()', y xml PATH '/'); > -[ RECORD 1 ]----------------------------------------------------------- > -x | pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&deep</n2>post > +x | pre<?pi arg?><![CDATA[&ent1]]><!--c1--><n2>&deep</n2>post > y | <e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&deep</n2>post</e>+ > | Pretty weird, agreed. But hard to see how it could be caused by the randomization change, except that perhaps it could highlight a preexisting bug? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2022-08-28 10:09:53 -0400, Tom Lane wrote: >> -x | pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&deep</n2>post >> +x | pre<?pi arg?><![CDATA[&ent1]]><!--c1--><n2>&deep</n2>post > Pretty weird, agreed. But hard to see how it could be caused by the > randomization change, except that perhaps it could highlight a preexisting > bug? I have no idea either. I agree there *shouldn't* be any connection, so if ASLR is somehow triggering this then whatever is failing is almost certainly buggy on its own terms. But there's a lot of moving parts here (mumble libxml mumble). I'm going to wait to see if it reproduces before spending much effort. regards, tom lane
On Sun, Aug 28, 2022 at 11:51:19AM -0400, Tom Lane wrote: > I have no idea either. I agree there *shouldn't* be any connection, > so if ASLR is somehow triggering this then whatever is failing is > almost certainly buggy on its own terms. But there's a lot of > moving parts here (mumble libxml mumble). I'm going to wait to see > if it reproduces before spending much effort. I have noticed that yesterday, but cannot think much about it. This basically changes the position of "<!--c1-->" for the first record, leaving the second one untouched: <!--c1--><?pi arg?><![CDATA[&ent1]]> <?pi arg?><![CDATA[&ent1]]><!--c1--> I am not used to xmltable(), but I wonder if there is something in one of these support functions in xml.c that gets influenced by the randomization. That sounds a bit hairy as make check passed in bowerbird, and I have noticed at least two other Windows hosts running TAP that passed. Or that's just something with libxml itself. -- Michael
Attachment
On Mon, Aug 29, 2022 at 10:12:20AM +0900, Michael Paquier wrote: > I have noticed that yesterday, but cannot think much about it. This > basically changes the position of "<!--c1-->" for the first record, > leaving the second one untouched: > <!--c1--><?pi arg?><![CDATA[&ent1]]> > <?pi arg?><![CDATA[&ent1]]><!--c1--> > > I am not used to xmltable(), but I wonder if there is something in one > of these support functions in xml.c that gets influenced by the > randomization. That sounds a bit hairy as make check passed in > bowerbird, and I have noticed at least two other Windows hosts running > TAP that passed. Or that's just something with libxml itself. This is amazing. The issue has showed up a second time in a row in bowerbird, as of: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2022-08-29%2013%3A30%3A32 I don't know what to think about ASLR that manipulates the comment in this XML object under VS 2017 (perhaps a compiler issue?), but it should be possible to go back to green simply by removing "<!--c1-->" from the input string. Creating an extra output pattern here would be very costly, as xml_2.out and xml.out have outputs for --with-libxml. Would people object if I do that for now? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > This is amazing. The issue has showed up a second time in a row in > bowerbird, as of: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2022-08-29%2013%3A30%3A32 That is fascinating. > I don't know what to think about ASLR that manipulates the comment in > this XML object under VS 2017 (perhaps a compiler issue?), but it > should be possible to go back to green simply by removing "<!--c1-->" > from the input string. Creating an extra output pattern here would be > very costly, as xml_2.out and xml.out have outputs for --with-libxml. > Would people object if I do that for now? Let's let it go for a few more runs. I want to know whether it reproduces 100% or only sometimes. regards, tom lane
Hi, On 2022-08-30 08:48:38 +0900, Michael Paquier wrote: > On Mon, Aug 29, 2022 at 10:12:20AM +0900, Michael Paquier wrote: > > I have noticed that yesterday, but cannot think much about it. This > > basically changes the position of "<!--c1-->" for the first record, > > leaving the second one untouched: > > <!--c1--><?pi arg?><![CDATA[&ent1]]> > > <?pi arg?><![CDATA[&ent1]]><!--c1--> > > > > I am not used to xmltable(), but I wonder if there is something in one > > of these support functions in xml.c that gets influenced by the > > randomization. That sounds a bit hairy as make check passed in > > bowerbird, and I have noticed at least two other Windows hosts running > > TAP that passed. Or that's just something with libxml itself. > > This is amazing. The issue has showed up a second time in a row in > bowerbird, as of: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2022-08-29%2013%3A30%3A32 > > I don't know what to think about ASLR that manipulates the comment in > this XML object under VS 2017 (perhaps a compiler issue?), but it > should be possible to go back to green simply by removing "<!--c1-->" > from the input string. Creating an extra output pattern here would be > very costly, as xml_2.out and xml.out have outputs for --with-libxml. > > Would people object if I do that for now? I do. Clearly something is wrong, what do we gain by averting our eyes? An alternative output file strikes me as an equally bad idea - it's not like this is a case where there are legitimately different outputs? Bowerbird is the only animal testing libxml with visual studio afaics. And CI doesn't have libxml either. So we have no idea if it's the msvc version that matters. Andrew, what version of libxml is installed? Afaict we don't see that anywhere with msvc. Greetings, Andres Freund
Hi, On 2022-08-29 19:58:31 -0400, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: > > This is amazing. The issue has showed up a second time in a row in > > bowerbird, as of: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2022-08-29%2013%3A30%3A32 > > That is fascinating. > > I don't know what to think about ASLR that manipulates the comment in > > this XML object under VS 2017 (perhaps a compiler issue?), but it > > should be possible to go back to green simply by removing "<!--c1-->" > > from the input string. Creating an extra output pattern here would be > > very costly, as xml_2.out and xml.out have outputs for --with-libxml. > > > Would people object if I do that for now? > > Let's let it go for a few more runs. I want to know whether it > reproduces 100% or only sometimes. The weirdest part is that it only happens as part of the the pg_upgrade test. I just tested it in my test windows vm (win 10, vs 2019), with a build of libxml I had around (2.9.7), and the regression tests passed both "normally" and within pg_upgrade. Greetings, Andres Freund
On Mon, Aug 29, 2022 at 05:33:56PM -0700, Andres Freund wrote: > The weirdest part is that it only happens as part of the the pg_upgrade test. make check has just failed: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2022-08-30%2001%3A15%3A13 > I just tested it in my test windows vm (win 10, vs 2019), with a build of > libxml I had around (2.9.7), and the regression tests passed both "normally" > and within pg_upgrade. bowerbird is feeling from c:\\prog\\3p64\\include\\libxml2 and c:\\prog\\3p64\\lib\\libxml2.lib. I am not sure which version of libxml this is, and the other animals of MSVC don't use libxml so it is not possible to correlate that only to VS 2017, either. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Aug 29, 2022 at 05:33:56PM -0700, Andres Freund wrote: >> The weirdest part is that it only happens as part of the the pg_upgrade test. > make check has just failed: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2022-08-30%2001%3A15%3A13 So it *is* probabilistic, which is pretty much what you'd expect if ASLR triggers it. That brings us no closer to understanding what the mechanism is, though. regards, tom lane
On Mon, Aug 29, 2022 at 10:00:10PM -0400, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: > > On Mon, Aug 29, 2022 at 05:33:56PM -0700, Andres Freund wrote: > >> The weirdest part is that it only happens as part of the the pg_upgrade test. > > > make check has just failed: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2022-08-30%2001%3A15%3A13 > > So it *is* probabilistic, which is pretty much what you'd expect > if ASLR triggers it. That brings us no closer to understanding > what the mechanism is, though. There have been more failures, always switching the input from "pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&deep</n2>post" to "pre<?pi arg?><![CDATA[&ent1]]><!--c1--><n2>&deep</n2>post". Using a PATH of node() influences the output. I am not verse unto XMLTABLE, but could it be an issue where each node is parsed and we have something like a qsort() applied on the pointer addresses for each part or something like that, causing the output to become unstable? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > Using a PATH of node() influences the output. I am not verse unto > XMLTABLE, but could it be an issue where each node is parsed and we > have something like a qsort() applied on the pointer addresses for > each part or something like that, causing the output to become > unstable? I'm not sure. I dug into this enough to find that the output from this query is generated in xml.c lines 4635ff: /* Concatenate serialized values */ initStringInfo(&str); for (int i = 0; i < count; i++) { textstr = xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i], xtCxt->xmlerrcxt); appendStringInfoText(&str, textstr); } cstr = str.data; So evidently, the problem occurs because the elements of the nodesetval->nodeTab[] array are in a different order than we expect. I looked into <xpath.h> and found the definition of the "nodesetval" struct, and the comments are eye-opening to say the least: /* * A node-set (an unordered collection of nodes without duplicates). */ typedef struct _xmlNodeSet xmlNodeSet; typedef xmlNodeSet *xmlNodeSetPtr; struct _xmlNodeSet { int nodeNr; /* number of nodes in the set */ int nodeMax; /* size of the array as allocated */ xmlNodePtr *nodeTab; /* array of nodes in no particular order */ /* @@ with_ns to check wether namespace nodes should be looked at @@ */ }; It seems like maybe we're relying on an ordering we should not. Yet surely the ordering of the pieces of this output is meaningful? Are we using the wrong libxml API to create this result? Anyway, I'm now suspicious that we've accidentally exposed a logic bug in the XMLTABLE code, rather than anything wrong with the ASLR stuff. regards, tom lane
On Wed, Aug 31, 2022 at 08:36:01AM +0900, Michael Paquier wrote: > There have been more failures, always switching the input from > "pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&deep</n2>post" > to "pre<?pi arg?><![CDATA[&ent1]]><!--c1--><n2>&deep</n2>post". > > Using a PATH of node() influences the output. I am not verse unto > XMLTABLE, but could it be an issue where each node is parsed and we > have something like a qsort() applied on the pointer addresses for > each part or something like that, causing the output to become > unstable? Hmm. I think that I may have an idea here after looking at our xml.c and xpath.c in libxml2/. From what I understand, we process the PATH through XmlTableGetValue() that builds a XML node path in xmlXPathCompiledEval(). The interesting part comes from libxml2's xmlXPathCompiledEvalInternal(), where I think we don't apply a sort on the contents generated. Hence, I am wondering if the solution here would be to do one xmlXPathNodeSetSort(xpathobj->nodesetval) after compiling the path with xmlXPathCompiledEval() in XmlTableGetValue(). This should ensure that the items are ordered even if ASLR mixes if the pointer positions. A complete solution would involve more code paths, but we'd need only one change in XmlTableGetValue() for the current regression tests to work. I don't have an environment where I can reproduce that, so that would be up to the buildfarm to stress this solution.. Thoughts? -- Michael
Attachment
On Tue, Aug 30, 2022 at 08:07:27PM -0400, Tom Lane wrote: > It seems like maybe we're relying on an ordering we should not. > Yet surely the ordering of the pieces of this output is meaningful? > Are we using the wrong libxml API to create this result? > > Anyway, I'm now suspicious that we've accidentally exposed a logic > bug in the XMLTABLE code, rather than anything wrong with the > ASLR stuff. Funny. I have reached basically the same conclusion as you a couple of minutes ago, but I also think that I have found what we need to do here to ensure the ordering of the nodes generated by the libxml code. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > Hence, I am wondering if the solution here > would be to do one xmlXPathNodeSetSort(xpathobj->nodesetval) after > compiling the path with xmlXPathCompiledEval() in XmlTableGetValue(). Hmm, I see that function declared in <xpathInternals.h>, which sure doesn't give me a warm feeling that we're supposed to call it directly. But maybe there's an approved way to get the result. Or perhaps this test case is wrong, and instead of "node()" we need to write something that specifies a sorted result? regards, tom lane
On Tue, Aug 30, 2022 at 08:28:58PM -0400, Tom Lane wrote: > Hmm, I see that function declared in <xpathInternals.h>, which > sure doesn't give me a warm feeling that we're supposed to call it > directly. But maybe there's an approved way to get the result. > > Or perhaps this test case is wrong, and instead of "node()" we > need to write something that specifies a sorted result? I am no specialist on this matter.. Still, looking around, this looks to be the kind of job that xsl:sort would do? We cannot use it here for an XML-only context, though, and it does not seem like there is a direct way to enforce the ordering per the nature of the XPath language, either. I may be missing something of course. Please note that I am also getting a bit the cold feet with the idea of enforcing a sorting of the elements in the generated output, actually, as this could have a performance penalty for all users particularly on large blobs of data. At the end, I'd like to think that it would be wiser to just remove entirely the test for node() or reduce the contents of the string to be able to strictly control the output order (say a simple '<e>pre<![CDATA[&ent1]]>post</e>' would do the trick). I cannot think about a better idea now. Note that removing the test case where we have node() has no impact on the coverage of xml.c. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > At the end, I'd like to think that it would be wiser to just remove > entirely the test for node() or reduce the contents of the string to > be able to strictly control the output order (say a simple > '<e>pre<![CDATA[&ent1]]>post</e>' would do the trick). I cannot think > about a better idea now. Note that removing the test case where we > have node() has no impact on the coverage of xml.c. Yeah, I confirm that: local code-coverage testing shows no change in the number of lines reached in xml.c when I remove that column: -SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&deep</n2>post</e>' COLUMNS x xmlPATH 'node()', y xml PATH '/'); +SELECT * FROM XMLTABLE('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&deep</n2>post</e>' COLUMNS y xmlPATH '/'); Dropping the query altogether does result in a reduction in the number of lines hit. I did not check the other XMLTABLE infrastructure, so what probably is best to do is keep the two output columns but change 'node()' to something with a more stable result; any preferences? regards, tom lane
On Wed, Aug 31, 2022 at 01:33:53PM -0400, Tom Lane wrote: > I did not check the other XMLTABLE infrastructure, so what probably > is best to do is keep the two output columns but change 'node()' > to something with a more stable result; any preferences? The input object could also be reworked so as we would not have any ordering issues, say "<e>pre<f><g><n2>&deep</n2></g></f>post</e>". Changing only the path, you could use "/e/n2" instead of "node()", so as we'd always get the most internal member. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > The input object could also be reworked so as we would not have any > ordering issues, say "<e>pre<f><g><n2>&deep</n2></g></f>post</e>". > Changing only the path, you could use "/e/n2" instead of "node()", so > as we'd always get the most internal member. Done that way. regards, tom lane
On Wed, Aug 31, 2022 at 10:22:06PM -0400, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: > > The input object could also be reworked so as we would not have any > > ordering issues, say "<e>pre<f><g><n2>&deep</n2></g></f>post</e>". > > Changing only the path, you could use "/e/n2" instead of "node()", so > > as we'd always get the most internal member. > > Done that way. Cool, thanks. bowerbird looks happy after its first run. An argument in favor of backpatching is if one decides to build the code with MSVC and patch the scripts to enable ASLR. However, nobody has complained about that yet, so fine by me to leave this change only on HEAD for now. -- Michael
Attachment
2022年9月1日(木) 13:42 Michael Paquier <michael@paquier.xyz>: > > On Wed, Aug 31, 2022 at 10:22:06PM -0400, Tom Lane wrote: > > Michael Paquier <michael@paquier.xyz> writes: > > > The input object could also be reworked so as we would not have any > > > ordering issues, say "<e>pre<f><g><n2>&deep</n2></g></f>post</e>". > > > Changing only the path, you could use "/e/n2" instead of "node()", so > > > as we'd always get the most internal member. > > > > Done that way. > > Cool, thanks. bowerbird looks happy after its first run. > > An argument in favor of backpatching is if one decides to build the > code with MSVC and patch the scripts to enable ASLR. However, nobody > has complained about that yet, so fine by me to leave this change only > on HEAD for now. Apologies for the thread bump, but there is an entry for this thread in the current CF previously marked "Needs review": https://commitfest.postgresql.org/40/3561/ I've gone ahead and marked it as "Committed", as that appears to have happened back in August as 36389a060c. If for whatever reason that was the Wrong Thing To Do, please let me know. Regards Ian Barwick
> If for whatever reason that was the Wrong Thing To Do, please let me know.
2022年9月1日(木) 13:42 Michael Paquier <michael@paquier.xyz>:
>
> On Wed, Aug 31, 2022 at 10:22:06PM -0400, Tom Lane wrote:
> > Michael Paquier <michael@paquier.xyz> writes:
> > > The input object could also be reworked so as we would not have any
> > > ordering issues, say "<e>pre<f><g><n2>&deep</n2></g></f>post</e>".
> > > Changing only the path, you could use "/e/n2" instead of "node()", so
> > > as we'd always get the most internal member.
> >
> > Done that way.
>
> Cool, thanks. bowerbird looks happy after its first run.
>
> An argument in favor of backpatching is if one decides to build the
> code with MSVC and patch the scripts to enable ASLR. However, nobody
> has complained about that yet, so fine by me to leave this change only
> on HEAD for now.
Apologies for the thread bump, but there is an entry for this thread
in the current CF
previously marked "Needs review":
https://commitfest.postgresql.org/40/3561/
I've gone ahead and marked it as "Committed", as that appears to have happened
back in August as 36389a060c.
If for whatever reason that was the Wrong Thing To Do, please let me know.
Regards
Ian Barwick
Linaro Limited
Attachment
On Thu, Nov 03, 2022 at 11:06:46AM +0000, Niyas Sait wrote: > > I've gone ahead and marked it as "Committed", as that appears to have > happened > > back in August as 36389a060c. > > If for whatever reason that was the Wrong Thing To Do, please let me know. > > 36389a060c commit enables ASLR for windows but does not include other > changes required for Arm64. Yes, marking this patch as committed is incorrect. That's just me lagging behind for the review :) I have switched that back as "Needs review" for the moment. > I've attached a new version of the patch which excludes the already merged > ASLR changes and add > small changes to handle latest changes in the build scripts. Thanks. -- Michael
Attachment
Hi, On 2022-11-03 11:06:46 +0000, Niyas Sait wrote: > I've attached a new version of the patch which excludes the already merged > ASLR changes and add > small changes to handle latest changes in the build scripts. Note that we're planning to remove the custom windows build scripts before the next release, relying on the meson build instead. > diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h > index 8b19ab160f..bf6a6dba35 100644 > --- a/src/include/storage/s_lock.h > +++ b/src/include/storage/s_lock.h > @@ -708,13 +708,21 @@ typedef LONG slock_t; > #define SPIN_DELAY() spin_delay() > > /* If using Visual C++ on Win64, inline assembly is unavailable. > - * Use a _mm_pause intrinsic instead of rep nop. > + * Use _mm_pause (x64) or __isb(arm64) intrinsic instead of rep nop. > */ > #if defined(_WIN64) > static __forceinline void > spin_delay(void) > { > +#ifdef _M_ARM64 > + /* > + * arm64 way of hinting processor for spin loops optimisations > + * ref: https://community.arm.com/support-forums/f/infrastructure-solutions-forum/48654/ssetoneon-faq > + */ > + __isb(_ARM64_BARRIER_SY); > +#else > _mm_pause(); > +#endif > } > #else > static __forceinline void I think we should just apply this, there seems very little risk of making anything worse, given the gating to _WIN64 && _M_ARM64. > diff --git a/src/port/pg_crc32c_armv8.c b/src/port/pg_crc32c_armv8.c > index 9e301f96f6..981718752f 100644 > --- a/src/port/pg_crc32c_armv8.c > +++ b/src/port/pg_crc32c_armv8.c > @@ -14,7 +14,9 @@ > */ > #include "c.h" > > +#ifndef _MSC_VER > #include <arm_acle.h> > +#endif > > #include "port/pg_crc32c.h" This won't suffice with the meson build, since the relevant configure test also uses arm_acle.h: elif host_cpu == 'arm' or host_cpu == 'aarch64' prog = ''' #include <arm_acle.h> int main(void) { unsigned int crc = 0; crc = __crc32cb(crc, 0); crc = __crc32ch(crc, 0); crc = __crc32cw(crc, 0); crc = __crc32cd(crc, 0); /* return computed value, to prevent the above being optimized away */ return crc == 0; } ''' if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', args: test_c_args) # Use ARM CRC Extension unconditionally cdata.set('USE_ARMV8_CRC32C', 1) have_optimized_crc = true elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd with -march=armv8-a+crc', args: test_c_args + ['-march=armv8-a+crc']) # Use ARM CRC Extension, with runtime check cflags_crc += '-march=armv8-a+crc' cdata.set('USE_ARMV8_CRC32C', false) cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) have_optimized_crc = true endif endif The meson checking logic is used both for msvc and other compilers, so this will need to work with both. > diff --git a/src/tools/msvc/gendef.pl b/src/tools/msvc/gendef.pl > index d6bed1ce15..f1e0ff446b 100644 > --- a/src/tools/msvc/gendef.pl > +++ b/src/tools/msvc/gendef.pl > @@ -120,9 +120,9 @@ sub writedef > { > my $isdata = $def->{$f} eq 'data'; > > - # Strip the leading underscore for win32, but not x64 > + # Strip the leading underscore for win32, but not x64 and arm64 > $f =~ s/^_// > - unless ($arch eq "x86_64"); > + unless ($arch ne "x86"); > > # Emit just the name if it's a function symbol, or emit the name > # decorated with the DATA option for variables. > @@ -143,7 +143,7 @@ sub writedef > sub usage > { > die("Usage: gendef.pl --arch <arch> --deffile <deffile> --tempdir <tempdir> files-or-directories\n" > - . " arch: x86 | x86_64\n" > + . " arch: x86 | x86_64 | arm64 \n" > . " deffile: path of the generated file\n" > . " tempdir: directory for temporary files\n" > . " files or directories: object files or directory containing object files\n" > @@ -160,7 +160,7 @@ GetOptions( > 'tempdir:s' => \$tempdir,) or usage(); > > usage("arch: $arch") > - unless ($arch eq 'x86' || $arch eq 'x86_64'); > + unless ($arch eq 'x86' || $arch eq 'x86_64' || $arch eq 'arm64'); > > my @files; > Seems reasonable. Greetings, Andres Freund
On Sat, Nov 05, 2022 at 11:31:36AM -0700, Andres Freund wrote: > On 2022-11-03 11:06:46 +0000, Niyas Sait wrote: > Note that we're planning to remove the custom windows build scripts before the > next release, relying on the meson build instead. Youpi. >> #if defined(_WIN64) >> static __forceinline void >> spin_delay(void) >> { >> +#ifdef _M_ARM64 >> + /* >> + * arm64 way of hinting processor for spin loops optimisations >> + * ref: https://community.arm.com/support-forums/f/infrastructure-solutions-forum/48654/ssetoneon-faq >> + */ >> + __isb(_ARM64_BARRIER_SY); >> +#else >> _mm_pause(); >> +#endif >> } >> #else >> static __forceinline void > > I think we should just apply this, there seems very little risk of making > anything worse, given the gating to _WIN64 && _M_ARM64. Seems so. Hmm, where does _ARM64_BARRIER_SY come from? Perhaps it would be better to have a comment referring to it from a different place than the forums of arm, like some actual docs? > The meson checking logic is used both for msvc and other compilers, so this > will need to work with both. The patch has been switched as waiting on author for now. -- Michael
Attachment
On Mon, Nov 07, 2022 at 03:58:13PM +0900, Michael Paquier wrote: > The patch has been switched as waiting on author for now. Seeing no replies after three weeks, I have marked the patch as returned with feedback for now. -- Michael
Attachment
On 05/11/2022 18:31, Andres Freund wrote: > On 2022-11-03 11:06:46 +0000, Niyas Sait wrote: >> I've attached a new version of the patch which excludes the already merged >> ASLR changes and add >> small changes to handle latest changes in the build scripts. > Note that we're planning to remove the custom windows build scripts before the > next release, relying on the meson build instead. > Thanks. I will add changes to add meson build support. > This won't suffice with the meson build, since the relevant configure test > also uses arm_acle.h: > elif host_cpu == 'arm' or host_cpu == 'aarch64' > > prog = ''' > #include <arm_acle.h> > > int main(void) > { > unsigned int crc = 0; > crc = __crc32cb(crc, 0); > crc = __crc32ch(crc, 0); > crc = __crc32cw(crc, 0); > crc = __crc32cd(crc, 0); > > /* return computed value, to prevent the above being optimized away */ > return crc == 0; > } > ''' > > if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', > args: test_c_args) > # Use ARM CRC Extension unconditionally > cdata.set('USE_ARMV8_CRC32C', 1) > have_optimized_crc = true > elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd with -march=armv8-a+crc', > args: test_c_args + ['-march=armv8-a+crc']) > # Use ARM CRC Extension, with runtime check > cflags_crc += '-march=armv8-a+crc' > cdata.set('USE_ARMV8_CRC32C', false) > cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) > have_optimized_crc = true > endif > endif > > The meson checking logic is used both for msvc and other compilers, so this > will need to work with both. Yes, will handle that. -- Niyas
On 07/11/2022 06:58, Michael Paquier wrote: >>> #if defined(_WIN64) >>> static __forceinline void >>> spin_delay(void) >>> { >>> +#ifdef _M_ARM64 >>> + /* >>> + * arm64 way of hinting processor for spin loops optimisations >>> + * ref: https://community.arm.com/support-forums/f/infrastructure-solutions-forum/48654/ssetoneon-faq >>> + */ >>> + __isb(_ARM64_BARRIER_SY); >>> +#else >>> _mm_pause(); >>> +#endif >>> } >>> #else >>> static __forceinline void >> >> I think we should just apply this, there seems very little risk of making >> anything worse, given the gating to _WIN64 && _M_ARM64. > > Seems so. Hmm, where does _ARM64_BARRIER_SY come from? Perhaps it > would be better to have a comment referring to it from a different > place than the forums of arm, like some actual docs? _ARM64_BARRIER_SY is defined in Microsoft Arm64 intrinsic documentation - https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#BarrierRestrictions I couldn't find something more official for the sse2neon library part. -- Niyas
Hello, I've attached a new revision of the patch (v4) and includes following changes, 1. Add support for meson build system 2. Extend MSVC scripts to handle ARM64 platform. 3. Add arm64 definition of spin_delay function. 4. Exclude arm_acle.h import with MSVC compiler. V3->V4: Add support for meson build system V2->V3: Removed ASLR enablement and rebased on master. V1->V2: Rebased on top of master -- Niyas
Attachment
On Thu, Dec 01, 2022 at 05:53:47PM +0000, Niyas Sait wrote: > I've attached a new revision of the patch (v4) and includes following > changes, > > 1. Add support for meson build system > 2. Extend MSVC scripts to handle ARM64 platform. > 3. Add arm64 definition of spin_delay function. > 4. Exclude arm_acle.h import with MSVC compiler. > > V3->V4: Add support for meson build system > V2->V3: Removed ASLR enablement and rebased on master. > V1->V2: Rebased on top of master Thanks for the updated version. I have been looking at it closely and it looks like it should be able to do the job (no arm64 machine for Windows here, sigh). I have one tiny comment about this part: - USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK => undef, + USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK => 1, Shouldn't we only enable this flag when we are under aarch64? Similarly, I don't think that it is a good idea to switch on USE_SSE42_CRC32C_WITH_RUNTIME_CHECK all the time. We should only set it when building under x86 and x86_64. I would also add your link [1] in s_lock.h. [1]: https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics#BarrierRestrictions -- Michael
Attachment
>
>
> On 07/11/2022 06:58, Michael Paquier wrote:
> > Seems so. Hmm, where does _ARM64_BARRIER_SY come from? Perhaps it
> > would be better to have a comment referring to it from a different
> > place than the forums of arm, like some actual docs?
>
>
> _ARM64_BARRIER_SY is defined in Microsoft Arm64 intrinsic documentation
> -
> https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#BarrierRestrictions
In particular, at the bottom of that section:
"For the __isb intrinsic, the only restriction that is currently valid is _ARM64_BARRIER_SY; all other values are reserved by the architecture."
This corresponds to
https://developer.arm.com/documentation/ddi0596/2021-06/Base-Instructions/ISB--Instruction-Synchronization-Barrier-
which says
"SY Full system barrier operation, encoded as CRm = 0b1111. Can be omitted."
> I couldn't find something more official for the sse2neon library part.
Not quite sure what this is referring to, but it seems we can just point to the __aarch64__ section in the same file, which uses the same instruction:
spin_delay(void)
{
__asm__ __volatile__(
" isb; \n");
}
...and which already explains the choice with a comment.
About v4:
+ * Use _mm_pause (x64) or __isb(arm64) intrinsic instead of rep nop.
Need a space here after __isb.
+ if cc.get_id() == 'msvc'
+ cdata.set('USE_ARMV8_CRC32C', false)
+ cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1)
+ have_optimized_crc = true
+ else
That seems like a heavy-handed way to force it. Could we just use the same gating in the test program that the patch puts in the code of interest? Namely:
+#ifndef _MSC_VER
#include <arm_acle.h>
+#endif
--
John Naylor
EDB: http://www.enterprisedb.com
On 02/12/2022 05:02, Michael Paquier wrote: > Thanks for the updated version. I have been looking at it closely and > it looks like it should be able to do the job (no arm64 machine for > Windows here, sigh). > > I have one tiny comment about this part: > > - USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK => undef, > + USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK => 1, > > Shouldn't we only enable this flag when we are under aarch64? > Similarly, I don't think that it is a good idea to switch on > USE_SSE42_CRC32C_WITH_RUNTIME_CHECK all the time. We should only set > it when building under x86 and x86_64. > Ok, I will fix ARM64 specific one in the next revision. -- Niyas
On 02/12/2022 05:41, John Naylor wrote: >> I couldn't find something more official for the sse2neon library part. > Not quite sure what this is referring to, but it seems we can just point to > the __aarch64__ section in the same file, which uses the same instruction: > > spin_delay(void) > { > __asm__ __volatile__( > " isb; \n"); > } > > ...and which already explains the choice with a comment. Good point. Will add the comment. > + if cc.get_id() == 'msvc' > + cdata.set('USE_ARMV8_CRC32C', false) > + cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) > + have_optimized_crc = true > + else > > That seems like a heavy-handed way to force it. Could we just use the same > gating in the test program that the patch puts in the code of interest? > Namely: > > +#ifndef _MSC_VER > #include <arm_acle.h> > +#endif I took a similar approach as x86 MSVC code. I don't think the test program would work with MSVC. The compiler options are not MSVC friendly. -- Niyas
Hello, I've attached a new revision of the patch (v5) and includes following changes, 1. Add support for meson build system 2. Extend MSVC scripts to handle ARM64 platform. 3. Add arm64 definition of spin_delay function. 4. Exclude arm_acle.h import with MSVC compiler. V4->V5: * Added reference to Microsoft arm64 intrinsic documentation * Conditionally enable USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK * Fixed styling issue spotted by Ian Lawerence Barwick V3->V4: Add support for meson build system V2->V3: Removed ASLR enablement and rebased on master. V1->V2: Rebased on top of master -- Niyas
Attachment
On Fri, Dec 02, 2022 at 11:09:15AM +0000, Niyas Sait wrote: > I've attached a new revision of the patch (v5) and includes following > changes, > > 1. Add support for meson build system > 2. Extend MSVC scripts to handle ARM64 platform. > 3. Add arm64 definition of spin_delay function. > 4. Exclude arm_acle.h import with MSVC compiler. Hmm. There are still a few things that need some attention here: - USE_SSE42_CRC32C_WITH_RUNTIME_CHECK should not be set for aarch64. - This is missing updates for ICU. Looking at the upstream code, Build.Windows.ProjectConfiguration.props uses libARM64 and binARM64 for the output library and binary paths. - This is missing updates for krb5. For this case, I am seeing no traces of packages for aarch64, so I guess that we could just fail hard until someone cares enough to ping us about what to do here. - There were zero changes in the docs, but we need to update at least the section about architectures supported for the 64-bit builds. - Last comes OpenSSL, that supports amd64_arm64 as build target (see NOTES-WINDOWS.md), and the library names are libssl.lib and libcrypto.lib. Looking at https://slproweb.com/products/Win32OpenSSL.html, there are experimental builds for arm64 with OpenSSL 3.0. Niyas or somebody else, could you look at the contents of lib/VC/ and see what we could rely on for the debug builds after installing this MSI? We should rely on something like lib/VC/sslcrypto64MD.lib or lib/VC/sslcrypto32MD.lib, but for arm64. With meson gaining in maturity, perhaps that's not the most urgent thing as we will likely remove src/tools/msvc/ soon but I'd rather do that right anyway as much as I can to avoid an incorrect state in the tree at any time in its history. - USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK => undef, + USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK => $self->{platform} eq "ARM64" ? : 1 : undef, Did you actually test this patch? This won't work at all with perl, per se the double colon after the question mark. For now, please find attached an updated patch with all the fixes I could come up with. -- Michael
Attachment
On 05/12/2022 05:12, Michael Paquier wrote: > - Last comes OpenSSL, that supports amd64_arm64 as build target (see > NOTES-WINDOWS.md), and the library names are libssl.lib and > libcrypto.lib. Looking at > https://slproweb.com/products/Win32OpenSSL.html, there are > experimental builds for arm64 with OpenSSL 3.0. Niyas or somebody > else, could you look at the contents of lib/VC/ and see what we could > rely on for the debug builds after installing this MSI? We should > rely on something like lib/VC/sslcrypto64MD.lib or > lib/VC/sslcrypto32MD.lib, but for arm64. I tried that installer. And I can see following libraries installed in lib/VC location. libcryptoarm64MD.lib libcryptoarm64MDd.lib libcryptoarm64MT.lib libcryptoarm64MTd.lib libsslarm64MD.lib libsslarm64MDd.lib libsslarm64MT.lib libsslarm64MTd.lib > - USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK => undef, > + USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK => $self->{platform} eq "ARM64" ? : 1 : undef, > Did you actually test this patch? This won't work at all with perl, > per se the double colon after the question mark. Yes I did a full build. I am not sure why I didn't see any error with that. My perl skills are very limited and I started with something bit more naive like "$self->{platform} == "ARM64" ? : 1 : undef" But that didn't work and I changed to using eq and the compilation was fine. Thanks for fixing the patch. > For now, please find attached an updated patch with all the fixes I > could come up with. Thanks. I did a quick sanity build with your updated patch and looks fine. -- Niyas
Hi, On 2022-12-05 14:12:41 +0900, Michael Paquier wrote: > With meson gaining in maturity, perhaps that's not the most urgent > thing as we will likely remove src/tools/msvc/ soon but I'd rather do > that right anyway as much as I can to avoid an incorrect state in the > tree at any time in its history. I'd actually argue that we should just not add win32 support to src/tools/msvc/. > --- a/src/include/storage/s_lock.h > +++ b/src/include/storage/s_lock.h > @@ -708,13 +708,21 @@ typedef LONG slock_t; > #define SPIN_DELAY() spin_delay() > > /* If using Visual C++ on Win64, inline assembly is unavailable. > - * Use a _mm_pause intrinsic instead of rep nop. > + * Use _mm_pause (x64) or __isb (arm64) intrinsic instead of rep nop. > */ > #if defined(_WIN64) > static __forceinline void > spin_delay(void) > { > +#ifdef _M_ARM64 > + /* > + * See spin_delay aarch64 inline assembly definition above for details > + * ref: https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics#BarrierRestrictions > + */ > + __isb(_ARM64_BARRIER_SY); > +#else > _mm_pause(); > +#endif > } > #else > static __forceinline void This looks somewhat wrong to me. We end up with some ifdefs on the function defintion level, and some others inside the function body. I think it should be either or. > diff --git a/meson.build b/meson.build > index 725e10d815..e354ad7650 100644 > --- a/meson.build > +++ b/meson.build > @@ -1944,7 +1944,13 @@ int main(void) > > elif host_cpu == 'arm' or host_cpu == 'aarch64' > > - prog = ''' > + if cc.get_id() == 'msvc' > + cdata.set('USE_ARMV8_CRC32C', false) > + cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) > + have_optimized_crc = true > + else > + > + prog = ''' > #include <arm_acle.h> > > int main(void) Why does this need to be hardcoded? The compiler probe should just work for msvc. > @@ -1960,18 +1966,19 @@ int main(void) > } > ''' > > - if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', > - args: test_c_args) > - # Use ARM CRC Extension unconditionally > - cdata.set('USE_ARMV8_CRC32C', 1) > - have_optimized_crc = true > - elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd with -march=armv8-a+crc', > - args: test_c_args + ['-march=armv8-a+crc']) > - # Use ARM CRC Extension, with runtime check > - cflags_crc += '-march=armv8-a+crc' > - cdata.set('USE_ARMV8_CRC32C', false) > - cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) > - have_optimized_crc = true > + if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', > + args: test_c_args) > + # Use ARM CRC Extension unconditionally > + cdata.set('USE_ARMV8_CRC32C', 1) > + have_optimized_crc = true > + elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd with -march=armv8-a+crc', > + args: test_c_args + ['-march=armv8-a+crc']) > + # Use ARM CRC Extension, with runtime check > + cflags_crc += '-march=armv8-a+crc' > + cdata.set('USE_ARMV8_CRC32C', false) > + cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) > + have_optimized_crc = true > + endif > endif > endif And then this reindent wouldn't be needed. Greetings, Andres Freund
On Mon, Dec 05, 2022 at 10:14:49AM -0800, Andres Freund wrote: > On 2022-12-05 14:12:41 +0900, Michael Paquier wrote: >> With meson gaining in maturity, perhaps that's not the most urgent >> thing as we will likely remove src/tools/msvc/ soon but I'd rather do >> that right anyway as much as I can to avoid an incorrect state in the >> tree at any time in its history. > > I'd actually argue that we should just not add win32 support to > src/tools/msvc/. Are you arguing about ripping out the existing win32 or do nothing for arm64? I guess the later, but these words also sound like the former to me. -- Michael
Attachment
Hi, On 2022-12-06 08:31:16 +0900, Michael Paquier wrote: > On Mon, Dec 05, 2022 at 10:14:49AM -0800, Andres Freund wrote: > > On 2022-12-05 14:12:41 +0900, Michael Paquier wrote: > >> With meson gaining in maturity, perhaps that's not the most urgent > >> thing as we will likely remove src/tools/msvc/ soon but I'd rather do > >> that right anyway as much as I can to avoid an incorrect state in the > >> tree at any time in its history. > > > > I'd actually argue that we should just not add win32 support to > > src/tools/msvc/. > > Are you arguing about ripping out the existing win32 or do nothing for > arm64? I guess the later, but these words also sound like the former > to me. Yes, that's a typo. I meant that we shouldn't add arm64 support to src/tools/msvc/. I do think we should rip out src/tools/msvc/ soon-ish, but we need buildfarm support first. Greetings, Andres Freund
On Mon, Dec 05, 2022 at 03:48:45PM -0800, Andres Freund wrote: > Yes, that's a typo. I meant that we shouldn't add arm64 support to > src/tools/msvc/. Okay, thanks. > I do think we should rip out src/tools/msvc/ soon-ish, but we need > buildfarm support first. Sure, let's see where it goes. -- Michael
Attachment
Hi, On 05/12/2022 18:14, Andres Freund wrote: >> With meson gaining in maturity, perhaps that's not the most urgent >> thing as we will likely remove src/tools/msvc/ soon but I'd rather do >> that right anyway as much as I can to avoid an incorrect state in the >> tree at any time in its history. > > I'd actually argue that we should just not add win32 support to > src/tools/msvc/. > > I think the old build system specific part is really minimal in the patch. I can strip out those if that's preferred. >> --- a/src/include/storage/s_lock.h >> +++ b/src/include/storage/s_lock.h >> @@ -708,13 +708,21 @@ typedef LONG slock_t; >> #define SPIN_DELAY() spin_delay() >> >> /* If using Visual C++ on Win64, inline assembly is unavailable. >> - * Use a _mm_pause intrinsic instead of rep nop. >> + * Use _mm_pause (x64) or __isb (arm64) intrinsic instead of rep nop. >> */ >> #if defined(_WIN64) >> static __forceinline void >> spin_delay(void) >> { >> +#ifdef _M_ARM64 >> + /* >> + * See spin_delay aarch64 inline assembly definition above for details >> + * ref: https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics#BarrierRestrictions >> + */ >> + __isb(_ARM64_BARRIER_SY); >> +#else >> _mm_pause(); >> +#endif >> } >> #else >> static __forceinline void > > This looks somewhat wrong to me. We end up with some ifdefs on the function > defintion level, and some others inside the function body. I think it should > be either or. > Ok, I can add an MSVC/ARM64 specific function. >> diff --git a/meson.build b/meson.build >> index 725e10d815..e354ad7650 100644 >> --- a/meson.build >> +++ b/meson.build >> @@ -1944,7 +1944,13 @@ int main(void) >> >> elif host_cpu == 'arm' or host_cpu == 'aarch64' >> >> - prog = ''' >> + if cc.get_id() == 'msvc' >> + cdata.set('USE_ARMV8_CRC32C', false) >> + cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) >> + have_optimized_crc = true >> + else >> + >> + prog = ''' >> #include <arm_acle.h> >> >> int main(void) > Why does this need to be hardcoded? The compiler probe should just work for > msvc. > There are couple of minor issues in the code probe with MSVC such as arm_acle.h needs to be removed and requires an explicit import of intrin.h. But even with those fixes, USE_ARMV8_CRC32C would be set and no runtime CRC extension check will be performed. Since CRC extension is optional in ARMv8, It would be better to use the CRC variant with runtime check. So I end up following the x64 variant and hardcoded the flags in case of ARM64 and MSVC. -- Niyas
On Mon, Dec 12, 2022 at 01:38:37PM +0000, Niyas Sait wrote: > On 05/12/2022 18:14, Andres Freund wrote: > I think the old build system specific part is really minimal in the patch. I > can strip out those if that's preferred. Removing all the changes from src/tools/msvc/ is an approach that works for me. >> Why does this need to be hardcoded? The compiler probe should just work for >> msvc. > > There are couple of minor issues in the code probe with MSVC such as > arm_acle.h needs to be removed and requires an explicit import of intrin.h. > But even with those fixes, USE_ARMV8_CRC32C would be set and no runtime CRC > extension check will be performed. Since CRC extension is optional in ARMv8, > It would be better to use the CRC variant with runtime check. So I end up > following the x64 variant and hardcoded the flags in case of ARM64 and MSVC. Hm.. Andres, do you have access to Windows hosts with ARM64 underneath to validate any of that? No way to blame Cirrus for not having this option, they already do a lot :) -- Michael
Attachment
On 12/12/2022 23:56, Michael Paquier wrote: > On Mon, Dec 12, 2022 at 01:38:37PM +0000, Niyas Sait wrote: >> On 05/12/2022 18:14, Andres Freund wrote: >> I think the old build system specific part is really minimal in the patch. I >> can strip out those if that's preferred. > Removing all the changes from src/tools/msvc/ is an approach that > works for me. > I've attached a new version (v7) that removes changes to support old MSVC build system. -- Niyas
Attachment
On 16/12/2022 10:52, Niyas Sait wrote: > On 12/12/2022 23:56, Michael Paquier wrote: >> On Mon, Dec 12, 2022 at 01:38:37PM +0000, Niyas Sait wrote: >>> On 05/12/2022 18:14, Andres Freund wrote: >>> I think the old build system specific part is really minimal in the >>> patch. I >>> can strip out those if that's preferred. >> Removing all the changes from src/tools/msvc/ is an approach that >> works for me. >> > > I've attached a new version (v7) that removes changes to support old > MSVC build system. Hello, Can I get review for the latest patch please ? -- Niyas
Hi, On 2022-12-16 10:52:23 +0000, Niyas Sait wrote: > Subject: [PATCH v7] Enable postgres native build for windows-arm64 platform > elif host_cpu == 'arm' or host_cpu == 'aarch64' > > - prog = ''' > + if cc.get_id() == 'msvc' > + cdata.set('USE_ARMV8_CRC32C', false) > + cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) > + have_optimized_crc = true I dimly recall that windows might actually require the relevant extension on arm? > + else > + prog = ''' > #include <arm_acle.h> I'd just make this include #ifdef _MSV_VER (or whatever it is). > int main(void) > @@ -1960,18 +1966,19 @@ int main(void) > } > ''' > > - if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', > - args: test_c_args) > - # Use ARM CRC Extension unconditionally > - cdata.set('USE_ARMV8_CRC32C', 1) > - have_optimized_crc = true > - elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd with -march=armv8-a+crc', > - args: test_c_args + ['-march=armv8-a+crc']) > - # Use ARM CRC Extension, with runtime check > - cflags_crc += '-march=armv8-a+crc' > - cdata.set('USE_ARMV8_CRC32C', false) > - cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) > - have_optimized_crc = true > + if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', > + args: test_c_args) Seems like it'd be easier to read if you don't re-indent this, but just have the cc.get_id() == 'msvc' part of this if/else-if. Greetings, Andres Freund
On 17/01/2023 22:51, Andres Freund wrote: > Hi, > > On 2022-12-16 10:52:23 +0000, Niyas Sait wrote: >> Subject: [PATCH v7] Enable postgres native build for windows-arm64 platform > >> elif host_cpu == 'arm' or host_cpu == 'aarch64' >> >> - prog = ''' >> + if cc.get_id() == 'msvc' >> + cdata.set('USE_ARMV8_CRC32C', false) >> + cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) >> + have_optimized_crc = true > > I dimly recall that windows might actually require the relevant extension on > arm? Do you mean we don't need the runtime checks for CRC ? CRC is an optional extension for ARMv8 base architecture. I am not sure if windows make it mandatory to have this implementation. > >> + else >> + prog = ''' >> #include <arm_acle.h> > > I'd just make this include #ifdef _MSV_VER (or whatever it is). The code snippet is not used for MSVC part. I am not sure why we need to add the #ifdef _MSC_VER. >> int main(void) >> @@ -1960,18 +1966,19 @@ int main(void) >> } >> ''' >> >> - if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', >> - args: test_c_args) >> - # Use ARM CRC Extension unconditionally >> - cdata.set('USE_ARMV8_CRC32C', 1) >> - have_optimized_crc = true >> - elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd with -march=armv8-a+crc', >> - args: test_c_args + ['-march=armv8-a+crc']) >> - # Use ARM CRC Extension, with runtime check >> - cflags_crc += '-march=armv8-a+crc' >> - cdata.set('USE_ARMV8_CRC32C', false) >> - cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) >> - have_optimized_crc = true >> + if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', >> + args: test_c_args) > > Seems like it'd be easier to read if you don't re-indent this, but just have > the cc.get_id() == 'msvc' part of this if/else-if. > Yes that looks better. will do it in next patch. Thanks Andres for the review. -- Niyas
On 19/01/2023 10:09, Niyas Sait wrote: > > > On 17/01/2023 22:51, Andres Freund wrote: >>> int main(void) >>> @@ -1960,18 +1966,19 @@ int main(void) >>> } >>> ''' >>> - if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and >>> __crc32cd without -march=armv8-a+crc', >>> - args: test_c_args) >>> - # Use ARM CRC Extension unconditionally >>> - cdata.set('USE_ARMV8_CRC32C', 1) >>> - have_optimized_crc = true >>> - elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and >>> __crc32cd with -march=armv8-a+crc', >>> - args: test_c_args + ['-march=armv8-a+crc']) >>> - # Use ARM CRC Extension, with runtime check >>> - cflags_crc += '-march=armv8-a+crc' >>> - cdata.set('USE_ARMV8_CRC32C', false) >>> - cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) >>> - have_optimized_crc = true >>> + if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and >>> __crc32cd without -march=armv8-a+crc', >>> + args: test_c_args) >> >> Seems like it'd be easier to read if you don't re-indent this, but >> just have >> the cc.get_id() == 'msvc' part of this if/else-if. >> > I've attached a new version (v8) to fix the above indentation issue. Could someone please help with the review ? -- Niyas
Attachment
On Tue, May 02, 2023 at 08:51:09AM +0100, Niyas Sait wrote: > I've attached a new version (v8) to fix the above indentation issue. > > Could someone please help with the review ? Seeing how simple this has become, I would be really tempted to get that applied, especially if there's a buildfarm machine able to follow up.. On the one hand, we are in a stability period for v16, so it may not be the best moment ever. On the other hand, waiting one more year sounds like a waste for a patch that just adds new code paths. RMT members, any thoughts? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > Seeing how simple this has become, I would be really tempted to get > that applied, especially if there's a buildfarm machine able to follow > up.. On the one hand, we are in a stability period for v16, so it may > not be the best moment ever. On the other hand, waiting one more year > sounds like a waste for a patch that just adds new code paths. Indeed, there's not much in this patch ... but it's impossible to see how "run on an entirely new platform" isn't a new feature. Moreover, it's a platform that very few of us will have any ability to support or debug for. I can't see how it's a good idea to shove this in post-feature-freeze. Let's plan to push it in when v17 opens. regards, tom lane
On Sat, May 06, 2023 at 12:35:40AM -0400, Tom Lane wrote: > Indeed, there's not much in this patch ... but it's impossible to see > how "run on an entirely new platform" isn't a new feature. Moreover, > it's a platform that very few of us will have any ability to support > or debug for. I can't see how it's a good idea to shove this in > post-feature-freeze. Let's plan to push it in when v17 opens. Fine by me. I have added an entry in the CF app to remember about this stuff as there was nothing present yet: https://commitfest.postgresql.org/43/4309/ -- Michael
Attachment
Hi, On 2023-05-06 00:35:40 -0400, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: > > Seeing how simple this has become, I would be really tempted to get > > that applied, especially if there's a buildfarm machine able to follow > > up.. On the one hand, we are in a stability period for v16, so it may > > not be the best moment ever. On the other hand, waiting one more year > > sounds like a waste for a patch that just adds new code paths. > > Indeed, there's not much in this patch ... but it's impossible to see > how "run on an entirely new platform" isn't a new feature. Moreover, > it's a platform that very few of us will have any ability to support > or debug for. I can't see how it's a good idea to shove this in > post-feature-freeze. Let's plan to push it in when v17 opens. I don't really have feelings either way - but haven't we gone further and even backpatched things like spinlock support for new arches in the past? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I don't really have feelings either way - but haven't we gone further and even > backpatched things like spinlock support for new arches in the past? Mmmm ... don't really think those cases were comparable. We weren't adding support for a whole new OS. Now, you might argue that Windows on arm64 will be just like Windows on x86_64, but I think the jury is still out on that. Microsoft was so Intel-only for so many years that I bet they've had to change quite a bit to make it go on ARM. Also, the cases of back-patched spinlock support that I can find in the last few years were pretty low-risk. I'll grant that c32fcac56 was a bit blue-sky because hardly anybody had RISC-V at that point, but by the same token anybody relying on it at the time would be dealing with a beta-grade OS too. On the other hand, 1c72d82c2 was immediately testable in the buildfarm, and f3bd00c01 was importing code already verified by our OpenBSD packagers. As I said upthread, this seems like something to put in at the beginning of a dev cycle, not post-feature-freeze. regards, tom lane
Andres Freund <andres@anarazel.de> writes:I don't really have feelings either way - but haven't we gone further and even backpatched things like spinlock support for new arches in the past?Mmmm ... don't really think those cases were comparable. We weren't adding support for a whole new OS. Now, you might argue that Windows on arm64 will be just like Windows on x86_64, but I think the jury is still out on that. Microsoft was so Intel-only for so many years that I bet they've had to change quite a bit to make it go on ARM. Also, the cases of back-patched spinlock support that I can find in the last few years were pretty low-risk. I'll grant that c32fcac56 was a bit blue-sky because hardly anybody had RISC-V at that point, but by the same token anybody relying on it at the time would be dealing with a beta-grade OS too. On the other hand, 1c72d82c2 was immediately testable in the buildfarm, and f3bd00c01 was importing code already verified by our OpenBSD packagers. As I said upthread, this seems like something to put in at the beginning of a dev cycle, not post-feature-freeze.
We will definitely want buildfarm support. I don't have such a machine and am not likely to have one any time soon. I do run drongo and fairywren on an EC2 instance, but AWS doesn't seem to have support for Windows on ARM. Maybe it's available on Azure (maybe then some of our colleagues working at Microsoft could arrange such a beast for me to set up potential buildfarm animal, or else run one themselves.)
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On Wed, May 10, 2023 at 09:24:39AM -0400, Andrew Dunstan wrote: > We will definitely want buildfarm support. I don't have such a machine and > am not likely to have one any time soon. I do run drongo and fairywren on an > EC2 instance, but AWS doesn't seem to have support for Windows on ARM. Maybe > it's available on Azure (maybe then some of our colleagues working at > Microsoft could arrange such a beast for me to set up potential buildfarm > animal, or else run one themselves.) Unfortunately AWS does not offer ARM on Windows with an EC2, that's something I have also looked at setting up :/ Anyway, Niyat has mentioned me that he was working on that, but it seems that it was off-list. Niyat, could you confirm this part? -- Michael
Attachment
On Thu, May 11, 2023 at 11:34 AM Michael Paquier <michael@paquier.xyz> wrote: > On Wed, May 10, 2023 at 09:24:39AM -0400, Andrew Dunstan wrote: > > We will definitely want buildfarm support. I don't have such a machine and > > am not likely to have one any time soon. I do run drongo and fairywren on an > > EC2 instance, but AWS doesn't seem to have support for Windows on ARM. Maybe > > it's available on Azure (maybe then some of our colleagues working at > > Microsoft could arrange such a beast for me to set up potential buildfarm > > animal, or else run one themselves.) > > Unfortunately AWS does not offer ARM on Windows with an EC2, that's > something I have also looked at setting up :/ Azure does have an image "Microsoft Windows 11 Preview arm64" to run on Ampere CPUs, which says it's a pre-release version intended for validation, which sounds approximately like what we want. I will try to find out more.
On Thu, May 11, 2023 at 12:17:25PM +1200, Thomas Munro wrote: > Azure does have an image "Microsoft Windows 11 Preview arm64" to run > on Ampere CPUs, which says it's a pre-release version intended for > validation, which sounds approximately like what we want. I will try > to find out more. Interesting. Thanks for mentioning it. -- Michael
Attachment
On 11/05/2023 00:34, Michael Paquier wrote: > On Wed, May 10, 2023 at 09:24:39AM -0400, Andrew Dunstan wrote: >> We will definitely want buildfarm support. I don't have such a machine and >> am not likely to have one any time soon. I do run drongo and fairywren on an >> EC2 instance, but AWS doesn't seem to have support for Windows on ARM. Maybe >> it's available on Azure (maybe then some of our colleagues working at >> Microsoft could arrange such a beast for me to set up potential buildfarm >> animal, or else run one themselves.) > Unfortunately AWS does not offer ARM on Windows with an EC2, that's > something I have also looked at setting up :/ > > Anyway, Niyat has mentioned me that he was working on that, but it > seems that it was off-list. Niyat, could you confirm this part? > -- > Michael If we have Azure VM credits, we could use that to deploy buildfarm machines. Otherwise, I can try to allocate a machine from Linaro Windows Lab for the buildbot. -- Niyas
Hello, Just wanted to give a heads up that I am moving to a new role outside Linaro and as a result wouldn't be able to continue contributing. Anthony Roberts from Linaro is going to support the enablement work. -- Niyas
On Thu, May 11, 2023 at 01:19:54PM +0900, Michael Paquier wrote: > On Thu, May 11, 2023 at 12:17:25PM +1200, Thomas Munro wrote: >> Azure does have an image "Microsoft Windows 11 Preview arm64" to run >> on Ampere CPUs, which says it's a pre-release version intended for >> validation, which sounds approximately like what we want. I will try >> to find out more. > > Interesting. Thanks for mentioning it. Now that v17 is open, I was looking at v8 posted at [1] and I don't have much more to add about it. The problem about the lack of buildfarm machine is still annoying, but I don't see a reason not to move forward and let people play with this stuff on HEAD. At least that would be progress. Any thoughts? Thomas, what's the state of ARM support for Windows on Azure? Is that still in preview? [1]: https://www.postgresql.org/message-id/dbee741f-b9b7-a0d5-1b1b-f9b532bb6f56%40linaro.org -- Michael
Attachment
Hi All, Just following up on this, has there been any movement? I did see another message go into the thread here with no reply: https://www.postgresql.org/message-id/ZKPLjLjIjwN2lxkg%40paquier.xyz Thanks, Anthony On 14/06/2023 13:25, Niyas Sait wrote: > Hello, > > Just wanted to give a heads up that I am moving to a new role outside > Linaro and as a result wouldn't be able to continue contributing. > > Anthony Roberts from Linaro is going to support the enablement work. >
On Thu, Aug 17, 2023 at 09:41:44AM +0100, Anthony Roberts wrote: > Just following up on this, has there been any movement? > > I did see another message go into the thread here with no reply: > https://www.postgresql.org/message-id/ZKPLjLjIjwN2lxkg%40paquier.xyz I don't have an environment to test the patch, but I don't object to it per se. However, I don't really want to move forward completely blindly as well in the long-term. As mentioned to Niyas, could it be possible to provide to the community a buildfarm machine that would be able to test this environment? I am not sure what's your status on that. Perhaps this is already set up and you are just waiting for the patch to be merged? -- Michael
Attachment
Hi, Which of these are you looking for us to provide: * A machine for you to directly access (via a VPN) * A machine we just run a worker script on that automatically picks up the builds as required * Us to do downstream CI All are possible, but preferably not option 1, as it would mean straight up pulling out a machine from our build farm, and it has to go through all sorts of approvals internally. If it's the only way forward I can kick it up the chain though. Option 2 and 3 are ones we do for various other projects (ie. 2 - CMake, 3 - OpenSSL) Thanks, Anthony On 17/08/2023 23:28, Michael Paquier wrote: > On Thu, Aug 17, 2023 at 09:41:44AM +0100, Anthony Roberts wrote: >> Just following up on this, has there been any movement? >> >> I did see another message go into the thread here with no reply: >> https://www.postgresql.org/message-id/ZKPLjLjIjwN2lxkg%40paquier.xyz > I don't have an environment to test the patch, but I don't object to > it per se. However, I don't really want to move forward completely > blindly as well in the long-term. > > As mentioned to Niyas, could it be possible to provide to the > community a buildfarm machine that would be able to test this > environment? I am not sure what's your status on that. Perhaps this > is already set up and you are just waiting for the patch to be merged? > -- > Michael
On Fri, Aug 25, 2023 at 10:40:39AM +0100, Anthony Roberts wrote: > Which of these are you looking for us to provide: > > * A machine for you to directly access (via a VPN) > * A machine we just run a worker script on that automatically picks up the > builds as required > * Us to do downstream CI > > All are possible, but preferably not option 1, as it would mean straight up > pulling out a machine from our build farm, and it has to go through all > sorts of approvals internally. If it's the only way forward I can kick it up > the chain though. > > Option 2 and 3 are ones we do for various other projects (ie. 2 - CMake, 3 - > OpenSSL) The community has its own CI facility. Here is a link of how to set up a machine: https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto And all the results are published by the maintainers of the machines on a periodic basis where the buildfarm client code is set: https://buildfarm.postgresql.org/cgi-bin/show_status.pl The final result would be for you to maintain the machine so as we are able to see if anything breaks with it. Adding extra dependencies to the build like OpenSSL would be nice, but based on the contents of the patch to add this port that does not seem mandatory to me, either. Once the machine is ready, you will need to request a buildfarm machine name and a key to be able to begin publishing the reports of the builds to the community buildfarm. Once the machine is ready to go, just let me know and I'd be OK to merge the patch (I still want to do a final review of it in case I've missed something, but I can move on with that before the buildfarm machine is set up). -- Michael
Attachment
Thanks for the link - that looks basically the same as what we do for CMake (option 2), so should be relatively easy. I will have a chat to relevant people about setting a machine up properly for it. Thanks, Anthony On 25/08/2023 11:17, Michael Paquier wrote: > On Fri, Aug 25, 2023 at 10:40:39AM +0100, Anthony Roberts wrote: >> Which of these are you looking for us to provide: >> >> * A machine for you to directly access (via a VPN) >> * A machine we just run a worker script on that automatically picks up the >> builds as required >> * Us to do downstream CI >> >> All are possible, but preferably not option 1, as it would mean straight up >> pulling out a machine from our build farm, and it has to go through all >> sorts of approvals internally. If it's the only way forward I can kick it up >> the chain though. >> >> Option 2 and 3 are ones we do for various other projects (ie. 2 - CMake, 3 - >> OpenSSL) > The community has its own CI facility. Here is a link of how to set > up a machine: > https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto > > And all the results are published by the maintainers of the machines > on a periodic basis where the buildfarm client code is set: > https://buildfarm.postgresql.org/cgi-bin/show_status.pl > > The final result would be for you to maintain the machine so as we are > able to see if anything breaks with it. Adding extra dependencies to > the build like OpenSSL would be nice, but based on the contents of the > patch to add this port that does not seem mandatory to me, either. > > Once the machine is ready, you will need to request a buildfarm > machine name and a key to be able to begin publishing the reports of > the builds to the community buildfarm. Once the machine is ready to > go, just let me know and I'd be OK to merge the patch (I still want to > do a final review of it in case I've missed something, but I can move > on with that before the buildfarm machine is set up). > -- > Michael
> On 25 Aug 2023, at 11:40, Anthony Roberts <anthony.roberts@linaro.org> wrote: > * A machine for you to directly access (via a VPN) That could be very helpful to provide to developers who are trying to debug very wicked bugs which requires access for analysis. Such things are on a case by case basis of course at your discretion, but it would most likely be very appreciated when/if the need exists. > * A machine we just run a worker script on that automatically picks up the builds as required This sounds like what we have for the buildfarm, where machines independently builds the branches from the Git repo by using the buildfarm client as worker. > * Us to do downstream CI This sounds like the CFBot, where we (currently) use Cirrus with our own workers for building and testing the patches which are being worked on. http://cfbot.cputube.org/ Both the two latter options are very helpful for the community, and serve two distincly different purposes. Option 2 is required for a platform to be considered supported whereas option 3 assists developers in being proactive rather than just reactive. -- Daniel Gustafsson
Thanks for the link - that looks basically the same as what we do for
CMake (option 2), so should be relatively easy.
I will have a chat to relevant people about setting a machine up
properly for it.
Thanks,
Anthony
On 25/08/2023 11:17, Michael Paquier wrote:
> On Fri, Aug 25, 2023 at 10:40:39AM +0100, Anthony Roberts wrote:
>> Which of these are you looking for us to provide:
>>
>> * A machine for you to directly access (via a VPN)
>> * A machine we just run a worker script on that automatically picks up the
>> builds as required
>> * Us to do downstream CI
>>
>> All are possible, but preferably not option 1, as it would mean straight up
>> pulling out a machine from our build farm, and it has to go through all
>> sorts of approvals internally. If it's the only way forward I can kick it up
>> the chain though.
>>
>> Option 2 and 3 are ones we do for various other projects (ie. 2 - CMake, 3 -
>> OpenSSL)
> The community has its own CI facility. Here is a link of how to set
> up a machine:
> https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto
>
> And all the results are published by the maintainers of the machines
> on a periodic basis where the buildfarm client code is set:
> https://buildfarm.postgresql.org/cgi-bin/show_status.pl
>
> The final result would be for you to maintain the machine so as we are
> able to see if anything breaks with it. Adding extra dependencies to
> the build like OpenSSL would be nice, but based on the contents of the
> patch to add this port that does not seem mandatory to me, either.
>
> Once the machine is ready, you will need to request a buildfarm
> machine name and a key to be able to begin publishing the reports of
> the builds to the community buildfarm. Once the machine is ready to
> go, just let me know and I'd be OK to merge the patch (I still want to
> do a final review of it in case I've missed something, but I can move
> on with that before the buildfarm machine is set up).
> --
> Michael
On Wed, Aug 30, 2023 at 04:12:16PM +0100, Anthony Roberts wrote: > Just a follow-up: having spoken to the relevant people, sadly, right now, > we do not have the capacity to be pulling machines out of our day-to-day CI > tasks to dedicate to specific projects. Okay, thanks for letting us know. > I can, however, offer a few alternatives: > > 1. If you have capacity on your linux x64 machines, it is possible to > emulate WoA, and run tests that way: > https://www.linaro.org/blog/emulate-windows-on-arm/ > This is something that works well for projects that we have contributed it > to (ie, sse2neon uses it via github actions) - if you run into any issues > with this, we are able to try and fix them on our side, as likely more than > postgres would benefit. > > 2. If there is any scope for it on your side at all, a Samsung Galaxy Book > Go 14" can be had from ebay for ~£140. > Understandably, this might not be an option for you. > > If it helps, if this is merged, we do have the possibility of running our > own downstream CI builds nightly for the tip of your main branch here: > https://gitlab.com/Linaro/windowsonarm/nightly > This is something we do for a few projects that do not have full upstream > CI. Honestly, I am not sure how to proceed here. Having something in the buildfarm is necessary IMO, because this is the central place that community members look at when it comes to monitoring the stability of a patch committed. Any opinions from the others? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Wed, Aug 30, 2023 at 04:12:16PM +0100, Anthony Roberts wrote: >> Just a follow-up: having spoken to the relevant people, sadly, right now, >> we do not have the capacity to be pulling machines out of our day-to-day CI >> tasks to dedicate to specific projects. > Okay, thanks for letting us know. Too bad. > Honestly, I am not sure how to proceed here. Having something in the > buildfarm is necessary IMO, because this is the central place that > community members look at when it comes to monitoring the stability of > a patch committed. > Any opinions from the others? I agree. I'm really uncomfortable with claiming support for Windows-on-ARM if we don't have a buildfarm member testing it. For other platforms that have a track record of multiple hardware support, it might not be a stretch ... but Windows was so resolutely Intel-only for so long that "it works on ARM" is a proposition that I won't trust without hard evidence. There are too many bits of that system that might not have gotten the word yet, or at least not gotten sufficient testing. My vote for this is we don't commit without a buildfarm member. (As a comparison point, I'd still be unsure about our support for macOS-on-ARM if we didn't have buildfarm support for that. That exists, and is being paid for out of my own pocket. I do not have much sympathy for a corporation that claims it can't afford to support one measly buildfarm animal.) regards, tom lane
On Thu, Aug 31, 2023 at 4:22 PM Michael Paquier <michael@paquier.xyz> wrote: > Honestly, I am not sure how to proceed here. Having something in the > buildfarm is necessary IMO, because this is the central place that > community members look at when it comes to monitoring the stability of > a patch committed. It's tricky, because usually the org that wants the thing to work supplies the thing, and I personally think that provides a meaningful "demand" signal too. For example, we have external Solaris representation, but no users of AIX represented at this point, and I consider that to be quite interesting information. Plenty of people want PG to work well on their OS or architecture. But, supposing Azure credits were suddenly available (just between you, me and the mailing list, I heard that this should be imminent as some kind of grant to PG US, as a result of the Cirrus thing), do we have a Windows-savvy volunteer to feed and water an animal? I feel like this case is inevitable anyway because people in our own community are probably going to be using Windows ARM laptops sooner or later...
Thomas Munro <thomas.munro@gmail.com> writes: > But, supposing Azure credits were suddenly available (just between > you, me and the mailing list, I heard that this should be imminent as > some kind of grant to PG US, as a result of the Cirrus thing), do we > have a Windows-savvy volunteer to feed and water an animal? I feel > like this case is inevitable anyway because people in our own > community are probably going to be using Windows ARM laptops sooner or > later... That's a good point. The hardware resources to support a buildfarm animal are really pretty trivial these days. What is important is an animal owner who is willing to help chase down problems when they arise. Maybe we just need to wait for some community members to acquire those laptops. regards, tom lane
On Thu, Aug 31, 2023 at 01:06:53AM -0400, Tom Lane wrote: > That's a good point. The hardware resources to support a buildfarm > animal are really pretty trivial these days. What is important is > an animal owner who is willing to help chase down problems when they > arise. Unfortunately I don't see myself running that at home. It is too hot and humid in Summer so the machine would not last. Something in the cloud would be OK for me, but AWS has no option for a Windows host with ARM yet. -- Michael
Attachment
On 02.05.23 09:51, Niyas Sait wrote: > diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml > index cbc70a039c..2ecd5fcf38 100644 > --- a/doc/src/sgml/install-windows.sgml > +++ b/doc/src/sgml/install-windows.sgml > @@ -352,7 +352,8 @@ $ENV{MSBFLAGS}="/m"; > <title>Special Considerations for 64-Bit Windows</title> > > <para> > - PostgreSQL will only build for the x64 architecture on 64-bit Windows. > + PostgreSQL will only build for the x64 and ARM64 architectures on 64-bit > + Windows. > </para> Are there other possible architectures besides x64 and ARM64? Itanium? Or should we just delete this sentence? > diff --git a/meson.build b/meson.build > index 096044628c..6cca212fae 100644 > --- a/meson.build > +++ b/meson.build > @@ -2045,8 +2045,11 @@ int main(void) > elif host_cpu == 'arm' or host_cpu == 'aarch64' > > prog = ''' > +#ifdef _MSC_VER > +#include <intrin.h> > +#else > #include <arm_acle.h> > - > +#endif Maybe keep the whitespace here. > - if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', > + if cc.get_id() == 'msvc' > + cdata.set('USE_ARMV8_CRC32C', false) > + cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) > + have_optimized_crc = true > + elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', Add a comment here. > diff --git a/src/port/pg_crc32c_armv8.c b/src/port/pg_crc32c_armv8.c > index d8fae510cf..3d7eb748ff 100644 > --- a/src/port/pg_crc32c_armv8.c > +++ b/src/port/pg_crc32c_armv8.c > @@ -14,7 +14,9 @@ > */ > #include "c.h" > > +#ifndef _MSC_VER > #include <arm_acle.h> > +#endif Also add a comment here. > > #include "port/pg_crc32c.h" > > diff --git a/src/tools/msvc/gendef.pl b/src/tools/msvc/gendef.pl > index e7cbefcbc3..934dc17b96 100644 > --- a/src/tools/msvc/gendef.pl > +++ b/src/tools/msvc/gendef.pl > @@ -120,9 +120,9 @@ sub writedef > { > my $isdata = $def->{$f} eq 'data'; > > - # Strip the leading underscore for win32, but not x64 > + # Strip the leading underscore for win32, but not x64 and aarch64 Is x64 the opposite of win32? Does this make sense? Should we reverse the logic here and single out the one variant where the stripping is necessary?
On 31.08.23 06:44, Tom Lane wrote: > I agree. I'm really uncomfortable with claiming support for > Windows-on-ARM if we don't have a buildfarm member testing it. > For other platforms that have a track record of multiple > hardware support, it might not be a stretch ... but Windows was > so resolutely Intel-only for so long that "it works on ARM" is > a proposition that I won't trust without hard evidence. There > are too many bits of that system that might not have gotten the > word yet, or at least not gotten sufficient testing. > > My vote for this is we don't commit without a buildfarm member. I think we can have a multi-tiered approach, where we can commit support but consider it experimental until we have buildfarm coverage.
Hi, On 2023-08-31 14:33:56 +0900, Michael Paquier wrote: > On Thu, Aug 31, 2023 at 01:06:53AM -0400, Tom Lane wrote: > > That's a good point. The hardware resources to support a buildfarm > > animal are really pretty trivial these days. What is important is > > an animal owner who is willing to help chase down problems when they > > arise. > > Unfortunately I don't see myself running that at home. It is too hot > and humid in Summer so the machine would not last. Something in the > cloud would be OK for me, but AWS has no option for a Windows host > with ARM yet. FWIW, we could run ARM windows as part of CI. It'd be a bit of scripting work, as google doesn't yet have readymade VM images with windows for ARM. Right now the CI VM images for windows are based on top of the google images. Scripting generation of ARM images would be some one-off work, but after that... Greetings, Andres Freund
> On 13 Sep 2023, at 21:12, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 31.08.23 06:44, Tom Lane wrote: >> I agree. I'm really uncomfortable with claiming support for >> Windows-on-ARM if we don't have a buildfarm member testing it. >> For other platforms that have a track record of multiple >> hardware support, it might not be a stretch ... but Windows was >> so resolutely Intel-only for so long that "it works on ARM" is >> a proposition that I won't trust without hard evidence. There >> are too many bits of that system that might not have gotten the >> word yet, or at least not gotten sufficient testing. >> My vote for this is we don't commit without a buildfarm member. > > I think we can have a multi-tiered approach, where we can commit support but consider it experimental until we have buildfarmcoverage. If it's experimental it should probably be behind an opt-in flag in autoconf/meson, or be reverted by the time REL_17_STABLE branches unless coverage has materialized by then. -- Daniel Gustafsson
On 14.09.23 11:39, Daniel Gustafsson wrote: >> On 13 Sep 2023, at 21:12, Peter Eisentraut <peter@eisentraut.org> wrote: >> >> On 31.08.23 06:44, Tom Lane wrote: >>> I agree. I'm really uncomfortable with claiming support for >>> Windows-on-ARM if we don't have a buildfarm member testing it. >>> For other platforms that have a track record of multiple >>> hardware support, it might not be a stretch ... but Windows was >>> so resolutely Intel-only for so long that "it works on ARM" is >>> a proposition that I won't trust without hard evidence. There >>> are too many bits of that system that might not have gotten the >>> word yet, or at least not gotten sufficient testing. >>> My vote for this is we don't commit without a buildfarm member. >> >> I think we can have a multi-tiered approach, where we can commit support but consider it experimental until we have buildfarmcoverage. > > If it's experimental it should probably be behind an opt-in flag in > autoconf/meson, or be reverted by the time REL_17_STABLE branches unless > coverage has materialized by then. The author's email is bouncing now, due to job change, so it's unlikely we will see any progress on this anymore. I am setting it to returned with feedback.
Hi, This was covered earlier in the thread - I have taken this on in Niyas' stead. Was there an explicit request for something there? I was under the impression that this was all just suggestion/theory at the moment. Thanks, Anthony On 19/09/2023 09:33, Peter Eisentraut wrote: > On 14.09.23 11:39, Daniel Gustafsson wrote: >>> On 13 Sep 2023, at 21:12, Peter Eisentraut <peter@eisentraut.org> >>> wrote: >>> >>> On 31.08.23 06:44, Tom Lane wrote: >>>> I agree. I'm really uncomfortable with claiming support for >>>> Windows-on-ARM if we don't have a buildfarm member testing it. >>>> For other platforms that have a track record of multiple >>>> hardware support, it might not be a stretch ... but Windows was >>>> so resolutely Intel-only for so long that "it works on ARM" is >>>> a proposition that I won't trust without hard evidence. There >>>> are too many bits of that system that might not have gotten the >>>> word yet, or at least not gotten sufficient testing. >>>> My vote for this is we don't commit without a buildfarm member. >>> >>> I think we can have a multi-tiered approach, where we can commit >>> support but consider it experimental until we have buildfarm coverage. >> >> If it's experimental it should probably be behind an opt-in flag in >> autoconf/meson, or be reverted by the time REL_17_STABLE branches unless >> coverage has materialized by then. > > The author's email is bouncing now, due to job change, so it's > unlikely we will see any progress on this anymore. I am setting it to > returned with feedback. >
On Tue, Sep 19, 2023 at 01:35:17PM +0100, Anthony Roberts wrote: > Was there an explicit request for something there? I was under the > impression that this was all just suggestion/theory at the moment. Yes. The addition of a buildfarm member registered into the community facilities, so as it is possible to provide back to developers dynamic feedback to the new configuration setup you would like to see supported in PostgreSQL. This has been mentioned a few times on this thread, around these places: https://www.postgresql.org/message-id/20220322103011.i6z2tuj4hle23wgx@jrouhaud https://www.postgresql.org/message-id/DBD80715-E56B-40EB-84AA-ACE70198E7AD@yesql.se https://www.postgresql.org/message-id/6D1E2719-FA49-42A5-A6FF-0B0072BF631D@yesql.se -- Michael
Attachment
On Tue, Sep 19, 2023 at 01:35:17PM +0100, Anthony Roberts wrote:
> Was there an explicit request for something there? I was under the
> impression that this was all just suggestion/theory at the moment.
Yes. The addition of a buildfarm member registered into the community
facilities, so as it is possible to provide back to developers dynamic
feedback to the new configuration setup you would like to see
supported in PostgreSQL. This has been mentioned a few times on this
thread, around these places:
https://www.postgresql.org/message-id/20220322103011.i6z2tuj4hle23wgx@jrouhaud
https://www.postgresql.org/message-id/DBD80715-E56B-40EB-84AA-ACE70198E7AD@yesql.se
https://www.postgresql.org/message-id/6D1E2719-FA49-42A5-A6FF-0B0072BF631D@yesql.se
--
Michael
Attachment
On Tue, 19 Sept 2023 at 23:48, Michael Paquier <michael@paquier.xyz> wrote:On Tue, Sep 19, 2023 at 01:35:17PM +0100, Anthony Roberts wrote:
> Was there an explicit request for something there? I was under the
> impression that this was all just suggestion/theory at the moment.
Yes. The addition of a buildfarm member registered into the community
facilities, so as it is possible to provide back to developers dynamic
feedback to the new configuration setup you would like to see
supported in PostgreSQL. This has been mentioned a few times on this
thread, around these places:
https://www.postgresql.org/message-id/20220322103011.i6z2tuj4hle23wgx@jrouhaud
https://www.postgresql.org/message-id/DBD80715-E56B-40EB-84AA-ACE70198E7AD@yesql.se
https://www.postgresql.org/message-id/6D1E2719-FA49-42A5-A6FF-0B0072BF631D@yesql.se
--
MichaelThe attached patch works with v17. I will work on getting a buildfarm animal up shortly
On Tue, Jan 23, 2024 at 04:13:05PM -0500, Dave Cramer wrote: > On Tue, 23 Jan 2024 at 08:46, Dave Cramer <davecramer@postgres.rocks> wrote: >> The attached patch works with v17. I will work on getting a buildfarm >> animal up shortly Thanks for doing that! > I can build using meson manually, but for some reason building with the > buildfarm fails. > > I'm not sure which files to attach to this to help. Andrew, what files do > you need ? Probably build.ninja and the contents of meson-logs/ would offer enough information? -- Michael
Attachment
On Tue, Jan 23, 2024 at 04:13:05PM -0500, Dave Cramer wrote:
> On Tue, 23 Jan 2024 at 08:46, Dave Cramer <davecramer@postgres.rocks> wrote:
>> The attached patch works with v17. I will work on getting a buildfarm
>> animal up shortly
Thanks for doing that!
> I can build using meson manually, but for some reason building with the
> buildfarm fails.
>
> I'm not sure which files to attach to this to help. Andrew, what files do
> you need ?
Probably build.ninja and the contents of meson-logs/ would offer
enough information?
--
Michael
On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote: > I managed to get it to build the vcvarsall arch needs to be x64. I need to > add some options, but the patch above needs to be applied to build it. Nice. If I may ask, what kind of host and/or configuration have you used to reach a state where the code can be compiled and run tests with meson? If you have found specific steps, it may be a good thing to document that on the wiki, say around [1]. Perhaps you have not included TAP? It may be fine in terms of runtime checks and coverage. [1]: https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto#Running_on_Windows -- Michael
Attachment
On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote:
> I managed to get it to build the vcvarsall arch needs to be x64. I need to
> add some options, but the patch above needs to be applied to build it.
Nice. If I may ask, what kind of host and/or configuration have you
used to reach a state where the code can be compiled and run tests
with meson? If you have found specific steps, it may be a good thing
to document that on the wiki, say around [1].
Perhaps you have not included TAP? It may be fine in terms of runtime
checks and coverage.
On Wed, 24 Jan 2024 at 19:03, Michael Paquier <michael@paquier.xyz> wrote:On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote:
> I managed to get it to build the vcvarsall arch needs to be x64. I need to
> add some options, but the patch above needs to be applied to build it.
Nice. If I may ask, what kind of host and/or configuration have you
used to reach a state where the code can be compiled and run tests
with meson? If you have found specific steps, it may be a good thing
to document that on the wiki, say around [1].I am running a mac-mini M2 with parallels running windows pro 11The only thing required is this patch. Running in a native developer promptmeson setup build --prefix=c:\postgrescd buildninjaninja installninja testall run without errorsI also have buildfarm running and will run that todayDave
Perhaps you have not included TAP? It may be fine in terms of runtime
checks and coverage.Does TAP have to be explicitly added to the meson build ?Dave
On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote:I managed to get it to build the vcvarsall arch needs to be x64. I need to add some options, but the patch above needs to be applied to build it.Nice. If I may ask, what kind of host and/or configuration have you used to reach a state where the code can be compiled and run tests with meson? If you have found specific steps, it may be a good thing to document that on the wiki, say around [1]. Perhaps you have not included TAP? It may be fine in terms of runtime checks and coverage. [1]: https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto#Running_on_Windows
I now have an ARM64 machine, so I set up a W11 ARM64 VM. I think we really want to build with x64_arm64, i.e. to generate native arm64 binaries. Setting just x64 will not do that, AIUI.
I tried that with the buidfarm, setting that in the config file's call to PGBuild::VSenv::getenv().
That upset msvc_gendef.pl, so I added this there to keep it happy:
$arch = 'x86_64' if $arch eq 'aarch64';
After that things went ok until I got this:
[1453/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
Creating library src\backend\postgres.exe.lib
storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol _mm_pause referenced in function perform_spin_delay
src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals
I haven't made further progress, but I will return to it in the next day or so.
While this will be nice to have, I think it won't really matter until there is ARM64 support in released versions of Windows Server. AFAICT they still only sell versions for x86_64
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2024-01-24 We 19:02, Michael Paquier wrote:On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote:I managed to get it to build the vcvarsall arch needs to be x64. I need to add some options, but the patch above needs to be applied to build it.Nice. If I may ask, what kind of host and/or configuration have you used to reach a state where the code can be compiled and run tests with meson? If you have found specific steps, it may be a good thing to document that on the wiki, say around [1]. Perhaps you have not included TAP? It may be fine in terms of runtime checks and coverage. [1]: https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto#Running_on_Windows
I now have an ARM64 machine, so I set up a W11 ARM64 VM. I think we really want to build with x64_arm64, i.e. to generate native arm64 binaries. Setting just x64 will not do that, AIUI.
I tried that with the buidfarm, setting that in the config file's call to PGBuild::VSenv::getenv().
That upset msvc_gendef.pl, so I added this there to keep it happy:
$arch = 'x86_64' if $arch eq 'aarch64';
After that things went ok until I got this:
[1453/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
Creating library src\backend\postgres.exe.lib
storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol _mm_pause referenced in function perform_spin_delay
src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals
I haven't made further progress, but I will return to it in the next day or so.
While this will be nice to have, I think it won't really matter until there is ARM64 support in released versions of Windows Server. AFAICT they still only sell versions for x86_64
I tried running my buildfarm using my git repo and my branch, but get the following errorStatus Line: 492 bad branch parameterContent:bad branch parameter fix_armWeb txn failed with status: 1
You can't use your own branch with the buildfarm, you need to let it set up its own branches.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2024-01-25 Th 08:45, Dave Cramer wrote:I tried running my buildfarm using my git repo and my branch, but get the following errorStatus Line: 492 bad branch parameterContent:bad branch parameter fix_armWeb txn failed with status: 1
You can't use your own branch with the buildfarm, you need to let it set up its own branches.
On 2024-01-24 We 19:02, Michael Paquier wrote:On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote:I managed to get it to build the vcvarsall arch needs to be x64. I need to add some options, but the patch above needs to be applied to build it.Nice. If I may ask, what kind of host and/or configuration have you used to reach a state where the code can be compiled and run tests with meson? If you have found specific steps, it may be a good thing to document that on the wiki, say around [1]. Perhaps you have not included TAP? It may be fine in terms of runtime checks and coverage. [1]: https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto#Running_on_Windows
I now have an ARM64 machine, so I set up a W11 ARM64 VM. I think we really want to build with x64_arm64, i.e. to generate native arm64 binaries. Setting just x64 will not do that, AIUI.
I tried that with the buidfarm, setting that in the config file's call to PGBuild::VSenv::getenv().
That upset msvc_gendef.pl, so I added this there to keep it happy:
$arch = 'x86_64' if $arch eq 'aarch64';
After that things went ok until I got this:
[1453/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
Creating library src\backend\postgres.exe.lib
storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol _mm_pause referenced in function perform_spin_delay
src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals
I haven't made further progress, but I will return to it in the next day or so.
While this will be nice to have, I think it won't really matter until there is ARM64 support in released versions of Windows Server. AFAICT they still only sell versions for x86_64
On Thu, 25 Jan 2024 at 12:30, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-24 We 19:02, Michael Paquier wrote:On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote:I managed to get it to build the vcvarsall arch needs to be x64. I need to add some options, but the patch above needs to be applied to build it.Nice. If I may ask, what kind of host and/or configuration have you used to reach a state where the code can be compiled and run tests with meson? If you have found specific steps, it may be a good thing to document that on the wiki, say around [1]. Perhaps you have not included TAP? It may be fine in terms of runtime checks and coverage. [1]: https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto#Running_on_Windows
I now have an ARM64 machine, so I set up a W11 ARM64 VM. I think we really want to build with x64_arm64, i.e. to generate native arm64 binaries. Setting just x64 will not do that, AIUI.
I tried that with the buidfarm, setting that in the config file's call to PGBuild::VSenv::getenv().
That upset msvc_gendef.pl, so I added this there to keep it happy:
$arch = 'x86_64' if $arch eq 'aarch64';
After that things went ok until I got this:
[1453/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
Creating library src\backend\postgres.exe.lib
storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol _mm_pause referenced in function perform_spin_delay
src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals
I haven't made further progress, but I will return to it in the next day or so.
While this will be nice to have, I think it won't really matter until there is ARM64 support in released versions of Windows Server. AFAICT they still only sell versions for x86_64
I've tried it with my patch attached previously and x64_arm64 and it works fine. builds using the buildfarm as well.Is there a definitive way to figure out if the binaries are x64_arm64Dave
On Thu, 25 Jan 2024 at 14:31, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-25 Th 08:45, Dave Cramer wrote:I tried running my buildfarm using my git repo and my branch, but get the following errorStatus Line: 492 bad branch parameterContent:bad branch parameter fix_armWeb txn failed with status: 1
You can't use your own branch with the buildfarm, you need to let it set up its own branches.
So I guess I have to wait until this patch is merged in ?
Not quite.
There's a switch that lets you test your own code. If you say --from-source /path/to/repo it will run in whatever state the repo is in. It won't care what branch you are on, and it won't try to update the repo. It won't report the results to the server, but it will build and test everything just like in a regular run. (On the "eat your own dog food" principle I use this mode a lot.) If you use that mode you probably also want to specify --verbose as well.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi David,Unix "file" or "dumpbin /headers" in vcvarsall are your best bets.Thanks,Anthony
On Thu, 25 Jan 2024 at 16:04, Anthony Roberts <anthony.roberts@linaro.org> wrote:Hi David,Unix "file" or "dumpbin /headers" in vcvarsall are your best bets.Thanks,AnthonySo there is another way, select the file in Windows Explorer and right click, in the compatibility tab if the "Windows on ARM" is greyed out it is an arm binary.So far mine are not :(
Yeah, I think the default Developer Command Prompt for VS2022 is set up for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2024-01-25 Th 16:17, Dave Cramer wrote:On Thu, 25 Jan 2024 at 16:04, Anthony Roberts <anthony.roberts@linaro.org> wrote:Hi David,Unix "file" or "dumpbin /headers" in vcvarsall are your best bets.Thanks,AnthonySo there is another way, select the file in Windows Explorer and right click, in the compatibility tab if the "Windows on ARM" is greyed out it is an arm binary.So far mine are not :(
Yeah, I think the default Developer Command Prompt for VS2022 is set up for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".
On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote: > On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan <andrew@dunslane.net> wrote: >> On 2024-01-25 Th 16:17, Dave Cramer wrote: >> Yeah, I think the default Developer Command Prompt for VS2022 is set up >> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64". > > Yup, now I'm in the same state you are Wait a minute here. Based on [1], x64_arm64 means you can use a x64 host and you'll be able to produce ARM64 builds, still these will not be able to run on the host where they were built. How much of the patch posted upthread is required to produce such builds? Basically everything from it, I guess, so as build dependencies can be satisfied? [1]: https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170 -- Michael
Attachment
On 2024-01-25 Th 20:32, Michael Paquier wrote: > On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote: >> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan <andrew@dunslane.net> wrote: >>> On 2024-01-25 Th 16:17, Dave Cramer wrote: >>> Yeah, I think the default Developer Command Prompt for VS2022 is set up >>> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64". >> Yup, now I'm in the same state you are > Wait a minute here. Based on [1], x64_arm64 means you can use a x64 > host and you'll be able to produce ARM64 builds, still these will not > be able to run on the host where they were built. How much of the > patch posted upthread is required to produce such builds? Basically > everything from it, I guess, so as build dependencies can be > satisfied? > > [1]: https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170 If you look at the table here x86 and x64 are the only supported host architectures. But that's OK, the x64 binaries will run on arm64 (W11 ARM64 has x64 emulation builtin). If that didn't work Dave and I would not have got as far as we have. But you want the x64_arm64 argument to vcvarsall so you will get ARM64 output. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2024-01-25 Th 20:32, Michael Paquier wrote:
> On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote:
>> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>>> Yeah, I think the default Developer Command Prompt for VS2022 is set up
>>> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".
>> Yup, now I'm in the same state you are
> Wait a minute here. Based on [1], x64_arm64 means you can use a x64
> host and you'll be able to produce ARM64 builds, still these will not
> be able to run on the host where they were built. How much of the
> patch posted upthread is required to produce such builds? Basically
> everything from it, I guess, so as build dependencies can be
> satisfied?
>
> [1]: https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170
If you look at the table here x86 and x64 are the only supported host
architectures. But that's OK, the x64 binaries will run on arm64 (W11
ARM64 has x64 emulation builtin). If that didn't work Dave and I would
not have got as far as we have. But you want the x64_arm64 argument to
vcvarsall so you will get ARM64 output.
Attachment
On Fri, 26 Jan 2024 at 07:36, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-25 Th 20:32, Michael Paquier wrote:
> On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote:
>> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>>> Yeah, I think the default Developer Command Prompt for VS2022 is set up
>>> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".
>> Yup, now I'm in the same state you are
> Wait a minute here. Based on [1], x64_arm64 means you can use a x64
> host and you'll be able to produce ARM64 builds, still these will not
> be able to run on the host where they were built. How much of the
> patch posted upthread is required to produce such builds? Basically
> everything from it, I guess, so as build dependencies can be
> satisfied?
>
> [1]: https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170
If you look at the table here x86 and x64 are the only supported host
architectures. But that's OK, the x64 binaries will run on arm64 (W11
ARM64 has x64 emulation builtin). If that didn't work Dave and I would
not have got as far as we have. But you want the x64_arm64 argument to
vcvarsall so you will get ARM64 output.I've rebuilt it using x64_arm64 and with the attached (very naive patch) and I still get an x64 binary :(
I am definitely getting ARM64 binaries (e.g. for initdb.exe the Windows on ARM compatibility setting is greyed out)
I'll try your patch.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On Fri, 26 Jan 2024 at 07:36, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-25 Th 20:32, Michael Paquier wrote:
> On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote:
>> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>>> Yeah, I think the default Developer Command Prompt for VS2022 is set up
>>> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".
>> Yup, now I'm in the same state you are
> Wait a minute here. Based on [1], x64_arm64 means you can use a x64
> host and you'll be able to produce ARM64 builds, still these will not
> be able to run on the host where they were built. How much of the
> patch posted upthread is required to produce such builds? Basically
> everything from it, I guess, so as build dependencies can be
> satisfied?
>
> [1]: https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170
If you look at the table here x86 and x64 are the only supported host
architectures. But that's OK, the x64 binaries will run on arm64 (W11
ARM64 has x64 emulation builtin). If that didn't work Dave and I would
not have got as far as we have. But you want the x64_arm64 argument to
vcvarsall so you will get ARM64 output.I've rebuilt it using x64_arm64 and with the attached (very naive patch) and I still get an x64 binary :(
With this patch I still get a build error, but it's different :-)
[1406/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
Creating library src\backend\postgres.exe.lib
storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol spin_delay referenced in function perform_spin_delay
src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2024-01-26 Fr 09:18, Dave Cramer wrote:On Fri, 26 Jan 2024 at 07:36, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-25 Th 20:32, Michael Paquier wrote:
> On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote:
>> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>>> Yeah, I think the default Developer Command Prompt for VS2022 is set up
>>> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".
>> Yup, now I'm in the same state you are
> Wait a minute here. Based on [1], x64_arm64 means you can use a x64
> host and you'll be able to produce ARM64 builds, still these will not
> be able to run on the host where they were built. How much of the
> patch posted upthread is required to produce such builds? Basically
> everything from it, I guess, so as build dependencies can be
> satisfied?
>
> [1]: https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170
If you look at the table here x86 and x64 are the only supported host
architectures. But that's OK, the x64 binaries will run on arm64 (W11
ARM64 has x64 emulation builtin). If that didn't work Dave and I would
not have got as far as we have. But you want the x64_arm64 argument to
vcvarsall so you will get ARM64 output.I've rebuilt it using x64_arm64 and with the attached (very naive patch) and I still get an x64 binary :(
With this patch I still get a build error, but it's different :-)
[1406/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
Creating library src\backend\postgres.exe.lib
storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol spin_delay referenced in function perform_spin_delay
src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals
On Mon, 29 Jan 2024 at 11:16, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-26 Fr 09:18, Dave Cramer wrote:On Fri, 26 Jan 2024 at 07:36, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-25 Th 20:32, Michael Paquier wrote:
> On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote:
>> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>>> Yeah, I think the default Developer Command Prompt for VS2022 is set up
>>> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".
>> Yup, now I'm in the same state you are
> Wait a minute here. Based on [1], x64_arm64 means you can use a x64
> host and you'll be able to produce ARM64 builds, still these will not
> be able to run on the host where they were built. How much of the
> patch posted upthread is required to produce such builds? Basically
> everything from it, I guess, so as build dependencies can be
> satisfied?
>
> [1]: https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170
If you look at the table here x86 and x64 are the only supported host
architectures. But that's OK, the x64 binaries will run on arm64 (W11
ARM64 has x64 emulation builtin). If that didn't work Dave and I would
not have got as far as we have. But you want the x64_arm64 argument to
vcvarsall so you will get ARM64 output.I've rebuilt it using x64_arm64 and with the attached (very naive patch) and I still get an x64 binary :(
With this patch I still get a build error, but it's different :-)
[1406/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
Creating library src\backend\postgres.exe.lib
storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol spin_delay referenced in function perform_spin_delay
src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externalsDid you add the latest lock.patch ?
I'm a bit confused about exactly what needs to be applied. Can you supply a complete patch to be applied to a pristine checkout that will let me build?
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2024-01-29 Mo 11:20, Dave Cramer wrote:On Mon, 29 Jan 2024 at 11:16, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-26 Fr 09:18, Dave Cramer wrote:On Fri, 26 Jan 2024 at 07:36, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-25 Th 20:32, Michael Paquier wrote:
> On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote:
>> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>>> Yeah, I think the default Developer Command Prompt for VS2022 is set up
>>> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".
>> Yup, now I'm in the same state you are
> Wait a minute here. Based on [1], x64_arm64 means you can use a x64
> host and you'll be able to produce ARM64 builds, still these will not
> be able to run on the host where they were built. How much of the
> patch posted upthread is required to produce such builds? Basically
> everything from it, I guess, so as build dependencies can be
> satisfied?
>
> [1]: https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170
If you look at the table here x86 and x64 are the only supported host
architectures. But that's OK, the x64 binaries will run on arm64 (W11
ARM64 has x64 emulation builtin). If that didn't work Dave and I would
not have got as far as we have. But you want the x64_arm64 argument to
vcvarsall so you will get ARM64 output.I've rebuilt it using x64_arm64 and with the attached (very naive patch) and I still get an x64 binary :(
With this patch I still get a build error, but it's different :-)
[1406/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
Creating library src\backend\postgres.exe.lib
storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol spin_delay referenced in function perform_spin_delay
src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externalsDid you add the latest lock.patch ?
I'm a bit confused about exactly what needs to be applied. Can you supply a complete patch to be applied to a pristine checkout that will let me build?
cheers
Attachment
On Tue, 30 Jan 2024 at 08:38, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-29 Mo 11:20, Dave Cramer wrote:On Mon, 29 Jan 2024 at 11:16, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-26 Fr 09:18, Dave Cramer wrote:On Fri, 26 Jan 2024 at 07:36, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-25 Th 20:32, Michael Paquier wrote:
> On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote:
>> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>>> Yeah, I think the default Developer Command Prompt for VS2022 is set up
>>> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".
>> Yup, now I'm in the same state you are
> Wait a minute here. Based on [1], x64_arm64 means you can use a x64
> host and you'll be able to produce ARM64 builds, still these will not
> be able to run on the host where they were built. How much of the
> patch posted upthread is required to produce such builds? Basically
> everything from it, I guess, so as build dependencies can be
> satisfied?
>
> [1]: https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170
If you look at the table here x86 and x64 are the only supported host
architectures. But that's OK, the x64 binaries will run on arm64 (W11
ARM64 has x64 emulation builtin). If that didn't work Dave and I would
not have got as far as we have. But you want the x64_arm64 argument to
vcvarsall so you will get ARM64 output.I've rebuilt it using x64_arm64 and with the attached (very naive patch) and I still get an x64 binary :(
With this patch I still get a build error, but it's different :-)
[1406/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
Creating library src\backend\postgres.exe.lib
storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol spin_delay referenced in function perform_spin_delay
src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externalsDid you add the latest lock.patch ?
I'm a bit confused about exactly what needs to be applied. Can you supply a complete patch to be applied to a pristine checkout that will let me build?
cheers
See attached.
No, that is what is giving me the error shown above (just tried again to be certain). And it's not surprising, as patch 2 #ifdef's out the definition of spin_delay().
If you can get a complete build with these patches then I suspect you're not doing a proper ARM64 build.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2024-01-30 Tu 09:50, Dave Cramer wrote:On Tue, 30 Jan 2024 at 08:38, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-29 Mo 11:20, Dave Cramer wrote:On Mon, 29 Jan 2024 at 11:16, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-26 Fr 09:18, Dave Cramer wrote:On Fri, 26 Jan 2024 at 07:36, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-25 Th 20:32, Michael Paquier wrote:
> On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote:
>> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>>> Yeah, I think the default Developer Command Prompt for VS2022 is set up
>>> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".
>> Yup, now I'm in the same state you are
> Wait a minute here. Based on [1], x64_arm64 means you can use a x64
> host and you'll be able to produce ARM64 builds, still these will not
> be able to run on the host where they were built. How much of the
> patch posted upthread is required to produce such builds? Basically
> everything from it, I guess, so as build dependencies can be
> satisfied?
>
> [1]: https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170
If you look at the table here x86 and x64 are the only supported host
architectures. But that's OK, the x64 binaries will run on arm64 (W11
ARM64 has x64 emulation builtin). If that didn't work Dave and I would
not have got as far as we have. But you want the x64_arm64 argument to
vcvarsall so you will get ARM64 output.I've rebuilt it using x64_arm64 and with the attached (very naive patch) and I still get an x64 binary :(
With this patch I still get a build error, but it's different :-)
[1406/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
Creating library src\backend\postgres.exe.lib
storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol spin_delay referenced in function perform_spin_delay
src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externalsDid you add the latest lock.patch ?
I'm a bit confused about exactly what needs to be applied. Can you supply a complete patch to be applied to a pristine checkout that will let me build?
cheers
See attached.
No, that is what is giving me the error shown above (just tried again to be certain). And it's not surprising, as patch 2 #ifdef's out the definition of spin_delay().
If you can get a complete build with these patches then I suspect you're not doing a proper ARM64 build.
On Tue, Jan 30, 2024 at 4:56 PM Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-30 Tu 09:50, Dave Cramer wrote:On Tue, 30 Jan 2024 at 08:38, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-29 Mo 11:20, Dave Cramer wrote:On Mon, 29 Jan 2024 at 11:16, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-26 Fr 09:18, Dave Cramer wrote:On Fri, 26 Jan 2024 at 07:36, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-25 Th 20:32, Michael Paquier wrote:
> On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote:
>> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>>> Yeah, I think the default Developer Command Prompt for VS2022 is set up
>>> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".
>> Yup, now I'm in the same state you are
> Wait a minute here. Based on [1], x64_arm64 means you can use a x64
> host and you'll be able to produce ARM64 builds, still these will not
> be able to run on the host where they were built. How much of the
> patch posted upthread is required to produce such builds? Basically
> everything from it, I guess, so as build dependencies can be
> satisfied?
>
> [1]: https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170
If you look at the table here x86 and x64 are the only supported host
architectures. But that's OK, the x64 binaries will run on arm64 (W11
ARM64 has x64 emulation builtin). If that didn't work Dave and I would
not have got as far as we have. But you want the x64_arm64 argument to
vcvarsall so you will get ARM64 output.I've rebuilt it using x64_arm64 and with the attached (very naive patch) and I still get an x64 binary :(
With this patch I still get a build error, but it's different :-)
[1406/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
Creating library src\backend\postgres.exe.lib
storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol spin_delay referenced in function perform_spin_delay
src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externalsDid you add the latest lock.patch ?
I'm a bit confused about exactly what needs to be applied. Can you supply a complete patch to be applied to a pristine checkout that will let me build?
cheers
See attached.
No, that is what is giving me the error shown above (just tried again to be certain). And it's not surprising, as patch 2 #ifdef's out the definition of spin_delay().
If you can get a complete build with these patches then I suspect you're not doing a proper ARM64 build.
Okay I will look when I get home in a week
I made some progress. The attached is mostly taken from <https://postgr.es/m/dbee741f-b9b7-a0d5-1b1b-f9b532bb6f56@linaro.org>
With it applied I was able to get a successful build using the buildfarm client. However, there are access violations when running some tests, so there is still some work to do, apparently.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Attachment
On 31.01.24 16:20, Andrew Dunstan wrote: > - PostgreSQL will only build for the x64 architecture on 64-bit Windows. > + PostgreSQL will only build for the x64 and ARM64 architecture on > 64-bit Windows. Are there any other 64-bit architectures for Windows? Possibly, the original sentence was meant to communicate that ARM was not supported, in which case it could now be removed?
On 2024-01-31 We 10:34, Peter Eisentraut wrote: > On 31.01.24 16:20, Andrew Dunstan wrote: >> - PostgreSQL will only build for the x64 architecture on 64-bit >> Windows. + PostgreSQL will only build for the x64 and ARM64 >> architecture on 64-bit Windows. > > Are there any other 64-bit architectures for Windows? > > Possibly, the original sentence was meant to communicate that ARM was > not supported, in which case it could now be removed? x86? That is in fact the default setting for VS even on ARM64. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2024-01-30 Tu 17:54, Dave Cramer wrote:On Tue, Jan 30, 2024 at 4:56 PM Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-30 Tu 09:50, Dave Cramer wrote:On Tue, 30 Jan 2024 at 08:38, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-29 Mo 11:20, Dave Cramer wrote:On Mon, 29 Jan 2024 at 11:16, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-26 Fr 09:18, Dave Cramer wrote:On Fri, 26 Jan 2024 at 07:36, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-01-25 Th 20:32, Michael Paquier wrote:
> On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote:
>> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>>> Yeah, I think the default Developer Command Prompt for VS2022 is set up
>>> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".
>> Yup, now I'm in the same state you are
> Wait a minute here. Based on [1], x64_arm64 means you can use a x64
> host and you'll be able to produce ARM64 builds, still these will not
> be able to run on the host where they were built. How much of the
> patch posted upthread is required to produce such builds? Basically
> everything from it, I guess, so as build dependencies can be
> satisfied?
>
> [1]: https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170
If you look at the table here x86 and x64 are the only supported host
architectures. But that's OK, the x64 binaries will run on arm64 (W11
ARM64 has x64 emulation builtin). If that didn't work Dave and I would
not have got as far as we have. But you want the x64_arm64 argument to
vcvarsall so you will get ARM64 output.I've rebuilt it using x64_arm64 and with the attached (very naive patch) and I still get an x64 binary :(
With this patch I still get a build error, but it's different :-)
[1406/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
Creating library src\backend\postgres.exe.lib
storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol spin_delay referenced in function perform_spin_delay
src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externalsDid you add the latest lock.patch ?
I'm a bit confused about exactly what needs to be applied. Can you supply a complete patch to be applied to a pristine checkout that will let me build?
cheers
See attached.
No, that is what is giving me the error shown above (just tried again to be certain). And it's not surprising, as patch 2 #ifdef's out the definition of spin_delay().
If you can get a complete build with these patches then I suspect you're not doing a proper ARM64 build.
Okay I will look when I get home in a week
I made some progress. The attached is mostly taken from <https://postgr.es/m/dbee741f-b9b7-a0d5-1b1b-f9b532bb6f56@linaro.org>
With it applied I was able to get a successful build using the buildfarm client. However, there are access violations when running some tests, so there is still some work to do, apparently.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On Tue, Feb 06, 2024 at 07:01:49AM -0500, Dave Cramer wrote: > Thanks, this patch works and > testing with meson passes. Only with the version posted at [1]? Interesting, that's the same contents as v8 posted upthread, minus src/tools/ because we don't need to care about them anymore. Andrew, what's happening on the test side? It does not seem you've mentioned any details about what is going wrong, or I have just missed them. > I'll try the buildfarm next. [1]: https://www.postgresql.org/message-id/ea42654a-3dc4-98b0-335b-56b7ec5e5687@dunslane.net -- Michael
Attachment
On Tue, Feb 06, 2024 at 07:01:49AM -0500, Dave Cramer wrote:
> Thanks, this patch works and
> testing with meson passes.
Only with the version posted at [1]? Interesting, that's the same
contents as v8 posted upthread, minus src/tools/ because we don't need
to care about them anymore.
Andrew, what's happening on the test side? It does not seem you've
mentioned any details about what is going wrong, or I have just missed
them.
> I'll try the buildfarm next.
[1]: https://www.postgresql.org/message-id/ea42654a-3dc4-98b0-335b-56b7ec5e5687@dunslane.net
On Fri, 9 Feb 2024 at 00:26, Michael Paquier <michael@paquier.xyz> wrote:On Tue, Feb 06, 2024 at 07:01:49AM -0500, Dave Cramer wrote:
> Thanks, this patch works and
> testing with meson passes.
Only with the version posted at [1]? Interesting, that's the same
contents as v8 posted upthread, minus src/tools/ because we don't need
to care about them anymore.
Andrew, what's happening on the test side? It does not seem you've
mentioned any details about what is going wrong, or I have just missed
them.
> I'll try the buildfarm next.
[1]: https://www.postgresql.org/message-id/ea42654a-3dc4-98b0-335b-56b7ec5e5687@dunslane.netinterestingly meson test does not produce any errorThe buildfarm produces the following error for me:-SELECT relname, attname, coltypes, get_columns_length(coltypes)- FROM check_columns- WHERE get_columns_length(coltypes) % 8 != 0 OR- 'name'::regtype::oid = ANY(coltypes);- relname | attname | coltypes | get_columns_length----------+---------+----------+---------------------(0 rows)-+server closed the connection unexpectedly+ This probably means the server terminated abnormally+ before or while processing the request.+connection to server was lost
Dave
Hi, On 2024-02-09 14:23:46 -0500, Dave Cramer wrote: > > interestingly meson test does not produce any error > > The buildfarm produces the following error for me: > > > > -SELECT relname, attname, coltypes, get_columns_length(coltypes) > > - FROM check_columns > > - WHERE get_columns_length(coltypes) % 8 != 0 OR > > - 'name'::regtype::oid = ANY(coltypes); > > - relname | attname | coltypes | get_columns_length > > ----------+---------+----------+-------------------- > > -(0 rows) > > - > > +server closed the connection unexpectedly > > + This probably means the server terminated abnormally > > + before or while processing the request. > > +connection to server was lost > > > > Actually digging some more, here is the actual error > > 2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: server process (PID > 11204) was terminated by exception 0xC0000005 > 2024-02-09 13:31:11.008 -05 postmaster[10672] DETAIL: Failed process was > running: VACUUM; > 2024-02-09 13:31:11.008 -05 postmaster[10672] HINT: See C include file > "ntstatus.h" for a description of the hexadecimal value. That's something like a segfault. One suspicion I have is that src/port/pg_crc32c_armv8_choose.c possibly doesn't properly support msvc. It seems to assume that SIGILL can be trapped, but that IIRC doesn't work on windows. I'd check if the problem persists if you change cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) to cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 0) Also, yikes, that's an ugly way of doing hardware detection. Jumping out of a signal handler into normal code. Brrr. Greetings, Andres Freund
Hi,
On 2024-02-09 14:23:46 -0500, Dave Cramer wrote:
> > interestingly meson test does not produce any error
> > The buildfarm produces the following error for me:
> >
> > -SELECT relname, attname, coltypes, get_columns_length(coltypes)
> > - FROM check_columns
> > - WHERE get_columns_length(coltypes) % 8 != 0 OR
> > - 'name'::regtype::oid = ANY(coltypes);
> > - relname | attname | coltypes | get_columns_length
> > ----------+---------+----------+--------------------
> > -(0 rows)
> > -
> > +server closed the connection unexpectedly
> > + This probably means the server terminated abnormally
> > + before or while processing the request.
> > +connection to server was lost
> >
>
> Actually digging some more, here is the actual error
>
> 2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: server process (PID
> 11204) was terminated by exception 0xC0000005
> 2024-02-09 13:31:11.008 -05 postmaster[10672] DETAIL: Failed process was
> running: VACUUM;
> 2024-02-09 13:31:11.008 -05 postmaster[10672] HINT: See C include file
> "ntstatus.h" for a description of the hexadecimal value.
That's something like a segfault.
One suspicion I have is that src/port/pg_crc32c_armv8_choose.c possibly
doesn't properly support msvc. It seems to assume that SIGILL can be trapped,
but that IIRC doesn't work on windows.
I'd check if the problem persists if you change
cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1)
to
cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 0)
Also, yikes, that's an ugly way of doing hardware detection. Jumping out of a
signal handler into normal code. Brrr.
Greetings,
Andres Freund
On Fri, 9 Feb 2024 at 07:18, Dave Cramer <davecramer@postgres.rocks> wrote:On Fri, 9 Feb 2024 at 00:26, Michael Paquier <michael@paquier.xyz> wrote:On Tue, Feb 06, 2024 at 07:01:49AM -0500, Dave Cramer wrote:
> Thanks, this patch works and
> testing with meson passes.
Only with the version posted at [1]? Interesting, that's the same
contents as v8 posted upthread, minus src/tools/ because we don't need
to care about them anymore.
Andrew, what's happening on the test side? It does not seem you've
mentioned any details about what is going wrong, or I have just missed
them.
> I'll try the buildfarm next.
[1]: https://www.postgresql.org/message-id/ea42654a-3dc4-98b0-335b-56b7ec5e5687@dunslane.netinterestingly meson test does not produce any errorThe buildfarm produces the following error for me:-SELECT relname, attname, coltypes, get_columns_length(coltypes)- FROM check_columns- WHERE get_columns_length(coltypes) % 8 != 0 OR- 'name'::regtype::oid = ANY(coltypes);- relname | attname | coltypes | get_columns_length----------+---------+----------+---------------------(0 rows)-+server closed the connection unexpectedly+ This probably means the server terminated abnormally+ before or while processing the request.+connection to server was lostActually digging some more, here is the actual error2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: server process (PID 11204) was terminated by exception 0xC00000052024-02-09 13:31:11.008 -05 postmaster[10672] DETAIL: Failed process was running: VACUUM;2024-02-09 13:31:11.008 -05 postmaster[10672] HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: terminating any other active server processes2024-02-09 13:31:11.013 -05 postmaster[10672] LOG: all server processes terminated; reinitializing2024-02-09 13:31:11.034 -05 startup[6152] LOG: database system was interrupted; last known up at 2024-02-09 13:31:01 -05
Yes, this is pretty much what I saw.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2024-02-09 Fr 14:23, Dave Cramer wrote:On Fri, 9 Feb 2024 at 07:18, Dave Cramer <davecramer@postgres.rocks> wrote:On Fri, 9 Feb 2024 at 00:26, Michael Paquier <michael@paquier.xyz> wrote:On Tue, Feb 06, 2024 at 07:01:49AM -0500, Dave Cramer wrote:
> Thanks, this patch works and
> testing with meson passes.
Only with the version posted at [1]? Interesting, that's the same
contents as v8 posted upthread, minus src/tools/ because we don't need
to care about them anymore.
Andrew, what's happening on the test side? It does not seem you've
mentioned any details about what is going wrong, or I have just missed
them.
> I'll try the buildfarm next.
[1]: https://www.postgresql.org/message-id/ea42654a-3dc4-98b0-335b-56b7ec5e5687@dunslane.netinterestingly meson test does not produce any errorThe buildfarm produces the following error for me:-SELECT relname, attname, coltypes, get_columns_length(coltypes)- FROM check_columns- WHERE get_columns_length(coltypes) % 8 != 0 OR- 'name'::regtype::oid = ANY(coltypes);- relname | attname | coltypes | get_columns_length----------+---------+----------+---------------------(0 rows)-+server closed the connection unexpectedly+ This probably means the server terminated abnormally+ before or while processing the request.+connection to server was lostActually digging some more, here is the actual error2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: server process (PID 11204) was terminated by exception 0xC00000052024-02-09 13:31:11.008 -05 postmaster[10672] DETAIL: Failed process was running: VACUUM;2024-02-09 13:31:11.008 -05 postmaster[10672] HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: terminating any other active server processes2024-02-09 13:31:11.013 -05 postmaster[10672] LOG: all server processes terminated; reinitializing2024-02-09 13:31:11.034 -05 startup[6152] LOG: database system was interrupted; last known up at 2024-02-09 13:31:01 -05
On Sat, 10 Feb 2024 at 11:19, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-02-09 Fr 14:23, Dave Cramer wrote:On Fri, 9 Feb 2024 at 07:18, Dave Cramer <davecramer@postgres.rocks> wrote:On Fri, 9 Feb 2024 at 00:26, Michael Paquier <michael@paquier.xyz> wrote:On Tue, Feb 06, 2024 at 07:01:49AM -0500, Dave Cramer wrote:
> Thanks, this patch works and
> testing with meson passes.
Only with the version posted at [1]? Interesting, that's the same
contents as v8 posted upthread, minus src/tools/ because we don't need
to care about them anymore.
Andrew, what's happening on the test side? It does not seem you've
mentioned any details about what is going wrong, or I have just missed
them.
> I'll try the buildfarm next.
[1]: https://www.postgresql.org/message-id/ea42654a-3dc4-98b0-335b-56b7ec5e5687@dunslane.netinterestingly meson test does not produce any errorThe buildfarm produces the following error for me:-SELECT relname, attname, coltypes, get_columns_length(coltypes)- FROM check_columns- WHERE get_columns_length(coltypes) % 8 != 0 OR- 'name'::regtype::oid = ANY(coltypes);- relname | attname | coltypes | get_columns_length----------+---------+----------+---------------------(0 rows)-+server closed the connection unexpectedly+ This probably means the server terminated abnormally+ before or while processing the request.+connection to server was lostActually digging some more, here is the actual error2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: server process (PID 11204) was terminated by exception 0xC00000052024-02-09 13:31:11.008 -05 postmaster[10672] DETAIL: Failed process was running: VACUUM;2024-02-09 13:31:11.008 -05 postmaster[10672] HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: terminating any other active server processes2024-02-09 13:31:11.013 -05 postmaster[10672] LOG: all server processes terminated; reinitializing2024-02-09 13:31:11.034 -05 startup[6152] LOG: database system was interrupted; last known up at 2024-02-09 13:31:01 -05
So how does one debug this ?Also if I `run meson test` I don't see this error. What does the buildfarm do differently?
First it does this:
meson test -C $pgsql --no-rebuild --suite setup
Then it does this (jflag is for the number of jobs):
meson test -t $meson_test_timeout $jflag -C $pgsql --logbase checklog --print-errorlogs --no-rebuild --suite regress --test-args=--no-locale
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2024-02-10 Sa 12:20, Dave Cramer wrote:On Sat, 10 Feb 2024 at 11:19, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-02-09 Fr 14:23, Dave Cramer wrote:On Fri, 9 Feb 2024 at 07:18, Dave Cramer <davecramer@postgres.rocks> wrote:On Fri, 9 Feb 2024 at 00:26, Michael Paquier <michael@paquier.xyz> wrote:On Tue, Feb 06, 2024 at 07:01:49AM -0500, Dave Cramer wrote:
> Thanks, this patch works and
> testing with meson passes.
Only with the version posted at [1]? Interesting, that's the same
contents as v8 posted upthread, minus src/tools/ because we don't need
to care about them anymore.
Andrew, what's happening on the test side? It does not seem you've
mentioned any details about what is going wrong, or I have just missed
them.
> I'll try the buildfarm next.
[1]: https://www.postgresql.org/message-id/ea42654a-3dc4-98b0-335b-56b7ec5e5687@dunslane.netinterestingly meson test does not produce any errorThe buildfarm produces the following error for me:-SELECT relname, attname, coltypes, get_columns_length(coltypes)- FROM check_columns- WHERE get_columns_length(coltypes) % 8 != 0 OR- 'name'::regtype::oid = ANY(coltypes);- relname | attname | coltypes | get_columns_length----------+---------+----------+---------------------(0 rows)-+server closed the connection unexpectedly+ This probably means the server terminated abnormally+ before or while processing the request.+connection to server was lostActually digging some more, here is the actual error2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: server process (PID 11204) was terminated by exception 0xC00000052024-02-09 13:31:11.008 -05 postmaster[10672] DETAIL: Failed process was running: VACUUM;2024-02-09 13:31:11.008 -05 postmaster[10672] HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: terminating any other active server processes2024-02-09 13:31:11.013 -05 postmaster[10672] LOG: all server processes terminated; reinitializing2024-02-09 13:31:11.034 -05 startup[6152] LOG: database system was interrupted; last known up at 2024-02-09 13:31:01 -05
So how does one debug this ?Also if I `run meson test` I don't see this error. What does the buildfarm do differently?
First it does this:
meson test -C $pgsql --no-rebuild --suite setup
Then it does this (jflag is for the number of jobs):
meson test -t $meson_test_timeout $jflag -C $pgsql --logbase checklog --print-errorlogs --no-rebuild --suite regress --test-args=--no-locale
On Sat, 10 Feb 2024 at 13:28, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-02-10 Sa 12:20, Dave Cramer wrote:On Sat, 10 Feb 2024 at 11:19, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-02-09 Fr 14:23, Dave Cramer wrote:On Fri, 9 Feb 2024 at 07:18, Dave Cramer <davecramer@postgres.rocks> wrote:On Fri, 9 Feb 2024 at 00:26, Michael Paquier <michael@paquier.xyz> wrote:On Tue, Feb 06, 2024 at 07:01:49AM -0500, Dave Cramer wrote:
> Thanks, this patch works and
> testing with meson passes.
Only with the version posted at [1]? Interesting, that's the same
contents as v8 posted upthread, minus src/tools/ because we don't need
to care about them anymore.
Andrew, what's happening on the test side? It does not seem you've
mentioned any details about what is going wrong, or I have just missed
them.
> I'll try the buildfarm next.
[1]: https://www.postgresql.org/message-id/ea42654a-3dc4-98b0-335b-56b7ec5e5687@dunslane.netinterestingly meson test does not produce any errorThe buildfarm produces the following error for me:-SELECT relname, attname, coltypes, get_columns_length(coltypes)- FROM check_columns- WHERE get_columns_length(coltypes) % 8 != 0 OR- 'name'::regtype::oid = ANY(coltypes);- relname | attname | coltypes | get_columns_length----------+---------+----------+---------------------(0 rows)-+server closed the connection unexpectedly+ This probably means the server terminated abnormally+ before or while processing the request.+connection to server was lostActually digging some more, here is the actual error2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: server process (PID 11204) was terminated by exception 0xC00000052024-02-09 13:31:11.008 -05 postmaster[10672] DETAIL: Failed process was running: VACUUM;2024-02-09 13:31:11.008 -05 postmaster[10672] HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: terminating any other active server processes2024-02-09 13:31:11.013 -05 postmaster[10672] LOG: all server processes terminated; reinitializing2024-02-09 13:31:11.034 -05 startup[6152] LOG: database system was interrupted; last known up at 2024-02-09 13:31:01 -05
So how does one debug this ?Also if I `run meson test` I don't see this error. What does the buildfarm do differently?
First it does this:
meson test -C $pgsql --no-rebuild --suite setup
Then it does this (jflag is for the number of jobs):
meson test -t $meson_test_timeout $jflag -C $pgsql --logbase checklog --print-errorlogs --no-rebuild --suite regress --test-args=--no-locale
running the above manually produces no errors ??
Not for me. I get the error I previously reported, It's an access violation error.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2024-02-12 Mo 08:51, Dave Cramer wrote:On Sat, 10 Feb 2024 at 13:28, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-02-10 Sa 12:20, Dave Cramer wrote:On Sat, 10 Feb 2024 at 11:19, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-02-09 Fr 14:23, Dave Cramer wrote:On Fri, 9 Feb 2024 at 07:18, Dave Cramer <davecramer@postgres.rocks> wrote:On Fri, 9 Feb 2024 at 00:26, Michael Paquier <michael@paquier.xyz> wrote:On Tue, Feb 06, 2024 at 07:01:49AM -0500, Dave Cramer wrote:
> Thanks, this patch works and
> testing with meson passes.
Only with the version posted at [1]? Interesting, that's the same
contents as v8 posted upthread, minus src/tools/ because we don't need
to care about them anymore.
Andrew, what's happening on the test side? It does not seem you've
mentioned any details about what is going wrong, or I have just missed
them.
> I'll try the buildfarm next.
[1]: https://www.postgresql.org/message-id/ea42654a-3dc4-98b0-335b-56b7ec5e5687@dunslane.netinterestingly meson test does not produce any errorThe buildfarm produces the following error for me:-SELECT relname, attname, coltypes, get_columns_length(coltypes)- FROM check_columns- WHERE get_columns_length(coltypes) % 8 != 0 OR- 'name'::regtype::oid = ANY(coltypes);- relname | attname | coltypes | get_columns_length----------+---------+----------+---------------------(0 rows)-+server closed the connection unexpectedly+ This probably means the server terminated abnormally+ before or while processing the request.+connection to server was lostActually digging some more, here is the actual error2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: server process (PID 11204) was terminated by exception 0xC00000052024-02-09 13:31:11.008 -05 postmaster[10672] DETAIL: Failed process was running: VACUUM;2024-02-09 13:31:11.008 -05 postmaster[10672] HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: terminating any other active server processes2024-02-09 13:31:11.013 -05 postmaster[10672] LOG: all server processes terminated; reinitializing2024-02-09 13:31:11.034 -05 startup[6152] LOG: database system was interrupted; last known up at 2024-02-09 13:31:01 -05
So how does one debug this ?Also if I `run meson test` I don't see this error. What does the buildfarm do differently?
First it does this:
meson test -C $pgsql --no-rebuild --suite setup
Then it does this (jflag is for the number of jobs):
meson test -t $meson_test_timeout $jflag -C $pgsql --logbase checklog --print-errorlogs --no-rebuild --suite regress --test-args=--no-locale
running the above manually produces no errors ??
Not for me. I get the error I previously reported, It's an access violation error.
cheers
andrew
On Mon, 12 Feb 2024 at 09:19, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-02-12 Mo 08:51, Dave Cramer wrote:On Sat, 10 Feb 2024 at 13:28, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-02-10 Sa 12:20, Dave Cramer wrote:On Sat, 10 Feb 2024 at 11:19, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-02-09 Fr 14:23, Dave Cramer wrote:On Fri, 9 Feb 2024 at 07:18, Dave Cramer <davecramer@postgres.rocks> wrote:On Fri, 9 Feb 2024 at 00:26, Michael Paquier <michael@paquier.xyz> wrote:On Tue, Feb 06, 2024 at 07:01:49AM -0500, Dave Cramer wrote:
> Thanks, this patch works and
> testing with meson passes.
Only with the version posted at [1]? Interesting, that's the same
contents as v8 posted upthread, minus src/tools/ because we don't need
to care about them anymore.
Andrew, what's happening on the test side? It does not seem you've
mentioned any details about what is going wrong, or I have just missed
them.
> I'll try the buildfarm next.
[1]: https://www.postgresql.org/message-id/ea42654a-3dc4-98b0-335b-56b7ec5e5687@dunslane.netinterestingly meson test does not produce any errorThe buildfarm produces the following error for me:-SELECT relname, attname, coltypes, get_columns_length(coltypes)- FROM check_columns- WHERE get_columns_length(coltypes) % 8 != 0 OR- 'name'::regtype::oid = ANY(coltypes);- relname | attname | coltypes | get_columns_length----------+---------+----------+---------------------(0 rows)-+server closed the connection unexpectedly+ This probably means the server terminated abnormally+ before or while processing the request.+connection to server was lostActually digging some more, here is the actual error2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: server process (PID 11204) was terminated by exception 0xC00000052024-02-09 13:31:11.008 -05 postmaster[10672] DETAIL: Failed process was running: VACUUM;2024-02-09 13:31:11.008 -05 postmaster[10672] HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.2024-02-09 13:31:11.008 -05 postmaster[10672] LOG: terminating any other active server processes2024-02-09 13:31:11.013 -05 postmaster[10672] LOG: all server processes terminated; reinitializing2024-02-09 13:31:11.034 -05 startup[6152] LOG: database system was interrupted; last known up at 2024-02-09 13:31:01 -05
So how does one debug this ?Also if I `run meson test` I don't see this error. What does the buildfarm do differently?
First it does this:
meson test -C $pgsql --no-rebuild --suite setup
Then it does this (jflag is for the number of jobs):
meson test -t $meson_test_timeout $jflag -C $pgsql --logbase checklog --print-errorlogs --no-rebuild --suite regress --test-args=--no-locale
running the above manually produces no errors ??
Not for me. I get the error I previously reported, It's an access violation error.
OK, so I have managed to get a debugger attached to postgres.exe when it faults and the fault occurs atspan is pointing to 0x0
Further data point. If I select a build type of 'debug' instead of 'debugoptimized' the error disappears.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi, On 2024-02-12 13:28:40 -0500, Andrew Dunstan wrote: > On 2024-02-12 Mo 11:44, Dave Cramer wrote: > > OK, so I have managed to get a debugger attached to postgres.exe when it > > faults and the fault occurs at > > https://github.com/postgres/postgres/blob/09eb633e1baa3b7cd7929f3cc77f9c46f63c20b1/src/backend/utils/mmgr/dsa.c#L869 > > span is pointing to 0x0 > > Further data point. If I select a build type of 'debug' instead of > 'debugoptimized' the error disappears. Oh, this is quite interesting. Dave, could you post the backtrace? I wonder if this indicates that we are either missing memory barriers somewhere or that the memory barriers we end up with on msvc + arm aren't correct? Either could explain why the problem doesn't occur when building with optimizations. Greetings, Andres Freund
Hi, On 2024-02-09 15:32:10 -0500, Dave Cramer wrote: > On Fri, 9 Feb 2024 at 14:36, Andres Freund <andres@anarazel.de> wrote: > > That's something like a segfault. > > > > One suspicion I have is that src/port/pg_crc32c_armv8_choose.c possibly > > doesn't properly support msvc. It seems to assume that SIGILL can be > > trapped, > > but that IIRC doesn't work on windows. > > > > I'd check if the problem persists if you change > > cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) > > to > > cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 0) > > > > This results in > > FAILED: src/bin/pg_checksums/pg_checksums.exe > src/bin/pg_checksums/pg_checksums.pdb > "link" /MACHINE:ARM64 /OUT:src/bin/pg_checksums/pg_checksums.exe > src/bin/pg_checksums/pg_checksums.exe.p/win32ver.res > src/bin/pg_checksums/pg_checksums.exe.p/pg_checksums.c.obj "/release" > "/nologo" "/DEBUG" "/PDB:src\bin\pg_checksums\pg_checksums.pdb" > "/INCREMENTAL:NO" "/STACK:4194304" "/NOEXP" "src/fe_utils/libpgfeutils.a" > "src/common/libpgcommon.a" "src/port/libpgport.a" "ws2_32.lib" "ws2_32.lib" > "ws2_32.lib" "ws2_32.lib" "/SUBSYSTEM:CONSOLE" "kernel32.lib" "user32.lib" > "gdi32.lib" "winspool.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" > "uuid.lib" "comdlg32.lib" "advapi32.lib" > libpgcommon.a(controldata_utils.c.obj) : error LNK2001: unresolved external > symbol pg_comp_crc32c That's because have_optimized_crc ends up being set. I'm somewhat unhappy about the msvc specific path in meson.build. Shouldn't at the very least the "Use ARM CRC Extension unconditionally" path be first. I guess it's not really this patch's fault that the runtime detection for armv8 crc is implemented this shoddily :(. Greetings, Andres Freund
Hi,
On 2024-02-12 13:28:40 -0500, Andrew Dunstan wrote:
> On 2024-02-12 Mo 11:44, Dave Cramer wrote:
> > OK, so I have managed to get a debugger attached to postgres.exe when it
> > faults and the fault occurs at
> > https://github.com/postgres/postgres/blob/09eb633e1baa3b7cd7929f3cc77f9c46f63c20b1/src/backend/utils/mmgr/dsa.c#L869
> > span is pointing to 0x0
>
> Further data point. If I select a build type of 'debug' instead of
> 'debugoptimized' the error disappears.
Oh, this is quite interesting. Dave, could you post the backtrace?
I wonder if this indicates that we are either missing memory barriers
somewhere or that the memory barriers we end up with on msvc + arm aren't
correct? Either could explain why the problem doesn't occur when building
with optimizations.
Greetings,
Andres Freund
Hi, On 2024-02-12 12:50:12 -0800, Andres Freund wrote: > On 2024-02-12 13:28:40 -0500, Andrew Dunstan wrote: > I wonder if this indicates that we are either missing memory barriers > somewhere or that the memory barriers we end up with on msvc + arm aren't > correct? Either could explain why the problem doesn't occur when building > with optimizations. I think I might have been on to something - if my human emulation of a preprocessor isn't wrong, we'd end up with #define S_UNLOCK(lock) \ do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0) on msvc + arm. And that's entirely insufficient - _ReadWriteBarrier() just limits *compiler* level reordering, not CPU level reordering. I think it's even insufficient on x86[-64], but it's definitely insufficient on arm. Another issue I see is that we have a bunch of code that's dependant on __aarch64__ being set - but it doesn't look to me that it is set on msvc. One obvious consequence of that is that 64bit atomics won't be used (see src/include/port/atomics/arch-arm.h) - but that shouldn't cause this issue. Greetings, Andres Freund
On Sat, Feb 10, 2024 at 8:36 AM Andres Freund <andres@anarazel.de> wrote: > Also, yikes, that's an ugly way of doing hardware detection. Jumping out of a > signal handler into normal code. Brrr. Maybe it's a little baroque but what's actually wrong with it? OpenSSL does something similar during initialisation as a fallback (see armcap.c), and I borrowed that particular idea because I didn't want the rat's nest of unportable strategies required to detect the feature otherwise. As people who ran OpenSSL-linked stuff under gdb on 32 bit ARM discover (on e.g. early 32 bit RPis and early iOS etc systems the instruction was missing, and the signal dropped you into the debugger, so there are lots of reports of that if you google this topic). FWIW psql also long-jumps out of signal handlers, arguably more questionably because it's an async signal and the program counter may be inside readline... As for Windows, let's see... SIGILL doesn't even work (the identifier is defined for compatibility, but illegal instruction exception is not connected up to it), not to mention that in the backend we redirect sigaction() to our own fake co-operative signal system so this technique wouldn't work even if SIGILL did. But I would be surprised if anyone actually has a platform that can run Windows that doesn't have this instruction (if they did I think we'd see a crash with untrapped illegal instruction exception 0xC000001D, let me put that here to help google to find this message if it ever happens...). CRC32C is required by ARMv8.1[1], and was optional but AFAIK present in general purpose ARMv8.0 cores including lowly RPis. In other words anything < 8 years old[2] has it certainly, and older stuff probably does too if it is a 64 bit server/PC/laptop/RPi? If you look for reports of systems without it they seem to be pre-ARMv8, or early phones/tablets (?)? It looks like Windows 11 requires ARMv8.1[3] ("the Windows kernel dropped ARMv8.0 support shortly after builds 22621 [= 2022], now requiring and enforcing a minimum of ARMv8.1 and thus the new atomics instructions", that atomic/lock stuff being a pretty solid reason to want to do so as discussed on this list before...). What if we just assumed that the same must be effectively true in practice for Windows 10 until we hear otherwise? If a reasonable non-hypothetical system shows up that rejects this instruction in the short time before we disclaim support for 10 (whose EOL is next year?), we can decide whether to just turn off the CRC32 fast path for that OS version, or figure out the runtime feature-probe patch. Clearly we need to minimise Windows magic in this project due to lack of hackers, so writing code proactively for imaginary users seems like a bad plan... [1] https://developer.arm.com/documentation/ddi0597/2023-12/Base-Instructions/CRC32C--CRC32C- [2] https://en.wikipedia.org/wiki/Comparison_of_ARM_processors#ARMv8-A [3] http://www.emulators.com/docs/abc_history_of_woa.htm
Hi,
On 2024-02-12 12:50:12 -0800, Andres Freund wrote:
> On 2024-02-12 13:28:40 -0500, Andrew Dunstan wrote:
> I wonder if this indicates that we are either missing memory barriers
> somewhere or that the memory barriers we end up with on msvc + arm aren't
> correct? Either could explain why the problem doesn't occur when building
> with optimizations.
I think I might have been on to something - if my human emulation of a
preprocessor isn't wrong, we'd end up with
#define S_UNLOCK(lock) \
do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0)
on msvc + arm. And that's entirely insufficient - _ReadWriteBarrier() just
limits *compiler* level reordering, not CPU level reordering. I think it's
even insufficient on x86[-64], but it's definitely insufficient on arm.
Hi, On 2024-02-13 12:49:33 -0500, Dave Cramer wrote: > > I think I might have been on to something - if my human emulation of a > > preprocessor isn't wrong, we'd end up with > > > > #define S_UNLOCK(lock) \ > > do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0) > > > > on msvc + arm. And that's entirely insufficient - _ReadWriteBarrier() just > > limits *compiler* level reordering, not CPU level reordering. I think it's > > even insufficient on x86[-64], but it's definitely insufficient on arm. > > > In fact ReadWriteBarrier has been deprecated _ReadWriteBarrier | Microsoft > Learn > <https://learn.microsoft.com/en-us/cpp/intrinsics/readwritebarrier?view=msvc-170> I'd just ignore that, that's just pushing towards more modern stuff that's more applicable to C++ than C. > I did try using atomic_thread_fence as per atomic_thread_fence - > cppreference.com > <https://en.cppreference.com/w/c/atomic/atomic_thread_fence> The semantics of atomic_thread_fence are, uh, very odd. I'd just use MemoryBarrier(). Greetings, Andres Freund
Hi,
On 2024-02-13 12:49:33 -0500, Dave Cramer wrote:
> > I think I might have been on to something - if my human emulation of a
> > preprocessor isn't wrong, we'd end up with
> >
> > #define S_UNLOCK(lock) \
> > do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0)
> >
> > on msvc + arm. And that's entirely insufficient - _ReadWriteBarrier() just
> > limits *compiler* level reordering, not CPU level reordering. I think it's
> > even insufficient on x86[-64], but it's definitely insufficient on arm.
> >
> In fact ReadWriteBarrier has been deprecated _ReadWriteBarrier | Microsoft
> Learn
> <https://learn.microsoft.com/en-us/cpp/intrinsics/readwritebarrier?view=msvc-170>
I'd just ignore that, that's just pushing towards more modern stuff that's
more applicable to C++ than C.
> I did try using atomic_thread_fence as per atomic_thread_fence -
> cppreference.com
> <https://en.cppreference.com/w/c/atomic/atomic_thread_fence>
The semantics of atomic_thread_fence are, uh, very odd. I'd just use
MemoryBarrier().
On Tue, 13 Feb 2024 at 12:52, Andres Freund <andres@anarazel.de> wrote:Hi,
On 2024-02-13 12:49:33 -0500, Dave Cramer wrote:
> > I think I might have been on to something - if my human emulation of a
> > preprocessor isn't wrong, we'd end up with
> >
> > #define S_UNLOCK(lock) \
> > do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0)
> >
> > on msvc + arm. And that's entirely insufficient - _ReadWriteBarrier() just
> > limits *compiler* level reordering, not CPU level reordering. I think it's
> > even insufficient on x86[-64], but it's definitely insufficient on arm.
> >
> In fact ReadWriteBarrier has been deprecated _ReadWriteBarrier | Microsoft
> Learn
> <https://learn.microsoft.com/en-us/cpp/intrinsics/readwritebarrier?view=msvc-170>
I'd just ignore that, that's just pushing towards more modern stuff that's
more applicable to C++ than C.
> I did try using atomic_thread_fence as per atomic_thread_fence -
> cppreference.com
> <https://en.cppreference.com/w/c/atomic/atomic_thread_fence>
The semantics of atomic_thread_fence are, uh, very odd. I'd just use
MemoryBarrier().#define S_UNLOCK(lock) \do { MemoryBarrier(); (*(lock)) = 0; } while (0)#endifHas no effect.I have no idea if that is what you meant that I should do ?Dave
On Tue, 13 Feb 2024 at 16:28, Dave Cramer <davecramer@postgres.rocks> wrote:On Tue, 13 Feb 2024 at 12:52, Andres Freund <andres@anarazel.de> wrote:Hi,
On 2024-02-13 12:49:33 -0500, Dave Cramer wrote:
> > I think I might have been on to something - if my human emulation of a
> > preprocessor isn't wrong, we'd end up with
> >
> > #define S_UNLOCK(lock) \
> > do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0)
> >
> > on msvc + arm. And that's entirely insufficient - _ReadWriteBarrier() just
> > limits *compiler* level reordering, not CPU level reordering. I think it's
> > even insufficient on x86[-64], but it's definitely insufficient on arm.
> >
> In fact ReadWriteBarrier has been deprecated _ReadWriteBarrier | Microsoft
> Learn
> <https://learn.microsoft.com/en-us/cpp/intrinsics/readwritebarrier?view=msvc-170>
I'd just ignore that, that's just pushing towards more modern stuff that's
more applicable to C++ than C.
> I did try using atomic_thread_fence as per atomic_thread_fence -
> cppreference.com
> <https://en.cppreference.com/w/c/atomic/atomic_thread_fence>
The semantics of atomic_thread_fence are, uh, very odd. I'd just use
MemoryBarrier().#define S_UNLOCK(lock) \do { MemoryBarrier(); (*(lock)) = 0; } while (0)#endifHas no effect.I have no idea if that is what you meant that I should do ?DaveRevisiting this:Andrew, can you explain the difference between ninja test (which passes) and what the build farm does. The buildfarm crashes.
Dave
On 2024-09-24 16:01:30 -0400, Dave Cramer wrote: > I have a dmp file with a stack trace if anyone is interested /me raises his hand
On Tue, 13 Feb 2024 at 16:28, Dave Cramer <davecramer@postgres.rocks> wrote:On Tue, 13 Feb 2024 at 12:52, Andres Freund <andres@anarazel.de> wrote:Hi,
On 2024-02-13 12:49:33 -0500, Dave Cramer wrote:
> > I think I might have been on to something - if my human emulation of a
> > preprocessor isn't wrong, we'd end up with
> >
> > #define S_UNLOCK(lock) \
> > do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0)
> >
> > on msvc + arm. And that's entirely insufficient - _ReadWriteBarrier() just
> > limits *compiler* level reordering, not CPU level reordering. I think it's
> > even insufficient on x86[-64], but it's definitely insufficient on arm.
> >
> In fact ReadWriteBarrier has been deprecated _ReadWriteBarrier | Microsoft
> Learn
> <https://learn.microsoft.com/en-us/cpp/intrinsics/readwritebarrier?view=msvc-170>
I'd just ignore that, that's just pushing towards more modern stuff that's
more applicable to C++ than C.
> I did try using atomic_thread_fence as per atomic_thread_fence -
> cppreference.com
> <https://en.cppreference.com/w/c/atomic/atomic_thread_fence>
The semantics of atomic_thread_fence are, uh, very odd. I'd just use
MemoryBarrier().#define S_UNLOCK(lock) \do { MemoryBarrier(); (*(lock)) = 0; } while (0)#endifHas no effect.I have no idea if that is what you meant that I should do ?DaveRevisiting this:Andrew, can you explain the difference between ninja test (which passes) and what the build farm does. The buildfarm crashes.
The buildfarm client performs these steps:
meson test -C $pgsql --no-rebuild --suite setup meson test -t $meson_test_timeout $jflag -C $pgsql --logbase checklog --print-errorlogs --no-rebuild --suite regress --test-args=--no-locale meson test -t $meson_test_timeout $jflag -C $pgsql --print-errorlogs --no-rebuild --logbase checkworld --no-suite setup --no-suite regress foreach tested locale: meson test -t $meson_test_timeout $jflag -v -C $pgsql --no-rebuild --print-errorlogs --setup running --suite regress-running --logbase regress-installcheck-$locale
$pgsql is the build root, $jflag is setting the number of jobs
IOW, we do test suite setup, run the core regression tests, run all the remaining non-install tests, then run the install tests for each locale.
We don't call ninja directly, but I don't see why that should make a difference.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On Tue, Feb 13, 2024 at 10:01 AM Dave Cramer <davecramer@postgres.rocks> wrote:
> > postgres.exe!dsa_free(dsa_area * area, unsigned __int64 dp) Line 869 C
> postgres.exe!resize(dshash_table * hash_table, unsigned __int64 new_size_log2) Line 879 C
> postgres.exe!dshash_find_or_insert(dshash_table * hash_table, const void * key, bool * found) Line 480 C
> postgres.exe!pgstat_get_entry_ref(PgStat_Kind kind, unsigned int dboid, unsigned int objoid, bool create, bool * created_entry) Line 455 C
Hmm, we can see that the spinlock stuff is too weak (assumes TSO), but
there aren't any spinlocks near that code so I think something else is
wrong. What does pg_read_barrier() compile to? I mention that
because it's used near there in a somewhat complicated way. But I'm a
bit confused because by reading through the code it looks like it
should be too strong, not too weak. I think it should fall back to
pg_memory_barrier_impl() for lack of pg_read_barrier_impl(), which
should produce MemoryBarrier() and I guess DMB SY (ARM instruction for
full system data memory barrier). So maybe some other pg_atomic_XXX
operations are falling back to code that doesn't work.
Just for experimentation, you could see what happens if you redirect
all that stuff to C11's <stdatomic.h>. Here's a quick-and-dirty patch
for that, which works on my ARM Mac, but I have no idea if it'll work
on MSVC (our CI uses MSVC 2019; they only added <stdatomic.h> in MSVC
2022, but that's the same edition that added ARM so you must have it).
It's not a very finely tuned patch (for example I think several calls
should use the _explicit() variants and be weaker, but they err on the
side of being too strong, so the code they generate is probably not as
fast as it could be; I lost interested when the general idea was
rejected for now, but it might help learn something about the MSVC+ARM
situation).
Combined with the patch that redirects spinlocks to atomics[1], there
would be no 'hand rolled' code relating to memory order or atomics,
just standard modern C. There are obviously other architecture tests
here and there, and some of them will be wrong (some of them are wrong
already for MSVC on x86), and that might be fixed by project
standardisation[2].
[1] https://www.postgresql.org/message-id/CA%2BhUKGJ%2BoA%2B62iUZ-EQb5R2cAOW3Y942ZoOtzOD%3D1sQ05iNg6Q%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CA%2BhUKG%2BOu%2BC%2BpXioo_W1gKcwRCEN1WNrLBBPSMJyxjgc%2B1JEJA%40mail.gmail.com
On Sat, 28 Sept 2024 at 20:03, Thomas Munro <thomas.munro@gmail.com> wrote:On Tue, Feb 13, 2024 at 10:01 AM Dave Cramer <davecramer@postgres.rocks> wrote:
> > postgres.exe!dsa_free(dsa_area * area, unsigned __int64 dp) Line 869 C
> postgres.exe!resize(dshash_table * hash_table, unsigned __int64 new_size_log2) Line 879 C
> postgres.exe!dshash_find_or_insert(dshash_table * hash_table, const void * key, bool * found) Line 480 C
> postgres.exe!pgstat_get_entry_ref(PgStat_Kind kind, unsigned int dboid, unsigned int objoid, bool create, bool * created_entry) Line 455 C
Hmm, we can see that the spinlock stuff is too weak (assumes TSO), but
there aren't any spinlocks near that code so I think something else is
wrong. What does pg_read_barrier() compile to? I mention that
because it's used near there in a somewhat complicated way. But I'm a
bit confused because by reading through the code it looks like it
should be too strong, not too weak. I think it should fall back to
pg_memory_barrier_impl() for lack of pg_read_barrier_impl(), which
should produce MemoryBarrier() and I guess DMB SY (ARM instruction for
full system data memory barrier). So maybe some other pg_atomic_XXX
operations are falling back to code that doesn't work.
Just for experimentation, you could see what happens if you redirect
all that stuff to C11's <stdatomic.h>. Here's a quick-and-dirty patch
for that, which works on my ARM Mac, but I have no idea if it'll work
on MSVC (our CI uses MSVC 2019; they only added <stdatomic.h> in MSVC
2022, but that's the same edition that added ARM so you must have it).
It's not a very finely tuned patch (for example I think several calls
should use the _explicit() variants and be weaker, but they err on the
side of being too strong, so the code they generate is probably not as
fast as it could be; I lost interested when the general idea was
rejected for now, but it might help learn something about the MSVC+ARM
situation).
Combined with the patch that redirects spinlocks to atomics[1], there
would be no 'hand rolled' code relating to memory order or atomics,
just standard modern C. There are obviously other architecture tests
here and there, and some of them will be wrong (some of them are wrong
already for MSVC on x86), and that might be fixed by project
standardisation[2].
[1] https://www.postgresql.org/message-id/CA%2BhUKGJ%2BoA%2B62iUZ-EQb5R2cAOW3Y942ZoOtzOD%3D1sQ05iNg6Q%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CA%2BhUKG%2BOu%2BC%2BpXioo_W1gKcwRCEN1WNrLBBPSMJyxjgc%2B1JEJA%40mail.gmail.com
On Wed, Oct 9, 2024 at 12:58 PM Dave Cramer <davecramer@postgres.rocks> wrote: > ..\src\include/port/atomics/arch-x86.h:39: warning: "pg_memory_barrier_impl" redefined It shouldn't be including that. Does the patch here help? https://www.postgresql.org/message-id/CA%2BhUKG%2BOu%2BC%2BpXioo_W1gKcwRCEN1WNrLBBPSMJyxjgc%2B1JEJA%40mail.gmail.com