Thread: AIX support
Hello Team,
We are working on AIX systems and noticed that the thread on removing AIX support in Postgres going forward.
https://github.com/postgres/postgres/commit/0b16bb8776bb834eb1ef8204ca95dd7667ab948b”
We would be glad to understand any outstanding issues hindering the support on AIX.
It is important for us to have Postgres to be supported on AIX. As we are using Postgres extensively on AIX.
Also we would like to provide any feasible support from our end for enabling the support on AIX.
We would request the community to extend the support on AIX ..
Thanks & regards,
Sriram.
On 2024-Mar-21, Sriram RK wrote: > Hello Team, > > We are working on AIX systems and noticed that the thread on removing AIX support in Postgres going forward. > > https://github.com/postgres/postgres/commit/0b16bb8776bb834eb1ef8204ca95dd7667ab948b” > > We would be glad to understand any outstanding issues hindering the support on AIX. There's a Discussion link at the bottom of that commit message. I suggest you read that discussion complete, and consider how much effort you or your company are willing to spend on doing the maintenance of the port yourselves for the community. Maybe ponder this question: would it be less onerous to migrate your Postgres servers to Linux, like Phil Florent described on the currently-last message of that thread? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Para tener más hay que desear menos"
Sriram RK <sriram.rk@outlook.com> writes: > We are working on AIX systems and noticed that the thread on removing AIX support in Postgres going forward. > https://github.com/postgres/postgres/commit/0b16bb8776bb834eb1ef8204ca95dd7667ab948b > We would be glad to understand any outstanding issues hindering the > support on AIX. Did you read the linked thread? Starting say about here: https://www.postgresql.org/message-id/flat/20240224172345.32%40rfd.leadboat.com#8b41e50c2190c82c861d91644eed9c30 > Also we would like to provide any feasible support from our end for enabling the support on AIX. Who is "we", and how much resources are you prepared to put into this? > We would request the community to extend the support on AIX .. The community, in the sense of the existing people doing significant work on Postgres, are absolutely not going to do that. If you can bring a bunch of work to fix all the problems noted in the discussion thread, and commit to providing ongoing development manpower and hardware to keep it working, maybe something could happen. But I suspect you will find it cheaper to start thinking about migrating off AIX. regards, tom lane
Thanks, Tom and Alvaro, for the info.
We shall look into to details and get back.
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thursday, 21 March 2024 at 7:27 PM
To: Sriram RK <sriram.rk@outlook.com>
Cc: pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Subject: Re: AIX support
Sriram RK <sriram.rk@outlook.com> writes:
> We are working on AIX systems and noticed that the thread on removing AIX support in Postgres going forward.
> https://github.com/postgres/postgres/commit/0b16bb8776bb834eb1ef8204ca95dd7667ab948b
> We would be glad to understand any outstanding issues hindering the
> support on AIX.
Did you read the linked thread? Starting say about here:
https://www.postgresql.org/message-id/flat/20240224172345.32%40rfd.leadboat.com#8b41e50c2190c82c861d91644eed9c30
> Also we would like to provide any feasible support from our end for enabling the support on AIX.
Who is "we", and how much resources are you prepared to put into this?
> We would request the community to extend the support on AIX ..
The community, in the sense of the existing people doing significant
work on Postgres, are absolutely not going to do that. If you can
bring a bunch of work to fix all the problems noted in the discussion
thread, and commit to providing ongoing development manpower and
hardware to keep it working, maybe something could happen. But I
suspect you will find it cheaper to start thinking about migrating
off AIX.
regards, tom lane
Hi Team,
We are setting up the build environment and trying to build the source and also trying to analyze the assert from the Aix point of view.
Also, would like to know if we can access the buildfarm(power machines) to run any of the specific tests to hit this assert.
Thanks & regards,
Sriram.
> From: Sriram RK <sriram.rk@outlook.com>
> Date: Thursday, 21 March 2024 at 10:05 PM
> To: Tom Lane tgl@sss.pgh.pa.us, Alvaro Herrera <alvherre@alvh.no-ip.org>
> Cc: pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
> Subject: Re: AIX support
> Thanks, Tom and Alvaro, for the info.
> We shall look into to details and get back.
On Thu, Mar 28, 2024 at 11:09:43AM +0000, Sriram RK wrote: > We are setting up the build environment and trying to build the source and also trying to analyze the assert from the Aixpoint of view. The thread Alvaro and Tom cited contains an analysis. It's a compiler bug. You can get past the compiler bug by upgrading your compiler; both ibm-clang 17.1.1.2 and gcc 13.2.0 are free from the bug. > Also, would like to know if we can access the buildfarm(power machines) to run any of the specific tests to hit this assert. https://portal.cfarm.net/users/new/ is the form to request access. It lists the eligibility criteria.
On Fri, Mar 29, 2024 at 3:48 PM Noah Misch <noah@leadboat.com> wrote: > On Thu, Mar 28, 2024 at 11:09:43AM +0000, Sriram RK wrote: > > We are setting up the build environment and trying to build the source and also trying to analyze the assert from theAix point of view. > > The thread Alvaro and Tom cited contains an analysis. It's a compiler bug. > You can get past the compiler bug by upgrading your compiler; both ibm-clang > 17.1.1.2 and gcc 13.2.0 are free from the bug. For the specific issue that triggered that, I strongly suspect that it would go away if we just used smgrzeroextend() instead of smgrextend() using that variable with the alignment requirement, since, as far as I can tell from build farm clues, the otherwise similar function-local static variable used by the former (ie one that the linker must still control the location of AFAIK?) seems to work fine. But we didn't remove AIX because of that, it was just the straw that broke the camel's back.
Noah Misch <noah@leadboat.com> writes: > On Thu, Mar 28, 2024 at 11:09:43AM +0000, Sriram RK wrote: >> Also, would like to know if we can access the buildfarm(power machines) to run any of the specific tests to hit this assert. > https://portal.cfarm.net/users/new/ is the form to request access. It lists > the eligibility criteria. There might be some confusion here about what system we are talking about. The Postgres buildfarm is described at https://buildfarm.postgresql.org/index.html but it consists of a large number of individual machines run by individual owners. There would not be a lot of point in adding a new AIX machine to the Postgres buildfarm right now, because it would surely fail to build HEAD. What Noah is referencing is the GCC compile farm, which happens to include some AIX machines. The existing AIX entries in the Postgres buildfarm are run (by Noah) on those GCC compile farm machines, which really the GCC crowd have been *very* forgiving about letting us abuse like that. If you have your own AIX hardware there's not a lot of reason that you should need to access the GCC farm. What you do need to do to reproduce the described problems is check out the Postgres git tree and rewind to just before commit 0b16bb877, where we deleted AIX support. Any attempt to restore AIX support would have to start with reverting that commit (and perhaps the followup f0827b443). regards, tom lane
On Fri, Mar 29, 2024 at 4:00 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Fri, Mar 29, 2024 at 3:48 PM Noah Misch <noah@leadboat.com> wrote: > > The thread Alvaro and Tom cited contains an analysis. It's a compiler bug. > > You can get past the compiler bug by upgrading your compiler; both ibm-clang > > 17.1.1.2 and gcc 13.2.0 are free from the bug. > > For the specific issue that triggered that, I strongly suspect that it > would go away if we just used smgrzeroextend() instead of smgrextend() > using that variable with the alignment requirement, since, as far as I > can tell from build farm clues, the otherwise similar function-local > static variable used by the former (ie one that the linker must still > control the location of AFAIK?) seems to work fine. Oh, sorry, I had missed the part where newer compilers fix the issue too. Old out-of-support versions of AIX running old compilers, what fun.
Thomas Munro <thomas.munro@gmail.com> writes: > Oh, sorry, I had missed the part where newer compilers fix the issue > too. Old out-of-support versions of AIX running old compilers, what > fun. Indeed. One of the topics that needs investigation if you want to pursue this is which AIX system and compiler versions still deserve support, and which of the AIX hacks we had been carrying still need to be there based on that analysis. For context, we've been pruning support for extinct-in-the-wild OS versions pretty aggressively over the past couple of years, and I'd expect to apply the same standard to AIX. regards, tom lane
> What you do need to do to reproduce the described problems is
> check out the Postgres git tree and rewind to just before
> commit 0b16bb877, where we deleted AIX support. Any attempt
> to restore AIX support would have to start with reverting that
> commit (and perhaps the followup f0827b443).
> regards, tom lane
Hi Team, thank you for all the info.
We progressed to build the source on our nodes and the build was successful with the below configuration.
Postgres - github-bcdfa5f2e2f
AIX - 71c
Xlc - 13.1.0
Bison - 3.0.5
Going ahead, we want to build the changes that were removed as part of “0b16bb8776bb8”, with latest Xlc and gcc version.
We were building the source from the postgres ftp server(https://ftp.postgresql.org/pub/source/), would like to understand if there are any source level changes between the ftp server and the source on github?
Regards,
Sriram.
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Friday, 29 March 2024 at 9:03 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Noah Misch <noah@leadboat.com>, Sriram RK <sriram.rk@outlook.com>, Alvaro Herrera <alvherre@alvh.no-ip.org>, pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Subject: Re: AIX support
Thomas Munro <thomas.munro@gmail.com> writes:
> Oh, sorry, I had missed the part where newer compilers fix the issue
> too. Old out-of-support versions of AIX running old compilers, what
> fun.
Indeed. One of the topics that needs investigation if you want to
pursue this is which AIX system and compiler versions still deserve
support, and which of the AIX hacks we had been carrying still need
to be there based on that analysis. For context, we've been pruning
support for extinct-in-the-wild OS versions pretty aggressively
over the past couple of years, and I'd expect to apply the same
standard to AIX.
regards, tom lane
On Fri, Apr 05, 2024 at 04:12:06PM +0000, Sriram RK wrote: > > > What you do need to do to reproduce the described problems is > > check out the Postgres git tree and rewind to just before > > commit 0b16bb877, where we deleted AIX support. Any attempt > > to restore AIX support would have to start with reverting that > > commit (and perhaps the followup f0827b443). > Going ahead, we want to build the changes that were removed as part of “0b16bb8776bb8”, with latest Xlc and gcc version. > > We were building the source from the postgres ftp server(https://ftp.postgresql.org/pub/source/), would like to understandif there are any source level changes between the ftp server and the source on github? To succeed in this endeavor, you'll need to develop fluency in the tools to answer questions like that, or bring in someone who is fluent. In this case, GNU diff is the standard tool for answering your question. These resources cover other topics you would need to learn: https://wiki.postgresql.org/wiki/Developer_FAQ https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F
Thanks Noah and Team,
We (IBM-AIX team) looked into this issue
https://www.postgresql.org/message-id/20240225194322.a5@rfd.leadboat.com
This is related to the compiler issue. The compilers xlc(13.1) and gcc(8.0) have issues. But we verified that this issue is resolved with the newer compiler versions openXL(xlc17.1) and gcc(12.0) onwards.
We reported this issue to the xlc team and they have noted this issue. A fix might be possible in May for this issue in xlc v16. We would like to understand if the community can start using the latest compilers to build the source.
Also as part of the support, we will help in fixing all the issues related to AIX and continue to support AIX for Postgres. If we need another CI environment we can work to make one available. But for time being can we start reverting the patch that has removed AIX support.
We want to make a note that postgres is used extensively in our IBM product and is being exploited by multiple customers.
Please let us know if there are any specific details you'd like to discuss further.
Regards,
Sriram.
On 18 April 2024 14:15:43 GMT+03:00, Sriram RK <sriram.rk@outlook.com> wrote: >Thanks Noah and Team, > >We (IBM-AIX team) looked into this issue > >https://www.postgresql.org/message-id/20240225194322.a5@rfd.leadboat.com > >This is related to the compiler issue. The compilers xlc(13.1) and gcc(8.0) have issues. But we verified that this issueis resolved with the newer compiler versions openXL(xlc17.1) and gcc(12.0) onwards. > >We reported this issue to the xlc team and they have noted this issue. A fix might be possible in May for this issue inxlc v16. We would like to understand if the community can start using the latest compilers to build the source. > >Also as part of the support, we will help in fixing all the issues related to AIX and continue to support AIX for Postgres.If we need another CI environment we can work to make one available. But for time being can we start reverting thepatch that has removed AIX support. Let's start by setting up a new AIX buildfarm member. Regardless of what we do with v17, we continue to support AIX on thestable branches, and we really need a buildfarm member to keep the stable branches working anyway. >We want to make a note that postgres is used extensively in our IBM product and is being exploited by multiple customers. Noted. I'm glad to hear you are interested to put in some effort for this. The situation from the current maintainers isthat none of us have much interest, or resources or knowledge to keep the AIX build working, so we'll definitely need thehelp. No promises on v17, but let's at least make sure the stable branches keep working. And with the patches and buildfarm supportfrom you, maybe v17 is feasible too. - Heikki
> Let's start by setting up a new AIX buildfarm member. Regardless of what we do with v17, we continue to support AIX on the stable branches, and we really need a buildfarm member to keep the stable branches working anyway.
Thanks Heikki. We had already build the source code(v17+ bcdfa5f2e2f) on our local nodes. We will try to setup the buildfarm and let you know.
Is there any specific configuration we are looking for?
Regards,
Sriram.
Hi, On 2024-04-18 11:15:43 +0000, Sriram RK wrote: > We (IBM-AIX team) looked into this issue > > https://www.postgresql.org/message-id/20240225194322.a5@rfd.leadboat.com > > This is related to the compiler issue. The compilers xlc(13.1) and gcc(8.0) > have issues. But we verified that this issue is resolved with the newer > compiler versions openXL(xlc17.1) and gcc(12.0) onwards. The reason we used these compilers was that those were the only ones we had kinda somewhat reasonable access to, via the gcc projects' "compile farm" https://portal.cfarm.net/ We have to rely on whatever the aix machines there provide. They're not particularly plentiful resource-wise either. This is generally one of the big issues with AIX support. There are other niche-y OSs that don't have a lot of users, e.g. openbsd, but in contrast to AIX I can just start an openbsd VM within a few minutes and iron out whatever portability issue I'm dealing with. Not being AIX customers we also can't raise bugs about compiler bugs, so we're stuck doing bad workarounds. > Also as part of the support, we will help in fixing all the issues related > to AIX and continue to support AIX for Postgres. If we need another CI > environment we can work to make one available. But for time being can we > start reverting the patch that has removed AIX support. The state when was removed was not in a state that I am OK with adding back. > We want to make a note that postgres is used extensively in our IBM product > and is being exploited by multiple customers. To be blunt: Then it'd have been nice to see some investment in that before now. Both on the code level and the infrastructure level (i.e. access to machines etc). Greetings, Andres Freund
On Fri, Apr 19, 2024 at 6:01 AM Andres Freund <andres@anarazel.de> wrote: > On 2024-04-18 11:15:43 +0000, Sriram RK wrote: > > We (IBM-AIX team) looked into this issue > > > > https://www.postgresql.org/message-id/20240225194322.a5@rfd.leadboat.com > > > > This is related to the compiler issue. The compilers xlc(13.1) and gcc(8.0) > > have issues. But we verified that this issue is resolved with the newer > > compiler versions openXL(xlc17.1) and gcc(12.0) onwards. > > The reason we used these compilers was that those were the only ones we had > kinda somewhat reasonable access to, via the gcc projects' > "compile farm" https://portal.cfarm.net/ > We have to rely on whatever the aix machines there provide. They're not > particularly plentiful resource-wise either. To be fair, those OSUOSL machines are donated by IBM: https://osuosl.org/services/powerdev/ It's just that they seem to be mostly focused on supporting Linux on POWER, with only a couple of AIX hosts (partitions/virtual machines?) made available via portal.cfarm.net, and they only very recently added a modern AIX 7.3 host. That's cfarm119, upgraded in September-ish, long after many threads on this list that between-the-lines threatened to drop support. > This is generally one of the big issues with AIX support. There are other > niche-y OSs that don't have a lot of users, e.g. openbsd, but in contrast to > AIX I can just start an openbsd VM within a few minutes and iron out whatever > portability issue I'm dealing with. Yeah. It is a known secret that you can run AIX inside Qemu/kvm (it appears that IBM has made changes to it to make that possible, because earlier AIX versions didn't like Qemu's POWER emulation or virtualisation, there are blog posts about it), but IBM doesn't actually make the images available to non-POWER-hardware owners (you need a serial number). If I were an OS vendor and wanted developers to target my OS for free, at the very top of my TODO list I would have: provide an easy to use image for developers to be able to spin something up in minutes and possibly even use in CI systems. That's the reason I can fix any minor portability issue on Linux, illumos, *BSD quickly and Windows with only moderate extra pain. Even Oracle knows this, see Solaris CBE. > > We want to make a note that postgres is used extensively in our IBM product > > and is being exploited by multiple customers. > > To be blunt: Then it'd have been nice to see some investment in that before > now. Both on the code level and the infrastructure level (i.e. access to > machines etc). In the AIX space generally, there were even clues that funding had been completely cut even for packaging PostgreSQL. I was aware of two packaging projects (not sure how they were related): 1. The ATOS packaging group, who used to show up on our mailing lists and discuss code changes, which announced it was shutting down: https://github.com/power-devops/bullfreeware 2. And last time I looked a few months back, the IBM AIX Toolbox packaging project only had PostgreSQL 10 or 11 packages, already out of support by us, meaning that their maintainer had given up, too: https://www.ibm.com/support/pages/aix-toolbox-open-source-software-downloads-alpha However I see that recently (last month?) someone has added PostgreSQL 15, so something has only just reawoken there? There are quite a lot of threads about AIX problems, but they are almost all just us non-AIX-users trying to unbreak stupid stuff on the build farm, which at some points began to seem distinctly quixotic: chivalric hackers valiantly trying to keep the entire Unix family tree working even though we don't remember why and th versions involved are out of support even by the vendor. Of the three old giant commercial Unixes, HP-UX was dropped without another mention (it really was a windmill after all), Solaris is somehow easier to deal with (I could guess it's because it influenced Linux and BSD so much, ELF and linker details spring to mind), while AIX fails on every dimension: unrepresented by users, lacking in build farm, unavailable to non-customers, and unusual among Unixen.
For any complier/hardware related issue we should able to provide support.
We are in the process of identifying the AIX systems that can be added to the CI/buildfarm environment.
Regards,
Sriram.
On 19.04.24 13:04, Sriram RK wrote: > For any complier/hardware related issue we should able to provide support. > > We are in the process of identifying the AIX systems that can be added > to the CI/buildfarm environment. I think we should manage expectations here, if there is any hope of getting AIX support back into PG17. I have some sympathy for this. The discussion about removing AIX support had a very short turnaround and happened in an unrelated thread, without any sort of public announcement or consultation. So this report of "hey, we were still using that" is timely and fair. But the underlying issue that led to the removal (something to do with direct I/O support and alignment) would still need to be addressed. And this probably wouldn't just need some infrastructure support; it would require contributions from someone who actively knows how to develop on this platform. Now, direct I/O is currently sort of an experimental feature, so disabling it on AIX, as was initially suggested in that discussion, might be okay for now, but the issue will come up again. Even if this new buildfarm support is forthcoming, there has to be some sort of deadline in any resurrection attempts for PG17. The first beta date has been set for 23 May. If we are making the reinstatement of AIX support contingent on new buildfarm support, those machines need to be available, at least initially, at least for backbranches, like in a week. Which seems tight. I can see several ways going forward: 1. We revert the removal of AIX support and carry on with the status quo ante. (The removal of AIX is a regression; it is timely and in scope now to revert the change.) 2. Like (1), but we consider that notice has been given, and we will remove it early in PG18 (like August) unless the situation improves. 3. We leave it out of PG17 and consider a new AIX port for PG18 on its own merits. Note that such a "new" port would probably require quite a bit of development and research work, to clean up all the cruft that had accumulated over the years in the old port. Another looming issue is that the meson build system only supported AIX with gcc before the removal. I don't know what it would take to expand that to support xclang, but if it requires meson upstream work, you have that to do, too.
Peter Eisentraut <peter@eisentraut.org> writes: > I have some sympathy for this. The discussion about removing AIX > support had a very short turnaround and happened in an unrelated thread, > without any sort of public announcement or consultation. So this report > of "hey, we were still using that" is timely and fair. Yup, that's a totally fair complaint. Still ... > I can see several ways going forward: > 1. We revert the removal of AIX support and carry on with the status quo > ante. (The removal of AIX is a regression; it is timely and in scope > now to revert the change.) > 2. Like (1), but we consider that notice has been given, and we will > remove it early in PG18 (like August) unless the situation improves. > 3. We leave it out of PG17 and consider a new AIX port for PG18 on its > own merits. Andres has ably summarized the reasons why the status quo ante was getting untenable. The direct-I/O problem could have been tolerable on its own, but in reality it was the straw that broke the camel's back so far as our willingness to maintain AIX support went. There were just too many hacks and workarounds for too many problems, with too few people interested in looking for better answers. So I'm totally not in favor of #1, at least not without some hard commitments and follow-through on really cleaning up the mess (which maybe looks more like your #2). What's needed here, as you said, is for someone with a decent amount of expertise in modern AIX to review all the issues. Maybe framing that as a "new port" per #3 would be a good way to think about it. But I don't want to just revert the AIX-ectomy and continue drifting. On the whole, it wouldn't be the worst thing in the world if PG 17 lacks AIX support but that comes back in PG 18. That approach would solve the schedule-crunch aspect and give time for considered review of how many of the hacks removed in 0b16bb877 really need to be put back, versus being obsolete or amenable to a nicer solution in late-model AIX. If we take a "new port" mindset then it would be totally reasonable to say that it only supports very recent AIX releases, so I'd hope at least some of the cruft could be removed. regards, tom lane
Hi Team,
> I have some sympathy for this. The discussion about removing AIX
> support had a very short turnaround and happened in an unrelated thread,
> without any sort of public announcement or consultation. So this report
> of "hey, we were still using that" is timely and fair.
We would really like to thank you & the team for considering our request,
and really appreciate for providing all the possible options to support AIX.
> But the underlying issue that led to the removal (something to do with
> direct I/O support and alignment) would still need to be addressed.
As we already validated that these alignment specific issues are resolved
with the latest versions of the compilers (gcc/ibm-clang). We would request
you to use the latest versions for the build.
> If we are making the reinstatement of AIX
> support contingent on new buildfarm support, those machines need to be
> available, at least initially, at least for back branches, like in a
> week. Which seems tight.
We are already working with the internal team in procuring the nodes
for the buildfarm, which can be accessible by the community.
> I can see several ways going forward:
> 1. We revert the removal of AIX support and carry on with the status quo
> ante. (The removal of AIX is a regression; it is timely and in scope
> now to revert the change.)
> 2. Like (1), but we consider that notice has been given, and we will
> remove it early in PG18 (like August) unless the situation improves.
We would really appreciate you for providing the possible options
and we are very much inclined to these above approaches.
Regards,
Sriram.
On Sat Apr 20, 2024 at 10:42 AM CDT, Peter Eisentraut wrote: > > 3. We leave it out of PG17 and consider a new AIX port for PG18 on its > own merits. > > Note that such a "new" port would probably require quite a bit of > development and research work, to clean up all the cruft that had > accumulated over the years in the old port. Another looming issue is > that the meson build system only supported AIX with gcc before the > removal. I don't know what it would take to expand that to support > xclang, but if it requires meson upstream work, you have that to do, too. Happy to help advocate for any PRs from AIX folks on the Meson side. You can find me as @tristan957 on github. -- Tristan Partin Neon (https://neon.tech)
On Sat, Apr 20, 2024 at 12:25:47PM -0400, Tom Lane wrote: > > I can see several ways going forward: > > 1. We revert the removal of AIX support and carry on with the status quo > > ante. (The removal of AIX is a regression; it is timely and in scope > > now to revert the change.) > > 2. Like (1), but we consider that notice has been given, and we will > > remove it early in PG18 (like August) unless the situation improves. > > 3. We leave it out of PG17 and consider a new AIX port for PG18 on its > > own merits. > > Andres has ably summarized the reasons why the status quo ante was > getting untenable. The direct-I/O problem could have been tolerable > on its own, but in reality it was the straw that broke the camel's > back so far as our willingness to maintain AIX support went. There > were just too many hacks and workarounds for too many problems, > with too few people interested in looking for better answers. > > So I'm totally not in favor of #1, at least not without some hard > commitments and follow-through on really cleaning up the mess > (which maybe looks more like your #2). What's needed here, as > you said, is for someone with a decent amount of expertise in > modern AIX to review all the issues. Maybe framing that as a > "new port" per #3 would be a good way to think about it. But > I don't want to just revert the AIX-ectomy and continue drifting. > > On the whole, it wouldn't be the worst thing in the world if PG 17 > lacks AIX support but that comes back in PG 18. That approach would > solve the schedule-crunch aspect and give time for considered review > of how many of the hacks removed in 0b16bb877 really need to be put > back, versus being obsolete or amenable to a nicer solution in > late-model AIX. If we take a "new port" mindset then it would be > totally reasonable to say that it only supports very recent AIX > releases, so I'd hope at least some of the cruft could be removed. I agree that targeting PG 18 for a new-er AIX port is the reasonable approach. If there is huge demand, someone can create an AIX fork for PG 17 using the reverted patches --- yeah, lots of pain there, but we have carried the AIX pain for too long with too little support. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Wed, Apr 24, 2024 at 11:39:37PM -0400, Bruce Momjian wrote: > On Sat, Apr 20, 2024 at 12:25:47PM -0400, Tom Lane wrote: >> So I'm totally not in favor of #1, at least not without some hard >> commitments and follow-through on really cleaning up the mess >> (which maybe looks more like your #2). What's needed here, as >> you said, is for someone with a decent amount of expertise in >> modern AIX to review all the issues. Maybe framing that as a >> "new port" per #3 would be a good way to think about it. But >> I don't want to just revert the AIX-ectomy and continue drifting. >> >> On the whole, it wouldn't be the worst thing in the world if PG 17 >> lacks AIX support but that comes back in PG 18. That approach would >> solve the schedule-crunch aspect and give time for considered review >> of how many of the hacks removed in 0b16bb877 really need to be put >> back, versus being obsolete or amenable to a nicer solution in >> late-model AIX. If we take a "new port" mindset then it would be >> totally reasonable to say that it only supports very recent AIX >> releases, so I'd hope at least some of the cruft could be removed. > > I agree that targeting PG 18 for a new-er AIX port is the reasonable > approach. If there is huge demand, someone can create an AIX fork for > PG 17 using the reverted patches --- yeah, lots of pain there, but we > have carried the AIX pain for too long with too little support. Some of the portability changes removed in 0b16bb877 feel indeed obsolete, so it may not hurt to start an analysis from scratch to see the minimum amount of work that would be really required with the latest versions of xlc, using the newest compilers as a supported base. I'd like to think backporting these to stable branches should be OK at some point, once the new port is proving baked enough. Anyway, getting an access to such compilers to be able to debug issues on hosts that take less than 12h to just compile the code would certainly help its adoption. So seeing commitment in the form of patches and access to environments would help a lot. Overall, studying that afresh with v18 looks like a good idea, assuming that anybody who commits such patches has access to hosts to evaluate them, with buildfarm members running on top, of course. My 2c. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > Some of the portability changes removed in 0b16bb877 feel indeed > obsolete, so it may not hurt to start an analysis from scratch to see > the minimum amount of work that would be really required with the > latest versions of xlc, using the newest compilers as a supported > base. Something I've been mulling over is whether to suggest that the proposed "new port" should only target building with gcc. On the one hand, that would (I think) remove a number of annoying issues, and the average end user is unlikely to care which compiler their database server was built with. On the other hand, I'm a strong proponent of avoiding software monocultures, and xlc is one of the few C compilers still standing that aren't gcc or clang. It would definitely make sense for a new port to start by getting things going with gcc only, and then look at resurrecting xlc support. > I'd like to think backporting these to stable branches should > be OK at some point, once the new port is proving baked enough. If things go as I expect, the "new port" would effectively drop support for older AIX and/or older compiler versions. So back- porting seems like an unlikely decision. > Anyway, getting an access to such compilers to be able to debug issues > on hosts that take less than 12h to just compile the code would > certainly help its adoption. +many regards, tom lane
On Thu, Apr 25, 2024 at 12:20:05AM -0400, Tom Lane wrote: > It would definitely make sense for a new port to start by getting > things going with gcc only, and then look at resurrecting xlc > support. Sriram mentioned upthread that he was looking at both of them. I'd be ready to assume that most of the interest is in xlc, not gcc. But I may be wrong. Saying that, dividing the effort into more successive steps is sensible here (didn't consider that previously, you have a good point). -- Michael
Attachment
On 25.04.24 06:20, Tom Lane wrote: > Something I've been mulling over is whether to suggest that the > proposed "new port" should only target building with gcc. > > On the one hand, that would (I think) remove a number of annoying > issues, and the average end user is unlikely to care which compiler > their database server was built with. On the other hand, I'm a strong > proponent of avoiding software monocultures, and xlc is one of the few > C compilers still standing that aren't gcc or clang. My understanding is that the old xlc is dead and has been replaced by "xlclang", which is presumably an xlc-compatible frontend on top of clang/llvm. Hopefully, that will have fewer issues.
On 2024-Apr-24, Bruce Momjian wrote: > I agree that targeting PG 18 for a new-er AIX port is the reasonable > approach. If there is huge demand, someone can create an AIX fork for > PG 17 using the reverted patches --- yeah, lots of pain there, but we > have carried the AIX pain for too long with too little support. I'm not sure how large the demand would be for an AIX port specifically of 17, though. I mean, people using older versions can continue to use 16 until 18 is released. Upgrading past one major version is hardly unheard of. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "If you want to have good ideas, you must have many ideas. Most of them will be wrong, and what you have to learn is which ones to throw away." (Linus Pauling)
Peter Eisentraut <peter@eisentraut.org> writes: > On 25.04.24 06:20, Tom Lane wrote: >> Something I've been mulling over is whether to suggest that the >> proposed "new port" should only target building with gcc. > My understanding is that the old xlc is dead and has been replaced by > "xlclang", which is presumably an xlc-compatible frontend on top of > clang/llvm. Hopefully, that will have fewer issues. [ googles... ] Actually it seems to be the other way around: per [1] xlclang is a clang-based front end to IBM's existing codegen+optimization logic, and the xlc front end is still there too. It's not at all clear that they have any intention of killing off xlc. Not sure where that leaves us in terms of either one being an interesting target to support. xlclang is presumably an easier lift to get working, but that also makes it much less interesting from the preserve-our-portability standpoint. regards, tom lane [1] https://www.ibm.com/docs/en/xl-c-and-cpp-aix/16.1?topic=new-clang-based-front-end
Hi, On 2024-04-25 00:20:05 -0400, Tom Lane wrote: > Something I've been mulling over is whether to suggest that the > proposed "new port" should only target building with gcc. Yes. I also wonder if such a port should only support building with sysv style shared library support, rather than the AIX (and windows) style. That'd make it considerably less impactful on the buildsystem level. I don't know what the performance difference is these days. Greetings, Andres Freund
On Thu, Apr 25, 2024 at 10:16:34AM +0200, Álvaro Herrera wrote: > On 2024-Apr-24, Bruce Momjian wrote: > > > I agree that targeting PG 18 for a new-er AIX port is the reasonable > > approach. If there is huge demand, someone can create an AIX fork for > > PG 17 using the reverted patches --- yeah, lots of pain there, but we > > have carried the AIX pain for too long with too little support. > > I'm not sure how large the demand would be for an AIX port specifically > of 17, though. I mean, people using older versions can continue to use > 16 until 18 is released. Upgrading past one major version is hardly > unheard of. Agreed. They seem to have packages for 11/12, and only 15 recently. I don't see how PG 17 would be missed, unless there are many people compiling from source. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Thu, Apr 25, 2024 at 01:06:24PM +0900, Michael Paquier wrote: > Anyway, getting an access to such compilers to be able to debug issues > on hosts that take less than 12h to just compile the code would > certainly help its adoption. So seeing commitment in the form of > patches and access to environments would help a lot. Overall, > studying that afresh with v18 looks like a good idea, assuming that > anybody who commits such patches has access to hosts to evaluate them, > with buildfarm members running on top, of course. Agreed. They can't even have buildfarm member for PG 17 since it doesn't compile anymore, so someone has to go over the reverted patch, figure out which ones are still valid, and report back. Trying to add a port, with possible breakage, during beta seems too risky compared to the value. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
> > It would definitely make sense for a new port to start by getting
> > things going with gcc only, and then look at resurrecting xlc
> > support.
> Sriram mentioned upthread that he was looking at both of them. I'd be
> ready to assume that most of the interest is in xlc, not gcc. But I
> may be wrong.
Just a heads-up, we got a node in the OSU lab for the buildfarm. Will let you know once we have the buildfarm setup on that node.
Also, we are making progress on setting up the buildfarm on a local node as well.
But currently there are some tests failing, seems some issue with plperl.
aix01::build-farm-17#
./run_build.pl --keepall --nosend --nostatus --verbose=5 --force REL_16_STABLE
Fri Apr 26 00:53:50 2024: buildfarm run for AIXnode01:REL_16_STABLE starting
AIXnode01:REL_16_STABLE [00:53:50] checking out source ...
AIXnode01:REL_16_STABLE [00:53:56] checking if build run needed ...
AIXnode01:REL_16_STABLE [00:53:56] copying source to pgsql.build ...
AIXnode01:REL_16_STABLE [00:54:08] running configure ...
AIXnode01:REL_16_STABLE [00:55:01] running build ...
AIXnode01:REL_16_STABLE [01:08:09] running basic regression tests ...
AIXnode01:REL_16_STABLE [01:09:51] running make contrib ...
AIXnode01:REL_16_STABLE [01:11:08] running make testmodules ...
AIXnode01:REL_16_STABLE [01:11:19] running install ...
AIXnode01:REL_16_STABLE [01:11:48] running make contrib install ...
AIXnode01:REL_16_STABLE [01:12:01] running testmodules install ...
AIXnode01:REL_16_STABLE [01:12:06] checking test-decoding
gmake: gcc: A file or directory in the path name does not exist.
AIXnode01:REL_16_STABLE [01:12:28] running make check miscellaneous modules ...
gmake: gcc: A file or directory in the path name does not exist.
AIXnode01:REL_16_STABLE [01:13:50] setting up db cluster (C)...
AIXnode01:REL_16_STABLE [01:13:53] starting db (C)...
AIXnode01:REL_16_STABLE [01:13:53] running installcheck (C)...
AIXnode01:REL_16_STABLE [01:15:05] restarting db (C)...
AIXnode01:REL_16_STABLE [01:15:07] running make isolation check ...
AIXnode01:REL_16_STABLE [01:15:51] restarting db (C)...
AIXnode01:REL_16_STABLE [01:15:56] running make PL installcheck (C)...
Branch: REL_16_STABLE
Stage PLCheck-C failed with status 2
Hi Team,
There are couple of updates, firstly we got an AIX node on the OSU lab.
Please feel free to reach me, so that we can provide access to the node.
We have started working on setting up the buildfarm on that node.
Secondly, as part of the buildfarm setup on our local nodes, we are hitting
an issue with the plperl extension. In the logs we noticed that when the
plperl extension is being created, it is failing to load the perl library.
- CREATE EXTENSION plperlu;
+ server closed the connection unexpectedly
+ This probably means the server terminated abnormally
+ before or while processing the request.
+ connection to server was lost
In the logfile we could see these
2024-05-04 05:05:17.537 CDT [3997786:17] pg_regress/plperl_setup LOG: statement: CREATE EXTENSION plperl;
Util.c: loadable library and perl binaries are mismatched (got first handshake key 9b80080, needed 9a80080)
We tried to resolve some of the suggestions mentioned here, but things couldn’t resolve.
Any inputs here would be greatly appreciated.
Regards,
Sriram.
Hi Team, on further investigation we were able to resolve the perl issue by setting the right PERL env location. Earlier it was pointing to the 32bit perl, as a result the perl lib mismatch seems to be happening.
Now we have successfully built release 15 and 16 stable branches on the OSU lab node.
p9aix (OSU)
OS: AIX 72Z
RELEASE 16
p9aix:REL_16_STABLE [08:31:26] OK
======== log passed to send_result ===========
Branch: REL_16_STABLE
All stages succeeded
RELEASE 15
p9aix:REL_15_STABLE [08:55:37] OK
======== log passed to send_result ===========
Branch: REL_15_STABLE
All stages succeeded
Also, we had successfully built release 16 branch on our local nodes as well
OS: AIX 71C
pgsql-aix71C:REL_16_STABLE [02:25:32] OK
======== log passed to send_result ===========
Branch: REL_16_STABLE
All stages succeeded
OS: AIX72Z
pgsql-aix72Z:REL_16_STABLE [02:35:03] OK
======== log passed to send_result ===========
Branch: REL_16_STABLE
All stages succeeded
OS: AIX73D
pgsql-aix73D:REL_16_STABLE [05:32:29] OK
======== log passed to send_result ===========
Branch: REL_16_STABLE
All stages succeeded
Regards,
Sriram.
Hi Team, We have the AIX node ready in OSU lab, and the branches 15 and 16 got build on the node. We had raised a request to register this node as buildfarm member. Yet to receive the approval.
We would like to understand your inputs/plans on reverting the changes for AIX.
Thanks,
Sriram.
On 08.05.24 13:39, Sriram RK wrote: > We would like to understand your inputs/plans on reverting the changes > for AIX. I think the ship has sailed for PG17. The way forward would be that you submit a patch for new, modernized AIX support for PG18.
On Wed, May 8, 2024 at 03:44:12PM +0200, Peter Eisentraut wrote: > On 08.05.24 13:39, Sriram RK wrote: > > We would like to understand your inputs/plans on reverting the changes > > for AIX. > > I think the ship has sailed for PG17. The way forward would be that you > submit a patch for new, modernized AIX support for PG18. Yes, I think we were clear that someone needs to review the reverted patch and figure out which parts are still needed, and why. We have no "plans" to restore support. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Hi Team, we have any updated from the XLC team, the issue specific to the alignment is fixed
and XLC had released it as part of 16.1.0.18. The PTF is available at the below location,
You can also find a link here:
https://www.ibm.com/support/pages/fix-list-xl-cc-aix.
>>/opt/IBM/xlC/16.1.0/bin/xlC align.c -o align.xl
>>./align.xl
al4096 4096 @ 0x20008000 (mod 0)
al4096_initialized 4096 @ 0x20004000 (mod 0)
al4096_const 4096 @ 0x2000b000 (mod 0)
al4096_const_initialized 4096 @ 0x10008000 (mod 0)
al4096_static 4096 @ 0x2000e000 (mod 0)
al4096_static_initialized 4096 @ 0x20001000 (mod 0)
al4096_static_const 4096 @ 0x20011000 (mod 0)
al4096_static_const_initialized 4096 @ 0x10001000 (mod 0)
Also would like to know some info related to the request raised for buildfarm access, to register the node in OSU lab. Where can I get the status of the request? Whom can I contact to get the request approved? So that we can add the node to the buildfarm.
Regards,
Sriram.
On Wed, May 15, 2024 at 03:33:25PM +0000, Sriram RK wrote: > Hi Team, we have any updated from the XLC team, the issue specific to the alignment is fixed > and XLC had released it as part of 16.1.0.18. The PTF is available at the below location, > > You can also find a link here: > https://www.ibm.com/support/pages/fix-list-xl-cc-aix. > > >>/opt/IBM/xlC/16.1.0/bin/xlC align.c -o align.xl > > >>./align.xl > al4096 4096 @ 0x20008000 (mod 0) > al4096_initialized 4096 @ 0x20004000 (mod 0) > al4096_const 4096 @ 0x2000b000 (mod 0) > al4096_const_initialized 4096 @ 0x10008000 (mod 0) > al4096_static 4096 @ 0x2000e000 (mod 0) > al4096_static_initialized 4096 @ 0x20001000 (mod 0) > al4096_static_const 4096 @ 0x20011000 (mod 0) > al4096_static_const_initialized 4096 @ 0x10001000 (mod 0) That is good news. PGIOAlignedBlock is now in the IBM publication, https://www.ibm.com/support/pages/apar/IJ51032 > Also would like to know some info related to the request raised for buildfarm access, to register the node in OSU lab.Where can I get the status of the request? Whom can I contact to get the request approved? So that we can add the nodeto the buildfarm. I assume you filled out the form at https://buildfarm.postgresql.org/cgi-bin/register-form.pl? It can take a few weeks, so I wouldn't worry yet.
> > Also would like to know some info related to the request raised for buildfarm access, to register the
> > node in OSU lab. Where can I get the status of the request? Whom can I contact to get the request
> > approved? So that we can add the node to the buildfarm.
> I assume you filled out the form at
> https://buildfarm.postgresql.org/cgi-bin/register-form.pl? It can take a few
> weeks, so I wouldn't worry yet.
Thanks Noha, I had already submitted a form a week back, hope it might take another couple of weeks to get it approved.
Hi Team,
We have an update wrt to the PG17 AIX port.
We have reverted the changes specific to AIX (that were removed in 0b16bb8776bb8) to the latest PG17 (head).
The Buildfarm succeeded for these changes. All the tests passed.
System config
OS level : AIX-73D
Compiler : gcc-12 & xlc(16.1.0.18)
Wed May 15 21:26:00 2024: buildfarm run for AIXnode01:HEAD starting
AIXnode01:HEAD [21:26:00] running configure ...
AIXnode01:HEAD [21:27:03] running build ...
AIXnode01:HEAD [21:27:27] running basic regression tests ...
AIXnode01:HEAD [21:34:41] running make contrib ...
AIXnode01:HEAD [21:34:43] running make testmodules ...
AIXnode01:HEAD [21:34:44] running install ...
AIXnode01:HEAD [21:34:58] running make contrib install ...
AIXnode01:HEAD [21:35:05] running testmodules install ...
AIXnode01:HEAD [21:35:08] checking pg_upgrade
AIXnode01:HEAD [21:35:08] checking test-decoding
AIXnode01:HEAD [21:35:29] running make check miscellaneous modules ...
AIXnode01:HEAD [21:36:16] setting up db cluster (C)...
AIXnode01:HEAD [21:36:19] starting db (C)...
AIXnode01:HEAD [21:36:19] running installcheck (C)...
AIXnode01:HEAD [21:46:27] restarting db (C)...
AIXnode01:HEAD [21:46:29] running make isolation check ...
AIXnode01:HEAD [21:49:57] restarting db (C)...
AIXnode01:HEAD [21:50:02] running make PL installcheck (C)...
AIXnode01:HEAD [21:50:09] restarting db (C)...
AIXnode01:HEAD [21:50:12] running make contrib installcheck (C)...
AIXnode01:HEAD [21:53:53] restarting db (C)...
AIXnode01:HEAD [21:53:56] running make test-modules installcheck (C)...
AIXnode01:HEAD [21:54:28] stopping db (C)...
AIXnode01:HEAD [21:54:29] running make ecpg check ...
AIXnode01:HEAD [21:54:45] OK
Branch: HEAD
All stages succeeded
The below changes are applied on this commit level
commit 54b69f1bd730a228a666441592a12d2a0cbe2a06 (HEAD -> pgAIX, origin/master, origin/HEAD, master)
On branch pgAIX
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: src/backend/port/aix/mkldexport.sh
new file: src/include/port/aix.h
new file: src/makefiles/Makefile.aix
new file: src/template/aix
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: Makefile
modified: config/c-compiler.m4
modified: configure
modified: configure.ac
modified: doc/src/sgml/dfunc.sgml
modified: doc/src/sgml/installation.sgml
modified: doc/src/sgml/runtime.sgml
modified: meson.build
modified: src/Makefile.shlib
modified: src/backend/Makefile
modified: src/backend/meson.build
modified: src/backend/storage/buffer/bufmgr.c
modified: src/backend/utils/error/elog.c
modified: src/backend/utils/misc/ps_status.c
modified: src/bin/pg_basebackup/t/010_pg_basebackup.pl
modified: src/bin/pg_verifybackup/t/008_untar.pl
modified: src/bin/pg_verifybackup/t/010_client_untar.pl
modified: src/include/c.h
modified: src/include/port/atomics.h
modified: src/include/storage/s_lock.h
modified: src/interfaces/libpq/Makefile
modified: src/interfaces/libpq/meson.build
modified: src/port/README
modified: src/port/strerror.c
modified: src/test/regress/Makefile
modified: src/test/regress/expected/sanity_check.out
modified: src/test/regress/expected/test_setup.out
modified: src/test/regress/regress.c
modified: src/test/regress/sql/sanity_check.sql
modified: src/test/regress/sql/test_setup.sql
modified: src/tools/gen_export.pl
modified: src/tools/pginclude/headerscheck
Can you please let us know, the process to post the changes for review?
Regards,
Sriram.
On 2024-May-16, Sriram RK wrote: > Hi Team, > > We have an update wrt to the PG17 AIX port. > > We have reverted the changes specific to AIX (that were removed in 0b16bb8776bb8) to the latest PG17 (head). > > The Buildfarm succeeded for these changes. All the tests passed. Excellent. > Can you please let us know, the process to post the changes for review? Here's some very good advice https://postgr.es/m/20240405172649.d1@rfd.leadboat.com Regards -- Álvaro Herrera
Thanks Alvaro, for the info…
Hi Team,
We referred to the below links to build this patch …
https://wiki.postgresql.org/wiki/Submitting_a_Patch
https://peter.eisentraut.org/blog/2023/05/09/how-to-submit-a-patch-by-email-2023-edition
Please find the attached patch.
Apart from the AIX specific changes, there is a minor change in this file wrt to XLC, below is the error for which we removed inline.
Later, the build and tests passed for both XLC(16.1.0.18) and gcc(12) as well.
src/backend/storage/buffer/bufmgr.c
"bufmgr.c", line 811.39: 1506-780 (S) Reference to "RelationGetSmgr" with internal linkage is not allowed within inline definition of "ReadBufferExtended".
"bufmgr.c", line 811.15: 1506-780 (S) Reference to "ReadBuffer_common" with internal linkage is not allowed within inline definition of "ReadBufferExtended".
gmake[4]: *** [<builtin>: bufmgr.o] Error 1
Please let us know your feedback.
Thanks,
Sriram.
Attachment
On 22.05.24 18:15, Sriram RK wrote: > Please find the attached patch. > > Apart from the AIX specific changes, there is a minor change in this > file wrt to XLC, below is the error for which we removed inline. > > Later, the build and tests passed for both XLC(16.1.0.18) and gcc(12) as > well. I think what you should do next is aggressively trim anything that does not apply to current versions of AIX or the current compiler. For example, + # Old xlc versions (<13.1) don't have support for -qvisibility. Use expfull to force + <para> + <productname>AIX</productname> versions before 7.1 are no longer + tested nor supported by the <productname>PostgreSQL</productname> + community. + </para> (Probably most of that section needs to be retested and rewritten.) + # Native memset() is faster, tested on: + # - AIX 5.1 and 5.2, XLC 6.0 (IBM's cc) + # - AIX 5.3 ML3, gcc 4.0.1 + memset_loop_limit = 0 + # for the base executable (AIX 4.2 and up) + * "IBM XL C/C++ for AIX, V12.1" miscompiles, for 32-bit, some inline One of the reasons that the AIX port ultimately became unmaintainable was that so many hacks and caveats were accumulated over the years. A new port should set a more recent baseline and trim all those hacks.
Hi Peter, thanks for your feedback.
We are eager to extend our support in resolving the issues specific to AIX or corresponding
compilers (XLC and cLang).
But as there are no issues with the patch after reverting the changes(with the latest compilers
gcc12 and xlc-16.0.1.18), we were wondering if this patch can be merged with the current release 17??
Having said that, we are committed to resolve all the hacks and caveats that got
accumulated specific to AIX over the period by picking and resolving one after the other,
rather than waiting for all the hacks to be fixed.
> One of the reasons that the AIX port ultimately became unmaintainable
> was that so many hacks and caveats were accumulated over the years. A
> new port should set a more recent baseline and trim all those hacks.
Please help me understand this, with respect to the AIX specific hacks, is it just we can find
all the location where _AIX macros are involved OR can we just look at the patch changes only, as all
the changes that were made were specific to AIX. If not, is there any other location where
we could find all the hacks to be resolved.
Can you provide some more details on the expectations here?
Warm regards,
Sriram.
On 23/05/2024 18:36, Srirama Kucherlapati wrote: > Hi Peter, thanks for your feedback. > > We are eager to extend our support in resolving the issues specific > to AIX or corresponding compilers (XLC and cLang). > > But as there are no issues with the patch after reverting the > changes(with the latest compilers gcc12 and xlc-16.0.1.18), we were > wondering if this patch can be merged with the current release 17?? > > Having said that, we are committed to resolve all the hacks and > caveats that got accumulated specific to AIX over the period by > picking and resolving one after the other, rather than waiting for > all the hacks to be fixed. I'm not eager to put back those hacks just to have them be removed again. So I'd like to see a minimal patch, with the *minimal* changes required for AIX support. And perhaps split that into two patches: First add back AIX support with GCC, and second patch to add XLC support. I'd like to to see how much of the changes are because of the different compiler and how much from the OS. No promises for v17, but if the patch is small and non-intrusive, I would consider it at least. But let's see what it looks like first. It's the same work that needs to be done whether it goes into v17 or v18 anyway. >> One of the reasons that the AIX port ultimately became >> unmaintainable was that so many hacks and caveats were accumulated >> over the years. A new port should set a more recent baseline and >> trim all those hacks. > > Please help me understand this, with respect to the AIX specific > hacks, is it just we can find all the location where _AIX macros are > involved OR can we just look at the patch changes only, as all the > changes that were made were specific to AIX. If not, is there any > other location where we could find all the hacks to be resolved. > > Can you provide some more details on the expectations here? Smallest possible patch that makes Postgres work on AIX again. Perhaps start with the patch you posted yesterday, but remove hunks from it one by one, to see which ones are still needed. -- Heikki Linnakangas Neon (https://neon.tech)
On Thu, May 23, 2024 at 07:03:20PM +0300, Heikki Linnakangas wrote: > > Can you provide some more details on the expectations here? > > Smallest possible patch that makes Postgres work on AIX again. > > Perhaps start with the patch you posted yesterday, but remove hunks from it > one by one, to see which ones are still needed. Yes, bingo, that is exactly what needs to be done, and for the minimal compiler, gcc, and the most recently supported versions of AIX. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Hi Team, We are pursuing to trim the changes wrt AIX. As of now we trimmed
the changes with respect to XLC and currently with trimmed changes the
buildfarm script passed (build and all the regression tests)
The XLC changes were trimmed only in the below file
modified: configure
modified: configure.ac
We are looking further into the other file changes as well.
Warm regards,
Sriram.
On Fri, 2024-06-07 at 16:30 +0000, Srirama Kucherlapati wrote: > Hi Team, We are pursuing to trim the changes wrt AIX. As of now we trimmed > the changes with respect to XLC and currently with trimmed changes the > buildfarm script passed (build and all the regression tests) > The XLC changes were trimmed only in the below file > modified: configure > modified: configure.ac Did you forget an attachment? Yours, Laurenz Albe
Hi Laurenz, we are working on the other file changes, we shall post you the updates soon.
Warm regards,
Sriram.
Hi Team,
Please find the attached patch, which resumes the AIX support with gcc alone. We have
removed changes wrt to XLC on AIX.
We are also continuing to work on the XLC and IBM-clang(openXLC) specific patch as well.
Once we get an approval for the above patch we can submit a subsequent patch to support
XLC/IBM-clang changes.
Kindly let us know your inputs/feedback.
Warm regards,
Sriram.
Attachment
On 19/06/2024 17:55, Srirama Kucherlapati wrote: > +/* Commenting for XLC > + * "IBM XL C/C++ for AIX, V12.1" miscompiles, for 32-bit, some inline > + * expansions of ginCompareItemPointers() "long long" arithmetic. To take > + * advantage of inlining, build a 64-bit PostgreSQL. > +#if defined(__ILP32__) && defined(__IBMC__) > +#define PG_FORCE_DISABLE_INLINE > +#endif > + */ This seems irrelevant. > + * Ordinarily, we'd code the branches here using GNU-style local symbols, that > + * is "1f" referencing "1:" and so on. But some people run gcc on AIX with > + * IBM's assembler as backend, and IBM's assembler doesn't do local symbols. > + * So hand-code the branch offsets; fortunately, all PPC instructions are > + * exactly 4 bytes each, so it's not too hard to count. Could you use GCC assembler to avoid this? > @@ -662,6 +666,21 @@ tas(volatile slock_t *lock) > > #if !defined(HAS_TEST_AND_SET) /* We didn't trigger above, let's try here */ > > +#if defined(_AIX) /* AIX */ > +/* > + * AIX (POWER) > + */ > +#define HAS_TEST_AND_SET > + > +#include <sys/atomic_op.h> > + > +typedef int slock_t; > + > +#define TAS(lock) _check_lock((slock_t *) (lock), 0, 1) > +#define S_UNLOCK(lock) _clear_lock((slock_t *) (lock), 0) > +#endif /* _AIX */ > + > + > /* These are in sunstudio_(sparc|x86).s */ > > #if defined(__SUNPRO_C) && (defined(__i386) || defined(__x86_64__) || defined(__sparc__) || defined(__sparc)) What CPI/compiler/OS configuration is this for, exactly? Could we rely on GCC-provided __sync_lock_test_and_set() builtin function instead? > +# Allow platforms with buggy compilers to force restrict to not be > +# used by setting $FORCE_DISABLE_RESTRICT=yes in the relevant > +# template. Surely we don't need that anymore? Or is the compiler still buggy? Do you still care about 32-bit binaries on AIX? If not, let's make that the default in configure or a check for it, and remove the instructions on building 32-bit binaries from the docs. Please try hard to remove any changes from the diff that are not absolutely necessary. - Heikki
Thanks Hikki, for going through the changes.
> +/* Commenting for XLC
> + * "IBM XL C/C++ for AIX, V12.1" miscompiles, for 32-bit, some inline
> + * expansions of ginCompareItemPointers() "long long" arithmetic. To take
> + * advantage of inlining, build a 64-bit PostgreSQL.
> +#if defined(__ILP32__) && defined(__IBMC__)
> +#define PG_FORCE_DISABLE_INLINE
> +#endif
> + */
I can remove these unwanted comments.
I have to analyze the changes for the rest of your comment and will get back to you.
Warm regards,
Sriram.
From: Heikki Linnakangas <hlinnaka@gmail.com>
Date: Wednesday, 19 June 2024 at 8:45 PM
To: Srirama Kucherlapati <sriram.rk@in.ibm.com>, Laurenz Albe <laurenz.albe@cybertec.at>, Bruce Momjian <bruce@momjian.us>, Heikki Linnakangas <hlinnaka@iki.fi>
Cc: Peter Eisentraut <peter@eisentraut.org>, Alvaro Herrera <alvherre@alvh.no-ip.org>, pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>, Noah Misch <noah@leadboat.com>, Michael Paquier <michael@paquier.xyz>, Andres Freund <andres@anarazel.de>, Tom Lane <tgl@sss.pgh.pa.us>, Thomas Munro <thomas.munro@gmail.com>, tvk1271@gmail.com <tvk1271@gmail.com>, postgres-ibm-aix@wwpdl.vnet.ibm.com <postgres-ibm-aix@wwpdl.vnet.ibm.com>
Subject: [EXTERNAL] Re: AIX support
On 19/06/2024 17:55, Srirama Kucherlapati wrote:
> +/* Commenting for XLC
> + * "IBM XL C/C++ for AIX, V12.1" miscompiles, for 32-bit, some inline
> + * expansions of ginCompareItemPointers() "long long" arithmetic. To take
> + * advantage of inlining, build a 64-bit PostgreSQL.
> +#if defined(__ILP32__) && defined(__IBMC__)
> +#define PG_FORCE_DISABLE_INLINE
> +#endif
> + */
This seems irrelevant.
> + * Ordinarily, we'd code the branches here using GNU-style local symbols, that
> + * is "1f" referencing "1:" and so on. But some people run gcc on AIX with
> + * IBM's assembler as backend, and IBM's assembler doesn't do local symbols.
> + * So hand-code the branch offsets; fortunately, all PPC instructions are
> + * exactly 4 bytes each, so it's not too hard to count.
Could you use GCC assembler to avoid this?
> @@ -662,6 +666,21 @@ tas(volatile slock_t *lock)
>
> #if !defined(HAS_TEST_AND_SET) /* We didn't trigger above, let's try here */
>
> +#if defined(_AIX) /* AIX */
> +/*
> + * AIX (POWER)
> + */
> +#define HAS_TEST_AND_SET
> +
> +#include <sys/atomic_op.h>
> +
> +typedef int slock_t;
> +
> +#define TAS(lock) _check_lock((slock_t *) (lock), 0, 1)
> +#define S_UNLOCK(lock) _clear_lock((slock_t *) (lock), 0)
> +#endif /* _AIX */
> +
> +
> /* These are in sunstudio_(sparc|x86).s */
>
> #if defined(__SUNPRO_C) && (defined(__i386) || defined(__x86_64__) || defined(__sparc__) || defined(__sparc))
What CPI/compiler/OS configuration is this for, exactly? Could we rely
on GCC-provided __sync_lock_test_and_set() builtin function instead?
> +# Allow platforms with buggy compilers to force restrict to not be
> +# used by setting $FORCE_DISABLE_RESTRICT=yes in the relevant
> +# template.
Surely we don't need that anymore? Or is the compiler still buggy?
Do you still care about 32-bit binaries on AIX? If not, let's make that
the default in configure or a check for it, and remove the instructions
on building 32-bit binaries from the docs.
Please try hard to remove any changes from the diff that are not
absolutely necessary.
- Heikki
We are continuing to work on the changes…
> Do you still care about 32-bit binaries on AIX? If not, let's make that
> the default in configure or a check for it, and remove the instructions
> on building 32-bit binaries from the docs.
As most of the products are moving towards 64bit, we will try to remove
this 32bit support.
Warm regards,
Sriram.
Hi Heikki & Team,
I tried to look at the assembly code changes with our team, in the below file.
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index 29ac6cdcd9..69582f4ae7 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
static __inline__ int
tas(volatile slock_t *lock)
@@ -424,17 +430,15 @@ tas(volatile slock_t *lock)
__asm__ __volatile__(
" lwarx %0,0,%3,1 \n"
" cmpwi %0,0 \n"
" bne $+16 \n" /* branch to li %1,1 */
" addi %0,%0,1 \n"
" stwcx. %0,0,%3 \n"
" beq $+12 \n" /* branch to lwsync */
" li %1,1 \n"
" b $+12 \n" /* branch to end of asm sequence */
" lwsync \n"
" li %1,0 \n"
: "=&b"(_t), "=r"(_res), "+m"(*lock)
: "r"(lock)
: "memory", "cc");
For the changes in the above file, this code is very specific to power architecture we need to use the IBM Power specific asm code only, rather than using the GNU assembler. Also, all these asm specific code is under the macro __ppc__, which should not impact any other platforms. I see there is a GCC specific implementation (under this macro #if defined(HAVE_GCC__SYNC_INT32_TAS)) in the same file as well.
+#define TAS(lock) _check_lock((slock_t *) (lock), 0, 1)
+#define S_UNLOCK(lock) _clear_lock((slock_t *) (lock), 0)
The above changes are specific to AIX kernel and it operates on fixed kernel memory. This is more like a compare_and_swap functionality with sync capability. For all the assemble code I think it would be better to use the IBM Power specific asm code to gain additional performance.
I was trying to understand here wrt to both the assemble changes if you are looking for anything specific to the architecture.
Attached is the patch for the previous comments, kindly please let me know your comments.
Warm regards,
Sriram.
Attachment
On 14/08/2024 06:31, Srirama Kucherlapati wrote: > Hi Heikki & Team, > > I tried to look at the assembly code changes with our team, in the below > file. > > diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h > index 29ac6cdcd9..69582f4ae7 100644 > --- a/src/include/storage/s_lock.h > +++ b/src/include/storage/s_lock.h > static __inline__ int > tas(volatile slock_t *lock) > @@ -424,17 +430,15 @@ tas(volatile slock_t *lock) > __asm__ __volatile__( > " lwarx %0,0,%3,1 \n" > " cmpwi %0,0 \n" > " bne $+16 \n" /* branch to li > %1,1 */ > " addi %0,%0,1 \n" > " stwcx. %0,0,%3 \n" > " beq $+12 \n" /* branch to > lwsync */ > " li %1,1 \n" > " b $+12 \n" /* branch to end > of asm sequence */ > " lwsync \n" > " li %1,0 \n" > : "=&b"(_t), "=r"(_res), "+m"(*lock) > : "r"(lock) > : "memory", "cc"); > > For the changes in the above file, this code is very specific to power > architecture we need to use the IBM Power specific asm code only, rather > than using the GNU assembler. Also, all these asm specific code is under > the macro __ppc__, which should not impact any other platforms. I see > there is a GCC specific implementation (under this macro #if > defined(HAVE_GCC__SYNC_INT32_TAS)) in the same file as well. I'm sorry, I don't understand what you're saying here. Do you mean that we don't need to do anything here, and the code we have in s_lock.h in 'master' now will work fine on AIX? Or do we need to (re-)do some changes to support AIX again? If we only support GCC, can we use the __sync_lock_test_and_set() builtin instead? If any changes are required, please include them in the patch. That'll make it clear what exactly you're proposing. > +#define TAS(lock) _check_lock((slock_t *) (lock), > 0, 1) > > +#define S_UNLOCK(lock) _clear_lock((slock_t *) (lock), 0) > > The above changes are specific to AIX kernel and it operates on fixed > kernel memory. This is more like a compare_and_swap functionality with > sync capability. For all the assemble code I think it would be better to > use the IBM Power specific asm code to gain additional performance. You mean we don't need the above? Ok, good. > I was trying to understand here wrt to both the assemble changes if you > are looking for anything specific to the architecture. I don't know. You tell me what makes most sense on AIX / powerpc. > Attached is the patch for the previous comments, kindly please let me > know your comments. Is this all that's needed to resurrect AIX support? -- Heikki Linnakangas Neon (https://neon.tech)
On 14/08/2024 18:22, Srirama Kucherlapati wrote: > Hi Heikki, > > I have attached the merged patch with all the changes, the earlier patch was > > just only the changes specific to older review comments. > > > I'm sorry, I don't understand what you're saying here. Do you > mean that > > we don't need to do anything here, and the code we have in > s_lock.h in > > 'master' now will work fine on AIX? Or do we need to (re-)do some > > changes to support AIX again? If we only support GCC, can we use the > > __sync_lock_test_and_set() builtin instead? > > Here we need these changes for ppc. These changes are not for > enabling the AIX support, but this is implementing “Enhanced PowerPC > Architecture”. This routine is more of compare_and_increment, which > is different from GCC __sync_lock_test_and_set(). Also I tried to > write a sample function to check the assemble generated by > __sync_lock_test_and_set(), which turned out to be different set of > assemble code. I still don't understand. We have Linux powerpc systems running happily in the buildfarm. They are happy with the current spinlock implementation. Why is this needed? What happens without it? How is this different from __sync_lock_test_and_set()? Is the __sync_lock_test_and_set() on that platform broken, less efficient, or just different but equally good? > > > +#define TAS(lock) _check_lock((slock_t *) > (lock), > > > 0, 1) > > > > >> +#define S_UNLOCK(lock) _clear_lock((slock_t *) (lock), 0) > >> > > > The above changes are specific to AIX kernel and it operates on > fixed > > > kernel memory. This is more like a compare_and_swap > functionality with > > > sync capability. For all the assemble code I think it would be > better to > > > use the IBM Power specific asm code to gain additional performance. > > > You mean we don't need the above? Ok, good. > > I mean this part of the code is needed as this is specific to AIX > kernel memory operation which is different from > __sync_lock_test_and_set(). How is it different from __sync_lock_test_and_set()? Why is it needed? What is AIX kernel memory operation? > I would like to mention that the changes made in > src/include/storage/s_lock.h > > are pretty much required and need to be operated in assemble specific to IBM > Power architecture. Note that your patch both modifies the existing powerpc implementation, and introduces a new AIX-specific one. They cannot *both* be required, because only one of them will ever be compiled on a given platform. Which is it? Or are you trying to make this work on multiple different CPUs on AIX, so that different implementation gets chosen on different CPUs? Is the mkldexport stuff still needed on modern AIX? Or was it specific to XLC and never needed on GCC? How do other products do that? On a general note: it's your responsibility to explain all the changes in a way that others will understand and can verify. It is especially important for something critical and platform-specific like spinlocks, because others don't have easy access to the hardware to test these things independently. I also want to remind you that from the Postgres community point of view, you are introducing support for a new platform, AIX, not merely resurrecting the old stuff. Every change needs to be justified anew. -- Heikki Linnakangas Neon (https://neon.tech)
On 11/09/2024 15:38, Srirama Kucherlapati wrote: >> I still don't understand. We have Linux powerpc systems running >> happily in the buildfarm. They are happy with the current spinlock >> implementation. Why is this needed? What happens without it? > > Not sure, by the time the below commits were made if there was a > consideration to use the gcc routines. The PPC asm code was originally written in 2002, and the first use of __sync_lock_test_and_set(), for ARM, appeared in 2012. The commit that made __sync_lock_test_and_set() be chosen automatically for platforms that don't specify anything else was added in 2022. > I tried to replace the AIX asm (under__ppc__ macro) with the gcc > routine __sync_lock_test_and_set(), and all the buildfarm tests > passed. Attached patch and the buildfarm output. Please let me know > your feedback. Ok, if we don't need the assembler code at all, that's good. A patch to introduce AIX support should not change it for non-AIX powerpc systems though. That might be a good change, but would need to be justified separately, e.g. by some performance testing, and should be a separate patch. If you make no changes to s_lock.h at all, will it work? Why not? You said earlier: > I mean this part of the code is needed as this is specific to AIX kernel memory > operation which is different from __sync_lock_test_and_set(). > > I would like to mention that the changes made in src/include/storage/s_lock.h > are pretty much required and need to be operated in assemble specific to IBM > Power architecture. Was that earlier statement incorrect? Is the manual wrong or outdated or not applicable to us? Moving on.. Do you still need mkldexport.sh? Surely there's a better way to do that in year 2024. Some quick googling says there's a '-bexpall' option to 'ld', which kind of sounds like what we want. Will that work? How do other programs do this? -- Heikki Linnakangas Neon (https://neon.tech)
On Thu, Sep 12, 2024 at 9:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > If you make no changes to s_lock.h at all, will it work? Why not? It's good to keep the work independent and I don't want to hold up anything happening in this thread, but just for information: I have been poking around at the idea of entirely removing the old spinlock code and pointing spin.h's function-like-macros to the atomics code. We couldn't do that before, because atomics were sometimes implemented with spinlocks, but now that pg_atomic_flag is never implemented with spinlocks we can flip that around, and then have only one place where we know how to do this stuff. What is needed for that to progress is, I guess, to determine though assembler analysis or experimentation across a bunch of targets that it works out at least as good... https://www.postgresql.org/message-id/flat/CA%2BhUKGJ%2BoA%2B62iUZ-EQb5R2cAOW3Y942ZoOtzOD%3D1sQ05iNg6Q%40mail.gmail.com#23598cafac3dd08ca94fa9e2228a4764
> The PPC asm code was originally written in 2002, and the first use of
> _ sync_lock_test_and_set(), for ARM, appeared in 2012. The commit that
> made __sync_lock_test_and_set() be chosen automatically for platforms
> that don't specify anything else was added in 2022.
Thanks for the info.
------------------
> Ok, if we don't need the assembler code at all, that's good. A patch to
> introduce AIX support should not change it for non-AIX powerpc systems
> though. That might be a good change, but would need to be justified
> separately, e.g. by some performance testing, and should be a separate
> patch.
> If you make no changes to s_lock.h at all, will it work? Why not?
With the existing asm code I see there are some syntax errors, being hit.
But after reverting the old changes the issues resolved. Below are diffs.
static __inline__ int
tas(volatile slock_t *lock)
{
@@ -424,17 +413,15 @@ tas(volatile slock_t *lock)
__asm__ __volatile__(
" lwarx %0,0,%3,1 \n"
" cmpwi %0,0 \n"
-" bne 1f \n"
+" bne $+16 \n" /* branch to li %1,1 */
" addi %0,%0,1 \n"
" stwcx. %0,0,%3 \n"
-" beq 2f \n"
-"1: \n"
+" beq $+12 \n" /* branch to lwsync */
" li %1,1 \n"
-" b 3f \n"
-"2: \n"
+" b $+12 \n" /* branch to end of asm sequence */
" lwsync \n"
" li %1,0 \n"
-"3: \n"
+
: "=&b"(_t), "=r"(_res), "+m"(*lock)
: "r"(lock)
: "memory", "cc");
Let me know if I need to run any perf tools to check the performance of
the __sync_lock_test_and_set change.
---------------
> > I mean this part of the code is needed as this is specific to AIX kernel memory
> > operation which is different from __sync_lock_test_and_set().
> >
> > I would like to mention that the changes made in src/include/storage/s_lock.h
> > are pretty much required and need to be operated in assemble specific to IBM
> > Power architecture.
> Was that earlier statement incorrect? Is the manual wrong or outdated or
> not applicable to us?
Here this change is specific to AIX, but since we are compiling with gcc, this
is not applicable. But I will try with __sync* routines and check.
---------------
> Do you still need mkldexport.sh? Surely there's a better way to do that
> in year 2024. Some quick googling says there's a '-bexpall' option to
> 'ld', which kind of sounds like what we want. Will that work? How do
> other programs do this?
Thanks for looking into this, I’m working on this, I will let you know.
Thanks,
Sriram.
> Do you still need mkldexport.sh? Surely there's a better way to do that
> in year 2024. Some quick googling says there's a '-bexpall' option to
> 'ld', which kind of sounds like what we want. Will that work? How do
> other programs do this?
We have noticed couple of caveats with these flags -bexpall/-bexpfull in other
opensource tools on AIX. This option would export too many symbols causing
problems because a shared library may re-export symbols from another library
causing confused dependencies, duplicate symbols.
We have similar discussion wrt to these flag in Cmake
https://gitlab.kitware.com/cmake/cmake/-/issues/19163
Also, I tried some sample program to verify the same as below
>> cat foo.c
#include <stdio.h>
#include <string.h>
int func1()
{
char str1[] = "Hello ", str2[] = "world! ";
strcat(str1,str2);
puts(str1);
return 0;
}
>> gcc -c foo.c -o foo.o
>> gcc -shared -Wl,-bexpall -o foo.so foo.o
>> dump -Tov foo.so
foo.so:
***Object Module Header***
# Sections Symbol Ptr # Symbols Opt Hdr Len Flags
4 0x00000d88 120 72 0x3002
Flags=( EXEC DYNLOAD SHROBJ DEP_SYSTEM )
Timestamp = "Sep 17 10:17:35 2024"
Magic = 0x1df (32-bit XCOFF)
***Optional Header***
Tsize Dsize Bsize Tstart Dstart
0x00000548 0x0000010c 0x00000004 0x10000128 0x20000670
SNloader SNentry SNtext SNtoc SNdata
0x0004 0x0000 0x0001 0x0002 0x0002
TXTalign DATAalign TOC vstamp entry
0x0005 0x0004 0x20000750 0x0001 0xffffffff
maxSTACK maxDATA SNbss magic modtype
0x00000000 0x00000000 0x0003 0x010b RE
***Loader Section***
***Loader Symbol Table Information***
[Index] Value Scn IMEX Sclass Type IMPid Name
[0] 0x2000068c .data RW SECdef [noIMid] __rtinit
[1] 0x00000000 undef IMP DS EXTref libgcc_s.a(shr.o) __cxa_finalize
[2] 0x00000000 undef IMP DS EXTref libgcc_s.a(shr.o) _GLOBAL__AIXI_shr_o
[3] 0x00000000 undef IMP DS EXTref libgcc_s.a(shr.o) _GLOBAL__AIXD_shr_o
[4] 0x00000000 undef IMP DS EXTref libc.a(shr.o) __strtollmax
[5] 0x00000000 undef IMP DS EXTref libc.a(shr.o) puts
[6] 0x200006f4 .data EXP DS Ldef [noIMid] __init_aix_libgcc_cxa_atexit
[7] 0x20000724 .data EXP DS Ldef [noIMid] _GLOBAL__AIXI_foo_so
[8] 0x20000730 .data EXP DS Ldef [noIMid] _GLOBAL__AIXD_foo_so
>> [9] 0x2000073c .data EXP DS SECdef [noIMid] strcat
[10] 0x20000744 .data EXP DS Ldef [noIMid] func1
The code makes use of strcat from libc but re-exports the symbol (because of -bexpall).
As of now due to the limitation with these flags (-bexpall / -bexpfull ? ), the
solution here is to continue to extract the symbols from the object files and
use that export file as part of building the shared library. (Continue to use
the mkldexport.sh script to generate the export symbols).
Thanks,
Sriram.
On 24/09/2024 14:25, Srirama Kucherlapati wrote: > Hi Heikki & team, > > Could you please let me know your comments on the previous details? > > Attached are the individual patches for AIX and gcc(__sync) routines. Repeating what I said earlier: > Ok, if we don't need the assembler code at all, that's good. A patch > to introduce AIX support should not change it for non-AIX powerpc > systems though. That might be a good change, but would need to be > justified separately, e.g. by some performance testing, and should > be a separate patch -- Heikki Linnakangas Neon (https://neon.tech)
On Tue, Sep 24, 2024 at 04:09:39PM +0300, Heikki Linnakangas wrote: > On 24/09/2024 14:25, Srirama Kucherlapati wrote: > > Hi Heikki & team, > > > > Could you please let me know your comments on the previous details? > > > > Attached are the individual patches for AIX and gcc(__sync) routines. > > Repeating what I said earlier: > > > Ok, if we don't need the assembler code at all, that's good. A patch > > to introduce AIX support should not change it for non-AIX powerpc > > systems though. That might be a good change, but would need to be > > justified separately, e.g. by some performance testing, and should > > be a separate patch Agreed. Srirama Kucherlapati, you seem to be doing the minimum amount of work and then repeatedly asking the same questions. I suggest you start to take this task more seriously. I would go back and read the entire thread, and take the things we have told you more seriously. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Hi Heikki,
As requested earlier, I need some assistance from the Postgres side to identify any tool or testcase to calibrate the sync routine performance in Postgres.
I see the below tools for benchmarking.
- Pgbench https://www.postgresql.org/docs/current/pgbench.html
- Pg_test_fsync https://www.postgresql.org/docs/current/pgtestfsync.html
- pg_test_timing https://www.postgresql.org/docs/current/pgtesttiming.html
Please let me know, if these tools are fine or else ca you suggest us with any additional tools to run the benchmarking.
> > Ok, if we don't need the assembler code at all, that's good. A patch to
> > introduce AIX support should not change it for non-AIX powerpc systems
> > though. That might be a good change, but would need to be justified
> > separately, e.g. by some performance testing, and should be a separate
> > patch.
> > If you make no changes to s_lock.h at all, will it work? Why not?
> With the existing asm code I see there are some syntax errors, being hit.
> But after reverting the old changes the issues resolved. Below are diffs.
> Let me know if I need to run any perf tools to check the performance of
> the __sync_lock_test_and_set change.
Thanks,
Sriram.
Hi Heikki and team,
A few updates…
> > Ok, if we don't need the assembler code at all, that's good. A patch to
> > introduce AIX support should not change it for non-AIX powerpc systems
> > though. That might be a good change, but would need to be justified
> > separately, e.g. by some performance testing, and should be a separate
> > patch.
We ran pgbench for both the patches on powerpc Linux and below are the test
results and both results are close.
PPCLE sync: is the patch with the spin lock using gcc __sync* routines.
PPCLE asm: is the patch with the assemble code.
>> pgbench -c 100 -p 5432 -d postgres -T 180 -r -P 10 -L 10 -j 20
OS/type : PPCLE sync PPCLE asm
---------------------------------------------------------
latency average : 136.257 138.552
---------------------------------------------------------
latency stddev : 234.74 238.603
---------------------------------------------------------
Initial
connection time : 101.791 88.411
---------------------------------------------------------
TPS(without initial
connection time) : 733.633924 721.440648
---------------------------------------------------------
No. of transactions
actually processed : 132080 129893
---------------------------------------------------------
No of transactions
above the 10.0 ms
latency limit : 124235/132080 122183/129893
(94.060%) (94.064%)
---------------------------------------------------------
Please let us know your comments.
BTW we are working on the other review comments as well.
Warm regards,
Sriram.
Hi Team, here are few updates.
As of now we have removed all the old changes and made the changes that are
pretty much required by building from scratch. We had few issues with our
hardware as a result it took a while to build the code.
Below are the changes done as of now.
commit d2b4b4c2259e21ceaf05e393769b69728bfbee99 (HEAD -> master, origin/master, origin/HEAD)
>> git status
On branch master
Your branch is up to date with 'origin/master'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: src/backend/port/aix/mkldexport.sh
new file: src/include/port/aix.h
new file: src/makefiles/Makefile.aix
new file: src/template/aix
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: Makefile
modified: configure
modified: doc/src/sgml/dfunc.sgml
modified: src/Makefile.shlib
modified: src/backend/Makefile
modified: src/backend/port/aix/mkldexport.sh
modified: src/backend/utils/error/elog.c
modified: src/include/storage/s_lock.h
modified: src/port/strerror.c
modified: src/test/regress/expected/jsonb_jsonpath.out
We are seeing this one issue as of now with ‘gmake check’…
1 diff -U3 /home /postgres/src/test/regress/expected/jsonb_jsonpath.out /home/ /postgres/src/test/regress/results/jsonb_jsonpath.out
2 --- /home /postgres/src/test/regress/expected/jsonb_jsonpath.out 2024-10-22 02:30:00.545814423 -0500
3 +++ /home/ postgres/src/test/regress/results/jsonb_jsonpath.out 2024-11-12 03:31:52.251125056 -0600
4 @@ -2687,7 +2687,7 @@
5 select jsonb_path_query_tz('"12:34:56"', '$.time_tz().string()');
6 jsonb_path_query_tz
7 ---------------------
8 - "12:34:56-07:00"
9 + "12:34:56-08:00"
10 (1 row)
11
12 select jsonb_path_query('"12:34:56"', '$.time().string()');
After modifying the expected output for this testcase as below, the issue was
resolved and the test case passed. But we are trying to see the root cause of this.
"12:34:56-08:00"
We are working on the buildfarm setup to check the actual tests.
I shall keep you posted.
Warm regards,
Sriram.
On Mon, Nov 25, 2024 at 11:27 AM Srirama Kucherlapati <sriram.rk@in.ibm.com> wrote: > After modifying the expected output for this testcase as below, the issue was > resolved and the test case passed. But we are trying to see the root cause of this. > > "12:34:56-08:00" This is not an AIX-specific issue. It was fixed in commit af21152268317323480caa790c4a6347110f8085, committed October 30th. -- Robert Haas EDB: http://www.enterprisedb.com
>> This is not an AIX-specific issue. It was fixed in commit
>> af21152268317323480caa790c4a6347110f8085, committed October 30th.
Thanks Robert, it worked after applying this change.
Warm regards,
Sriram.
Hi Team, few more updates on the buildfarm, all the tests have passed and below
is the gist of the logs. We are working on the cleaning up the changes and will
post them for review.
Tue Dec 3 04:08:29 2024: buildfarm run for CHANGEME:HEAD starting
CHANGEME:HEAD [04:09:49] running build ...
======== make log ===========
CHANGEME:HEAD [04:09:54] running basic regression tests ...
======== make check logs ===========
CHANGEME:HEAD [04:19:47] running make contrib ...
======== make contrib log ===========
CHANGEME:HEAD [04:19:48] running make testmodules ...
======== make testmodules log ===========
CHANGEME:HEAD [04:19:49] running install ...
======== make install log ===========
CHANGEME:HEAD [04:20:51] running make contrib install ...
======== make contrib install log ===========
CHANGEME:HEAD [04:21:20] running testmodules install ...
======== testmodules install log ===========
CHANGEME:HEAD [04:21:33] checking pg_upgrade
======== pg_upgrade check log ===========
CHANGEME:HEAD [04:21:33] checking test-decoding
======== test-decoding check log ===========
========================== output_iso/log/initdb.log ================
========================== output_iso/log/postmaster.log ================
========================== test_decoding/log/initdb.log ================
========================== test_decoding/log/postmaster.log ================
CHANGEME:HEAD [04:23:40] running make check miscellaneous modules ...
=========== Module commit_ts check =============
=========== Module injection_points check =============
=========== Module test_oat_hooks check =============
=========== Module test_slru check =============
=========== Module unsafe_tests check =============
=========== Module basic_archive check =============
=========== Module pg_freespacemap check =============
=========== Module pg_logicalinspect check =============
=========== Module pg_stat_statements check =============
=========== Module pg_walinspect check =============
=========== Module test_decoding check =============
CHANGEME:HEAD [04:31:54] setting up db cluster (C)...
======== initdb log (C) ===========
CHANGEME:HEAD [04:32:13] starting db (C)...
======== start db (C) : 1 log ========
=========== db log file ==========
CHANGEME:HEAD [04:32:14] running installcheck (C)...
======== make installcheck log ===========
CHANGEME:HEAD [04:44:14] restarting db (C)...
======== stop db (C): 1 log ==========
=========== db log file ==========
======== start db (C) : 2 log ========
=========== db log file ==========
CHANGEME:HEAD [04:44:16] running make isolation check ...
======== make isolation check logs ===========
CHANGEME:HEAD [04:48:22] restarting db (C)...
======== stop db (C): 2 log ==========
=========== db log file ==========
======== start db (C) : 3 log ========
=========== db log file ==========
CHANGEME:HEAD [04:49:24] running make PL installcheck (C)...
======== make pl installcheck log ===========
CHANGEME:HEAD [04:49:44] restarting db (C)...
======== stop db (C): 3 log ==========
=========== db log file ==========
======== start db (C) : 4 log ========
=========== db log file ==========
CHANGEME:HEAD [04:49:47] running make contrib installcheck (C)...
======== make contrib installcheck log ===========
CHANGEME:HEAD [04:55:01] restarting db (C)...
======== stop db (C): 4 log ==========
=========== db log file ==========
======== start db (C) : 5 log ========
=========== db log file ==========
CHANGEME:HEAD [04:55:09] running make test-modules installcheck (C)...
======== make testmodules installcheck log ===========
CHANGEME:HEAD [04:56:54] stopping db (C)...
======== stop db (C): 5 log ==========
=========== db log file ==========
CHANGEME:HEAD [04:56:57] running make ecpg check ...
======== make ecpg check logs ===========
CHANGEME:HEAD [04:58:38] OK
======== log passed to send_result ===========
======== log passed to send_result ===========
All stages succeeded
Generated logs files
aixpg-lpar01:~/build-farm-17/buildroot/HEAD/CHANGEME.fromsource-logs]
>> ls
build.log pl-install-check-C.log
check-pg_upgrade.log startdb-C-1.log
check.log startdb-C-2.log
configure.log startdb-C-3.log
contrib-install-check-C.log startdb-C-4.log
ecpg-check.log startdb-C-5.log
initdb-C.log stopdb-C-1.log
install-check-C.log stopdb-C-2.log
install-contrib.log stopdb-C-3.log
install-testmodules.log stopdb-C-4.log
isolation-check.log stopdb-C-5.log
make-contrib.log test-decoding-check.log
make-install.log testmodules-install-check-C.log
make-testmodules.log web-txn.data
misc-check.log
Warm regards,
Sriram.
On 23/12/2024 17:55, Srirama Kucherlapati wrote: > Hello Team, > > In response to our previous discussions, we have refined the code to address > > only the necessary adjustments focusing solely on the essential > modifications > > for AIX compatibility. The codebase was pulled from the GitHub repository > > (https://github.com/postgres/postgres.git <https://github.com/postgres/ > postgres.git>) and we resolved all issues > > encountered during the configure and build processes. The following are the > > minimal alterations needed to ensure AIX is supported. Thanks, getting better. Please generate the patches against 'master', rather than v17. We might choose to backport this to v17 later, but let's focus on the current development branch for now. > Overview of Changes > > We consulted with internal teams (GCC/Linker and Open Source) for each > > modification. Details of the changes for the relevant files are as follows: > > - configure: > > - Addressed AIX-specific alignment issues to support MAXALIGN_ALIGNOF. > > - Added comments regarding AIX alignment rules in the file. The patch modifies 'configure', but that file is generated from 'configure.ac'. Please show the changes to 'configure.ac' Please also update the meson build system accordingly. > +# AIX alignment info: We assume long's alignment is at least as strong as char, short, or > +# int; but we must check long long (if it is being used for int64) and double. > +# > +# The AIX 'power' alignment rules apply the natural alignment of the "first > +# member" if it is of a floating-point data type (or is an aggregate whose > +# recursively "first" member or element is such a type). The alignment > +# associated with these types for subsequent members use an alignment value > +# where the floating-point data type is considered to have 4-byte alignment. > +# > +# The double is aligned to 4-bytes on AIX in aggregates. But to maintain > +# alignement across platforms the max alignment of long should be considered. I don't quite understand this comment. Maybe some rephrasing or examples would help. What is an aggregate in this context? A struct or union? We use C99 int64_t for the int64 datatype nowadays, not "long long", so this seems a little outdated. (Since commit 962da900ac) > - doc/src/sgml/dfunc.sgml: > > - Updated related documentation. > diff --git a/doc/src/sgml/dfunc.sgml b/doc/src/sgml/dfunc.sgml > index b94aefcd0c..554f9fac4c 100644 > --- a/doc/src/sgml/dfunc.sgml > +++ b/doc/src/sgml/dfunc.sgml > @@ -202,4 +202,23 @@ gcc -G -o foo.so foo.o > server expects to find the shared library files. > </para> > > +<!-- > +Under AIX, object files are compiled normally but building the shared > +library requires a couple of steps. First, create the object file: > +.nf > +cc <other flags> -c foo.c > +.fi > +You must then create a symbol \*(lqexports\*(rq file for the object > +file: > +.nf > +mkldexport foo.o `pwd` > foo.exp > +.fi > +Finally, you can create the shared library: > +.nf > +ld <other flags> -H512 -T512 -o foo.so -e _nostart \e > + -bI:.../lib/postgres.exp -bE:foo.exp foo.o \e > + -lm -lc 2>/dev/null > +.fi > + --> > + > </sect2> Why is this commented out? You say "updated related documentation", but these instructions were first added to the tree in year 1999, in commit f2f43efbe1, which apparently copied them from old man pages. Are they still current and relevant? > - Shared Libraries Build Process: > > - Files Affected: > > src/Makefile.shlib > > src/backend/Makefile > > - src/backend/port/aix/mkldexport.sh > > - Implemented AIX-specific changes for shared library creation using > export > > files. > - As per discussions with the linker team, the current design > mandates using > > the script method to extract symbols and build shared libraries. > > - This approach aligns with practices in other open-source projects like > > Python, OpenBLAS and other gnu tools(references, link). > > https://github.com/python/cpython/blob/main/Modules/ld_so_aix.in > <https://github.com/python/cpython/blob/main/Modules/ld_so_aix.in> > > https://github.com/OpenMathLib/OpenBLAS/ > commit/892f8ff3e55e24fda9af3f6364319cce3f60116b#diff-4011a114c75fe804332139868806cbe23f275963e4e6afa9b57e51b2b9817293R261 <https://github.com/OpenMathLib/OpenBLAS/commit/892f8ff3e55e24fda9af3f6364319cce3f60116b#diff-4011a114c75fe804332139868806cbe23f275963e4e6afa9b57e51b2b9817293R261> Thanks for the links. It's disappointing there isn't a standard way to do this. It's nice to see the comments in cpython's makeexp_aix script explaining the "hidden tricks". I'd love to see similar comments in mkldexport.sh explaining what it does. And also why the script is needed on AIX in the first place. I wonder if meson's AIX support would have something built-in to do this? > - src/makefiles/Makefile.aix: Introduced AIX-specific makefile for > building. > +# when building with gcc, need to make sure that libgcc can be found > +ifeq ($(GCC), yes) > +libpath := $(libpath):$(dir $(shell gcc -print-libgcc-file-name)) > +endif > + > +rpath = -Wl,-blibpath:'$(rpathdir)$(libpath)' Is this still needed? I have no idea, but it seems surprising if gcc and ld don't handle this automatically. > +LDFLAGS_SL += -Wl,-bnoentry -Wl,-H512 -Wl,-bM:SRE What do all these options do? Why are they needed? > - src/template/aix: Defined memset flags for AIX. Why? Did you benchmark it? How big is the difference? I don't know if the default we have for MEMSET_LOOP_LIMIT is any good on modern compilers and systems in general. Perhaps we should just always use the system memset(). So for this patch, e key question is whether there's something AIX specific here. I'd assume no. I'd assume it to depend on the hardware rather than the OS. -- Heikki Linnakangas Neon (https://neon.tech)
Hi Heikki, Thanks for your feedback.
> Please generate the patches against 'master', rather than v17. We might
> choose to backport this to v17 later, but let's focus on the current
> development branch for now.
All the changes were made on top of master(development) branch pulled from github repository. Below is the log info.
I’ll pull the latest master and apply the changes for the next patch.
github repository: https://github.com/postgres/postgres.git
commit 103760b2d065081447efd27f4ecc20c5dbb7abcf (HEAD -> aixSupport)
Author: Sriram RK sriram.rk@in.ibm.com
Date: Fri Dec 6 10:36:29 2024 -0600
AIX support
commit d2b4b4c2259e21ceaf05e393769b69728bfbee99 (origin/master, origin/HEAD, gitibm/master, master)
Author: Peter Eisentraut peter@eisentraut.org
Date: Tue Oct 22 08:12:43 2024 +0200
Fix C23 compiler warning
…
commit 45e0ba30fc40581f320fac17ad8b4e0676e1b3b5
Author: Michael Paquier michael@paquier.xyz
Date: Tue Oct 22 13:05:51 2024 +0900
pg_stat_statements: Add tests for nested queries with level tracking
I shall continue to work on the other comments and will keep you posted.
Warm regards,
Sriram.