Thread: [MASSMAIL]Speed up clean meson builds by ~25%
Building the generated ecpg preproc file can take a long time. You can check how long using: ninja -C build src/interfaces/ecpg/preproc/ecpg.p/meson-generated_.._preproc.c.o This moves that file much closer to the top of our build order, so building it can be pipelined much better with other files. It improved clean build times on my machine (10 cores/20 threads) from ~40 seconds to ~30 seconds. You can check improvements for yourself with: ninja -C build clean && ninja -C build all
Attachment
On Fri, 5 Apr 2024 at 00:45, Jelte Fennema-Nio <me@jeltef.nl> wrote: > It improved clean build times on my machine (10 cores/20 threads) from ~40 > seconds to ~30 seconds. After discussing this off-list with Bilal, I realized that this gain is only happening for clang builds on my system. Because those take a long time as was also recently discussed in[1]. My builds don't take nearly as long though. I tried with clang 15 through 18 and they all took 10-22 seconds to run and clang comes from apt.llvm.org on Ubuntu 22.04 [1]: https://www.postgresql.org/message-id/flat/CA%2BhUKGLvJ7-%3DfS-J9kN%3DaZWrpyqykwqCBbxXLEhUa9831dPFcg%40mail.gmail.com
Hi, On 2024-04-05 15:36:34 +0200, Jelte Fennema-Nio wrote: > On Fri, 5 Apr 2024 at 00:45, Jelte Fennema-Nio <me@jeltef.nl> wrote: > > It improved clean build times on my machine (10 cores/20 threads) from ~40 > > seconds to ~30 seconds. > > After discussing this off-list with Bilal, I realized that this gain > is only happening for clang builds on my system. Because those take a > long time as was also recently discussed in[1]. My builds don't take > nearly as long though. I tried with clang 15 through 18 and they all > took 10-22 seconds to run and clang comes from apt.llvm.org on Ubuntu > 22.04 > > [1]: https://www.postgresql.org/message-id/flat/CA%2BhUKGLvJ7-%3DfS-J9kN%3DaZWrpyqykwqCBbxXLEhUa9831dPFcg%40mail.gmail.com I recommend opening a bug report for clang, best with an already preprocessed input file. We're going to need to do something about this from our side as well, I suspect. The times aren't great with gcc either, even if not as bad as with clang. Greetings, Andres Freund
On Fri, 5 Apr 2024 at 17:24, Andres Freund <andres@anarazel.de> wrote: > I recommend opening a bug report for clang, best with an already preprocessed > input file. > We're going to need to do something about this from our side as well, I > suspect. The times aren't great with gcc either, even if not as bad as with > clang. Agreed. While not a full solution, I think this patch is still good to address some of the pain: Waiting 10 seconds at the end of my build with only 1 of my 10 cores doing anything. So while this doesn't decrease CPU time spent it does decrease wall-clock time significantly in some cases, with afaict no downsides.
On Fri, Apr 05, 2024 at 06:19:20PM +0200, Jelte Fennema-Nio wrote: > Agreed. While not a full solution, I think this patch is still good to > address some of the pain: Waiting 10 seconds at the end of my build > with only 1 of my 10 cores doing anything. > > So while this doesn't decrease CPU time spent it does decrease > wall-clock time significantly in some cases, with afaict no downsides. Well, this is also painful with ./configure. So, even if we are going to move away from it at this point, we still need to support it for a couple of years. It looks to me that letting the clang folks know about the situation is the best way forward. -- Michael
Attachment
Hello Michael, 08.04.2024 08:23, Michael Paquier wrote: > On Fri, Apr 05, 2024 at 06:19:20PM +0200, Jelte Fennema-Nio wrote: >> Agreed. While not a full solution, I think this patch is still good to >> address some of the pain: Waiting 10 seconds at the end of my build >> with only 1 of my 10 cores doing anything. >> >> So while this doesn't decrease CPU time spent it does decrease >> wall-clock time significantly in some cases, with afaict no downsides. > Well, this is also painful with ./configure. So, even if we are going > to move away from it at this point, we still need to support it for a > couple of years. It looks to me that letting the clang folks know > about the situation is the best way forward. > As I wrote in [1], I didn't observe the issue with clang-18, so maybe it is fixed already. Perhaps it's worth rechecking... [1] https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com Best regards, Alexander
On 05.04.24 18:19, Jelte Fennema-Nio wrote: > On Fri, 5 Apr 2024 at 17:24, Andres Freund <andres@anarazel.de> wrote: >> I recommend opening a bug report for clang, best with an already preprocessed >> input file. > >> We're going to need to do something about this from our side as well, I >> suspect. The times aren't great with gcc either, even if not as bad as with >> clang. > > Agreed. While not a full solution, I think this patch is still good to > address some of the pain: Waiting 10 seconds at the end of my build > with only 1 of my 10 cores doing anything. > > So while this doesn't decrease CPU time spent it does decrease > wall-clock time significantly in some cases, with afaict no downsides. I have tested this with various compilers, and I can confirm that this shaves off about 5 seconds from the build wall-clock time, which represents about 10%-20% of the total time. I think this is a good patch. Interestingly, if I apply the analogous changes to the makefiles, I don't get any significant improvements. (Depends on local circumstances, of course.) But I would still suggest to keep the makefiles aligned.
On Mon, 8 Apr 2024 at 10:00, Alexander Lakhin <exclusion@gmail.com> wrote: > As I wrote in [1], I didn't observe the issue with clang-18, so maybe it > is fixed already. > Perhaps it's worth rechecking... Using the attached script I got these timings. Clang is significantly slower in all of them. But especially with -Og the difference between is huge. gcc 11.4.0: 7.276s clang 18.1.3: 17.216s gcc 11.4.0 --debug: 7.441s clang 18.1.3 --debug: 18.164s gcc 11.4.0 --debug -Og: 2.418s clang 18.1.3 --debug -Og: 14.864s I reported this same issue to the LLVM project here: https://github.com/llvm/llvm-project/issues/87973
Attachment
On Mon, 8 Apr 2024 at 10:02, Peter Eisentraut <peter@eisentraut.org> wrote: > I have tested this with various compilers, and I can confirm that this > shaves off about 5 seconds from the build wall-clock time, which > represents about 10%-20% of the total time. I think this is a good patch. Great to hear. > Interestingly, if I apply the analogous changes to the makefiles, I > don't get any significant improvements. (Depends on local > circumstances, of course.) But I would still suggest to keep the > makefiles aligned. Attached is a patch that also updates the Makefiles, but indeed I don't get any perf gains there either. On Mon, 8 Apr 2024 at 07:23, Michael Paquier <michael@paquier.xyz> wrote: > Well, this is also painful with ./configure. So, even if we are going > to move away from it at this point, we still need to support it for a > couple of years. It looks to me that letting the clang folks know > about the situation is the best way forward. I reported the issue to the clang folks: https://github.com/llvm/llvm-project/issues/87973 But even if my patch doesn't help for ./configure, it still seems like a good idea to me to still reduce compile times when using meson while we wait for clang folks to address the issue.
Attachment
Hello Jelte, 08.04.2024 11:36, Jelte Fennema-Nio wrote: > On Mon, 8 Apr 2024 at 10:00, Alexander Lakhin <exclusion@gmail.com> wrote: >> As I wrote in [1], I didn't observe the issue with clang-18, so maybe it >> is fixed already. >> Perhaps it's worth rechecking... > Using the attached script I got these timings. Clang is significantly > slower in all of them. But especially with -Og the difference between > is huge. > > gcc 11.4.0: 7.276s > clang 18.1.3: 17.216s > gcc 11.4.0 --debug: 7.441s > clang 18.1.3 --debug: 18.164s > gcc 11.4.0 --debug -Og: 2.418s > clang 18.1.3 --debug -Og: 14.864s > > I reported this same issue to the LLVM project here: > https://github.com/llvm/llvm-project/issues/87973 Maybe we're talking about different problems. At [1] Thomas (and then I) was unhappy with more than 200 seconds duration... https://www.postgresql.org/message-id/CA%2BhUKGLvJ7-%3DfS-J9kN%3DaZWrpyqykwqCBbxXLEhUa9831dPFcg%40mail.gmail.com Best regards, Alexander
Hi, On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin <exclusion@gmail.com> wrote: > > Hello Michael, > > 08.04.2024 08:23, Michael Paquier wrote: > > On Fri, Apr 05, 2024 at 06:19:20PM +0200, Jelte Fennema-Nio wrote: > >> Agreed. While not a full solution, I think this patch is still good to > >> address some of the pain: Waiting 10 seconds at the end of my build > >> with only 1 of my 10 cores doing anything. > >> > >> So while this doesn't decrease CPU time spent it does decrease > >> wall-clock time significantly in some cases, with afaict no downsides. > > Well, this is also painful with ./configure. So, even if we are going > > to move away from it at this point, we still need to support it for a > > couple of years. It looks to me that letting the clang folks know > > about the situation is the best way forward. > > > > As I wrote in [1], I didn't observe the issue with clang-18, so maybe it > is fixed already. > Perhaps it's worth rechecking... > > [1] https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com I had this problem on my local computer. My build times are: gcc: 20s clang-15: 24s clang-16: 105s clang-17: 111s clang-18: 25s -- Regards, Nazir Bilal Yavuz Microsoft
On Mon, Apr 08, 2024 at 12:23:56PM +0300, Nazir Bilal Yavuz wrote: > On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin <exclusion@gmail.com> wrote: >> As I wrote in [1], I didn't observe the issue with clang-18, so maybe it >> is fixed already. >> Perhaps it's worth rechecking... >> >> [1] https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com > > I had this problem on my local computer. My build times are: > > gcc: 20s > clang-15: 24s > clang-16: 105s > clang-17: 111s > clang-18: 25s Interesting. A parallel build of ecpg shows similar numbers here: clang-16: 101s clang-17: 112s clang-18: 14s gcc: 10s Most of the time is still spent on preproc.c with clang-18, but that's much, much faster (default version of clang is 16 on Debian GID where I've run these numbers). -- Michael
Attachment
On Tue, Apr 9, 2024 at 5:01 PM Michael Paquier <michael@paquier.xyz> wrote: > On Mon, Apr 08, 2024 at 12:23:56PM +0300, Nazir Bilal Yavuz wrote: > > On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin <exclusion@gmail.com> wrote: > >> As I wrote in [1], I didn't observe the issue with clang-18, so maybe it > >> is fixed already. > >> Perhaps it's worth rechecking... > >> > >> [1] https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com > > > > I had this problem on my local computer. My build times are: > > > > gcc: 20s > > clang-15: 24s > > clang-16: 105s > > clang-17: 111s > > clang-18: 25s > > Interesting. A parallel build of ecpg shows similar numbers here: > clang-16: 101s > clang-17: 112s > clang-18: 14s > gcc: 10s I don't expect it to get fixed BTW, because it's present in 16.0.6, and .6 is the terminal release, if I understand their system correctly. They're currently only doing bug fixes for 18, and even there not for much longer. Interesting that not everyone saw this at first, perhaps the bug arrived in a minor release that some people didn't have yet? Or perhaps there is something special required to trigger it?
Hi, On 2024-04-09 17:13:52 +1200, Thomas Munro wrote: > On Tue, Apr 9, 2024 at 5:01 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Apr 08, 2024 at 12:23:56PM +0300, Nazir Bilal Yavuz wrote: > > > On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin <exclusion@gmail.com> wrote: > > >> As I wrote in [1], I didn't observe the issue with clang-18, so maybe it > > >> is fixed already. > > >> Perhaps it's worth rechecking... > > >> > > >> [1] https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com > > > > > > I had this problem on my local computer. My build times are: > > > > > > gcc: 20s > > > clang-15: 24s > > > clang-16: 105s > > > clang-17: 111s > > > clang-18: 25s > > > > Interesting. A parallel build of ecpg shows similar numbers here: > > clang-16: 101s > > clang-17: 112s > > clang-18: 14s > > gcc: 10s > > I don't expect it to get fixed BTW, because it's present in 16.0.6, > and .6 is the terminal release, if I understand their system > correctly. They're currently only doing bug fixes for 18, and even > there not for much longer. Interesting that not everyone saw this at > first, perhaps the bug arrived in a minor release that some people > didn't have yet? Or perhaps there is something special required to > trigger it? I think we need to do something about the compile time of this file, even with gcc. Our main grammar already is an issue and stacking all the ecpg stuff on top makes it considerably worse. ISTM there's a bunch of pretty pointless stuff in the generated preproc.y, which do seem to have some impact on compile time. E.g. a good bit of the file is just stuff like reserved_keyword: ALL { $$ = mm_strdup("all"); } ... Why are strduping all of these? We could instead just use the value of the token, instead of forcing the compiler to generate branches for all individual keywords etc. I don't know off-hand if the keyword lookup machinery ends up with an uppercase keyword, but if so, that'd be easy enough to change. It actually looks to me like the many calls to mm_strdup() might actually be what's driving clang nuts. I hacked up preproc.y to not need those calls for unreserved_keyword col_name_keyword type_func_name_keyword reserved_keyword bare_label_keyword by removing the actions and defining those tokens to be of type str. There are many more such calls that could be dealt with similarly. That alone reduced compile times with clang-16 -O1 from 18.268s to 12.516s clang-16 -O2 from 345.188 to 158.084s clang-19 -O2 from 26.018s to 15.200s I suspect what is happening is that clang tries to optimize the number of calls to mm_strdup(), by separating the argument setup from the function call. Which leads to a control flow graph with *many* incoming edges to the basic block containing the function call to mm_strdup(), triggering a normally harmless O(N^2) or such. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I think we need to do something about the compile time of this file, even with > gcc. Our main grammar already is an issue and stacking all the ecpg stuff on > top makes it considerably worse. Seems reasonable, if we can. > Why are strduping all of these? IIRC, the issue is that the mechanism for concatenating the tokens back together frees the input strings static char * cat2_str(char *str1, char *str2) { char * res_str = (char *)mm_alloc(strlen(str1) + strlen(str2) + 2); strcpy(res_str, str1); if (strlen(str1) != 0 && strlen(str2) != 0) strcat(res_str, " "); strcat(res_str, str2); free(str1); <------------------ free(str2); <------------------ return res_str; } So that ought to dump core if you don't make all the productions return malloc'd strings. How did you work around that? (Maybe it'd be okay to just leak all the strings?) regards, tom lane
Hi, On 2024-04-09 15:33:10 -0700, Andres Freund wrote: > Which leads to a control flow graph with *many* incoming edges to the basic > block containing the function call to mm_strdup(), triggering a normally > harmless O(N^2) or such. With clang-16 -O2 there is a basic block with 3904 incoming basic blocks. With the hacked up preproc.y it's 2968. A 30% increase leading to a doubling of runtime imo seems consistent with my theory of there being some ~quadratic behaviour. I suspect that this is also what's causing gram.c compilation to be fairly slow, with both clang and gcc. There aren't as many pstrdup()s in gram.y as the are mm_strdup() in preproc.y, but there still are many. ISTM that there are many pstrdup()s that really make very little sense. I think it's largely because there are many rules declared %type <str>, which prevents us from returning a string constant without a warning. There may be (?) some rules that modify strings returned by subsidiary rules, but if so, it can't be many. Greetings, Andres
Hi, On 2024-04-09 19:00:41 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I think we need to do something about the compile time of this file, even with > > gcc. Our main grammar already is an issue and stacking all the ecpg stuff on > > top makes it considerably worse. > > Seems reasonable, if we can. > > > Why are strduping all of these? > > IIRC, the issue is that the mechanism for concatenating the tokens > back together frees the input strings Ah, that explains it - but also seems somewhat unnecessary. > So that ought to dump core if you don't make all the productions > return malloc'd strings. How did you work around that? I just tried to get to the point of understanding the reasons for slow compilation, not to actually keep it working :). I.e. I didn't. > (Maybe it'd be okay to just leak all the strings?) Hm. The input to ecpg can be fairly large, I guess. And we have fun code like cat_str(), which afaict is O(arguments^2) in its memory usage if we wouldn't free? Not immediately sure what the right path is. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2024-04-09 19:00:41 -0400, Tom Lane wrote: >> Andres Freund <andres@anarazel.de> writes: >>> Why are strduping all of these? >> IIRC, the issue is that the mechanism for concatenating the tokens >> back together frees the input strings > Ah, that explains it - but also seems somewhat unnecessary. I experimented with replacing mm_strdup() with #define mm_strdup(x) (x) As you did, I wasn't trying to get to a working result, so I didn't do anything about removing all the free's or fixing the cast-away-const warnings. The result was disappointing though. On my Mac laptop (Apple clang version 15.0.0), the compile time for preproc.o went from 6.7sec to 5.5sec. Which is better, but not enough better to persuade me to do all the janitorial work of restructuring ecpg's string-slinging. I think we haven't really identified the problem. As a comparison point, compiling gram.o on the same machine takes 1.3sec. So I am seeing a problem here, sure enough, although not as bad as it is in some other clang versions. regards, tom lane
Hi, On 2024-04-09 19:44:03 -0400, Tom Lane wrote: > I experimented with replacing mm_strdup() with > > #define mm_strdup(x) (x) > > As you did, I wasn't trying to get to a working result, so I didn't do > anything about removing all the free's or fixing the cast-away-const > warnings. The result was disappointing though. On my Mac laptop > (Apple clang version 15.0.0), the compile time for preproc.o went from > 6.7sec to 5.5sec. Which is better, but not enough better to persuade > me to do all the janitorial work of restructuring ecpg's > string-slinging. I think we haven't really identified the problem. With what level of optimization was that? It kinda looks like their version might be from before the worst of the issue... FWIW, just redefining mm_strdup() that way doesn't help much here either, even with an affected compiler. The gain increases substantially after simplifying unreserved_keyword etc to just use the default action. I think having the non-default actions for those branches leaves you with a similar issue, each of the actions just set a register, storing that and going to the loop iteration is the same. FWIW: clang-19 -O2 "plain" 0m24.354s mm_strdup redefined 0m23.741s +use default action 0m14.218s Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2024-04-09 19:44:03 -0400, Tom Lane wrote: >> As you did, I wasn't trying to get to a working result, so I didn't do >> anything about removing all the free's or fixing the cast-away-const >> warnings. The result was disappointing though. On my Mac laptop >> (Apple clang version 15.0.0), the compile time for preproc.o went from >> 6.7sec to 5.5sec. Which is better, but not enough better to persuade >> me to do all the janitorial work of restructuring ecpg's >> string-slinging. I think we haven't really identified the problem. > With what level of optimization was that? It kinda looks like their version > might be from before the worst of the issue... Just the autoconf-default -O2. > FWIW, just redefining mm_strdup() that way doesn't help much here either, > even with an affected compiler. The gain increases substantially after > simplifying unreserved_keyword etc to just use the default action. Hm. In any case, this is all moot unless we can come to a new design for how ecpg does its string-mashing. Thoughts? I thought for a bit about not allocating strings as such, but just passing around pointers into the source text plus lengths, and reassembling the string data only at the end when we need to output it. Not sure how well that would work, but it could be a starting point. regards, tom lane
Hi, On 2024-04-09 20:12:48 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > FWIW, just redefining mm_strdup() that way doesn't help much here either, > > even with an affected compiler. The gain increases substantially after > > simplifying unreserved_keyword etc to just use the default action. > > Hm. > > In any case, this is all moot unless we can come to a new design for > how ecpg does its string-mashing. Thoughts? Tthere might be a quick-n-dirty way: We could make pgc.l return dynamically allocated keywords. Am I missing something, or is ecpg string handling almost comically inefficient? Building up strings in tiny increments, which then get mashed together to get slightly larger pieces, just to then be mashed together again? It's like an intentional allocator stress test. It's particularly absurd because in the end we just print those strings, after carefully assembling them... > I thought for a bit about not allocating strings as such, but just > passing around pointers into the source text plus lengths, and > reassembling the string data only at the end when we need to output it. > Not sure how well that would work, but it could be a starting point. I was wondering about something vaguely similar: Instead of concatenating individual strings, append them to a stringinfo. The stringinfo can be reset individually... Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2024-04-09 20:12:48 -0400, Tom Lane wrote: >> In any case, this is all moot unless we can come to a new design for >> how ecpg does its string-mashing. Thoughts? > Am I missing something, or is ecpg string handling almost comically > inefficient? Building up strings in tiny increments, which then get mashed > together to get slightly larger pieces, just to then be mashed together again? > It's like an intentional allocator stress test. > It's particularly absurd because in the end we just print those strings, after > carefully assembling them... It is that. Here's what I'm thinking: probably 90% of what ecpg does is to verify that a chunk of its input represents a valid bit of SQL (or C) syntax and then emit a representation of that chunk. Currently, that representation tends to case-normalize tokens and smash inter-token whitespace and comments to a single space. I propose though that neither of those behaviors is mission-critical, or even all that desirable. I think few users would complain if ecpg preserved the input's casing and spacing and comments. Given that definition, most of ecpg's productions (certainly just about all the auto-generated ones) would simply need to return a pointer and length describing a part of the input string. There are places where ecpg wants to insert some text it generates, and I think it might need to re-order text in a few places, so we need a production result representation that can cope with those cases --- but if we can make "regurgitate the input" cases efficient, I think we'll have licked the performance problem. With that in mind, I wonder whether we couldn't make the simple cases depend on bison's existing support for location tracking. In which case, the actual actions for all those cases could be default, achieving one of the goals you mention. Obviously, this is not going to be a small lift, but it kind of seems do-able. regards, tom lane
On Wed, Apr 10, 2024 at 11:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > ... On my Mac laptop > (Apple clang version 15.0.0), the compile time for preproc.o went from > 6.7sec to 5.5sec. Having seen multi-minute compile times on FreeBSD (where clang is the system compiler) and Debian (where I get packages from apt.llvm.org), I have been quietly waiting for this issue to hit Mac users too (where a clang with unknown proprietary changes is the system compiler), but it never did. Huh. I tried to understand a bit more about Apple's version soup. This seems to be an up-to-date table (though I don't understand their source of information): https://en.wikipedia.org/wiki/Xcode#Xcode_15.0_-_(since_visionOS_support)_2 According to cc -v on my up-to-date MacBook Air, it has "Apple clang version 15.0.0 (clang-1500.3.9.4)", which, if the table is correct, means that it's using LLVM 16.0.0 (note, not 16.0.6, the final version of that branch of [open] LLVM, and the version I saw the issue with on FreeBSD and Debian). They relabel everything to match the Xcode version that shipped it, and they're currently off by one. I wondered if perhaps the table just wasn't accurate in the final digits, so I looked for clues in strings in the binary, and sure enough it contains "LLVM 15.0.0". My guess would be that they've clobbered the major version, but not the rest: the Xcode version is 15.3, and I don't see a 3, so I guess this is really derived from LLVM 16.0.0. One explanation would be that they rebase their proprietary bits and pieces over the .0 version of each major release, and then cherry-pick urgent fixes and stuff later, not pulling in the whole minor release; they also presumably have to maintain it for much longer than the LLVM project's narrow support window. Who knows. So now I wonder if it could be that LLVM 16.0.6 does this, but LLVM 16.0.0 doesn't. I installed clang-16 (16.0.6) with MacPorts, and it does show the problem.
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Apr 10, 2024 at 11:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... On my Mac laptop >> (Apple clang version 15.0.0), the compile time for preproc.o went from >> 6.7sec to 5.5sec. > Having seen multi-minute compile times on FreeBSD (where clang is the > system compiler) and Debian (where I get packages from apt.llvm.org), > I have been quietly waiting for this issue to hit Mac users too (where > a clang with unknown proprietary changes is the system compiler), but > it never did. Huh. I don't doubt that there are other clang versions where the problem bites a lot harder. What result do you get from the test I tried (turning mm_strdup into a no-op macro)? regards, tom lane
On Wed, Apr 10, 2024 at 5:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I don't doubt that there are other clang versions where the problem > bites a lot harder. What result do you get from the test I tried > (turning mm_strdup into a no-op macro)? #define mm_strdup(x) (x) does this: Apple clang 15: master: 14s -> 9s MacPorts clang 16, master: 170s -> 10s
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Apr 10, 2024 at 5:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I don't doubt that there are other clang versions where the problem >> bites a lot harder. What result do you get from the test I tried >> (turning mm_strdup into a no-op macro)? > #define mm_strdup(x) (x) does this: > Apple clang 15: master: 14s -> 9s > MacPorts clang 16, master: 170s -> 10s Wow. So (a) it's definitely worth doing something about this; but (b) anything that would move the needle would probably require significant refactoring of ecpg's string handling, and I doubt we want to consider that post-feature-freeze. The earliest we could land such a fix would be ~ July, if we follow past schedules. The immediate question then is do we want to take Jelte's patch as a way to ameliorate the pain meanwhile. I'm kind of down on it, because AFAICS what would happen if you break the core grammar is that (most likely) the failure would be reported against ecpg first. That seems pretty confusing. However, maybe it's tolerable as a short-term band-aid that we plan to revert later. I would not commit the makefile side of the patch though, as that creates the same problem for makefile users while providing little benefit. As for the longer-term fix, I'd be willing to have a go at implementing the sketch I wrote last night; but I'm also happy to defer to anyone else who's hot to work on this. regards, tom lane
On 10.04.24 17:33, Tom Lane wrote: > The immediate question then is do we want to take Jelte's patch > as a way to ameliorate the pain meanwhile. I'm kind of down on > it, because AFAICS what would happen if you break the core > grammar is that (most likely) the failure would be reported > against ecpg first. That seems pretty confusing. Yeah that would be confusing. I suppose we could just take the part of the patch that moves up preproc among the subdirectories of ecpg, but I don't know if that by itself would really buy anything.
On Wed, 17 Apr 2024 at 13:41, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 10.04.24 17:33, Tom Lane wrote: > > The immediate question then is do we want to take Jelte's patch > > as a way to ameliorate the pain meanwhile. I'm kind of down on > > it, because AFAICS what would happen if you break the core > > grammar is that (most likely) the failure would be reported > > against ecpg first. That seems pretty confusing. > > Yeah that would be confusing. How can I test if this actually happens? Because it sounds like if that indeed happens it should be fixable fairly easily.
Jelte Fennema-Nio <postgres@jeltef.nl> writes: > How can I test if this actually happens? Because it sounds like if > that indeed happens it should be fixable fairly easily. Break gram.y (say, misspell some token in a production) and see what happens. The behavior's likely to be timing sensitive though. regards, tom lane
On Wed, 17 Apr 2024 at 16:00, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Break gram.y (say, misspell some token in a production) and > see what happens. The behavior's likely to be timing sensitive > though. Thanks for clarifying. It took me a little while to break gram.y in such a way that I was able to consistently reproduce, but I managed in the end using the attached small diff. And then running ninja in non-parallel mode: ninja -C build all -j1 As I expected this problem was indeed fairly easy to address by still building "backend/parser" before "interfaces". See attached patch.
Attachment
Jelte Fennema-Nio <postgres@jeltef.nl> writes: > As I expected this problem was indeed fairly easy to address by still > building "backend/parser" before "interfaces". See attached patch. I think we should hold off on this. I found a simpler way to address ecpg's problem than what I sketched upthread. I have a not-ready-to- show-yet patch that allows the vast majority of ecpg's grammar productions to use the default semantic action. Testing on my M1 Macbook with clang 16.0.6 from MacPorts, I see the compile time for preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec. The core idea of the patch is to get rid of <str> results from grammar nonterminals and instead pass the strings back as yylloc results, which we can do by redefining YYLTYPE as "char *"; since ecpg isn't using Bison's location logic for anything, this is free. Then we can implement a one-size-fits-most token concatenation rule in YYLLOC_DEFAULT, and only the various handmade rules that don't want to just concatenate their inputs need to do something different. The patch presently passes regression tests, but its memory management is shoddy as can be (basically "leak like there's no tomorrow"), and I want to fix that before presenting it. One could almost argue that we don't care about memory consumption of the ecpg preprocessor; but I think it's possible to do better. regards, tom lane
On 2024-04-17 23:10:53 -0400, Tom Lane wrote: > Jelte Fennema-Nio <postgres@jeltef.nl> writes: > > As I expected this problem was indeed fairly easy to address by still > > building "backend/parser" before "interfaces". See attached patch. > > I think we should hold off on this. I found a simpler way to address > ecpg's problem than what I sketched upthread. I have a not-ready-to- > show-yet patch that allows the vast majority of ecpg's grammar > productions to use the default semantic action. Testing on my M1 > Macbook with clang 16.0.6 from MacPorts, I see the compile time for > preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec. That's pretty amazing.
Andres Freund <andres@anarazel.de> writes: > On 2024-04-17 23:10:53 -0400, Tom Lane wrote: >> I think we should hold off on this. I found a simpler way to address >> ecpg's problem than what I sketched upthread. I have a not-ready-to- >> show-yet patch that allows the vast majority of ecpg's grammar >> productions to use the default semantic action. Testing on my M1 >> Macbook with clang 16.0.6 from MacPorts, I see the compile time for >> preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec. > That's pretty amazing. Patch posted at [1]. regards, tom lane [1] https://www.postgresql.org/message-id/2011420.1713493114%40sss.pgh.pa.us
On Wed, Apr 17, 2024 at 11:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think we should hold off on this. I found a simpler way to address > ecpg's problem than what I sketched upthread. I have a not-ready-to- > show-yet patch that allows the vast majority of ecpg's grammar > productions to use the default semantic action. Testing on my M1 > Macbook with clang 16.0.6 from MacPorts, I see the compile time for > preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec. If this is the consensus opinion, then https://commitfest.postgresql.org/48/4914/ should be marked Rejected. However, while I think the improvements that Tom was able to make here sound fantastic, I don't understand why that's an argument against Jelte's patches. After all, Tom's work will only go into v18, but this patch could be adopted in v17 and back-patched to all releases that support meson builds, saving oodles of compile time for as long as those releases are supported. The most obvious beneficiary of that course of action would seem to be Tom himself, since he back-patches more fixes than anybody, last I checked, but it'd be also be useful to get slightly quicker results from the buildfarm and slightly quicker results for anyone using CI on back-branches and for other hackers who are looking to back-patch bug fixes. I don't quite understand why we want to throw those potential benefits out the window just because we have a better fix for the future. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > If this is the consensus opinion, then > https://commitfest.postgresql.org/48/4914/ should be marked Rejected. > However, while I think the improvements that Tom was able to make here > sound fantastic, I don't understand why that's an argument against > Jelte's patches. After all, Tom's work will only go into v18, but this > patch could be adopted in v17 and back-patched to all releases that > support meson builds, saving oodles of compile time for as long as > those releases are supported. The most obvious beneficiary of that > course of action would seem to be Tom himself, since he back-patches > more fixes than anybody, last I checked, but it'd be also be useful to > get slightly quicker results from the buildfarm and slightly quicker > results for anyone using CI on back-branches and for other hackers who > are looking to back-patch bug fixes. I don't quite understand why we > want to throw those potential benefits out the window just because we > have a better fix for the future. As I mentioned upthread, I'm more worried about confusing error reports than the machine time. It would save me personally exactly nada, since (a) I usually develop with gcc not clang, (b) when I do use clang it's not a heavily-affected version, and (c) since we *very* seldom change the grammar in stable branches, ccache will hide the problem pretty effectively anyway in the back branches. (If you're not using ccache, please don't complain about build time.) I grant that there are people who are more affected, but still, I'd just as soon not contort the build rules for a temporary problem. regards, tom lane
On Fri, May 17, 2024 at 4:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > As I mentioned upthread, I'm more worried about confusing error > reports than the machine time. Well, Jelte fixed that. > I grant that there are people who are more affected, but still, I'd > just as soon not contort the build rules for a temporary problem. Arguably by doing this, but I don't think it's enough of a contortion to get excited about. Anyone else want to vote? -- Robert Haas EDB: http://www.enterprisedb.com
On 17.05.24 23:01, Robert Haas wrote: > On Fri, May 17, 2024 at 4:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> As I mentioned upthread, I'm more worried about confusing error >> reports than the machine time. > > Well, Jelte fixed that. > >> I grant that there are people who are more affected, but still, I'd >> just as soon not contort the build rules for a temporary problem. > > Arguably by doing this, but I don't think it's enough of a contortion > to get excited about. > > Anyone else want to vote? I retested the patch from 2024-04-07 (I think that's the one that "fixed that"? There are multiple "v1" patches in this thread.) using gcc-14 and clang-18, with ccache disabled of course. The measured effects of the patch are: gcc-14: 1% slower clang-18: 3% faster So with that, it doesn't seem very interesting.
On 2024-May-17, Robert Haas wrote: > Anyone else want to vote? I had pretty much the same thought as you. It seems a waste to leave the code in existing branches be slow only because we have a much better approach for a branch that doesn't even exist yet. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "We're here to devour each other alive" (Hobbes)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2024-May-17, Robert Haas wrote: >> Anyone else want to vote? > I had pretty much the same thought as you. It seems a waste to leave > the code in existing branches be slow only because we have a much better > approach for a branch that doesn't even exist yet. I won't complain too loudly as long as we remember to revert the patch from HEAD once the ecpg fix goes in. regards, tom lane
On Sat, May 18, 2024 at 6:09 PM Andres Freund <andres@anarazel.de> wrote: > A few tests with ccache disabled: These tests seem to show no difference between the two releases, so I wonder what you're intending to demonstrate here. Locally, I have ninja 1.11.1. I'm sure Andres will be absolutely shocked, shocked I say, to hear that I haven't upgraded to the very latest. Anyway, I tried a clean build with CC=clang without the patch and then with the patch and got: unpatched - real 1m9.872s patched - real 1m6.130s That's the time to run my script that first calls meson setup and then afterward runs ninja. I tried ninja -v and put the output to a file. With the patch: [292/2402] /opt/local/bin/perl ../pgsql/src/interfaces/ecpg/preproc/parse.pl --srcdir ../pgsql/src/interfaces/ecpg/preproc --parser ../pgsql/src/interfaces/ecpg/preproc/../../../backend/parser/gram.y --output src/interfaces/ecpg/preproc/preproc.y And without: [1854/2402] /opt/local/bin/perl ../pgsql/src/interfaces/ecpg/preproc/parse.pl --srcdir ../pgsql/src/interfaces/ecpg/preproc --parser ../pgsql/src/interfaces/ecpg/preproc/../../../backend/parser/gram.y --output src/interfaces/ecpg/preproc/preproc.y With my usual CC='ccache clang', I get real 0m37.500s unpatched and real 0m37.786s patched. I also tried this with the original v1 patch (not to be confused with the revised, still-v1 patch) and got real 37.950s. So I guess I'm now feeling pretty unexcited about this. If the patch really did what the subject line says, that would be nice to have. However, since Tom will patch the underlying problem in v18, this will only help people building the back-branches. Of those, it will only help the people using a slow compiler; I read the thread as suggesting that you need to be using clang rather than gcc and also not using ccache. Plus, it sounds like you also need an old ninja version. Even then, the highest reported savings is 10s and I only save 3s. I think that's a small enough savings with enough conditionals that we should not bother. It seems fine to say that people who need the fastest possible builds of back-branches should use at least one of gcc, ccache, and ninja >= 1.12. I'll go mark this patch Rejected in the CommitFest. Incidentally, this thread is an excellent object lesson in why it's so darn hard to cut down the size of the CF. I mean, this is a 7-line patch and we've now got a 30+ email thread about it. In my role as temporary self-appointed wielder of the CommitFest mace, I have now spent probably a good 2 hours trying to figure out what to do about it. There are more than 230 active patches in the CommitFest. That means CommitFest management would be more than a full-time job for an experienced committer even if every patch were as simple as this one, which is definitely not the case. If we want to restore some sanity here, we're going to have to find some way of distributing the burden across more people, including the patch authors. Note that Jelte's last update to the thread is over a month ago. I'm not saying that he's done something wrong, but I do strongly suspect that the time that the community as whole has spent on this patch is a pretty significant multiple of the time that he personally spent on it, and such dynamics are bound to create scaling issues. I don't know how we do better: I just want to highlight the problem. -- Robert Haas EDB: http://www.enterprisedb.com
On 2024-05-20 09:37:46 -0400, Robert Haas wrote: > On Sat, May 18, 2024 at 6:09 PM Andres Freund <andres@anarazel.de> wrote: > > A few tests with ccache disabled: > > These tests seem to show no difference between the two releases, so I > wonder what you're intending to demonstrate here. They show a few seconds of win for 'real' time. -O0: 0m21.577s->0m19.529s -O3: 0m59.730s->0m54.853s
On Mon, May 20, 2024 at 11:37 AM Andres Freund <andres@anarazel.de> wrote: > On 2024-05-20 09:37:46 -0400, Robert Haas wrote: > > On Sat, May 18, 2024 at 6:09 PM Andres Freund <andres@anarazel.de> wrote: > > > A few tests with ccache disabled: > > > > These tests seem to show no difference between the two releases, so I > > wonder what you're intending to demonstrate here. > > They show a few seconds of win for 'real' time. > -O0: 0m21.577s->0m19.529s > -O3: 0m59.730s->0m54.853s Ah, OK, I think I misinterpreted. -- Robert Haas EDB: http://www.enterprisedb.com
On 19.05.24 00:09, Andres Freund wrote: > On 2024-05-18 00:35:08 +0200, Peter Eisentraut wrote: >> I retested the patch from 2024-04-07 (I think that's the one that "fixed >> that"? There are multiple "v1" patches in this thread.) using gcc-14 and >> clang-18, with ccache disabled of course. The measured effects of the patch >> are: >> >> gcc-14: 1% slower >> clang-18: 3% faster >> >> So with that, it doesn't seem very interesting. > I wonder whether the reason you're seing less benefit than Jelte is that - I'm > guessing - you now used ninja 1.12 and Jelte something older. Ninja 1.12 > prioritizes building edges using a "critical path" heuristic, leading to > scheduling nodes with more incoming dependencies, and deeper in the dependency > graph earlier. Yes! Very interesting! With ninja 1.11 and gcc-14, I see the patch gives about a 17% speedup.