Thread: [MASSMAIL]Speed up clean meson builds by ~25%

[MASSMAIL]Speed up clean meson builds by ~25%

From

Jelte Fennema-Nio

Date:

04 April 2024, 22:45:20

Building the generated ecpg preproc file can take a long time. You can
check how long using:

ninja -C build src/interfaces/ecpg/preproc/ecpg.p/meson-generated_.._preproc.c.o

This moves that file much closer to the top of our build order, so
building it can be pipelined much better with other files.

It improved clean build times on my machine (10 cores/20 threads) from ~40
seconds to ~30 seconds.

You can check improvements for yourself with:

ninja -C build clean && ninja -C build all

Attachment

v1-0001-Speed-up-clean-parallel-meson-builds-a-lot.patch

Re: Speed up clean meson builds by ~25%

From

Jelte Fennema-Nio

Date:

05 April 2024, 13:36:34

On Fri, 5 Apr 2024 at 00:45, Jelte Fennema-Nio <me@jeltef.nl> wrote:
> It improved clean build times on my machine (10 cores/20 threads) from ~40
> seconds to ~30 seconds.

After discussing this off-list with Bilal, I realized that this gain
is only happening for clang builds on my system. Because those take a
long time as was also recently discussed in[1]. My builds don't take
nearly as long though. I tried with clang 15 through 18 and they all
took 10-22 seconds to run and clang comes from apt.llvm.org on Ubuntu
22.04

[1]:
https://www.postgresql.org/message-id/flat/CA%2BhUKGLvJ7-%3DfS-J9kN%3DaZWrpyqykwqCBbxXLEhUa9831dPFcg%40mail.gmail.com

Re: Speed up clean meson builds by ~25%

From

Andres Freund

Date:

05 April 2024, 15:24:43

Hi,

On 2024-04-05 15:36:34 +0200, Jelte Fennema-Nio wrote:
> On Fri, 5 Apr 2024 at 00:45, Jelte Fennema-Nio <me@jeltef.nl> wrote:
> > It improved clean build times on my machine (10 cores/20 threads) from ~40
> > seconds to ~30 seconds.
> 
> After discussing this off-list with Bilal, I realized that this gain
> is only happening for clang builds on my system. Because those take a
> long time as was also recently discussed in[1]. My builds don't take
> nearly as long though. I tried with clang 15 through 18 and they all
> took 10-22 seconds to run and clang comes from apt.llvm.org on Ubuntu
> 22.04
> 
> [1]:
https://www.postgresql.org/message-id/flat/CA%2BhUKGLvJ7-%3DfS-J9kN%3DaZWrpyqykwqCBbxXLEhUa9831dPFcg%40mail.gmail.com

I recommend opening a bug report for clang, best with an already preprocessed
input file.

We're going to need to do something about this from our side as well, I
suspect. The times aren't great with gcc either, even if not as bad as with
clang.

Greetings,

Andres Freund

Re: Speed up clean meson builds by ~25%

From

Jelte Fennema-Nio

Date:

05 April 2024, 16:19:20

On Fri, 5 Apr 2024 at 17:24, Andres Freund <andres@anarazel.de> wrote:
> I recommend opening a bug report for clang, best with an already preprocessed
> input file.

> We're going to need to do something about this from our side as well, I
> suspect. The times aren't great with gcc either, even if not as bad as with
> clang.

Agreed. While not a full solution, I think this patch is still good to
address some of the pain: Waiting 10 seconds at the end of my build
with only 1 of my 10 cores doing anything.

So while this doesn't decrease CPU time spent it does decrease
wall-clock time significantly in some cases, with afaict no downsides.

Re: Speed up clean meson builds by ~25%

From

Michael Paquier

Date:

08 April 2024, 05:23:07

On Fri, Apr 05, 2024 at 06:19:20PM +0200, Jelte Fennema-Nio wrote:
> Agreed. While not a full solution, I think this patch is still good to
> address some of the pain: Waiting 10 seconds at the end of my build
> with only 1 of my 10 cores doing anything.
>
> So while this doesn't decrease CPU time spent it does decrease
> wall-clock time significantly in some cases, with afaict no downsides.

Well, this is also painful with ./configure.  So, even if we are going
to move away from it at this point, we still need to support it for a
couple of years.  It looks to me that letting the clang folks know
about the situation is the best way forward.
--
Michael

Attachment

signature.asc

Re: Speed up clean meson builds by ~25%

From

Alexander Lakhin

Date:

08 April 2024, 08:00:00

Hello Michael,

08.04.2024 08:23, Michael Paquier wrote:
> On Fri, Apr 05, 2024 at 06:19:20PM +0200, Jelte Fennema-Nio wrote:
>> Agreed. While not a full solution, I think this patch is still good to
>> address some of the pain: Waiting 10 seconds at the end of my build
>> with only 1 of my 10 cores doing anything.
>>
>> So while this doesn't decrease CPU time spent it does decrease
>> wall-clock time significantly in some cases, with afaict no downsides.
> Well, this is also painful with ./configure.  So, even if we are going
> to move away from it at this point, we still need to support it for a
> couple of years.  It looks to me that letting the clang folks know
> about the situation is the best way forward.
>

As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
is fixed already.
Perhaps it's worth rechecking...

[1] https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com

Best regards,
Alexander

Re: Speed up clean meson builds by ~25%

From

Peter Eisentraut

Date:

08 April 2024, 08:02:18

On 05.04.24 18:19, Jelte Fennema-Nio wrote:
> On Fri, 5 Apr 2024 at 17:24, Andres Freund <andres@anarazel.de> wrote:
>> I recommend opening a bug report for clang, best with an already preprocessed
>> input file.
> 
>> We're going to need to do something about this from our side as well, I
>> suspect. The times aren't great with gcc either, even if not as bad as with
>> clang.
> 
> Agreed. While not a full solution, I think this patch is still good to
> address some of the pain: Waiting 10 seconds at the end of my build
> with only 1 of my 10 cores doing anything.
> 
> So while this doesn't decrease CPU time spent it does decrease
> wall-clock time significantly in some cases, with afaict no downsides.

I have tested this with various compilers, and I can confirm that this 
shaves off about 5 seconds from the build wall-clock time, which 
represents about 10%-20% of the total time.  I think this is a good patch.

Interestingly, if I apply the analogous changes to the makefiles, I 
don't get any significant improvements.  (Depends on local 
circumstances, of course.)  But I would still suggest to keep the 
makefiles aligned.

Re: Speed up clean meson builds by ~25%

From

Jelte Fennema-Nio

Date:

08 April 2024, 08:36:59

On Mon, 8 Apr 2024 at 10:00, Alexander Lakhin <exclusion@gmail.com> wrote:
> As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
> is fixed already.
> Perhaps it's worth rechecking...

Using the attached script I got these timings. Clang is significantly
slower in all of them. But especially with -Og the difference between
is huge.

gcc 11.4.0: 7.276s
clang 18.1.3: 17.216s
gcc 11.4.0 --debug: 7.441s
clang 18.1.3 --debug: 18.164s
gcc 11.4.0 --debug -Og: 2.418s
clang 18.1.3 --debug -Og: 14.864s

I reported this same issue to the LLVM project here:
https://github.com/llvm/llvm-project/issues/87973

Attachment

compile.sh.txt

Re: Speed up clean meson builds by ~25%

From

Jelte Fennema-Nio

Date:

08 April 2024, 08:59:17

On Mon, 8 Apr 2024 at 10:02, Peter Eisentraut <peter@eisentraut.org> wrote:
> I have tested this with various compilers, and I can confirm that this
> shaves off about 5 seconds from the build wall-clock time, which
> represents about 10%-20% of the total time.  I think this is a good patch.

Great to hear.

> Interestingly, if I apply the analogous changes to the makefiles, I
> don't get any significant improvements.  (Depends on local
> circumstances, of course.)  But I would still suggest to keep the
> makefiles aligned.

Attached is a patch that also updates the Makefiles, but indeed I
don't get any perf gains there either.

On Mon, 8 Apr 2024 at 07:23, Michael Paquier <michael@paquier.xyz> wrote:
> Well, this is also painful with ./configure.  So, even if we are going
> to move away from it at this point, we still need to support it for a
> couple of years.  It looks to me that letting the clang folks know
> about the situation is the best way forward.

I reported the issue to the clang folks:
https://github.com/llvm/llvm-project/issues/87973

But even if my patch doesn't help for ./configure, it still seems like
a good idea to me to still reduce compile times when using meson while
we wait for clang folks to address the issue.

Attachment

v2-0001-Speed-up-clean-parallel-meson-builds-a-lot.patch

Re: Speed up clean meson builds by ~25%

From

Alexander Lakhin

Date:

08 April 2024, 09:00:00

Hello Jelte,

08.04.2024 11:36, Jelte Fennema-Nio wrote:
> On Mon, 8 Apr 2024 at 10:00, Alexander Lakhin <exclusion@gmail.com> wrote:
>> As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
>> is fixed already.
>> Perhaps it's worth rechecking...
> Using the attached script I got these timings. Clang is significantly
> slower in all of them. But especially with -Og the difference between
> is huge.
>
> gcc 11.4.0: 7.276s
> clang 18.1.3: 17.216s
> gcc 11.4.0 --debug: 7.441s
> clang 18.1.3 --debug: 18.164s
> gcc 11.4.0 --debug -Og: 2.418s
> clang 18.1.3 --debug -Og: 14.864s
>
> I reported this same issue to the LLVM project here:
> https://github.com/llvm/llvm-project/issues/87973

Maybe we're talking about different problems.
At [1] Thomas (and then I) was unhappy with more than 200 seconds
duration...

https://www.postgresql.org/message-id/CA%2BhUKGLvJ7-%3DfS-J9kN%3DaZWrpyqykwqCBbxXLEhUa9831dPFcg%40mail.gmail.com

Best regards,
Alexander

Re: Speed up clean meson builds by ~25%

From

Nazir Bilal Yavuz

Date:

08 April 2024, 09:23:56

Hi,

On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin <exclusion@gmail.com> wrote:
>
> Hello Michael,
>
> 08.04.2024 08:23, Michael Paquier wrote:
> > On Fri, Apr 05, 2024 at 06:19:20PM +0200, Jelte Fennema-Nio wrote:
> >> Agreed. While not a full solution, I think this patch is still good to
> >> address some of the pain: Waiting 10 seconds at the end of my build
> >> with only 1 of my 10 cores doing anything.
> >>
> >> So while this doesn't decrease CPU time spent it does decrease
> >> wall-clock time significantly in some cases, with afaict no downsides.
> > Well, this is also painful with ./configure.  So, even if we are going
> > to move away from it at this point, we still need to support it for a
> > couple of years.  It looks to me that letting the clang folks know
> > about the situation is the best way forward.
> >
>
> As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
> is fixed already.
> Perhaps it's worth rechecking...
>
> [1] https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com

I had this problem on my local computer. My build times are:

gcc: 20s
clang-15: 24s
clang-16: 105s
clang-17: 111s
clang-18: 25s

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: Speed up clean meson builds by ~25%

From

Michael Paquier

Date:

09 April 2024, 05:01:00

On Mon, Apr 08, 2024 at 12:23:56PM +0300, Nazir Bilal Yavuz wrote:
> On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin <exclusion@gmail.com> wrote:
>> As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
>> is fixed already.
>> Perhaps it's worth rechecking...
>>
>> [1] https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com
>
> I had this problem on my local computer. My build times are:
>
> gcc: 20s
> clang-15: 24s
> clang-16: 105s
> clang-17: 111s
> clang-18: 25s

Interesting.  A parallel build of ecpg shows similar numbers here:
clang-16: 101s
clang-17: 112s
clang-18: 14s
gcc: 10s

Most of the time is still spent on preproc.c with clang-18, but that's
much, much faster (default version of clang is 16 on Debian GID where
I've run these numbers).
--
Michael

Attachment

signature.asc

Re: Speed up clean meson builds by ~25%

From

Thomas Munro

Date:

09 April 2024, 05:13:52

On Tue, Apr 9, 2024 at 5:01 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Mon, Apr 08, 2024 at 12:23:56PM +0300, Nazir Bilal Yavuz wrote:
> > On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin <exclusion@gmail.com> wrote:
> >> As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
> >> is fixed already.
> >> Perhaps it's worth rechecking...
> >>
> >> [1] https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com
> >
> > I had this problem on my local computer. My build times are:
> >
> > gcc: 20s
> > clang-15: 24s
> > clang-16: 105s
> > clang-17: 111s
> > clang-18: 25s
>
> Interesting.  A parallel build of ecpg shows similar numbers here:
> clang-16: 101s
> clang-17: 112s
> clang-18: 14s
> gcc: 10s

I don't expect it to get fixed BTW, because it's present in 16.0.6,
and .6 is the terminal release, if I understand their system
correctly.  They're currently only doing bug fixes for 18, and even
there not for much longer. Interesting that not everyone saw this at
first, perhaps the bug arrived in a minor release that some people
didn't have yet?  Or perhaps there is something special required to
trigger it?

Re: Speed up clean meson builds by ~25%

From

Andres Freund

Date:

09 April 2024, 22:33:10

Hi,

On 2024-04-09 17:13:52 +1200, Thomas Munro wrote:
> On Tue, Apr 9, 2024 at 5:01 PM Michael Paquier <michael@paquier.xyz> wrote:
> > On Mon, Apr 08, 2024 at 12:23:56PM +0300, Nazir Bilal Yavuz wrote:
> > > On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin <exclusion@gmail.com> wrote:
> > >> As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
> > >> is fixed already.
> > >> Perhaps it's worth rechecking...
> > >>
> > >> [1] https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com
> > >
> > > I had this problem on my local computer. My build times are:
> > >
> > > gcc: 20s
> > > clang-15: 24s
> > > clang-16: 105s
> > > clang-17: 111s
> > > clang-18: 25s
> >
> > Interesting.  A parallel build of ecpg shows similar numbers here:
> > clang-16: 101s
> > clang-17: 112s
> > clang-18: 14s
> > gcc: 10s
>
> I don't expect it to get fixed BTW, because it's present in 16.0.6,
> and .6 is the terminal release, if I understand their system
> correctly.  They're currently only doing bug fixes for 18, and even
> there not for much longer. Interesting that not everyone saw this at
> first, perhaps the bug arrived in a minor release that some people
> didn't have yet?  Or perhaps there is something special required to
> trigger it?

I think we need to do something about the compile time of this file, even with
gcc. Our main grammar already is an issue and stacking all the ecpg stuff on
top makes it considerably worse.

ISTM there's a bunch of pretty pointless stuff in the generated preproc.y,
which do seem to have some impact on compile time. E.g. a good bit of the file
is just stuff like

 reserved_keyword:
 ALL
 {
 $$ = mm_strdup("all");
}
...

Why are strduping all of these? We could instead just use the value of the
token, instead of forcing the compiler to generate branches for all individual
keywords etc.

I don't know off-hand if the keyword lookup machinery ends up with an
uppercase keyword, but if so, that'd be easy enough to change.

It actually looks to me like the many calls to mm_strdup() might actually be
what's driving clang nuts. I hacked up preproc.y to not need those calls for
  unreserved_keyword
  col_name_keyword
  type_func_name_keyword
  reserved_keyword
  bare_label_keyword
by removing the actions and defining those tokens to be of type str. There are
many more such calls that could be dealt with similarly.

That alone reduced compile times with
    clang-16 -O1 from  18.268s to  12.516s
    clang-16 -O2 from 345.188  to 158.084s
    clang-19 -O2 from  26.018s to  15.200s

I suspect what is happening is that clang tries to optimize the number of
calls to mm_strdup(), by separating the argument setup from the function
call. Which leads to a control flow graph with *many* incoming edges to the
basic block containing the function call to mm_strdup(), triggering a normally
harmless O(N^2) or such.

Greetings,

Andres Freund

Re: Speed up clean meson builds by ~25%

From

Tom Lane

Date:

09 April 2024, 23:00:41

Andres Freund <andres@anarazel.de> writes:
> I think we need to do something about the compile time of this file, even with
> gcc. Our main grammar already is an issue and stacking all the ecpg stuff on
> top makes it considerably worse.

Seems reasonable, if we can.

> Why are strduping all of these?

IIRC, the issue is that the mechanism for concatenating the tokens
back together frees the input strings

static char *
cat2_str(char *str1, char *str2)
{
    char * res_str    = (char *)mm_alloc(strlen(str1) + strlen(str2) + 2);

    strcpy(res_str, str1);
    if (strlen(str1) != 0 && strlen(str2) != 0)
        strcat(res_str, " ");
    strcat(res_str, str2);
    free(str1);          <------------------
    free(str2);          <------------------
    return res_str;
}

So that ought to dump core if you don't make all the productions
return malloc'd strings.  How did you work around that?

(Maybe it'd be okay to just leak all the strings?)

            regards, tom lane

Re: Speed up clean meson builds by ~25%

From

Andres Freund

Date:

09 April 2024, 23:11:37

Hi,

On 2024-04-09 15:33:10 -0700, Andres Freund wrote:
> Which leads to a control flow graph with *many* incoming edges to the basic
> block containing the function call to mm_strdup(), triggering a normally
> harmless O(N^2) or such.

With clang-16 -O2 there is a basic block with 3904 incoming basic blocks. With
the hacked up preproc.y it's 2968. A 30% increase leading to a doubling of
runtime imo seems consistent with my theory of there being some ~quadratic
behaviour.

I suspect that this is also what's causing gram.c compilation to be fairly
slow, with both clang and gcc. There aren't as many pstrdup()s in gram.y as
the are mm_strdup() in preproc.y, but there still are many.

ISTM that there are many pstrdup()s that really make very little sense. I
think it's largely because there are many rules declared %type <str>, which
prevents us from returning a string constant without a warning.

There may be (?) some rules that modify strings returned by subsidiary rules,
but if so, it can't be many.

Greetings,

Andres

Re: Speed up clean meson builds by ~25%

From

Andres Freund

Date:

09 April 2024, 23:22:38

Hi,

On 2024-04-09 19:00:41 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I think we need to do something about the compile time of this file, even with
> > gcc. Our main grammar already is an issue and stacking all the ecpg stuff on
> > top makes it considerably worse.
>
> Seems reasonable, if we can.
>
> > Why are strduping all of these?
>
> IIRC, the issue is that the mechanism for concatenating the tokens
> back together frees the input strings

Ah, that explains it - but also seems somewhat unnecessary.

> So that ought to dump core if you don't make all the productions
> return malloc'd strings.  How did you work around that?

I just tried to get to the point of understanding the reasons for slow
compilation, not to actually keep it working :). I.e. I didn't.

> (Maybe it'd be okay to just leak all the strings?)

Hm. The input to ecpg can be fairly large, I guess. And we have fun code like
cat_str(), which afaict is O(arguments^2) in its memory usage if we wouldn't
free?

Not immediately sure what the right path is.

Greetings,

Andres Freund

Re: Speed up clean meson builds by ~25%

From

Tom Lane

Date:

09 April 2024, 23:44:03

Andres Freund <andres@anarazel.de> writes:
> On 2024-04-09 19:00:41 -0400, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> Why are strduping all of these?

>> IIRC, the issue is that the mechanism for concatenating the tokens
>> back together frees the input strings

> Ah, that explains it - but also seems somewhat unnecessary.

I experimented with replacing mm_strdup() with

#define mm_strdup(x) (x)

As you did, I wasn't trying to get to a working result, so I didn't do
anything about removing all the free's or fixing the cast-away-const
warnings.  The result was disappointing though.  On my Mac laptop
(Apple clang version 15.0.0), the compile time for preproc.o went from
6.7sec to 5.5sec.  Which is better, but not enough better to persuade
me to do all the janitorial work of restructuring ecpg's
string-slinging.  I think we haven't really identified the problem.

As a comparison point, compiling gram.o on the same machine
takes 1.3sec.  So I am seeing a problem here, sure enough,
although not as bad as it is in some other clang versions.

            regards, tom lane

Re: Speed up clean meson builds by ~25%

From

Andres Freund

Date:

10 April 2024, 00:03:51

Hi,

On 2024-04-09 19:44:03 -0400, Tom Lane wrote:
> I experimented with replacing mm_strdup() with
>
> #define mm_strdup(x) (x)
>
> As you did, I wasn't trying to get to a working result, so I didn't do
> anything about removing all the free's or fixing the cast-away-const
> warnings.  The result was disappointing though.  On my Mac laptop
> (Apple clang version 15.0.0), the compile time for preproc.o went from
> 6.7sec to 5.5sec.  Which is better, but not enough better to persuade
> me to do all the janitorial work of restructuring ecpg's
> string-slinging.  I think we haven't really identified the problem.

With what level of optimization was that?  It kinda looks like their version
might be from before the worst of the issue...

FWIW, just redefining mm_strdup() that way doesn't help much here either,
even with an affected compiler. The gain increases substantially after
simplifying unreserved_keyword etc to just use the default action.

I think having the non-default actions for those branches leaves you with a
similar issue, each of the actions just set a register, storing that and going
to the loop iteration is the same.

FWIW:
                               clang-19 -O2
"plain"                        0m24.354s
mm_strdup redefined            0m23.741s
+use default action            0m14.218s

Greetings,

Andres Freund

Re: Speed up clean meson builds by ~25%

From

Tom Lane

Date:

10 April 2024, 00:12:48

Andres Freund <andres@anarazel.de> writes:
> On 2024-04-09 19:44:03 -0400, Tom Lane wrote:
>> As you did, I wasn't trying to get to a working result, so I didn't do
>> anything about removing all the free's or fixing the cast-away-const
>> warnings.  The result was disappointing though.  On my Mac laptop
>> (Apple clang version 15.0.0), the compile time for preproc.o went from
>> 6.7sec to 5.5sec.  Which is better, but not enough better to persuade
>> me to do all the janitorial work of restructuring ecpg's
>> string-slinging.  I think we haven't really identified the problem.

> With what level of optimization was that?  It kinda looks like their version
> might be from before the worst of the issue...

Just the autoconf-default -O2.

> FWIW, just redefining mm_strdup() that way doesn't help much here either,
> even with an affected compiler. The gain increases substantially after
> simplifying unreserved_keyword etc to just use the default action.

Hm.

In any case, this is all moot unless we can come to a new design for
how ecpg does its string-mashing.  Thoughts?

I thought for a bit about not allocating strings as such, but just
passing around pointers into the source text plus lengths, and
reassembling the string data only at the end when we need to output it.
Not sure how well that would work, but it could be a starting point.

            regards, tom lane

Re: Speed up clean meson builds by ~25%

From

Andres Freund

Date:

10 April 2024, 00:23:39

Hi,

On 2024-04-09 20:12:48 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > FWIW, just redefining mm_strdup() that way doesn't help much here either,
> > even with an affected compiler. The gain increases substantially after
> > simplifying unreserved_keyword etc to just use the default action.
> 
> Hm.
> 
> In any case, this is all moot unless we can come to a new design for
> how ecpg does its string-mashing.  Thoughts?

Tthere might be a quick-n-dirty way: We could make pgc.l return dynamically
allocated keywords.

Am I missing something, or is ecpg string handling almost comically
inefficient? Building up strings in tiny increments, which then get mashed
together to get slightly larger pieces, just to then be mashed together again?
It's like an intentional allocator stress test.

It's particularly absurd because in the end we just print those strings, after
carefully assembling them...

> I thought for a bit about not allocating strings as such, but just
> passing around pointers into the source text plus lengths, and
> reassembling the string data only at the end when we need to output it.
> Not sure how well that would work, but it could be a starting point.

I was wondering about something vaguely similar: Instead of concatenating
individual strings, append them to a stringinfo. The stringinfo can be reset
individually...

Greetings,

Andres Freund

Re: Speed up clean meson builds by ~25%

From

Tom Lane

Date:

10 April 2024, 00:55:36

Andres Freund <andres@anarazel.de> writes:
> On 2024-04-09 20:12:48 -0400, Tom Lane wrote:
>> In any case, this is all moot unless we can come to a new design for
>> how ecpg does its string-mashing.  Thoughts?

> Am I missing something, or is ecpg string handling almost comically
> inefficient? Building up strings in tiny increments, which then get mashed
> together to get slightly larger pieces, just to then be mashed together again?
> It's like an intentional allocator stress test.
> It's particularly absurd because in the end we just print those strings, after
> carefully assembling them...

It is that.  Here's what I'm thinking: probably 90% of what ecpg
does is to verify that a chunk of its input represents a valid bit
of SQL (or C) syntax and then emit a representation of that chunk.
Currently, that representation tends to case-normalize tokens and
smash inter-token whitespace and comments to a single space.
I propose though that neither of those behaviors is mission-critical,
or even all that desirable.  I think few users would complain if
ecpg preserved the input's casing and spacing and comments.

Given that definition, most of ecpg's productions (certainly just
about all the auto-generated ones) would simply need to return a
pointer and length describing a part of the input string.  There are
places where ecpg wants to insert some text it generates, and I think
it might need to re-order text in a few places, so we need a
production result representation that can cope with those cases ---
but if we can make "regurgitate the input" cases efficient, I think
we'll have licked the performance problem.

With that in mind, I wonder whether we couldn't make the simple
cases depend on bison's existing support for location tracking.
In which case, the actual actions for all those cases could be
default, achieving one of the goals you mention.

Obviously, this is not going to be a small lift, but it kind
of seems do-able.

            regards, tom lane

Re: Speed up clean meson builds by ~25%

From

Thomas Munro

Date:

10 April 2024, 04:57:43

On Wed, Apr 10, 2024 at 11:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ... On my Mac laptop
> (Apple clang version 15.0.0), the compile time for preproc.o went from
> 6.7sec to 5.5sec.

Having seen multi-minute compile times on FreeBSD (where clang is the
system compiler) and Debian (where I get packages from apt.llvm.org),
I have been quietly waiting for this issue to hit Mac users too (where
a clang with unknown proprietary changes is the system compiler), but
it never did.  Huh.

I tried to understand a bit more about Apple's version soup.  This
seems to be an up-to-date table (though I don't understand their
source of information):

https://en.wikipedia.org/wiki/Xcode#Xcode_15.0_-_(since_visionOS_support)_2

According to cc -v on my up-to-date MacBook Air, it has "Apple clang
version 15.0.0 (clang-1500.3.9.4)", which, if the table is correct,
means that it's using LLVM 16.0.0 (note, not 16.0.6, the final version
of that branch of [open] LLVM, and the version I saw the issue with on
FreeBSD and Debian).  They relabel everything to match the Xcode
version that shipped it, and they're currently off by one.

I wondered if perhaps the table just wasn't accurate in the final
digits, so I looked for clues in strings in the binary, and sure
enough it contains "LLVM 15.0.0".  My guess would be that they've
clobbered the major version, but not the rest: the Xcode version is
15.3, and I don't see a 3, so I guess this is really derived from LLVM
16.0.0.

One explanation would be that they rebase their proprietary bits and
pieces over the .0 version of each major release, and then cherry-pick
urgent fixes and stuff later, not pulling in the whole minor release;
they also presumably have to maintain it for much longer than the LLVM
project's narrow support window.  Who knows.  So now I wonder if it
could be that LLVM 16.0.6 does this, but LLVM 16.0.0 doesn't.

I installed clang-16 (16.0.6) with MacPorts, and it does show the problem.

Re: Speed up clean meson builds by ~25%

From

Tom Lane

Date:

10 April 2024, 05:03:09

Thomas Munro <thomas.munro@gmail.com> writes:
> On Wed, Apr 10, 2024 at 11:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... On my Mac laptop
>> (Apple clang version 15.0.0), the compile time for preproc.o went from
>> 6.7sec to 5.5sec.

> Having seen multi-minute compile times on FreeBSD (where clang is the
> system compiler) and Debian (where I get packages from apt.llvm.org),
> I have been quietly waiting for this issue to hit Mac users too (where
> a clang with unknown proprietary changes is the system compiler), but
> it never did.  Huh.

I don't doubt that there are other clang versions where the problem
bites a lot harder.  What result do you get from the test I tried
(turning mm_strdup into a no-op macro)?

            regards, tom lane

Re: Speed up clean meson builds by ~25%

From

Thomas Munro

Date:

10 April 2024, 09:15:37

On Wed, Apr 10, 2024 at 5:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't doubt that there are other clang versions where the problem
> bites a lot harder.  What result do you get from the test I tried
> (turning mm_strdup into a no-op macro)?

#define mm_strdup(x) (x) does this:

Apple clang 15: master: 14s -> 9s
MacPorts clang 16, master: 170s -> 10s

Re: Speed up clean meson builds by ~25%

From

Tom Lane

Date:

10 April 2024, 15:33:37

Thomas Munro <thomas.munro@gmail.com> writes:
> On Wed, Apr 10, 2024 at 5:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't doubt that there are other clang versions where the problem
>> bites a lot harder.  What result do you get from the test I tried
>> (turning mm_strdup into a no-op macro)?

> #define mm_strdup(x) (x) does this:
> Apple clang 15: master: 14s -> 9s
> MacPorts clang 16, master: 170s -> 10s

Wow.  So (a) it's definitely worth doing something about this;
but (b) anything that would move the needle would probably require
significant refactoring of ecpg's string handling, and I doubt we
want to consider that post-feature-freeze.  The earliest we could
land such a fix would be ~ July, if we follow past schedules.

The immediate question then is do we want to take Jelte's patch
as a way to ameliorate the pain meanwhile.  I'm kind of down on
it, because AFAICS what would happen if you break the core
grammar is that (most likely) the failure would be reported
against ecpg first.  That seems pretty confusing.  However, maybe
it's tolerable as a short-term band-aid that we plan to revert later.
I would not commit the makefile side of the patch though, as
that creates the same problem for makefile users while providing
little benefit.

As for the longer-term fix, I'd be willing to have a go at
implementing the sketch I wrote last night; but I'm also happy
to defer to anyone else who's hot to work on this.

            regards, tom lane

Re: Speed up clean meson builds by ~25%

From

Peter Eisentraut

Date:

17 April 2024, 11:40:58

On 10.04.24 17:33, Tom Lane wrote:
> The immediate question then is do we want to take Jelte's patch
> as a way to ameliorate the pain meanwhile.  I'm kind of down on
> it, because AFAICS what would happen if you break the core
> grammar is that (most likely) the failure would be reported
> against ecpg first.  That seems pretty confusing.

Yeah that would be confusing.

I suppose we could just take the part of the patch that moves up preproc 
among the subdirectories of ecpg, but I don't know if that by itself 
would really buy anything.

Re: Speed up clean meson builds by ~25%

From

Jelte Fennema-Nio

Date:

17 April 2024, 12:03:37

On Wed, 17 Apr 2024 at 13:41, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 10.04.24 17:33, Tom Lane wrote:
> > The immediate question then is do we want to take Jelte's patch
> > as a way to ameliorate the pain meanwhile.  I'm kind of down on
> > it, because AFAICS what would happen if you break the core
> > grammar is that (most likely) the failure would be reported
> > against ecpg first.  That seems pretty confusing.
>
> Yeah that would be confusing.

How can I test if this actually happens? Because it sounds like if
that indeed happens it should be fixable fairly easily.

Re: Speed up clean meson builds by ~25%

From

Tom Lane

Date:

17 April 2024, 14:00:23

Jelte Fennema-Nio <postgres@jeltef.nl> writes:
> How can I test if this actually happens? Because it sounds like if
> that indeed happens it should be fixable fairly easily.

Break gram.y (say, misspell some token in a production) and
see what happens.  The behavior's likely to be timing sensitive
though.

            regards, tom lane

Re: Speed up clean meson builds by ~25%

From

Jelte Fennema-Nio

Date:

17 April 2024, 14:51:49

On Wed, 17 Apr 2024 at 16:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Break gram.y (say, misspell some token in a production) and
> see what happens.  The behavior's likely to be timing sensitive
> though.

Thanks for clarifying. It took me a little while to break gram.y in
such a way that I was able to consistently reproduce, but I managed in
the end using the attached small diff.
And then running ninja in non-parallel mode:

ninja -C build all -j1

As I expected this problem was indeed fairly easy to address by still
building "backend/parser" before "interfaces". See attached patch.

Attachment

Re: Speed up clean meson builds by ~25%

From

Tom Lane

Date:

18 April 2024, 03:10:53

Jelte Fennema-Nio <postgres@jeltef.nl> writes:
> As I expected this problem was indeed fairly easy to address by still
> building "backend/parser" before "interfaces". See attached patch.

I think we should hold off on this.  I found a simpler way to address
ecpg's problem than what I sketched upthread.  I have a not-ready-to-
show-yet patch that allows the vast majority of ecpg's grammar
productions to use the default semantic action.  Testing on my M1
Macbook with clang 16.0.6 from MacPorts, I see the compile time for
preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec.

The core idea of the patch is to get rid of <str> results from
grammar nonterminals and instead pass the strings back as yylloc
results, which we can do by redefining YYLTYPE as "char *"; since
ecpg isn't using Bison's location logic for anything, this is free.
Then we can implement a one-size-fits-most token concatenation
rule in YYLLOC_DEFAULT, and only the various handmade rules that
don't want to just concatenate their inputs need to do something
different.

The patch presently passes regression tests, but its memory management
is shoddy as can be (basically "leak like there's no tomorrow"), and
I want to fix that before presenting it.  One could almost argue that
we don't care about memory consumption of the ecpg preprocessor;
but I think it's possible to do better.

            regards, tom lane

Re: Speed up clean meson builds by ~25%

From

Andres Freund

Date:

18 April 2024, 16:37:47

On 2024-04-17 23:10:53 -0400, Tom Lane wrote:
> Jelte Fennema-Nio <postgres@jeltef.nl> writes:
> > As I expected this problem was indeed fairly easy to address by still
> > building "backend/parser" before "interfaces". See attached patch.
> 
> I think we should hold off on this.  I found a simpler way to address
> ecpg's problem than what I sketched upthread.  I have a not-ready-to-
> show-yet patch that allows the vast majority of ecpg's grammar
> productions to use the default semantic action.  Testing on my M1
> Macbook with clang 16.0.6 from MacPorts, I see the compile time for
> preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec.

That's pretty amazing.

Re: Speed up clean meson builds by ~25%

From

Tom Lane

Date:

19 April 2024, 02:24:11

Andres Freund <andres@anarazel.de> writes:
> On 2024-04-17 23:10:53 -0400, Tom Lane wrote:
>> I think we should hold off on this.  I found a simpler way to address
>> ecpg's problem than what I sketched upthread.  I have a not-ready-to-
>> show-yet patch that allows the vast majority of ecpg's grammar
>> productions to use the default semantic action.  Testing on my M1
>> Macbook with clang 16.0.6 from MacPorts, I see the compile time for
>> preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec.

> That's pretty amazing.

Patch posted at [1].

            regards, tom lane

[1] https://www.postgresql.org/message-id/2011420.1713493114%40sss.pgh.pa.us

Re: Speed up clean meson builds by ~25%

From

Robert Haas

Date:

17 May 2024, 20:17:49

On Wed, Apr 17, 2024 at 11:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think we should hold off on this.  I found a simpler way to address
> ecpg's problem than what I sketched upthread.  I have a not-ready-to-
> show-yet patch that allows the vast majority of ecpg's grammar
> productions to use the default semantic action.  Testing on my M1
> Macbook with clang 16.0.6 from MacPorts, I see the compile time for
> preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec.

If this is the consensus opinion, then
https://commitfest.postgresql.org/48/4914/ should be marked Rejected.
However, while I think the improvements that Tom was able to make here
sound fantastic, I don't understand why that's an argument against
Jelte's patches. After all, Tom's work will only go into v18, but this
patch could be adopted in v17 and back-patched to all releases that
support meson builds, saving oodles of compile time for as long as
those releases are supported. The most obvious beneficiary of that
course of action would seem to be Tom himself, since he back-patches
more fixes than anybody, last I checked, but it'd be also be useful to
get slightly quicker results from the buildfarm and slightly quicker
results for anyone using CI on back-branches and for other hackers who
are looking to back-patch bug fixes. I don't quite understand why we
want to throw those potential benefits out the window just because we
have a better fix for the future.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Speed up clean meson builds by ~25%

From

Tom Lane

Date:

17 May 2024, 20:59:08

Robert Haas <robertmhaas@gmail.com> writes:
> If this is the consensus opinion, then
> https://commitfest.postgresql.org/48/4914/ should be marked Rejected.
> However, while I think the improvements that Tom was able to make here
> sound fantastic, I don't understand why that's an argument against
> Jelte's patches. After all, Tom's work will only go into v18, but this
> patch could be adopted in v17 and back-patched to all releases that
> support meson builds, saving oodles of compile time for as long as
> those releases are supported. The most obvious beneficiary of that
> course of action would seem to be Tom himself, since he back-patches
> more fixes than anybody, last I checked, but it'd be also be useful to
> get slightly quicker results from the buildfarm and slightly quicker
> results for anyone using CI on back-branches and for other hackers who
> are looking to back-patch bug fixes. I don't quite understand why we
> want to throw those potential benefits out the window just because we
> have a better fix for the future.

As I mentioned upthread, I'm more worried about confusing error
reports than the machine time.  It would save me personally exactly
nada, since (a) I usually develop with gcc not clang, (b) when
I do use clang it's not a heavily-affected version, and (c) since
we *very* seldom change the grammar in stable branches, ccache will
hide the problem pretty effectively anyway in the back branches.

(If you're not using ccache, please don't complain about build time.)

I grant that there are people who are more affected, but still, I'd
just as soon not contort the build rules for a temporary problem.

            regards, tom lane

Re: Speed up clean meson builds by ~25%

From

Robert Haas

Date:

17 May 2024, 21:01:31

On Fri, May 17, 2024 at 4:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> As I mentioned upthread, I'm more worried about confusing error
> reports than the machine time.

Well, Jelte fixed that.

> I grant that there are people who are more affected, but still, I'd
> just as soon not contort the build rules for a temporary problem.

Arguably by doing this, but I don't think it's enough of a contortion
to get excited about.

Anyone else want to vote?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Speed up clean meson builds by ~25%

From

Peter Eisentraut

Date:

17 May 2024, 22:35:08

On 17.05.24 23:01, Robert Haas wrote:
> On Fri, May 17, 2024 at 4:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> As I mentioned upthread, I'm more worried about confusing error
>> reports than the machine time.
> 
> Well, Jelte fixed that.
> 
>> I grant that there are people who are more affected, but still, I'd
>> just as soon not contort the build rules for a temporary problem.
> 
> Arguably by doing this, but I don't think it's enough of a contortion
> to get excited about.
> 
> Anyone else want to vote?

I retested the patch from 2024-04-07 (I think that's the one that "fixed 
that"?  There are multiple "v1" patches in this thread.) using gcc-14 
and clang-18, with ccache disabled of course.  The measured effects of 
the patch are:

gcc-14:   1% slower
clang-18: 3% faster

So with that, it doesn't seem very interesting.

Re: Speed up clean meson builds by ~25%

From

Alvaro Herrera

Date:

18 May 2024, 11:00:46

On 2024-May-17, Robert Haas wrote:

> Anyone else want to vote?

I had pretty much the same thought as you.  It seems a waste to leave
the code in existing branches be slow only because we have a much better
approach for a branch that doesn't even exist yet.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"We're here to devour each other alive"            (Hobbes)

Re: Speed up clean meson builds by ~25%

From

Tom Lane

Date:

18 May 2024, 16:58:03

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2024-May-17, Robert Haas wrote:
>> Anyone else want to vote?

> I had pretty much the same thought as you.  It seems a waste to leave
> the code in existing branches be slow only because we have a much better
> approach for a branch that doesn't even exist yet.

I won't complain too loudly as long as we remember to revert the
patch from HEAD once the ecpg fix goes in.

            regards, tom lane

Re: Speed up clean meson builds by ~25%

From

Robert Haas

Date:

20 May 2024, 13:37:46

On Sat, May 18, 2024 at 6:09 PM Andres Freund <andres@anarazel.de> wrote:
> A few tests with ccache disabled:

These tests seem to show no difference between the two releases, so I
wonder what you're intending to demonstrate here.

Locally, I have ninja 1.11.1. I'm sure Andres will be absolutely
shocked, shocked I say, to hear that I haven't upgraded to the very
latest.

Anyway, I tried a clean build with CC=clang without the patch and then
with the patch and got:

unpatched - real 1m9.872s
patched - real 1m6.130s

That's the time to run my script that first calls meson setup and then
afterward runs ninja. I tried ninja -v and put the output to a file.
With the patch:

[292/2402] /opt/local/bin/perl
../pgsql/src/interfaces/ecpg/preproc/parse.pl --srcdir
../pgsql/src/interfaces/ecpg/preproc --parser
../pgsql/src/interfaces/ecpg/preproc/../../../backend/parser/gram.y
--output src/interfaces/ecpg/preproc/preproc.y

And without:

[1854/2402] /opt/local/bin/perl
../pgsql/src/interfaces/ecpg/preproc/parse.pl --srcdir
../pgsql/src/interfaces/ecpg/preproc --parser
../pgsql/src/interfaces/ecpg/preproc/../../../backend/parser/gram.y
--output src/interfaces/ecpg/preproc/preproc.y

With my usual CC='ccache clang', I get real 0m37.500s unpatched and
real 0m37.786s patched. I also tried this with the original v1 patch
(not to be confused with the revised, still-v1 patch) and got real
37.950s.

So I guess I'm now feeling pretty unexcited about this. If the patch
really did what the subject line says, that would be nice to have.
However, since Tom will patch the underlying problem in v18, this will
only help people building the back-branches. Of those, it will only
help the people using a slow compiler; I read the thread as suggesting
that you need to be using clang rather than gcc and also not using
ccache. Plus, it sounds like you also need an old ninja version. Even
then, the highest reported savings is 10s and I only save 3s. I think
that's a small enough savings with enough conditionals that we should
not bother. It seems fine to say that people who need the fastest
possible builds of back-branches should use at least one of gcc,
ccache, and ninja >= 1.12.

I'll go mark this patch Rejected in the CommitFest. Incidentally, this
thread is an excellent object lesson in why it's so darn hard to cut
down the size of the CF. I mean, this is a 7-line patch and we've now
got a 30+ email thread about it. In my role as temporary
self-appointed wielder of the CommitFest mace, I have now spent
probably a good 2 hours trying to figure out what to do about it.
There are more than 230 active patches in the CommitFest. That means
CommitFest management would be more than a full-time job for an
experienced committer even if every patch were as simple as this one,
which is definitely not the case. If we want to restore some sanity
here, we're going to have to find some way of distributing the burden
across more people, including the patch authors. Note that Jelte's
last update to the thread is over a month ago. I'm not saying that
he's done something wrong, but I do strongly suspect that the time
that the community as whole has spent on this patch is a pretty
significant multiple of the time that he personally spent on it, and
such dynamics are bound to create scaling issues. I don't know how we
do better: I just want to highlight the problem.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Speed up clean meson builds by ~25%

From

Andres Freund

Date:

20 May 2024, 15:37:48

On 2024-05-20 09:37:46 -0400, Robert Haas wrote:
> On Sat, May 18, 2024 at 6:09 PM Andres Freund <andres@anarazel.de> wrote:
> > A few tests with ccache disabled:
> 
> These tests seem to show no difference between the two releases, so I
> wonder what you're intending to demonstrate here.

They show a few seconds of win for 'real' time.
-O0: 0m21.577s->0m19.529s
-O3: 0m59.730s->0m54.853s

Re: Speed up clean meson builds by ~25%

From

Robert Haas

Date:

20 May 2024, 15:52:21

On Mon, May 20, 2024 at 11:37 AM Andres Freund <andres@anarazel.de> wrote:
> On 2024-05-20 09:37:46 -0400, Robert Haas wrote:
> > On Sat, May 18, 2024 at 6:09 PM Andres Freund <andres@anarazel.de> wrote:
> > > A few tests with ccache disabled:
> >
> > These tests seem to show no difference between the two releases, so I
> > wonder what you're intending to demonstrate here.
>
> They show a few seconds of win for 'real' time.
> -O0: 0m21.577s->0m19.529s
> -O3: 0m59.730s->0m54.853s

Ah, OK, I think I misinterpreted.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Speed up clean meson builds by ~25%

From

Peter Eisentraut

Date:

21 May 2024, 05:47:23

On 19.05.24 00:09, Andres Freund wrote:
> On 2024-05-18 00:35:08 +0200, Peter Eisentraut wrote:
>> I retested the patch from 2024-04-07 (I think that's the one that "fixed
>> that"?  There are multiple "v1" patches in this thread.) using gcc-14 and
>> clang-18, with ccache disabled of course.  The measured effects of the patch
>> are:
>>
>> gcc-14:   1% slower
>> clang-18: 3% faster
>>
>> So with that, it doesn't seem very interesting.
> I wonder whether the reason you're seing less benefit than Jelte is that - I'm
> guessing - you now used ninja 1.12 and Jelte something older.  Ninja 1.12
> prioritizes building edges using a "critical path" heuristic, leading to
> scheduling nodes with more incoming dependencies, and deeper in the dependency
> graph earlier.

Yes!  Very interesting!

With ninja 1.11 and gcc-14, I see the patch gives about a 17% speedup.