Thread: snprintf.c hammering memset()

snprintf.c hammering memset()

From
Thomas Munro
Date:
Hello hackers,

Mateusz Guzik was benchmarking PostgreSQL on FreeBSD investigating the
kqueue thread and complained off-list about a lot of calls to memset()
of size 256KB from dopr() in our snprintf.c code.

Yeah, that says:

    PrintfArgType argtypes[NL_ARGMAX + 2];
...
    MemSet(argtypes, 0, sizeof(argtypes));

PrintfArgType is an enum, and we define NL_ARGMAX as 16 if the OS
didn't already define it.  On FreeBSD 11, NL_ARGMAX was defined as 99
in <limits.h>.  On FreeBSD 12, it is defined as 65536... ouch.  On a
Debian box I see it is 4096.

Is there any reason to use the OS definition here?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: snprintf.c hammering memset()

From
Tom Lane
Date:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> Mateusz Guzik was benchmarking PostgreSQL on FreeBSD investigating the
> kqueue thread and complained off-list about a lot of calls to memset()
> of size 256KB from dopr() in our snprintf.c code.

> Yeah, that says:
>     PrintfArgType argtypes[NL_ARGMAX + 2];
> ...
>     MemSet(argtypes, 0, sizeof(argtypes));

> PrintfArgType is an enum, and we define NL_ARGMAX as 16 if the OS
> didn't already define it.  On FreeBSD 11, NL_ARGMAX was defined as 99
> in <limits.h>.  On FreeBSD 12, it is defined as 65536... ouch.  On a
> Debian box I see it is 4096.

> Is there any reason to use the OS definition here?

Ouch indeed.  Quite aside from cycles wasted, that's way more stack than
we want this to consume.  I'm good with forcing this to 16 or so ...
any objections?

(For reference, POSIX doesn't require NL_ARGMAX to be more than 9.)

(I wonder if this has anything to do with Andres' performance gripes.)

            regards, tom lane


Re: snprintf.c hammering memset()

From
Andres Freund
Date:
Hi,

On 2018-10-01 19:52:40 -0400, Tom Lane wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
> > Mateusz Guzik was benchmarking PostgreSQL on FreeBSD investigating the
> > kqueue thread and complained off-list about a lot of calls to memset()
> > of size 256KB from dopr() in our snprintf.c code.
> 
> > Yeah, that says:
> >     PrintfArgType argtypes[NL_ARGMAX + 2];
> > ...
> >     MemSet(argtypes, 0, sizeof(argtypes));
> 
> > PrintfArgType is an enum, and we define NL_ARGMAX as 16 if the OS
> > didn't already define it.  On FreeBSD 11, NL_ARGMAX was defined as 99
> > in <limits.h>.  On FreeBSD 12, it is defined as 65536... ouch.  On a
> > Debian box I see it is 4096.
> 
> > Is there any reason to use the OS definition here?
> 
> Ouch indeed.  Quite aside from cycles wasted, that's way more stack than
> we want this to consume.  I'm good with forcing this to 16 or so ...
> any objections?

Especially after your performance patch, shouldn't we actually be able
to get rid of that memset entirely?

And if not, shouldn't we be able to reduce the per-element size of
argtypes considerably, by using a uint8 as the base, rather than 4 byte
per element?


> (I wonder if this has anything to do with Andres' performance gripes.)

It probably plays some role, but the profile didn't show it as *too* large
a part.

Greetings,

Andres Freund


Re: snprintf.c hammering memset()

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2018-10-01 19:52:40 -0400, Tom Lane wrote:
>> Ouch indeed.  Quite aside from cycles wasted, that's way more stack than
>> we want this to consume.  I'm good with forcing this to 16 or so ...
>> any objections?

> Especially after your performance patch, shouldn't we actually be able
> to get rid of that memset entirely?

That patch takes the memset out of the main line, but it'd still be
a performance problem for formats using argument reordering; and the
stack-space concern would remain the same.

> And if not, shouldn't we be able to reduce the per-element size of
> argtypes considerably, by using a uint8 as the base, rather than 4 byte
> per element?

argtypes is only a small part of the stack-space issue, there's also
argvalues which is (at least) twice as big.  I don't think second-guessing
the compiler about the most efficient representation for an enum is
going to buy us much here.

            regards, tom lane


Re: snprintf.c hammering memset()

From
Tom Lane
Date:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> PrintfArgType is an enum, and we define NL_ARGMAX as 16 if the OS
> didn't already define it.  On FreeBSD 11, NL_ARGMAX was defined as 99
> in <limits.h>.  On FreeBSD 12, it is defined as 65536... ouch.  On a
> Debian box I see it is 4096.

Some further research:

* My Red Hat boxes also think it's 4096.

* macOS thinks it's just 9.

* Assuming I've grepped the .po files correctly, we have no translatable
messages today that use more than 9 %'s.  That's not a totally accurate
result because I didn't try to count "*" precision/width specs, which'd
also count against ARGMAX.  Still, we couldn't be needing much more than
9 slots.

* It's completely silly to imagine that anybody would write a printf call
with more than, perhaps, a couple dozen arguments.  So these OS values
must be getting set with an eye to some other use-case for NL_ARGMAX
besides printf field order control.

Setting snprintf's limit to 16 might be a bit tight based on the observed
results for translatable messages, but I'd be entirely comfortable with
32.  The values we're getting from the OS are just silly.

            regards, tom lane


Re: snprintf.c hammering memset()

From
Andres Freund
Date:
Hi,

On 2018-10-01 20:19:16 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-10-01 19:52:40 -0400, Tom Lane wrote:
> >> Ouch indeed.  Quite aside from cycles wasted, that's way more stack than
> >> we want this to consume.  I'm good with forcing this to 16 or so ...
> >> any objections?
> 
> > Especially after your performance patch, shouldn't we actually be able
> > to get rid of that memset entirely?
> 
> That patch takes the memset out of the main line, but it'd still be
> a performance problem for formats using argument reordering; and the
> stack-space concern would remain the same.

What I mean is that it shouldn't be that hard to only zero out the
portions of the array that are actually used, and thus could refrain
from introducing the limit.


> > And if not, shouldn't we be able to reduce the per-element size of
> > argtypes considerably, by using a uint8 as the base, rather than 4 byte
> > per element?
> 
> argtypes is only a small part of the stack-space issue, there's also
> argvalues which is (at least) twice as big.  I don't think second-guessing
> the compiler about the most efficient representation for an enum is
> going to buy us much here.

Sure, but argvalues isn't zeroed out.  As long as the memory on the
stack isn't used (as is the case for most of arvalues elements), the
size of the stack allocation doesn't have a significant efficiency
impact - it's still just an %rsp adjustment.  If we're going to continue
to zero out argtypes, it's sure going to be better to zero out 16 rather
than 64bytes (after limiting to 16 args).

Greetings,

Andres Freund


Re: snprintf.c hammering memset()

From
Andres Freund
Date:
Hi,

On 2018-10-01 17:46:55 -0700, Andres Freund wrote:
> On 2018-10-01 20:19:16 -0400, Tom Lane wrote:
> > argtypes is only a small part of the stack-space issue, there's also
> > argvalues which is (at least) twice as big.  I don't think second-guessing
> > the compiler about the most efficient representation for an enum is
> > going to buy us much here.
> 
> Sure, but argvalues isn't zeroed out.  As long as the memory on the
> stack isn't used (as is the case for most of arvalues elements), the
> size of the stack allocation doesn't have a significant efficiency
> impact - it's still just an %rsp adjustment.  If we're going to continue
> to zero out argtypes, it's sure going to be better to zero out 16 rather
> than 64bytes (after limiting to 16 args).

Btw, I don't think any common compiler optimizes common enum sizes for
efficiency.  Their size is part of the struct layout, so doing so would
not generally be trivial (although you get to larger enums if you go
above either INT32_MAX or UINT32_MAX element values, but ...).

Greetings,

Andres Freund


Re: snprintf.c hammering memset()

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2018-10-01 20:19:16 -0400, Tom Lane wrote:
>> That patch takes the memset out of the main line, but it'd still be
>> a performance problem for formats using argument reordering; and the
>> stack-space concern would remain the same.

> What I mean is that it shouldn't be that hard to only zero out the
> portions of the array that are actually used, and thus could refrain
> from introducing the limit.

Well, we use the zeroing exactly to detect which entries have been used.
Probably there's another way, but I doubt it'd be faster.

In any case, the stack-space-consumption problem remains, and IMO that
is a *far* greater concern than the cycles.  Keep in mind that we'll be
calling this code when we have already hit our stack space consumption
limit.  I'm a bit surprised that we've not seen any buildfarm members
fall over in the error recursion test.  We have not designed the system
on the assumption that ereport() could eat half a meg of stack.

            regards, tom lane