Thread: snprintf.c hammering memset()
Hello hackers, Mateusz Guzik was benchmarking PostgreSQL on FreeBSD investigating the kqueue thread and complained off-list about a lot of calls to memset() of size 256KB from dopr() in our snprintf.c code. Yeah, that says: PrintfArgType argtypes[NL_ARGMAX + 2]; ... MemSet(argtypes, 0, sizeof(argtypes)); PrintfArgType is an enum, and we define NL_ARGMAX as 16 if the OS didn't already define it. On FreeBSD 11, NL_ARGMAX was defined as 99 in <limits.h>. On FreeBSD 12, it is defined as 65536... ouch. On a Debian box I see it is 4096. Is there any reason to use the OS definition here? -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > Mateusz Guzik was benchmarking PostgreSQL on FreeBSD investigating the > kqueue thread and complained off-list about a lot of calls to memset() > of size 256KB from dopr() in our snprintf.c code. > Yeah, that says: > PrintfArgType argtypes[NL_ARGMAX + 2]; > ... > MemSet(argtypes, 0, sizeof(argtypes)); > PrintfArgType is an enum, and we define NL_ARGMAX as 16 if the OS > didn't already define it. On FreeBSD 11, NL_ARGMAX was defined as 99 > in <limits.h>. On FreeBSD 12, it is defined as 65536... ouch. On a > Debian box I see it is 4096. > Is there any reason to use the OS definition here? Ouch indeed. Quite aside from cycles wasted, that's way more stack than we want this to consume. I'm good with forcing this to 16 or so ... any objections? (For reference, POSIX doesn't require NL_ARGMAX to be more than 9.) (I wonder if this has anything to do with Andres' performance gripes.) regards, tom lane
Hi, On 2018-10-01 19:52:40 -0400, Tom Lane wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: > > Mateusz Guzik was benchmarking PostgreSQL on FreeBSD investigating the > > kqueue thread and complained off-list about a lot of calls to memset() > > of size 256KB from dopr() in our snprintf.c code. > > > Yeah, that says: > > PrintfArgType argtypes[NL_ARGMAX + 2]; > > ... > > MemSet(argtypes, 0, sizeof(argtypes)); > > > PrintfArgType is an enum, and we define NL_ARGMAX as 16 if the OS > > didn't already define it. On FreeBSD 11, NL_ARGMAX was defined as 99 > > in <limits.h>. On FreeBSD 12, it is defined as 65536... ouch. On a > > Debian box I see it is 4096. > > > Is there any reason to use the OS definition here? > > Ouch indeed. Quite aside from cycles wasted, that's way more stack than > we want this to consume. I'm good with forcing this to 16 or so ... > any objections? Especially after your performance patch, shouldn't we actually be able to get rid of that memset entirely? And if not, shouldn't we be able to reduce the per-element size of argtypes considerably, by using a uint8 as the base, rather than 4 byte per element? > (I wonder if this has anything to do with Andres' performance gripes.) It probably plays some role, but the profile didn't show it as *too* large a part. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-10-01 19:52:40 -0400, Tom Lane wrote: >> Ouch indeed. Quite aside from cycles wasted, that's way more stack than >> we want this to consume. I'm good with forcing this to 16 or so ... >> any objections? > Especially after your performance patch, shouldn't we actually be able > to get rid of that memset entirely? That patch takes the memset out of the main line, but it'd still be a performance problem for formats using argument reordering; and the stack-space concern would remain the same. > And if not, shouldn't we be able to reduce the per-element size of > argtypes considerably, by using a uint8 as the base, rather than 4 byte > per element? argtypes is only a small part of the stack-space issue, there's also argvalues which is (at least) twice as big. I don't think second-guessing the compiler about the most efficient representation for an enum is going to buy us much here. regards, tom lane
Thomas Munro <thomas.munro@enterprisedb.com> writes: > PrintfArgType is an enum, and we define NL_ARGMAX as 16 if the OS > didn't already define it. On FreeBSD 11, NL_ARGMAX was defined as 99 > in <limits.h>. On FreeBSD 12, it is defined as 65536... ouch. On a > Debian box I see it is 4096. Some further research: * My Red Hat boxes also think it's 4096. * macOS thinks it's just 9. * Assuming I've grepped the .po files correctly, we have no translatable messages today that use more than 9 %'s. That's not a totally accurate result because I didn't try to count "*" precision/width specs, which'd also count against ARGMAX. Still, we couldn't be needing much more than 9 slots. * It's completely silly to imagine that anybody would write a printf call with more than, perhaps, a couple dozen arguments. So these OS values must be getting set with an eye to some other use-case for NL_ARGMAX besides printf field order control. Setting snprintf's limit to 16 might be a bit tight based on the observed results for translatable messages, but I'd be entirely comfortable with 32. The values we're getting from the OS are just silly. regards, tom lane
Hi, On 2018-10-01 20:19:16 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2018-10-01 19:52:40 -0400, Tom Lane wrote: > >> Ouch indeed. Quite aside from cycles wasted, that's way more stack than > >> we want this to consume. I'm good with forcing this to 16 or so ... > >> any objections? > > > Especially after your performance patch, shouldn't we actually be able > > to get rid of that memset entirely? > > That patch takes the memset out of the main line, but it'd still be > a performance problem for formats using argument reordering; and the > stack-space concern would remain the same. What I mean is that it shouldn't be that hard to only zero out the portions of the array that are actually used, and thus could refrain from introducing the limit. > > And if not, shouldn't we be able to reduce the per-element size of > > argtypes considerably, by using a uint8 as the base, rather than 4 byte > > per element? > > argtypes is only a small part of the stack-space issue, there's also > argvalues which is (at least) twice as big. I don't think second-guessing > the compiler about the most efficient representation for an enum is > going to buy us much here. Sure, but argvalues isn't zeroed out. As long as the memory on the stack isn't used (as is the case for most of arvalues elements), the size of the stack allocation doesn't have a significant efficiency impact - it's still just an %rsp adjustment. If we're going to continue to zero out argtypes, it's sure going to be better to zero out 16 rather than 64bytes (after limiting to 16 args). Greetings, Andres Freund
Hi, On 2018-10-01 17:46:55 -0700, Andres Freund wrote: > On 2018-10-01 20:19:16 -0400, Tom Lane wrote: > > argtypes is only a small part of the stack-space issue, there's also > > argvalues which is (at least) twice as big. I don't think second-guessing > > the compiler about the most efficient representation for an enum is > > going to buy us much here. > > Sure, but argvalues isn't zeroed out. As long as the memory on the > stack isn't used (as is the case for most of arvalues elements), the > size of the stack allocation doesn't have a significant efficiency > impact - it's still just an %rsp adjustment. If we're going to continue > to zero out argtypes, it's sure going to be better to zero out 16 rather > than 64bytes (after limiting to 16 args). Btw, I don't think any common compiler optimizes common enum sizes for efficiency. Their size is part of the struct layout, so doing so would not generally be trivial (although you get to larger enums if you go above either INT32_MAX or UINT32_MAX element values, but ...). Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-10-01 20:19:16 -0400, Tom Lane wrote: >> That patch takes the memset out of the main line, but it'd still be >> a performance problem for formats using argument reordering; and the >> stack-space concern would remain the same. > What I mean is that it shouldn't be that hard to only zero out the > portions of the array that are actually used, and thus could refrain > from introducing the limit. Well, we use the zeroing exactly to detect which entries have been used. Probably there's another way, but I doubt it'd be faster. In any case, the stack-space-consumption problem remains, and IMO that is a *far* greater concern than the cycles. Keep in mind that we'll be calling this code when we have already hit our stack space consumption limit. I'm a bit surprised that we've not seen any buildfarm members fall over in the error recursion test. We have not designed the system on the assumption that ereport() could eat half a meg of stack. regards, tom lane