Thread: appendBinaryStringInfo stuff
I found a couple of adjacent weird things: There are a bunch of places in the json code that use appendBinaryStringInfo() where appendStringInfoString() could be used, e.g., appendBinaryStringInfo(buf, ".size()", 7); Is there a reason for this? Are we that stretched for performance? I find this kind of code very fragile. Also, the argument type of appendBinaryStringInfo() is char *. There is some code that uses this function to assemble some kind of packed binary layout, which requires a bunch of casts because of this. I think functions taking binary data plus length should take void * instead, like memcpy() for example. Attached are two patches that illustrate these issues and show proposed changes.
Attachment
Hi, On 2022-12-19 07:13:40 +0100, Peter Eisentraut wrote: > I found a couple of adjacent weird things: > > There are a bunch of places in the json code that use > appendBinaryStringInfo() where appendStringInfoString() could be used, e.g., > > appendBinaryStringInfo(buf, ".size()", 7); > > Is there a reason for this? Are we that stretched for performance? strlen() isn't that cheap, so it doesn't generally seem unreasonable. I don't think we should add the strlen overhead in places that can conceivably be a bottleneck - and some of the jsonb code clearly can be that. > I find this kind of code very fragile. But this is obviously an issue. Perhaps we should make appendStringInfoString() a static inline function - most compilers can compute strlen() of a constant string at compile time. Greetings, Andres Freund
On Mon, 19 Dec 2022 at 21:12, Andres Freund <andres@anarazel.de> wrote: > Perhaps we should make appendStringInfoString() a static inline function > - most compilers can compute strlen() of a constant string at compile > time. I had wondered about that, but last time I looked into it there was a small increase in the size of the binary from doing it. Perhaps it does not matter, but it's something to consider. Re-thinking, I wonder if we could use the same macro trick used in ereport_domain(). Something like: #ifdef HAVE__BUILTIN_CONSTANT_P #define appendStringInfoString(str, s) \ __builtin_constant_p(s) ? \ appendBinaryStringInfo(str, s, sizeof(s) - 1) : \ appendStringInfoStringInternal(str, s) #else #define appendStringInfoString(str, s) \ appendStringInfoStringInternal(str, s) #endif and rename the existing function to appendStringInfoStringInternal. Because __builtin_constant_p is a known compile-time constant, it should be folded to just call the corresponding function during compilation. Just looking at the binary sizes for postgres. I see: unpatched = 9972128 bytes inline function = 9990064 bytes macro trick = 9984968 bytes I'm currently not sure why the macro trick increases the binary at all. I understand why the inline function does. David
David Rowley <dgrowleyml@gmail.com> writes: > I'm currently not sure why the macro trick increases the binary at > all. I understand why the inline function does. In the places where it changes the code at all, you're replacing appendStringInfoString(buf, s); with appendBinaryStringInfo(buf, s, n); Even if n is a constant, the latter surely requires more instructions per call site. Whether this is a win seems to depend on how many of these are performance-critical. regards, tom lane
On 19.12.22 09:12, Andres Freund wrote: >> There are a bunch of places in the json code that use >> appendBinaryStringInfo() where appendStringInfoString() could be used, e.g., >> >> appendBinaryStringInfo(buf, ".size()", 7); >> >> Is there a reason for this? Are we that stretched for performance? > strlen() isn't that cheap, so it doesn't generally seem unreasonable. I > don't think we should add the strlen overhead in places that can > conceivably be a bottleneck - and some of the jsonb code clearly can be > that. AFAICT, the code in question is for the text output of the jsonpath type, which is used ... for barely anything?
On Tue, 20 Dec 2022 at 09:23, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > AFAICT, the code in question is for the text output of the jsonpath > type, which is used ... for barely anything? I think the performance of a type's output function is quite critical. I've seen huge performance gains in COPY TO performance from optimising output functions in the past (see dad75eb4a and aa2387e2f). It would be good to see some measurements to find out how much adding the strlen calls back in would cost us. If we're unable to measure the change, then maybe the cleanup patch would be nice. If it's going to slow COPY TO down 10-20%, then we need to leave this or consider the inline function mentioned by Andres or the macro trick mentioned by me. David
David Rowley <dgrowleyml@gmail.com> writes: > On Tue, 20 Dec 2022 at 09:23, Peter Eisentraut > <peter.eisentraut@enterprisedb.com> wrote: >> AFAICT, the code in question is for the text output of the jsonpath >> type, which is used ... for barely anything? > I think the performance of a type's output function is quite critical. I think Peter is entirely right to question whether *this* type's output function is performance-critical. Who's got large tables with jsonpath columns? It seems to me the type would mostly only exist as constants within queries. regards, tom lane
On Tue, 20 Dec 2022 at 11:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think Peter is entirely right to question whether *this* type's > output function is performance-critical. Who's got large tables with > jsonpath columns? It seems to me the type would mostly only exist > as constants within queries. The patch touches code in the path of jsonb's output function too. I don't think you could claim the same for that. David
On 2022-12-19 Mo 17:48, David Rowley wrote: > On Tue, 20 Dec 2022 at 11:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think Peter is entirely right to question whether *this* type's >> output function is performance-critical. Who's got large tables with >> jsonpath columns? It seems to me the type would mostly only exist >> as constants within queries. > The patch touches code in the path of jsonb's output function too. I > don't think you could claim the same for that. > I agree that some of the uses in the jsonpath code could reasonably just be converted to use appendStringInfoString() There are 5 uses in the jsonb code where the length param is a compile time constant: andrew@ub22:adt $ grep appendBinary.*[0-9] jsonb* jsonb.c: appendBinaryStringInfo(out, "null", 4); jsonb.c: appendBinaryStringInfo(out, "true", 4); jsonb.c: appendBinaryStringInfo(out, "false", 5); jsonb.c: appendBinaryStringInfo(out, ": ", 2); jsonb.c: appendBinaryStringInfo(out, " ", 4); None of these really bother me much, TBH. In fact the last one is arguably nicer because it tells you without counting how many spaces there are. Changing the type of the second argument to appendBinaryStringInfo to void* seems reasonable. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi, On 2022-12-19 21:29:10 +1300, David Rowley wrote: > On Mon, 19 Dec 2022 at 21:12, Andres Freund <andres@anarazel.de> wrote: > > Perhaps we should make appendStringInfoString() a static inline function > > - most compilers can compute strlen() of a constant string at compile > > time. > > I had wondered about that, but last time I looked into it there was a > small increase in the size of the binary from doing it. Perhaps it > does not matter, but it's something to consider. I'd not be too worried about that in this case. > Re-thinking, I wonder if we could use the same macro trick used in > ereport_domain(). Something like: > > #ifdef HAVE__BUILTIN_CONSTANT_P > #define appendStringInfoString(str, s) \ > __builtin_constant_p(s) ? \ > appendBinaryStringInfo(str, s, sizeof(s) - 1) : \ > appendStringInfoStringInternal(str, s) > #else > #define appendStringInfoString(str, s) \ > appendStringInfoStringInternal(str, s) > #endif > > and rename the existing function to appendStringInfoStringInternal. > > Because __builtin_constant_p is a known compile-time constant, it > should be folded to just call the corresponding function during > compilation. Several compilers can optimize away repeated strlen() calls, even if the string isn't a compile-time constant. So I'm not really convinced that tying inlining-strlen to __builtin_constant_p() is a good ida. > Just looking at the binary sizes for postgres. I see: > > unpatched = 9972128 bytes > inline function = 9990064 bytes That seems acceptable to me. > macro trick = 9984968 bytes > > I'm currently not sure why the macro trick increases the binary at > all. I understand why the inline function does. I think Tom's explanation is on point. I've in the past looked at stringinfo.c being the bottleneck in a bunch of places and concluded that we really need to remove the function call in the happy path entirely - we should have an enlargeStringInfo() that we can call externally iff needed and then implement the rest of appendBinaryStringInfo() etc in an inline function. That allows the compiler to e.g. optimize out the repeated maintenance of the \0 write etc. Greetings, Andres Freund
On Wed, 21 Dec 2022 at 04:47, Andrew Dunstan <andrew@dunslane.net> wrote: > jsonb.c: appendBinaryStringInfo(out, " ", 4); > > None of these really bother me much, TBH. In fact the last one is > arguably nicer because it tells you without counting how many spaces > there are. appendStringInfoSpaces() might be even better. David
On Tue, Dec 20, 2022 at 10:47 AM Andrew Dunstan <andrew@dunslane.net> wrote: > There are 5 uses in the jsonb code where the length param is a compile > time constant: > > andrew@ub22:adt $ grep appendBinary.*[0-9] jsonb* > jsonb.c: appendBinaryStringInfo(out, "null", 4); > jsonb.c: appendBinaryStringInfo(out, "true", 4); > jsonb.c: appendBinaryStringInfo(out, "false", 5); > jsonb.c: appendBinaryStringInfo(out, ": ", 2); > jsonb.c: appendBinaryStringInfo(out, " ", 4); > > None of these really bother me much, TBH. In fact the last one is > arguably nicer because it tells you without counting how many spaces > there are. +1. There are certainly cases where this kind of style can create confusion, but I have a hard time putting any of these instances into that category. It's obvious at a glance that null is 4 bytes, false is 5, etc. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, 20 Dec 2022 at 11:26, David Rowley <dgrowleyml@gmail.com> wrote: > It would be good to see some measurements to find out how much adding > the strlen calls back in would cost us. I tried this out. I'm not pretending I found the best test which highlights how much the performance will change in a real-world case. I just wanted to try to get an indication of if changing jsonb's output function to make more use of appendStringInfoString instead of appendBinaryStringInfo is likely to affect performance. Also, in test 2, I picked a use case that makes quite a bit of use of appendStringInfoString already and checked if inlining that function would help improve things. I imagine test 2 really is not bottlenecked on appendStringInfoString enough to get a true idea of how much inlining appendStringInfoString could really help (spoiler, it helps quite a bit) Test 1: See if using appendStringInfoString instead of appendBinaryStringInfo hinders jsonb output performance. setup: create table jb (j jsonb); insert into jb select row_to_json(pg_class) from pg_class; vacuum freeze analyze jb; bench.sql: select sum(length(j::text)) from jb; master (@3f28bd73): $ pgbench -n -T 60 -f bench.sql -M prepared postgres | grep latency latency average = 1.896 ms latency average = 1.885 ms latency average = 1.899 ms 22.57% postgres [.] escape_json 21.83% postgres [.] pg_utf_mblen 9.23% postgres [.] JsonbIteratorNext.part.0 7.12% postgres [.] AllocSetAlloc 4.07% postgres [.] pg_mbstrlen_with_len 3.71% postgres [.] JsonbToCStringWorker 3.70% postgres [.] fillJsonbValue 3.17% postgres [.] appendBinaryStringInfo 2.95% postgres [.] enlargeStringInfo 2.09% postgres [.] jsonb_put_escaped_value 1.89% postgres [.] palloc master + 0001-Use-appendStringInfoString-instead-of-appendBinarySt.patch $ pgbench -n -T 60 -f bench.sql -M prepared postgres | grep latency latency average = 1.912 ms latency average = 1.912 ms latency average = 1.912 ms (~1% slower) 22.38% postgres [.] escape_json 21.98% postgres [.] pg_utf_mblen 9.07% postgres [.] JsonbIteratorNext.part.0 5.93% postgres [.] AllocSetAlloc 4.11% postgres [.] pg_mbstrlen_with_len 3.87% postgres [.] fillJsonbValue 3.66% postgres [.] JsonbToCStringWorker 2.28% postgres [.] enlargeStringInfo 2.15% postgres [.] appendStringInfoString 1.98% postgres [.] jsonb_put_escaped_value 1.92% postgres [.] palloc 1.58% postgres [.] appendBinaryStringInfo 1.42% postgres [.] pnstrdup Test 2: Test if inlining appendStringInfoString helps bench.sql: select sum(length(pg_get_ruledef(oid))) from pg_rewrite; master (@3f28bd73): $ pgbench -n -T 60 -f bench.sql postgres | grep latency latency average = 16.355 ms latency average = 16.290 ms latency average = 16.303 ms static inline appendStringInfoString $ pgbench -n -T 60 -f bench.sql postgres | grep latency latency average = 15.690 ms latency average = 15.575 ms latency average = 15.604 ms (~4.4% faster) David
David Rowley <dgrowleyml@gmail.com> writes: > 22.57% postgres [.] escape_json Hmm ... shouldn't we do something like - appendStringInfoString(buf, "\\b"); + appendStringInfoCharMacro(buf, '\\'); + appendStringInfoCharMacro(buf, 'b'); and so on in that function? I'm not convinced that this one hotspot justifies inlining appendStringInfoString everywhere. regards, tom lane
On Thu, 22 Dec 2022 at 20:56, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > 22.57% postgres [.] escape_json > > Hmm ... shouldn't we do something like > > - appendStringInfoString(buf, "\\b"); > + appendStringInfoCharMacro(buf, '\\'); > + appendStringInfoCharMacro(buf, 'b'); > > and so on in that function? I'm not convinced that this one > hotspot justifies inlining appendStringInfoString everywhere. It improves things slightly: Test 1 (from earlier) master + escape_json using appendStringInfoCharMacro $ pgbench -n -T 60 -f bench.sql -M prepared postgres | grep latency latency average = 1.807 ms latency average = 1.800 ms latency average = 1.812 ms (~4.8% faster than master) 23.05% postgres [.] pg_utf_mblen 22.55% postgres [.] escape_json 8.58% postgres [.] JsonbIteratorNext.part.0 6.80% postgres [.] AllocSetAlloc 4.23% postgres [.] pg_mbstrlen_with_len 3.88% postgres [.] JsonbToCStringWorker 3.79% postgres [.] fillJsonbValue 3.18% postgres [.] appendBinaryStringInfo 2.43% postgres [.] enlargeStringInfo 2.02% postgres [.] palloc 1.61% postgres [.] jsonb_put_escaped_value David
On Thu, Dec 22, 2022 at 4:19 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> Test 1 (from earlier)
>
> master + escape_json using appendStringInfoCharMacro
> $ pgbench -n -T 60 -f bench.sql -M prepared postgres | grep latency
> latency average = 1.807 ms
> latency average = 1.800 ms
> latency average = 1.812 ms (~4.8% faster than master)
> 23.05% postgres [.] pg_utf_mblen
I get about 20% improvement by adding an ascii fast path in pg_mbstrlen_with_len, which I think would work with all encodings we support:
@@ -1064,7 +1064,12 @@ pg_mbstrlen_with_len(const char *mbstr, int limit)
while (limit > 0 && *mbstr)
{
- int l = pg_mblen(mbstr);
+ int l;
+
+ if (!IS_HIGHBIT_SET(*mbstr))
+ l = 1;
+ else
+ l = pg_mblen(mbstr);
--
John Naylor
EDB: http://www.enterprisedb.com
while (limit > 0 && *mbstr)
{
- int l = pg_mblen(mbstr);
+ int l;
+
+ if (!IS_HIGHBIT_SET(*mbstr))
+ l = 1;
+ else
+ l = pg_mblen(mbstr);
--
John Naylor
EDB: http://www.enterprisedb.com
On 19.12.22 23:48, David Rowley wrote: > On Tue, 20 Dec 2022 at 11:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think Peter is entirely right to question whether *this* type's >> output function is performance-critical. Who's got large tables with >> jsonpath columns? It seems to me the type would mostly only exist >> as constants within queries. > > The patch touches code in the path of jsonb's output function too. I > don't think you could claim the same for that. Ok, let's leave the jsonb output alone. The jsonb output code also won't change a lot, but there is a bunch of stuff for jsonpath on the horizon, so having some more robust coding style to imitate there seems useful. Here is another patch set with the jsonb changes omitted.
Attachment
On Fri, 23 Dec 2022 at 22:04, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > On 19.12.22 23:48, David Rowley wrote: > > On Tue, 20 Dec 2022 at 11:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I think Peter is entirely right to question whether *this* type's > >> output function is performance-critical. Who's got large tables with > >> jsonpath columns? It seems to me the type would mostly only exist > >> as constants within queries. > > > > The patch touches code in the path of jsonb's output function too. I > > don't think you could claim the same for that. > > Ok, let's leave the jsonb output alone. The jsonb output code also > won't change a lot, but there is a bunch of stuff for jsonpath on the > horizon, so having some more robust coding style to imitate there seems > useful. Here is another patch set with the jsonb changes omitted. Maybe if there's concern that inlining appendStringInfoString is going to bloat the binary too much, then how about we just invent an inlined version of it using some other name that we can use when performance matters? We could then safely replace the offending appendBinaryStringInfos from both places without any concern for regressing performance. FWIW, I just did a few compilation runs of our supported versions to see how much postgres binary grew release to release: branch postgres binary size growth bytes REL_10_STABLE 8230232 0 REL_11_STABLE 8586024 355792 REL_12_STABLE 8831664 245640 REL_13_STABLE 8990824 159160 REL_14_STABLE 9484848 494024 REL_15_STABLE 9744680 259832 master 9977896 233216 inline_asis 10004032 26136 (inlined_asis = inlined appendStringInfoString) On the other hand, if we went with inlining the existing function, then it looks to be about 10% of the growth we saw between v14 and v15. That seems quite large. David
On 23.12.22 10:04, Peter Eisentraut wrote: > On 19.12.22 23:48, David Rowley wrote: >> On Tue, 20 Dec 2022 at 11:42, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I think Peter is entirely right to question whether *this* type's >>> output function is performance-critical. Who's got large tables with >>> jsonpath columns? It seems to me the type would mostly only exist >>> as constants within queries. >> >> The patch touches code in the path of jsonb's output function too. I >> don't think you could claim the same for that. > > Ok, let's leave the jsonb output alone. The jsonb output code also > won't change a lot, but there is a bunch of stuff for jsonpath on the > horizon, so having some more robust coding style to imitate there seems > useful. Here is another patch set with the jsonb changes omitted. I have committed these.
On 23.12.22 14:01, David Rowley wrote: > Maybe if there's concern that inlining appendStringInfoString is going > to bloat the binary too much, then how about we just invent an inlined > version of it using some other name that we can use when performance > matters? We could then safely replace the offending > appendBinaryStringInfos from both places without any concern for > regressing performance. The jsonpath output routines don't appear to be written with deep concern about performance now, so I'm not sure this is the place to start tweaking. For the jsonb parts, there are only a handful of strings this affects ("true", "false", "null"), so using appendBinaryStringInfo() there a few times doesn't seem so bad. So I'm not too worried about this altogether.
On 19.12.22 07:13, Peter Eisentraut wrote: > Also, the argument type of appendBinaryStringInfo() is char *. There is > some code that uses this function to assemble some kind of packed binary > layout, which requires a bunch of casts because of this. I think > functions taking binary data plus length should take void * instead, > like memcpy() for example. I found a little follow-up for this one: Make the same change to pq_sendbytes(), which is a thin wrapper around appendBinaryStringInfo(). This would allow getting rid of further casts at call sites.
Attachment
On Fri, Feb 10, 2023 at 7:16 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
On 19.12.22 07:13, Peter Eisentraut wrote:
> Also, the argument type of appendBinaryStringInfo() is char *. There is
> some code that uses this function to assemble some kind of packed binary
> layout, which requires a bunch of casts because of this. I think
> functions taking binary data plus length should take void * instead,
> like memcpy() for example.
I found a little follow-up for this one: Make the same change to
pq_sendbytes(), which is a thin wrapper around appendBinaryStringInfo().
This would allow getting rid of further casts at call sites.
+1
Has all the benefits that 54a177a948b0a773c25c6737d1cc3cc49222a526 had.
Passes make check-world.
On 10.02.23 20:08, Corey Huinker wrote: > > > On Fri, Feb 10, 2023 at 7:16 AM Peter Eisentraut > <peter.eisentraut@enterprisedb.com > <mailto:peter.eisentraut@enterprisedb.com>> wrote: > > On 19.12.22 07:13, Peter Eisentraut wrote: > > Also, the argument type of appendBinaryStringInfo() is char *. > There is > > some code that uses this function to assemble some kind of packed > binary > > layout, which requires a bunch of casts because of this. I think > > functions taking binary data plus length should take void * instead, > > like memcpy() for example. > > I found a little follow-up for this one: Make the same change to > pq_sendbytes(), which is a thin wrapper around > appendBinaryStringInfo(). > This would allow getting rid of further casts at call sites. > > > +1 > > Has all the benefits that 54a177a948b0a773c25c6737d1cc3cc49222a526 had. > > Passes make check-world. committed, thanks