Thread: [PATCH] Optimize json_lex_string by batching character copying
When parsing JSON strings need to be converted from the JSON string format to a c-style string. A simple copy of the buffer does not suffice because of the various escape sequences that that JSON supports. Because of this our JSON parser wrote characters into the c-style string buffer one at a time. However, this is only necessary for these escaped sequences that map to another character. This patch changes the behaviour for non-escaped characters. These are now copied in batches instead of one character at a time. To test performance of this change I used COPY BINARY from a JSONB table into another, containing fairly JSONB values of ~15kB. The JSONB values are a JSON object with a single level. They contain a few small keys and values, but one very big value that's a stringified JSON blob. So this JSON blob contains a relatively high number of escape characters, to escape all the " characters. This change improves performance for workload this workload on my machine by ~18% (going from 1m24s to 1m09s). @Andres, there was indeed some low hanging fruit. @John Naylor, SSE2 indeed sounds like another nice improvement. I'll leave that to you.
Attachment
Hi,
Looking at the patch,
+ if (copyable_characters_length)
+ {
+ /* flush copyable characters */
+ appendBinaryStringInfo(
+ lex->strval,
+ s - copyable_characters_length,
+ copyable_characters_length);
+
+ }
break;
+ {
+ /* flush copyable characters */
+ appendBinaryStringInfo(
+ lex->strval,
+ s - copyable_characters_length,
+ copyable_characters_length);
+
+ }
break;
I wonder why copyable_characters_length is not reset after flushing.
Cheers
> + if (copyable_characters_length) > + { > + /* flush copyable characters */ > + appendBinaryStringInfo( > + lex->strval, > + s - copyable_characters_length, > + copyable_characters_length); > + > + } > break; > > I wonder why copyable_characters_length is not reset after flushing. It breaks from the loop right after. So copyable_characters_length isn't used again and thus resetting is not necessary. But I agree this could use a comment.
Hi, On 2022-06-24 08:47:09 +0000, Jelte Fennema wrote: > To test performance of this change I used COPY BINARY from a JSONB table > into another, containing fairly JSONB values of ~15kB. This will have a lot of other costs included (DML is expensive). I'd suggest storing the json in a text column and casting it to json[b], with a filter ontop of the json[b] result that cheaply filters it away. That should end up spending nearly all the time somewhere around json parsing. It's useful for things like this to include a way for others to use the same benchmark... I tried your patch with: DROP TABLE IF EXISTS json_as_text; CREATE TABLE json_as_text AS SELECT (SELECT json_agg(row_to_json(pd)) as t FROM pg_description pd) FROM generate_series(1,100); VACUUM FREEZE json_as_text; SELECT 1 FROM json_as_text WHERE jsonb_typeof(t::jsonb) = 'not me'; Which the patch improves from 846ms to 754ms (best of three). A bit smaller than your improvement, but still nice. I think your patch doesn't quite go far enough - we still end up looping for each character, have the added complication of needing to flush the "buffer". I'd be surprised if a "dedicated" loop to see until where the string last isn't faster. That then obviously could be SIMDified. Separately, it seems pretty awful efficiency / code density wise to have the NULL checks for ->strval all over. Might be worth forcing json_lex() and json_lex_string() to be inlined, with a constant parameter deciding whether ->strval is expected. That'd likely be enough to get the compiler specialize the code for us. Might also be worth to maintain ->strval using appendBinaryStringInfoNT(). Greetings, Andres Freund
Hi, On 2022-06-24 17:18:10 -0700, Andres Freund wrote: > On 2022-06-24 08:47:09 +0000, Jelte Fennema wrote: > > To test performance of this change I used COPY BINARY from a JSONB table > > into another, containing fairly JSONB values of ~15kB. > > This will have a lot of other costs included (DML is expensive). I'd suggest > storing the json in a text column and casting it to json[b], with a filter > ontop of the json[b] result that cheaply filters it away. That should end up > spending nearly all the time somewhere around json parsing. > > It's useful for things like this to include a way for others to use the same > benchmark... > > I tried your patch with: > > DROP TABLE IF EXISTS json_as_text; > CREATE TABLE json_as_text AS SELECT (SELECT json_agg(row_to_json(pd)) as t FROM pg_description pd) FROM generate_series(1,100); > VACUUM FREEZE json_as_text; > > SELECT 1 FROM json_as_text WHERE jsonb_typeof(t::jsonb) = 'not me'; > > Which the patch improves from 846ms to 754ms (best of three). A bit smaller > than your improvement, but still nice. > > > I think your patch doesn't quite go far enough - we still end up looping for > each character, have the added complication of needing to flush the > "buffer". I'd be surprised if a "dedicated" loop to see until where the string > last isn't faster. That then obviously could be SIMDified. A naive implementation (attached) of that gets me down to 706ms. Greetings, Andres Freund
Attachment
On Sat, Jun 25, 2022 at 8:05 AM Andres Freund <andres@anarazel.de> wrote: > > I tried your patch with: > > > > DROP TABLE IF EXISTS json_as_text; > > CREATE TABLE json_as_text AS SELECT (SELECT json_agg(row_to_json(pd)) as t FROM pg_description pd) FROM generate_series(1,100); > > VACUUM FREEZE json_as_text; > > > > SELECT 1 FROM json_as_text WHERE jsonb_typeof(t::jsonb) = 'not me'; > > > > Which the patch improves from 846ms to 754ms (best of three). A bit smaller > > than your improvement, but still nice. > > > > > > I think your patch doesn't quite go far enough - we still end up looping for > > each character, have the added complication of needing to flush the > > "buffer". I'd be surprised if a "dedicated" loop to see until where the string > > last isn't faster. That then obviously could be SIMDified. > > A naive implementation (attached) of that gets me down to 706ms. Taking this a step further, I modified json_lex and json_lex_string to use a const end pointer instead of maintaining the length (0001). The observed speedup is small enough that it might not be real, but the code is simpler this way, and it makes 0002 and 0003 easier to reason about. Then I modified your patch to do the same (0002). Hackish SSE2 support is in 0003. To exercise the SIMD code a bit, I added a second test: DROP TABLE IF EXISTS long_json_as_text; CREATE TABLE long_json_as_text AS with long as ( select repeat(description, 10) from pg_description pd ) SELECT (select json_agg(row_to_json(long)) as t from long) from generate_series(1, 100); VACUUM FREEZE long_json_as_text; SELECT 1 FROM long_json_as_text WHERE jsonb_typeof(t::jsonb) = 'not me'; With this, I get (turbo disabled, min of 3): short test: master: 769ms 0001: 751ms 0002: 703ms 0003: 701ms long test; master: 939ms 0001: 883ms 0002: 537ms 0003: 439ms I think 0001/2 are mostly in committable shape. With 0003, I'd want to make the portability check a bit nicer and more centralized. I'm thinking of modifying the CRC check to report that the host cpu/compiler understands SSE4.2 x86 intrinsics, and then the compile time SSE2 check can piggyback on top of that without a runtime check. This is conceptually easy but a bit of work to not look like a hack (which probably means the ARM CRC check should look more generic somehow...). The regression tests will likely need some work as well. > Separately, it seems pretty awful efficiency / code density wise to have the > NULL checks for ->strval all over. Might be worth forcing json_lex() and > json_lex_string() to be inlined, with a constant parameter deciding whether > ->strval is expected. That'd likely be enough to get the compiler specialize > the code for us. I had a look at this but it's a bit more invasive than I want to devote time to at this point. -- John Naylor EDB: http://www.enterprisedb.com
Attachment
Hi, On 2022-07-06 12:10:20 +0700, John Naylor wrote: > diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c > index eeedc0645a..ad4858c623 100644 > --- a/src/common/jsonapi.c > +++ b/src/common/jsonapi.c > @@ -851,10 +851,26 @@ json_lex_string(JsonLexContext *lex) > } > else if (lex->strval != NULL) > { > + /* start lookahead at next byte */ > + char *p = s + 1; > + > if (hi_surrogate != -1) > return JSON_UNICODE_LOW_SURROGATE; > > - appendStringInfoChar(lex->strval, *s); > + while (p < end) > + { > + if (*p == '\\' || *p == '"' || (unsigned char) *p < 32) > + break; > + p++; > + } > + > + appendBinaryStringInfo(lex->strval, s, p - s); > + > + /* > + * s will be incremented at the top of the loop, so set it to just > + * behind our lookahead position > + */ > + s = p - 1; > } > } > > -- > 2.36.1 I think before committing something along those lines we should make the relevant bits also be applicable when ->strval is NULL, as several functions use that (notably json_in IIRC). Afaics we'd just need to move the strval check to be around the appendBinaryStringInfo(). And it should simplify the function, because some of the relevant code is duplicated outside as well... Greetings, Andres Freund
On Wed, Jul 6, 2022 at 12:18 PM Andres Freund <andres@anarazel.de> wrote: > I think before committing something along those lines we should make the > relevant bits also be applicable when ->strval is NULL, as several functions > use that (notably json_in IIRC). Afaics we'd just need to move the strval > check to be around the appendBinaryStringInfo(). That makes sense and is easy. > And it should simplify the > function, because some of the relevant code is duplicated outside as well... Not sure how far to take this, but I put the returnable paths inside the "other" path, so only backslash will go back to the top. Both the above changes are split into a new 0003 patch for easier review, but in the end will likely be squashed with 0002. -- John Naylor EDB: http://www.enterprisedb.com
Attachment
I've pushed 0001 (although the email seems to have been swallowed again), and pending additional comments on 0002 and 0003 I'll squash and push those next week. 0004 needs some thought on integrating with symbols we discover during configure. -- John Naylor EDB: http://www.enterprisedb.com
On Fri, Jul 8, 2022 at 3:06 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > I've pushed 0001 (although the email seems to have been swallowed > again), and pending additional comments on 0002 and 0003 I'll squash > and push those next week. This is done. > 0004 needs some thought on integrating with > symbols we discover during configure. Still needs thought. -- John Naylor EDB: http://www.enterprisedb.com
Hi, On 2022-07-06 15:58:44 +0700, John Naylor wrote: > From 82e13b6bebd85a152ededcfd75495c0c0f642354 Mon Sep 17 00:00:00 2001 > From: John Naylor <john.naylor@postgresql.org> > Date: Wed, 6 Jul 2022 15:50:09 +0700 > Subject: [PATCH v4 4/4] Use vectorized lookahead in json_lex_string on x86 > > --- > src/common/jsonapi.c | 48 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 48 insertions(+) > > diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c > index 81e176ad8d..44e8ed2b2f 100644 > --- a/src/common/jsonapi.c > +++ b/src/common/jsonapi.c > @@ -24,6 +24,12 @@ > #include "miscadmin.h" > #endif > > +/* WIP: put somewhere sensible and consider removing CRC from the names */ > +#if (defined (__x86_64__) || defined(_M_AMD64)) && (defined(USE_SSE42_CRC32C) || defined(USE_SSE42_CRC32C_WITH_RUNTIME_CHECK)) > +#include <nmmintrin.h> > +#define USE_SSE2 > +#endif > + > /* > * The context of the parser is maintained by the recursive descent > * mechanism, but is passed explicitly to the error reporting routine > @@ -842,12 +848,54 @@ json_lex_string(JsonLexContext *lex) > } > else > { > +#ifdef USE_SSE2 > + __m128i block, > + has_backslash, > + has_doublequote, > + control, > + has_control, > + error_cum = _mm_setzero_si128(); > + const __m128i backslash = _mm_set1_epi8('\\'); > + const __m128i doublequote = _mm_set1_epi8('"'); > + const __m128i max_control = _mm_set1_epi8(0x1F); > +#endif > /* start lookahead at current byte */ > char *p = s; > > if (hi_surrogate != -1) > return JSON_UNICODE_LOW_SURROGATE; > > +#ifdef USE_SSE2 > + while (p < end - sizeof(__m128i)) > + { > + block = _mm_loadu_si128((const __m128i *) p); > + > + /* direct comparison to quotes and backslashes */ > + has_backslash = _mm_cmpeq_epi8(block, backslash); > + has_doublequote = _mm_cmpeq_epi8(block, doublequote); > + > + /* > + * use saturation arithmetic to check for <= highest control > + * char > + */ > + control = _mm_subs_epu8(block, max_control); > + has_control = _mm_cmpeq_epi8(control, _mm_setzero_si128()); > + > + /* > + * set bits in error_cum where the corresponding lanes in has_* > + * are set > + */ > + error_cum = _mm_or_si128(error_cum, has_backslash); > + error_cum = _mm_or_si128(error_cum, has_doublequote); > + error_cum = _mm_or_si128(error_cum, has_control); > + > + if (_mm_movemask_epi8(error_cum)) > + break; > + > + p += sizeof(__m128i); > + } > +#endif /* USE_SSE2 */ > + > while (p < end) > { > if (*p == '\\' || *p == '"') > -- > 2.36.1 > I wonder if we can't abstract this at least a bit better. If we go that route a bit further, then add another arch, this code will be pretty much unreadable. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I wonder if we can't abstract this at least a bit better. If we go that route > a bit further, then add another arch, this code will be pretty much > unreadable. IMO, it's pretty unreadable *now*, for lack of comments about what it's doing and why. regards, tom lane
Hi, On 2022-07-11 11:53:26 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I wonder if we can't abstract this at least a bit better. If we go that route > > a bit further, then add another arch, this code will be pretty much > > unreadable. > > IMO, it's pretty unreadable *now*, for lack of comments about what it's > doing and why. Yea, that could at least be addressed by adding comments. But even with a bunch of comments, it'd still be pretty hard to read once the events above have happened (and they seem kind of inevitable). I wonder if we can add a somewhat more general function for scanning until some characters are found using SIMD? There's plenty other places that could be useful. Greetings, Andres Freund
On Mon, Jul 11, 2022 at 11:07 PM Andres Freund <andres@anarazel.de> wrote:
> I wonder if we can add a somewhat more general function for scanning until
> some characters are found using SIMD? There's plenty other places that could
> be useful.
In simple cases, we could possibly abstract the entire loop. With this particular case, I imagine the most approachable way to write the loop would be a bit more low-level:
while (p < end - VECTOR_WIDTH &&
!vector_has_byte(p, '\\') &&
!vector_has_byte(p, '"') &&
vector_min_byte(p, 0x20))
p += VECTOR_WIDTH
I wonder if we'd lose a bit of efficiency here by not accumulating set bits from the three conditions, but it's worth trying.
On 2022-06-24 Fr 20:18, Andres Freund wrote: > Hi, > > On 2022-06-24 08:47:09 +0000, Jelte Fennema wrote: >> To test performance of this change I used COPY BINARY from a JSONB table >> into another, containing fairly JSONB values of ~15kB. > This will have a lot of other costs included (DML is expensive). I'd suggest > storing the json in a text column and casting it to json[b], with a filter > ontop of the json[b] result that cheaply filters it away. That should end up > spending nearly all the time somewhere around json parsing. > > It's useful for things like this to include a way for others to use the same > benchmark... > > I tried your patch with: > > DROP TABLE IF EXISTS json_as_text; > CREATE TABLE json_as_text AS SELECT (SELECT json_agg(row_to_json(pd)) as t FROM pg_description pd) FROM generate_series(1,100); > VACUUM FREEZE json_as_text; > > SELECT 1 FROM json_as_text WHERE jsonb_typeof(t::jsonb) = 'not me'; I've been doing some other work related to json parsing and John referred me to this. But it's actually not the best test for pure json parsing - casting to jsonb involves some extra work besides pure parsing. Instead I've been using this query with the same table, which should be almost all json parsing: select 1 from json_as_text where t::json is null; cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
I wrote > On Mon, Jul 11, 2022 at 11:07 PM Andres Freund <andres@anarazel.de> wrote: > > > I wonder if we can add a somewhat more general function for scanning until > > some characters are found using SIMD? There's plenty other places that could > > be useful. > > In simple cases, we could possibly abstract the entire loop. With this particular case, I imagine the most approachableway to write the loop would be a bit more low-level: > > while (p < end - VECTOR_WIDTH && > !vector_has_byte(p, '\\') && > !vector_has_byte(p, '"') && > vector_min_byte(p, 0x20)) > p += VECTOR_WIDTH > > I wonder if we'd lose a bit of efficiency here by not accumulating set bits from the three conditions, but it's worth trying. The attached implements the above, more or less, using new pg_lfind8() and pg_lfind8_le(), which in turn are based on helper functions that act on a single vector. The pg_lfind* functions have regression tests, but I haven't done the same for json yet. I went the extra step to use bit-twiddling for non-SSE builds using uint64 as a "vector", which still gives a pretty good boost (test below, min of 3): master: 356ms v5: 259ms v5 disable SSE: 288ms It still needs a bit of polishing and testing, but I think it's a good workout for abstracting SIMD out of the way. ------------- test: DROP TABLE IF EXISTS long_json_as_text; CREATE TABLE long_json_as_text AS with long as ( select repeat(description, 11) from pg_description ) select (select json_agg(row_to_json(long))::text as t from long) from generate_series(1, 100); VACUUM FREEZE long_json_as_text; select 1 from long_json_as_text where t::json is null; -- from Andrew upthread -- John Naylor EDB: http://www.enterprisedb.com
Attachment
Hi,
I ran this test.
DROP TABLE IF EXISTS long_json_as_text;
CREATE TABLE long_json_as_text AS
with long as (
select repeat(description, 11)
from pg_description
)
select (select json_agg(row_to_json(long))::text as t from long) from
generate_series(1, 100);
VACUUM FREEZE long_json_as_text;
select 1 from long_json_as_text where t::json is null;
head:
Time: 161,741ms
v5:
Time: 270,298 ms
ubuntu 64 bits
gcc 9.4.0
regards,
Ranier Vilela
Em seg., 15 de ago. de 2022 às 15:34, Ranier Vilela <ranier.vf@gmail.com> escreveu:
Hi,
I ran this test.
DROP TABLE IF EXISTS long_json_as_text;
CREATE TABLE long_json_as_text AS
with long as (
select repeat(description, 11)
from pg_description
)
select (select json_agg(row_to_json(long))::text as t from long) from
generate_series(1, 100);
VACUUM FREEZE long_json_as_text;select 1 from long_json_as_text where t::json is null;
head:Time: 161,741msv5:Time: 270,298 ms
Sorry too fast, 270,298ms with native memchr.
v5
Time: 208,689 ms
regards,
Ranier Vilela
On Mon, Aug 15, 2022 at 08:33:21PM +0700, John Naylor wrote: > The attached implements the above, more or less, using new pg_lfind8() > and pg_lfind8_le(), which in turn are based on helper functions that > act on a single vector. The pg_lfind* functions have regression tests, > but I haven't done the same for json yet. I went the extra step to use > bit-twiddling for non-SSE builds using uint64 as a "vector", which > still gives a pretty good boost (test below, min of 3): Looks pretty reasonable to me. > +#ifdef USE_SSE2 > + chunk = _mm_loadu_si128((const __m128i *) &base[i]); > +#else > + memcpy(&chunk, &base[i], sizeof(chunk)); > +#endif /* USE_SSE2 */ > +#ifdef USE_SSE2 > + chunk = _mm_loadu_si128((const __m128i *) &base[i]); > +#else > + memcpy(&chunk, &base[i], sizeof(chunk)); > +#endif /* USE_SSE2 */ Perhaps there should be a macro or inline function for loading a vector so that these USE_SSE2 checks can be abstracted away, too. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Tue, Aug 16, 2022 at 4:23 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Mon, Aug 15, 2022 at 08:33:21PM +0700, John Naylor wrote: > > +#ifdef USE_SSE2 > > + chunk = _mm_loadu_si128((const __m128i *) &base[i]); > > +#else > > + memcpy(&chunk, &base[i], sizeof(chunk)); > > +#endif /* USE_SSE2 */ > > Perhaps there should be a macro or inline function for loading a vector so > that these USE_SSE2 checks can be abstracted away, too. This is done. Also: - a complete overhaul of the pg_lfind8* tests - using a typedef for the vector type - some refactoring, name changes and other cleanups (a few of these could also be applied to the 32-byte element path, but that is left for future work) TODO: json-specific tests of the new path -- John Naylor EDB: http://www.enterprisedb.com
Attachment
On Fri, Aug 19, 2022 at 03:11:36PM +0700, John Naylor wrote: > This is done. Also: > - a complete overhaul of the pg_lfind8* tests > - using a typedef for the vector type > - some refactoring, name changes and other cleanups (a few of these > could also be applied to the 32-byte element path, but that is left > for future work) > > TODO: json-specific tests of the new path This looks pretty good to me. Should we rename vector_broadcast() and vector_has_zero() to indicate that they are working with bytes (e.g., vector_broadcast_byte())? We might be able to use vector_broadcast_int() in the 32-bit functions, and your other vector functions already have a _byte suffix. In general, the approach you've taken seems like a decent readability improvement. I'd be happy to try my hand at adjusting the 32-bit path and adding ARM versions of all this stuff. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Fri, Aug 19, 2022 at 01:42:15PM -0700, Nathan Bossart wrote: > On Fri, Aug 19, 2022 at 03:11:36PM +0700, John Naylor wrote: >> This is done. Also: >> - a complete overhaul of the pg_lfind8* tests >> - using a typedef for the vector type >> - some refactoring, name changes and other cleanups (a few of these >> could also be applied to the 32-byte element path, but that is left >> for future work) >> >> TODO: json-specific tests of the new path > > This looks pretty good to me. Should we rename vector_broadcast() and > vector_has_zero() to indicate that they are working with bytes (e.g., > vector_broadcast_byte())? We might be able to use vector_broadcast_int() > in the 32-bit functions, and your other vector functions already have a > _byte suffix. > > In general, the approach you've taken seems like a decent readability > improvement. I'd be happy to try my hand at adjusting the 32-bit path and > adding ARM versions of all this stuff. I spent some more time looking at this one, and I had a few ideas that I thought I'd share. 0001 is your v6 patch with a few additional changes, including simplying the assertions for readability, splitting out the Vector type into Vector8 and Vector32 (needed for ARM), and adjusting pg_lfind32() to use the new tools in simd.h. 0002 adds ARM versions of everything, which obsoletes the other thread I started [0]. This is still a little rough around the edges (e.g., this should probably be more than 2 patches), but I think it helps demonstrate a more comprehensive design than what I've proposed in the pg_lfind32-for-ARM thread [0]. Apologies if I'm stepping on your toes a bit here. [0] https://postgr.es/m/20220819200829.GA395728%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Sun, Aug 21, 2022 at 12:47 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > I spent some more time looking at this one, and I had a few ideas that I > thought I'd share. 0001 is your v6 patch with a few additional changes, > including simplying the assertions for readability, splitting out the > Vector type into Vector8 and Vector32 (needed for ARM), and adjusting > pg_lfind32() to use the new tools in simd.h. 0002 adds ARM versions of > everything, which obsoletes the other thread I started [0]. This is still > a little rough around the edges (e.g., this should probably be more than 2 > patches), but I think it helps demonstrate a more comprehensive design than > what I've proposed in the pg_lfind32-for-ARM thread [0]. > > Apologies if I'm stepping on your toes a bit here. Not at all! However, the 32-bit-element changes are irrelevant for json, and make review more difficult. I would suggest keeping those in the other thread starting with whatever refactoring is needed. I can always rebase over that. Not a full review, but on a brief look: - I like the idea of simplifying the assertions, but I can't get behind using platform lfind to do it, since it has a different API, requires new functions we don't need, and possibly has portability issues. A simple for-loop is better for assertions. - A runtime elog is not appropriate for a compile time check -- use #error instead. -- John Naylor EDB: http://www.enterprisedb.com
On Mon, Aug 22, 2022 at 09:35:34AM +0700, John Naylor wrote: > Not at all! However, the 32-bit-element changes are irrelevant for > json, and make review more difficult. I would suggest keeping those in > the other thread starting with whatever refactoring is needed. I can > always rebase over that. Yeah, I'll remove those to keep this thread focused. > - I like the idea of simplifying the assertions, but I can't get > behind using platform lfind to do it, since it has a different API, > requires new functions we don't need, and possibly has portability > issues. A simple for-loop is better for assertions. My main goal with this was improving readability, which is likely possible without lfind(). I'll see what I can do. > - A runtime elog is not appropriate for a compile time check -- use > #error instead. Will do. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Mon, Aug 22, 2022 at 02:22:29PM -0700, Nathan Bossart wrote: > On Mon, Aug 22, 2022 at 09:35:34AM +0700, John Naylor wrote: >> Not at all! However, the 32-bit-element changes are irrelevant for >> json, and make review more difficult. I would suggest keeping those in >> the other thread starting with whatever refactoring is needed. I can >> always rebase over that. > > Yeah, I'll remove those to keep this thread focused. Here's a new version of the patch with the 32-bit changes and calls to lfind() removed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, Aug 23, 2022 at 10:32 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Mon, Aug 22, 2022 at 02:22:29PM -0700, Nathan Bossart wrote: > > On Mon, Aug 22, 2022 at 09:35:34AM +0700, John Naylor wrote: > >> Not at all! However, the 32-bit-element changes are irrelevant for > >> json, and make review more difficult. I would suggest keeping those in > >> the other thread starting with whatever refactoring is needed. I can > >> always rebase over that. > > > > Yeah, I'll remove those to keep this thread focused. > > Here's a new version of the patch with the 32-bit changes and calls to > lfind() removed. LGTM overall. My plan is to split out the json piece, adding tests for that, and commit the infrastructure for it fairly soon. Possible bikeshedding: Functions like vector8_eq() might be misunderstood as comparing two vectors, but here we are comparing each lane with a scalar. I wonder if vector8_eq_scalar() et al might be more clear. -- John Naylor EDB: http://www.enterprisedb.com
On Tue, Aug 23, 2022 at 01:03:03PM +0700, John Naylor wrote: > On Tue, Aug 23, 2022 at 10:32 AM Nathan Bossart >> Here's a new version of the patch with the 32-bit changes and calls to >> lfind() removed. > > LGTM overall. My plan is to split out the json piece, adding tests for > that, and commit the infrastructure for it fairly soon. Possible > bikeshedding: Functions like vector8_eq() might be misunderstood as > comparing two vectors, but here we are comparing each lane with a > scalar. I wonder if vector8_eq_scalar() et al might be more clear. Good point. I had used vector32_veq() to denote vector comparison, which would extend to something like vector8_seq(). But that doesn't seem descriptive enough. It might be worth considering vector8_contains() or vector8_has() as well. I don't really have an opinion, but if I had to pick something, I guess I'd choose vector8_contains(). -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Wed, Aug 24, 2022 at 12:15 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Aug 23, 2022 at 01:03:03PM +0700, John Naylor wrote: > > On Tue, Aug 23, 2022 at 10:32 AM Nathan Bossart > >> Here's a new version of the patch with the 32-bit changes and calls to > >> lfind() removed. > > > > LGTM overall. My plan is to split out the json piece, adding tests for > > that, and commit the infrastructure for it fairly soon. Possible > > bikeshedding: Functions like vector8_eq() might be misunderstood as > > comparing two vectors, but here we are comparing each lane with a > > scalar. I wonder if vector8_eq_scalar() et al might be more clear. > > Good point. I had used vector32_veq() to denote vector comparison, which > would extend to something like vector8_seq(). But that doesn't seem > descriptive enough. It might be worth considering vector8_contains() or > vector8_has() as well. I don't really have an opinion, but if I had to > pick something, I guess I'd choose vector8_contains(). It seems "scalar" would be a bad choice since it already means (confusingly) operating on the least significant element of a vector. I'm thinking of *_has and *_has_le, matching the already existing in the earlier patch *_has_zero. -- John Naylor EDB: http://www.enterprisedb.com
On Wed, Aug 24, 2022 at 11:59:25AM +0700, John Naylor wrote: > It seems "scalar" would be a bad choice since it already means > (confusingly) operating on the least significant element of a vector. > I'm thinking of *_has and *_has_le, matching the already existing in > the earlier patch *_has_zero. That seems reasonable to me. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Wed, Aug 24, 2022 at 11:56 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Wed, Aug 24, 2022 at 11:59:25AM +0700, John Naylor wrote: > > It seems "scalar" would be a bad choice since it already means > > (confusingly) operating on the least significant element of a vector. > > I'm thinking of *_has and *_has_le, matching the already existing in > > the earlier patch *_has_zero. > > That seems reasonable to me. Okay, done that way, also in v9: - a convenience macro in the test suite which is handy now and can be used for 32-bit element tests if we like - more tests - pgindent and some additional comment smithing - split out the json piece for a later commit - For the following comment, pgindent will put spaced operands on a separate line which is not great for readability. and our other reference to the Stanford bithacks page keeps the in-page link, and I see no reason to exclude it -- if it goes missing, the whole page will still load. So I put back those two details. + * To find bytes <= c, we can use bitwise operations to find bytes < c+1, + * but it only works if c+1 <= 128 and if the highest bit in v is not set. + * Adapted from + * https://graphics.stanford.edu/~seander/bithacks.html#HasLessInWord I think I'll go ahead and commit 0001 in a couple days pending further comments. -- John Naylor EDB: http://www.enterprisedb.com
Attachment
On Thu, Aug 25, 2022 at 01:35:45PM +0700, John Naylor wrote: > - For the following comment, pgindent will put spaced operands on a > separate line which is not great for readability. and our other > reference to the Stanford bithacks page keeps the in-page link, and I > see no reason to exclude it -- if it goes missing, the whole page will > still load. So I put back those two details. > > + * To find bytes <= c, we can use bitwise operations to find > bytes < c+1, > + * but it only works if c+1 <= 128 and if the highest bit in v > is not set. > + * Adapted from > + * https://graphics.stanford.edu/~seander/bithacks.html#HasLessInWord This was just unnecessary fiddling on my part, sorry about that. > +test_lfind8_internal(uint8 key) > +{ > + uint8 charbuf[LEN_WITH_TAIL(Vector8)]; > + const int len_no_tail = LEN_NO_TAIL(Vector8); > + const int len_with_tail = LEN_WITH_TAIL(Vector8); > + > + memset(charbuf, 0xFF, len_with_tail); > + /* search tail to test one-byte-at-a-time path */ > + charbuf[len_with_tail - 1] = key; > + if (key > 0x00 && pg_lfind8(key - 1, charbuf, len_with_tail)) > + elog(ERROR, "pg_lfind8() found nonexistent element <= '0x%x'", key - 1); > + if (key < 0xFF && !pg_lfind8(key, charbuf, len_with_tail)) > + elog(ERROR, "pg_lfind8() did not find existing element <= '0x%x'", key); > + if (key < 0xFE && pg_lfind8(key + 1, charbuf, len_with_tail)) > + elog(ERROR, "pg_lfind8() found nonexistent element <= '0x%x'", key + 1); > + > + memset(charbuf, 0xFF, len_with_tail); > + /* search with vector operations */ > + charbuf[len_no_tail - 1] = key; > + if (key > 0x00 && pg_lfind8(key - 1, charbuf, len_no_tail)) > + elog(ERROR, "pg_lfind8() found nonexistent element <= '0x%x'", key - 1); > + if (key < 0xFF && !pg_lfind8(key, charbuf, len_no_tail)) > + elog(ERROR, "pg_lfind8() did not find existing element <= '0x%x'", key); > + if (key < 0xFE && pg_lfind8(key + 1, charbuf, len_no_tail)) > + elog(ERROR, "pg_lfind8() found nonexistent element <= '0x%x'", key + 1); > +} nitpick: Shouldn't the elog() calls use "==" instead of "<=" for this one? Otherwise, 0001 looks good to me. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Fri, Aug 26, 2022 at 10:14 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > > +test_lfind8_internal(uint8 key) [...] > > + elog(ERROR, "pg_lfind8() found nonexistent element <= '0x%x'", key + 1); > > +} > > nitpick: Shouldn't the elog() calls use "==" instead of "<=" for this one? Good catch, will fix. -- John Naylor EDB: http://www.enterprisedb.com
On Thu, Aug 25, 2022 at 1:35 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > I think I'll go ahead and commit 0001 in a couple days pending further comments. Pushed with Nathan's correction and some cosmetic rearrangements. -- John Naylor EDB: http://www.enterprisedb.com
> I wonder why copyable_characters_length is not reset after flushing. It's not necessary because of the break statement right after. But this part of the code was refactored away in John's improved patch that's actually merged: https://github.com/postgres/postgres/commit/3838fa269c15706df2b85ce2d6af8aacd5611655
On Tue, Aug 23, 2022 at 1:03 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > LGTM overall. My plan is to split out the json piece, adding tests for > that, and commit the infrastructure for it fairly soon. Here's the final piece. I debated how many tests to add and decided it was probably enough to add one each for checking quotes and backslashes in the fast path. There is one cosmetic change in the code: Before, the vectorized less-equal check compared to 0x1F, but the byte-wise path did so with < 32. I made them both "less-equal 31" for consistency. I'll commit this by the end of the week unless anyone has a better idea about testing. -- John Naylor EDB: http://www.enterprisedb.com
Attachment
On Wed, Aug 31, 2022 at 10:50:39AM +0700, John Naylor wrote: > Here's the final piece. I debated how many tests to add and decided it > was probably enough to add one each for checking quotes and > backslashes in the fast path. There is one cosmetic change in the > code: Before, the vectorized less-equal check compared to 0x1F, but > the byte-wise path did so with < 32. I made them both "less-equal 31" > for consistency. I'll commit this by the end of the week unless anyone > has a better idea about testing. LGTM -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Wed, Aug 31, 2022 at 11:17 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Wed, Aug 31, 2022 at 10:50:39AM +0700, John Naylor wrote: > > Here's the final piece. I debated how many tests to add and decided it > > was probably enough to add one each for checking quotes and > > backslashes in the fast path. There is one cosmetic change in the > > code: Before, the vectorized less-equal check compared to 0x1F, but > > the byte-wise path did so with < 32. I made them both "less-equal 31" > > for consistency. I'll commit this by the end of the week unless anyone > > has a better idea about testing. > > LGTM Pushed, thanks for looking! -- John Naylor EDB: http://www.enterprisedb.com