Re: [PATCH] Optimize json_lex_string by batching character copying - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PATCH] Optimize json_lex_string by batching character copying
Date
Msg-id CAFBsxsESLUyJ5spfOSyPrOvKUEYYNqsBosue9SV1j8ecgNXSKA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Optimize json_lex_string by batching character copying  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [PATCH] Optimize json_lex_string by batching character copying
List pgsql-hackers
I wrote

> On Mon, Jul 11, 2022 at 11:07 PM Andres Freund <andres@anarazel.de> wrote:
>
> > I wonder if we can add a somewhat more general function for scanning until
> > some characters are found using SIMD? There's plenty other places that could
> > be useful.
>
> In simple cases, we could possibly abstract the entire loop. With this particular case, I imagine the most
approachableway to write the loop would be a bit more low-level:
 
>
> while (p < end - VECTOR_WIDTH &&
>        !vector_has_byte(p, '\\') &&
>        !vector_has_byte(p, '"') &&
>        vector_min_byte(p, 0x20))
>     p += VECTOR_WIDTH
>
> I wonder if we'd lose a bit of efficiency here by not accumulating set bits from the three conditions, but it's worth
trying.

The attached implements the above, more or less, using new pg_lfind8()
and pg_lfind8_le(), which in turn are based on helper functions that
act on a single vector. The pg_lfind* functions have regression tests,
but I haven't done the same for json yet. I went the extra step to use
bit-twiddling for non-SSE builds using uint64 as a "vector", which
still gives a pretty good boost (test below, min of 3):

master:
356ms

v5:
259ms

v5 disable SSE:
288ms

It still needs a bit of polishing and testing, but I think it's a good
workout for abstracting SIMD out of the way.

-------------
test:

DROP TABLE IF EXISTS long_json_as_text;
CREATE TABLE long_json_as_text AS
with long as (
        select repeat(description, 11)
        from pg_description
)
select (select json_agg(row_to_json(long))::text as t from long) from
generate_series(1, 100);
VACUUM FREEZE long_json_as_text;

select 1 from long_json_as_text where t::json is null; -- from Andrew upthread

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Damir Belyalov
Date:
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Next
From: John Naylor
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum