Home > mailing lists

Re: [PATCH] Optimize json_lex_string by batching character copying - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: [PATCH] Optimize json_lex_string by batching character copying
Date	June 25, 2022 03:18:10
Msg-id	20220625001810.q4giojmraweoyl5e@alap3.anarazel.de Whole thread Raw
In response to	[PATCH] Optimize json_lex_string by batching character copying (Jelte Fennema <Jelte.Fennema@microsoft.com>)
Responses	Re: [PATCH] Optimize json_lex_string by batching character copying Re: [PATCH] Optimize json_lex_string by batching character copying
List	pgsql-hackers

Tree view

Hi,

On 2022-06-24 08:47:09 +0000, Jelte Fennema wrote:
> To test performance of this change I used COPY BINARY from a JSONB table
> into another, containing fairly JSONB values of ~15kB.

This will have a lot of other costs included (DML is expensive). I'd suggest
storing the json in a text column and casting it to json[b], with a filter
ontop of the json[b] result that cheaply filters it away. That should end up
spending nearly all the time somewhere around json parsing.

It's useful for things like this to include a way for others to use the same
benchmark...

I tried your patch with:

DROP TABLE IF EXISTS json_as_text;
CREATE TABLE json_as_text AS SELECT (SELECT json_agg(row_to_json(pd)) as t FROM pg_description pd) FROM
generate_series(1,100);

VACUUM FREEZE json_as_text;

SELECT 1 FROM json_as_text WHERE jsonb_typeof(t::jsonb) = 'not me';

Which the patch improves from 846ms to 754ms (best of three). A bit smaller
than your improvement, but still nice.

I think your patch doesn't quite go far enough - we still end up looping for
each character, have the added complication of needing to flush the
"buffer". I'd be surprised if a "dedicated" loop to see until where the string
last isn't faster.  That then obviously could be SIMDified.

Separately, it seems pretty awful efficiency / code density wise to have the
NULL checks for ->strval all over. Might be worth forcing json_lex() and
json_lex_string() to be inlined, with a constant parameter deciding whether
->strval is expected. That'd likely be enough to get the compiler specialize
the code for us.

Might also be worth to maintain ->strval using appendBinaryStringInfoNT().

Greetings,

Andres Freund

pgsql-hackers by date:

From: Hannu Krosing
Date: 25 June 2022, 03:17:55
Subject: Re: Hardening PostgreSQL via (optional) ban on local file system access

From: Gurjeet Singh
Date: 25 June 2022, 03:26:41
Subject: Re: Hardening PostgreSQL via (optional) ban on local file system access

Re: [PATCH] Optimize json_lex_string by batching character copying - Mailing list pgsql-hackers

Previous

Next