Re: Speed up JSON escape processing with SIMD plus other optimisations - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Speed up JSON escape processing with SIMD plus other optimisations |
Date | |
Msg-id | CAApHDvqQr0A2CopL85d0Uphk9uQxBdWcxzfMp1qqKYPe8vJNEw@mail.gmail.com Whole thread Raw |
In response to | Re: Speed up JSON escape processing with SIMD plus other optimisations (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Speed up JSON escape processing with SIMD plus other optimisations
|
List | pgsql-hackers |
On Wed, 24 Jul 2024 at 22:55, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 02/07/2024 07:49, David Rowley wrote: > > I've attached a rebased set of patches. The previous set no longer applied. > > I looked briefly at the first patch. Seems reasonable. > > One little thing that caught my eye is that in populate_scalar(), you > sometimes make a temporary copy of the string to add the > null-terminator, but then call escape_json() which doesn't need the > null-terminator anymore. See attached patch to avoid that. However, it's > not clear to me how to reach that codepath, or if it reachable at all. I > tried to add a NOTICE there and ran the regression tests, but got no > failures. Thanks for noticing that. It seems like a good simplification regardless. I've incorporated it. I made another pass over the 0001 and 0003 patches and after a bit of renaming, I pushed the result. I ended up keeping escape_json() as-is and giving the new function the name escape_json_with_len(). The text version is named ecape_json_text(). I think originally I did it the other way as thought I'd have been able to adjust more locations than I did. Having it this way around is slightly less churn. I did another round of testing on the SIMD patch (attached as v5-0001) as I wondered if the SIMD loop maybe shouldn't wait too long before copying the bytes to the destination string. I had wondered if the JSON string was very large that if we looked ahead too far that by the time we flush those bytes out to the destination buffer, we'd have started eviction of L1 cachelines for parts of the buffer that are still to be flushed. I put this to the test (test 3) and found that with a 1MB JSON string it is faster to flush every 512 bytes than it is to only flush after checking the entire 1MB. With a 10kB JSON string (test 2), the extra code to flush every 512 bytes seems to slow things down. I'm a bit undecided about whether the flushing is worthwhile or not. It really depend on the length of JSON strings we'd like to optimise for. It might be possible to get the best of both but I think it might require manually implementing portions of appendBinaryStringInfo(). I'd rather not go there. Does anyone have any thoughts about that? Test 2 (10KB) does show a ~261% performance increase but dropped to ~227% flushing every 512 bytes. Test 3 (1MB) increased performance by ~99% without early flushing and increased to ~156% flushing every 512 bytes. bench.sql: select row_to_json(j1)::jsonb from j1; ## Test 1 (variable JSON strings up to 1KB) create table j1 (very_long_column_name_to_test_json_escape text); insert into j1 select repeat('x', x) from generate_series(0,1024)x; vacuum freeze j1; master @ 17a5871d: $ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps; done tps = 364.410386 (without initial connection time) tps = 367.914165 (without initial connection time) tps = 365.794513 (without initial connection time) master + v5-0001 $ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps; done tps = 683.570613 (without initial connection time) tps = 685.206578 (without initial connection time) tps = 679.014056 (without initial connection time) ## Test 2 (10KB JSON strings) create table j1 (very_long_column_name_to_test_json_escape text); insert into j1 select repeat('x', 1024*10) from generate_series(0,1024)x; vacuum freeze j1; master @ 17a5871d: $ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps; done tps = 23.872630 (without initial connection time) tps = 26.232014 (without initial connection time) tps = 26.495739 (without initial connection time) master + v5-0001 $ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps; done tps = 96.813515 (without initial connection time) tps = 96.023632 (without initial connection time) tps = 99.630428 (without initial connection time) master + v5-0001 ESCAPE_JSON_MAX_LOOKHEAD 512 $ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps; done tps = 83.597442 (without initial connection time) tps = 85.045554 (without initial connection time) tps = 82.105907 (without initial connection time) ## Test 3 (1MB JSON strings) create table j1 (very_long_column_name_to_test_json_escape text); insert into j1 select repeat('x', 1024*1024) from generate_series(0,10)x; vacuum freeze j1; master @ 17a5871d: $ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps; done tps = 18.885922 (without initial connection time) tps = 18.829701 (without initial connection time) tps = 18.889369 (without initial connection time) master v5-0001 $ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps; done tps = 37.464967 (without initial connection time) tps = 37.536676 (without initial connection time) tps = 37.561387 (without initial connection time) master + v5-0001 ESCAPE_JSON_MAX_LOOKHEAD 512 $ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps; done tps = 48.296320 (without initial connection time) tps = 48.118151 (without initial connection time) tps = 48.507530 (without initial connection time) David
Attachment
pgsql-hackers by date: