Re: WIP Incremental JSON Parser - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: WIP Incremental JSON Parser
Date
Msg-id 1a4d8a4c-ca76-4d58-8d8e-608ad8c6e263@dunslane.net
Whole thread Raw
In response to Re: WIP Incremental JSON Parser  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: WIP Incremental JSON Parser
List pgsql-hackers
On 2024-04-09 Tu 07:54, Andrew Dunstan wrote:
>
>
> On 2024-04-09 Tu 01:23, Michael Paquier wrote:
>> On Tue, Apr 09, 2024 at 09:48:18AM +0900, Michael Paquier wrote:
>>> There is no direct check on test_json_parser_perf.c, either, only a
>>> custom rule in the Makefile without specifying something for meson.
>>> So it looks like you could do short execution check in a TAP test, at
>>> least.
>
>
> Not adding a test for that was deliberate - any sane test takes a 
> while, and I didn't want to spend that much time on it every time 
> someone runs "make check-world" or equivalent. However, adding a test 
> to run it with a trivial number of iterations seems reasonable, so 
> I'll add that. I'll also add a meson target for the binary.
>
>
>> While reading the code, I've noticed a few things, as well:
>>
>> +   /* max delicious line length is less than this */
>> +   char        buff[6001];
>>
>> Delicious applies to the contents, nothing to do with this code even
>> if I'd like to think that these lines of code are edible and good.
>> Using a hardcoded limit for some JSON input sounds like a bad idea to
>> me even if this is a test module.  Comment that applies to both the
>> perf and the incremental tools.  You could use a #define'd buffer size
>> for readability rather than assuming this value in many places.
>
>
> The comment is a remnant of when I hadn't yet added support for 
> incomplete tokens, and so I had to parse the input line by line. I 
> agree it can go, and we can use a manifest constant for the buffer size.
>
>
>> +++ b/src/test/modules/test_json_parser/test_json_parser_incremental.c
>> + * This progam tests incremental parsing of json. The input is fed into
>> + * full range of incement handling, especially in the lexer, is exercised.
>> +++ b/src/test/modules/test_json_parser/test_json_parser_perf.c
>> + *    Performancet est program for both flavors of the JSON parser
>> + * This progam tests either the standard (recursive descent) JSON parser
>> +++ b/src/test/modules/test_json_parser/README
>> +  reads in a file and pases it in very small chunks (60 bytes at a time) to
>>
>> Collection of typos across various files.
>
>
> Will fix. (The older I get the more typos I seem to make and the 
> harder it is to notice them. It's very annoying.)
>
>
>> +       appendStringInfoString(&json, "1+23 trailing junk");
>> What's the purpose here?  Perhaps the intention should be documented
>> in a comment?
>
>
> The purpose is to ensure that if there is not a trailing '\0' on the 
> json chunk the parser will still do the right thing. I'll add a 
> comment to that effect.
>
>
>> At the end, having a way to generate JSON blobs randomly to test this
>> stuff would be more appealing than what you have currently, with
>> perhaps a few factors like:
>> - Array and simple object density.
>> - Max Level of nesting.
>> - Limitation to ASCII characters that can be escaped.
>> - Perhaps more things I cannot think about?
>
>
> No, I disagree. Maybe we need those things as well, but we do need a 
> static test where we can test the output against known results. I have 
> no objection to changing the input and output files.
>
> It's worth noting that t/002_inline.pl does generate some input and 
> test e.g., the maximum nesting levels among other errors. Perhaps you 
> missed that. If you think we need more tests there adding them would 
> be extremely simple.
>
>
>> So the current state of things is kind of disappointing, and the size
>> of the data set added to the tree is not that portable either if you
>> want to test various scenarios depending on the data set.  It seems to
>> me that this has been committed too hastily and that this is not ready
>> for integration, even if that's just a test module.  Tom also has
>> shared some concerns upthread, as far as I can see.
>
>
> I have posted a patch already that addresses the issue Tom raised.
>
>


Here's a consolidated set of cleanup patches, including the memory leak 
patch and Jacob's shrink-tiny patch.

I think the biggest open issue is whether or not we remove the 
performance test program.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: simplehash.h: "SH_SCOPE static" causes warnings
Next
From: Tom Lane
Date:
Subject: Re: macOS Ventura won't generate core dumps