Re: making the backend's json parser work in frontend code - Mailing list pgsql-hackers

From Robert Haas
Subject Re: making the backend's json parser work in frontend code
Date
Msg-id CA+TgmoauCHfN_7+mT9rn9VSSMU55PpXEbLwMZU5cZW=5Je5J5w@mail.gmail.com
Whole thread Raw
In response to Re: making the backend's json parser work in frontend code  (David Steele <david@pgmasters.net>)
Responses Re: making the backend's json parser work in frontend code  (David Steele <david@pgmasters.net>)
List pgsql-hackers
On Fri, Jan 17, 2020 at 12:36 PM David Steele <david@pgmasters.net> wrote:
> It seems to work just fine.  I didn't stress it too hard but I did put
> in one escape and a multi-byte character and check the various data types.

Cool.

> I used the callbacks because that's the first method I found but it
> seems like json_lex() might be easier to use in practice.

Ugh, really? That doesn't seem like it would be nice at all.

> I think it's an issue that the entire string must be passed to the lexer
> at once.  That will not be great for large manifests.  However, I don't
> think it will be all that hard to implement an optional "want more"
> callback in the lexer so JSON data can be fed in from the file in chunks.

I thought so initially, but now I'm not so sure. The thing is, you
actually need all the manifest data in memory at once anyway, or so I
think. You're essentially doing a "full join" between the contents of
the manifest and the contents of the file system, so you've got to
scan one (probably the filesystem) and then mark entries in the other
(probably the manifest) used as you go.

But this might need more thought. The details probably depend on
exactly how you design it all.

> So, that just leaves ereport() as the largest remaining issue?  I'll
> look at that today and Tuesday and see what I can work up.

PFA my work on that topic. As compared with my previous patch series,
the previous 0001 is dropped and what are now 0001 and 0002 are the
same as patches from the previous series. 0003 and 0004 are aiming
toward getting rid of ereport() and, I believe, show a plausible
strategy for so doing. There are, possibly, things not to like here,
and it's certainly incomplete, but I think I kinda like this
direction. Comments appreciated.

0003 nukes lex_accept(), inlining the logic into callers. I found that
the refactoring I wanted to do in 0004 was pretty hard without this,
and it turns out to save code, so I think this is a good idea
independently of anything else.

0004 adjusts many functions in jsonapi.c to return a new enumerated
type, JsonParseErrorType, instead of directly doing ereport(). It adds
a new function that takes this value and a lexing context and throws
an error. The JSON_ESCAPING_INVALID case is wrong and involves a gross
hack, but that's fixable with another field in the lexing context.
More work is needed to really bring this up to scratch, but the idea
is to make this code have a soft dependency on ereport() rather than a
hard one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Avoid full GIN index scan when possible
Next
From: Alexander Korotkov
Date:
Subject: Re: Avoid full GIN index scan when possible