Re: making the backend's json parser work in frontend code - Mailing list pgsql-hackers

From Andres Freund
Subject Re: making the backend's json parser work in frontend code
Date
Msg-id 20200115234046.lb4oeyvv7qf7h6vg@alap3.anarazel.de
Whole thread Raw
In response to making the backend's json parser work in frontend code  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: making the backend's json parser work in frontend code  (Robert Haas <robertmhaas@gmail.com>)
Re: making the backend's json parser work in frontend code  (David Steele <david@pgmasters.net>)
List pgsql-hackers
Hi,

On 2020-01-15 16:02:49 -0500, Robert Haas wrote:
> The discussion on the backup manifest thread has gotten bogged down on
> the issue of the format that should be used to store the backup
> manifest file. I want something simple and ad-hoc; David Steele and
> Stephen Frost prefer JSON. That is problematic because our JSON parser
> does not work in frontend code, and I want to be able to validate a
> backup against its manifest, which involves being able to parse the
> manifest from frontend code. The latest development over there is that
> David Steele has posted the JSON parser that he wrote for pgbackrest
> with an offer to try to adapt it for use in front-end PostgreSQL code,
> an offer which I genuinely appreciate. I'll write more about that over
> on that thread.

I'm not sure where I come down between using json and a simple ad-hoc
format, when the dependency for the former is making the existing json
parser work in the frontend. But if the alternative is to add a second
json parser, it very clearly shifts towards using an ad-hoc
format. Having to maintain a simple ad-hoc parser is a lot less
technical debt than having a second full blown json parser. Imo even
when an external project or three also has to have that simple parser.

If the alternative were to use that newly proposed json parser to
*replace* the backend one too, the story would again be different.


> 0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
> missing something, this seems like an overdue cleanup. It's long been
> the case that wchar.c is actually compiled and linked into both
> frontend and backend code. Commit
> 60f11b87a2349985230c08616fa8a34ffde934c8 added code into src/common
> that depends on wchar.c being available, but didn't actually make
> wchar.c part of src/common, which seems like an odd decision: the
> functions in the library are dependent on code that is not part of any
> library but whose source files get copied around where needed. Eh?

Cool.


> 0002 does some basic header cleanup to make it possible to include the
> existing header file jsonapi.h in frontend code. The state of the JSON
> headers today looks generally poor. There seems not to have been much
> attempt to get the prototypes for a given source file, say foo.c, into
> a header file with the same name, say foo.h. Also, dependencies
> between various header files seem to be have added somewhat freely.
> This patch does not come close to fixing all that, but I consider it a
> modest down payment on a cleanup that probably ought to be taken
> further.

Yea, this seems like a necessary cleanup (or well, maybe the start of
it).


> 0003 splits json.c into two files, json.c and jsonapi.c. All the
> lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into
> jsonapi.c, while the stuff that pertains to the 'json' data type
> remains in json.c. This also seems like a good cleanup, because to me,
> at least, it's not a great idea to mix together code that is used by
> both the json and jsonb data types as well as other things in the
> system that want to generate or parse json together with things that
> are specific to the 'json' data type.

+1


> On the other hand, 0004, 0005, and 0006 are charitably described as
> experimental or WIP.  0004 and 0005 hack up jsonapi.c so that it can
> still be compiled even if #include "postgres.h" is changed to #include
> "postgres-fe.h" and 0006 moves it into src/common. Note that I say
> that they make it compile, not work. It's not just untested; it's
> definitely broken. But it gives a feeling for what the remaining
> obstacles to making this code available in a frontend environment are.
> Since I wrote my very first email complaining about the difficulty of
> making the backend's JSON parser work in a frontend environment, one
> obstacle has been knocked down: StringInfo is now available in
> front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
> remaining problems (that I know about) have to do with error reporting
> and multibyte character support; a read of the patches is suggested
> for those wanting further details.

> From d05e1fc82a51cb583a0367e72b1afc0de561dd00 Mon Sep 17 00:00:00 2001
> From: Robert Haas <rhaas@postgresql.org>
> Date: Wed, 15 Jan 2020 10:36:52 -0500
> Subject: [PATCH 4/6] Introduce json_error() macro.
> 
> ---
>  src/backend/utils/adt/jsonapi.c | 221 +++++++++++++-------------------
>  1 file changed, 90 insertions(+), 131 deletions(-)
> 
> diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
> index fc8af9f861..20f7f0f7ac 100644
> --- a/src/backend/utils/adt/jsonapi.c
> +++ b/src/backend/utils/adt/jsonapi.c
> @@ -17,6 +17,9 @@
>  #include "miscadmin.h"
>  #include "utils/jsonapi.h"
>  
> +#define json_error(rest) \
> +    ereport(ERROR, (rest, report_json_context(lex)))
> +

It's not obvious why the better approach here wouldn't be to just have a
very simple ereport replacement, that needs to be explicitly included
from frontend code. It'd not be meaningfully harder, imo, and it'd
require fewer adaptions, and it'd look more familiar.



>  /* the null action object used for pure validation */
> @@ -701,7 +735,11 @@ json_lex_string(JsonLexContext *lex)
>                          ch = (ch * 16) + (*s - 'A') + 10;
>                      else
>                      {
> +#ifdef FRONTEND
> +                        lex->token_terminator = s + PQmblen(s, PG_UTF8);
> +#else
>                          lex->token_terminator = s + pg_mblen(s);
> +#endif

If we were to go this way, it seems like the ifdef should rather be in a
helper function, rather than all over. It seems like it should be
unproblematic to have a common interface for both frontend/backend?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Enabling B-Tree deduplication by default
Next
From: Jesse Zhang
Date:
Subject: Re: Use compiler intrinsics for bit ops in hash