Thread: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Tom Lane

Date:

12 June 2012, 15:41:07

Robert Haas <rhaas@postgresql.org> writes:
> Mark JSON error detail messages for translation.
> Per gripe from Tom Lane.

OK, that was one generically bad thing about json.c's error messages.
The other thing that was bothering me was the style of the errdetail
strings, eg
               ereport(ERROR,                       (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 errmsg("invalid input syntax for type json"),                        errdetail("line %d: Invalid escape \"\\%s\".",
                             lex->line_number, extract_mb_char(s))));

I'm not thrilled with the "line %d:" prefixes.  That seems to me to
violate the letter and spirit of the guideline about making errdetail
messages be complete sentences.  Furthermore, the line number is not
the most important part of the detail message, if indeed it has any
usefulness at all which is debatable.  (It's certainly not going to
be tremendously useful when the error report doesn't show any of the
specific input string being complained of.)

Also, a minority of the errors report the whole input string:
       ereport(ERROR,               (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),                errmsg("invalid input
syntaxfor type json: \"%s\"",                       lex->input),                errdetail("The input string ended
unexpectedly.")));

And then there are some that provide the whole input string AND a line
number, just in case you thought there was any intentional design
decision there.

A minimum change IMO would be to recast the detail messages as complete
sentences, say "The escape sequence \"\\%s\" in line %d is invalid."
But I'd like a better-thought-out solution to identifying the location
of the error.

I notice that there's an unfinished attempt to maintain a line_start
pointer; if that were carried through, we could imagine printing the
current line up to the point of an error, which might provide a
reasonable balance between verbosity and insufficient context.
As an example, instead of

regression=# select '{"x":1,
z  "y":"txt1"}'::json;
ERROR:  invalid input syntax for type json
LINE 1: select '{"x":1,              ^
DETAIL:  line 2: Token "z" is invalid.

maybe we could print

regression=# select '{"x":1,
z  "y":"txt1"}'::json;
ERROR:  invalid input syntax for type json
LINE 1: select '{"x":1,              ^
DETAIL:  Token "z" is invalid in line 2: "z".

or perhaps better let it run to the end of the line:

regression=# select '{"x":1,
z  "y":"txt1"}'::json;
ERROR:  invalid input syntax for type json
LINE 1: select '{"x":1,              ^
DETAIL:  Token "z" is invalid in line 2: "z  "y":"txt1"}".

Thoughts?
        regards, tom lane

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Robert Haas

Date:

12 June 2012, 15:49:24

On Tue, Jun 12, 2012 at 2:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <rhaas@postgresql.org> writes:
>> Mark JSON error detail messages for translation.
>> Per gripe from Tom Lane.
>
> OK, that was one generically bad thing about json.c's error messages.
> The other thing that was bothering me was the style of the errdetail
> strings, eg
>
>                ereport(ERROR,
>                        (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
>                         errmsg("invalid input syntax for type json"),
>                         errdetail("line %d: Invalid escape \"\\%s\".",
>                                   lex->line_number, extract_mb_char(s))));
>
> I'm not thrilled with the "line %d:" prefixes.  That seems to me to
> violate the letter and spirit of the guideline about making errdetail
> messages be complete sentences.  Furthermore, the line number is not
> the most important part of the detail message, if indeed it has any
> usefulness at all which is debatable.  (It's certainly not going to
> be tremendously useful when the error report doesn't show any of the
> specific input string being complained of.)

I am amenable to rephrasing the errdetail to put the line number
elsewhere in the string, but I am strongly opposed to getting rid of
it.  JSON blobs can easily run to dozens or hundreds of lines, and you
will want to know the line number.  Knowing the contents of the line
will in many cases be LESS useful, because the line might contain,
say, a single closing bracket.  That's not gonna help much if it
happens many times in the input value, which is not unlikely.

> Also, a minority of the errors report the whole input string:
>
>        ereport(ERROR,
>                (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
>                 errmsg("invalid input syntax for type json: \"%s\"",
>                        lex->input),
>                 errdetail("The input string ended unexpectedly.")));
>
> And then there are some that provide the whole input string AND a line
> number, just in case you thought there was any intentional design
> decision there.

I did try.

> A minimum change IMO would be to recast the detail messages as complete
> sentences, say "The escape sequence \"\\%s\" in line %d is invalid."
> But I'd like a better-thought-out solution to identifying the location
> of the error.
>
> I notice that there's an unfinished attempt to maintain a line_start
> pointer; if that were carried through, we could imagine printing the
> current line up to the point of an error, which might provide a
> reasonable balance between verbosity and insufficient context.
> As an example, instead of
>
> regression=# select '{"x":1,
> z  "y":"txt1"}'::json;
> ERROR:  invalid input syntax for type json
> LINE 1: select '{"x":1,
>               ^
> DETAIL:  line 2: Token "z" is invalid.
>
> maybe we could print
>
> regression=# select '{"x":1,
> z  "y":"txt1"}'::json;
> ERROR:  invalid input syntax for type json
> LINE 1: select '{"x":1,
>               ^
> DETAIL:  Token "z" is invalid in line 2: "z".
>
> or perhaps better let it run to the end of the line:
>
> regression=# select '{"x":1,
> z  "y":"txt1"}'::json;
> ERROR:  invalid input syntax for type json
> LINE 1: select '{"x":1,
>               ^
> DETAIL:  Token "z" is invalid in line 2: "z  "y":"txt1"}".
>
> Thoughts?

I'm not sure I find that an improvement, but I'm open to what other
people think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Tom Lane

Date:

12 June 2012, 17:52:38

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 12, 2012 at 2:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm not thrilled with the "line %d:" prefixes.  That seems to me to
>> violate the letter and spirit of the guideline about making errdetail
>> messages be complete sentences.  Furthermore, the line number is not
>> the most important part of the detail message, if indeed it has any
>> usefulness at all which is debatable.  (It's certainly not going to
>> be tremendously useful when the error report doesn't show any of the
>> specific input string being complained of.)

> I am amenable to rephrasing the errdetail to put the line number
> elsewhere in the string, but I am strongly opposed to getting rid of
> it.  JSON blobs can easily run to dozens or hundreds of lines, and you
> will want to know the line number.

Agreed, the input strings could be quite large, but in that case we
definitely don't want to be including the whole input in the primary
error message (which is supposed to be short).  I doubt it'd even be
a good idea to try to put it into errdetail; thus I'm thinking about
providing one line only.

> Knowing the contents of the line
> will in many cases be LESS useful, because the line might contain,
> say, a single closing bracket.

True, but I don't think the line number alone is adequate.  There are
too many situations where it's not entirely clear what value is being
complained of.  (Considering only syntax errors thrown from literal
constants in SQL commands is quite misleading about the requirements
here.)  So I think we need to include at least some of the input in
the error.

>> I notice that there's an unfinished attempt to maintain a line_start
>> pointer; if that were carried through, we could imagine printing the
>> current line up to the point of an error, which might provide a
>> reasonable balance between verbosity and insufficient context.
>> ...
>> or perhaps better let it run to the end of the line:

> I'm not sure I find that an improvement, but I'm open to what other
> people think.

Anybody here besides the crickets?
        regards, tom lane

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Alvaro Herrera

Date:

12 June 2012, 18:11:05

Excerpts from Tom Lane's message of mar jun 12 16:52:20 -0400 2012:
> Robert Haas <robertmhaas@gmail.com> writes:

> >> I notice that there's an unfinished attempt to maintain a line_start
> >> pointer; if that were carried through, we could imagine printing the
> >> current line up to the point of an error, which might provide a
> >> reasonable balance between verbosity and insufficient context.
> >> ...
> >> or perhaps better let it run to the end of the line:
>
> > I'm not sure I find that an improvement, but I'm open to what other
> > people think.
>
> Anybody here besides the crickets?

I think providing both partial line contents (so +1 for maintaining
line_start carefully as required) and line number would be useful enough
to track down problems.

I am not sure about the idea of letting the detail run to the end of the
line; that would be problematic should the line be long (there might not
be newlines in the literal at all, which is not that unusual).  I think
it should be truncated at, say, 76 chars or so.

For the case where you have a single } in a line, this isn't all that
helpful; we could consider printing the previous line as well.  But if
you end up with
      }   }

then it's not that helpful either.  I am not sure.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Tom Lane

Date:

12 June 2012, 18:27:02

Alvaro Herrera <alvherre@commandprompt.com> writes:
> I am not sure about the idea of letting the detail run to the end of the
> line; that would be problematic should the line be long (there might not
> be newlines in the literal at all, which is not that unusual).  I think
> it should be truncated at, say, 76 chars or so.

Yeah, I was wondering about trying to provide a given amount of context
instead of fixing it to "one line".  We could do something like

(1) back up N characters;
(2) find the next newline, if there is one at least M characters before
the error point;
(3) print from there to the error point.

This would give between M and N characters of context, except when the
error point is less than M characters from the start of the input.  Not
sure how to display such text together with a line number though; with a
multi-line fragment it would not be obvious which part the line number
refers to.  (Backing up over multibyte data might be a bit complicated
too, but I assume we can think of something workable for that.)
        regards, tom lane

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Tom Lane

Date:

12 June 2012, 22:53:17

I wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> I am not sure about the idea of letting the detail run to the end of the
>> line; that would be problematic should the line be long (there might not
>> be newlines in the literal at all, which is not that unusual).  I think
>> it should be truncated at, say, 76 chars or so.

> Yeah, I was wondering about trying to provide a given amount of context
> instead of fixing it to "one line".  We could do something like
> (1) back up N characters;
> (2) find the next newline, if there is one at least M characters before
> the error point;
> (3) print from there to the error point.

After experimenting with this for awhile I concluded that the above is
overcomplicated, and that we might as well just print up to N characters
of context; in most input, the line breaks are far enough apart that
preferentially breaking at them just leads to not having very much
context.  Also, it seems like it might be a good idea to present the
input as a CONTEXT line, because that provides more space; you can fit
50 or so characters of data without overrunning standard display width.
This gives me output like

regression=# select '{"unique1":8800,"unique2":0,"two":0,"four":0,"ten":0,"twenty":0,"hundred":0,"thousand":800,
"twothous
and":800,"fivethous":3800,"tenthous":8800,"odd":0,"even":1,
"stringu1":"MAAAAA","stringu2":"AAAAAA","string4":"AAAAxx"}'
::json;
ERROR:  invalid input syntax for type json
LINE 1: select '{"unique1":8800,"unique2":0,"two":0,"four":0,"ten":0...              ^
DETAIL:  Character with value "0x0a" must be escaped.
CONTEXT:  JSON data, line 1: ..."twenty":0,"hundred":0,"thousand":800,
"twothous

regression=# 

I can't give too many examples because I've only bothered to context-ify
this single error case as yet ;-) ... but does this seem like a sane way
to go?

The code for this is as attached.  Note that I'd rip out the normal-path
tracking of line boundaries; it seems better to have a second scan of
the data in the error case and save the cycles in non-error cases.

           ...           ereport(ERROR,                   (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
   errmsg("invalid input syntax for type json"),                    errdetail("Character with value \"0x%02x\" must be
escaped.",                             (unsigned char) *s),                    report_json_context(lex)));
...

/** Report a CONTEXT line for bogus JSON input.** The return value isn't meaningful, but we make it non-void so that
this*can be invoked inside ereport().*/

static int
report_json_context(JsonLexContext *lex)
{   char       *context_start;   char       *context_end;   char       *line_start;   int         line_number;   char
   *ctxt;   int         ctxtlen;

   /* Choose boundaries for the part of the input we will display */   context_start = lex->input;   context_end =
lex->token_terminator;  line_start = context_start;   line_number = 1;   for (;;)   {       /* Always advance over a
newline,unless it's the current token */       if (*context_start == '\n' && context_start < lex->token_start)       {
        context_start++;           line_start = context_start;           line_number++;           continue;       }
 /* Otherwise, done as soon as we are close enough to context_end */       if (context_end - context_start < 50)
  break;       /* Advance to next multibyte character */       if (IS_HIGHBIT_SET(*context_start))
context_start+= pg_mblen(context_start);       else           context_start++;   }

   /* Get a null-terminated copy of the data to present */   ctxtlen = context_end - context_start;   ctxt =
palloc(ctxtlen+ 1);   memcpy(ctxt, context_start, ctxtlen);   ctxt[ctxtlen] = '\0';

   return errcontext("JSON data, line %d: %s%s",                     line_number,                     (context_start >
line_start)? "..." : "",                     ctxt);

}

        regards, tom lane

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Robert Haas

Date:

12 June 2012, 23:07:58

On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The code for this is as attached.  Note that I'd rip out the normal-path
> tracking of line boundaries; it seems better to have a second scan of
> the data in the error case and save the cycles in non-error cases.

Really?!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Tom Lane

Date:

13 June 2012, 11:35:54

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The code for this is as attached. �Note that I'd rip out the normal-path
>> tracking of line boundaries; it seems better to have a second scan of
>> the data in the error case and save the cycles in non-error cases.

> Really?!

Um ... do you have a problem with that idea, and if so what?  It would
be considerably more complicated to do it without a second pass.
        regards, tom lane

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Robert Haas

Date:

13 June 2012, 12:03:54

On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The code for this is as attached.  Note that I'd rip out the normal-path
>>> tracking of line boundaries; it seems better to have a second scan of
>>> the data in the error case and save the cycles in non-error cases.
>
>> Really?!
>
> Um ... do you have a problem with that idea, and if so what?  It would
> be considerably more complicated to do it without a second pass.

Could you explain how it's broken now, and why it will be hard to fix?People may well want to use a cast to JSON within
anexception block 
as a way of testing whether strings are valid JSON.  We should not
assume that the cost of an exception is totally irrelevant, because
this might be iterated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Andres Freund

Date:

13 June 2012, 12:06:26

On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote:
> On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> The code for this is as attached.  Note that I'd rip out the
> >>> normal-path tracking of line boundaries; it seems better to have a
> >>> second scan of the data in the error case and save the cycles in
> >>> non-error cases.
> >> 
> >> Really?!
> > 
> > Um ... do you have a problem with that idea, and if so what?  It would
> > be considerably more complicated to do it without a second pass.
> 
> Could you explain how it's broken now, and why it will be hard to fix?
>  People may well want to use a cast to JSON within an exception block
> as a way of testing whether strings are valid JSON.  We should not
> assume that the cost of an exception is totally irrelevant, because
> this might be iterated.
Exception blocks/subtransctions already are considerably expensive. I have a 
hard time believing this additional cost would be measureable.

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Robert Haas

Date:

13 June 2012, 12:18:48

On Wed, Jun 13, 2012 at 11:06 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote:
>> On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Robert Haas <robertmhaas@gmail.com> writes:
>> >> On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >>> The code for this is as attached.  Note that I'd rip out the
>> >>> normal-path tracking of line boundaries; it seems better to have a
>> >>> second scan of the data in the error case and save the cycles in
>> >>> non-error cases.
>> >>
>> >> Really?!
>> >
>> > Um ... do you have a problem with that idea, and if so what?  It would
>> > be considerably more complicated to do it without a second pass.
>>
>> Could you explain how it's broken now, and why it will be hard to fix?
>>  People may well want to use a cast to JSON within an exception block
>> as a way of testing whether strings are valid JSON.  We should not
>> assume that the cost of an exception is totally irrelevant, because
>> this might be iterated.
> Exception blocks/subtransctions already are considerably expensive. I have a
> hard time believing this additional cost would be measureable.

According to my testing, the main cost of an exception block catching
a division-by-zero error is that of generating the error message,
primarily sprintf()-type stuff.  The cost of scanning a multi-megabyte
string seems likely to be much larger.

Mind you, I'm not going to spend a lot of time crying into my beer if
it turns out that there's no other reasonable way to implement this,
but I do think that it's entirely appropriate to ask why it's not
possible to do better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Andres Freund

Date:

13 June 2012, 12:22:38

On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote:
> On Wed, Jun 13, 2012 at 11:06 AM, Andres Freund <andres@2ndquadrant.com> 
wrote:
> > On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote:
> >> On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> > Robert Haas <robertmhaas@gmail.com> writes:
> >> >> On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> >>> The code for this is as attached.  Note that I'd rip out the
> >> >>> normal-path tracking of line boundaries; it seems better to have a
> >> >>> second scan of the data in the error case and save the cycles in
> >> >>> non-error cases.
> >> >> 
> >> >> Really?!
> >> > 
> >> > Um ... do you have a problem with that idea, and if so what?  It would
> >> > be considerably more complicated to do it without a second pass.
> >> 
> >> Could you explain how it's broken now, and why it will be hard to fix?
> >>  People may well want to use a cast to JSON within an exception block
> >> as a way of testing whether strings are valid JSON.  We should not
> >> assume that the cost of an exception is totally irrelevant, because
> >> this might be iterated.
> > 
> > Exception blocks/subtransctions already are considerably expensive. I
> > have a hard time believing this additional cost would be measureable.
> 
> According to my testing, the main cost of an exception block catching
> a division-by-zero error is that of generating the error message,
> primarily sprintf()-type stuff.  The cost of scanning a multi-megabyte
> string seems likely to be much larger.
True. I ignored that there doesn't have to be an xid assigned yet... I still 
think its not very relevant though.

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Tom Lane

Date:

13 June 2012, 12:56:00

Andres Freund <andres@2ndquadrant.com> writes:
> On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote:
>> According to my testing, the main cost of an exception block catching
>> a division-by-zero error is that of generating the error message,
>> primarily sprintf()-type stuff.  The cost of scanning a multi-megabyte
>> string seems likely to be much larger.

> True. I ignored that there doesn't have to be an xid assigned yet... I still 
> think its not very relevant though.

I don't think it's relevant either.  In the first place, I don't think
that errors occurring "multi megabytes" into a JSON blob are going to
occur often enough to be performance critical.  In the second place,
removing cycles from the non-error case is worth something too, probably
more than making the error case faster is worth.

In any case, the proposed scheme for providing context requires that
you know where the error is before you can identify the context.  I
considered schemes that would keep track of the last N characters or
line breaks in case one of them proved to be the one we need.  That
would add enough cycles to non-error cases to almost certainly not be
desirable.  I also considered trying to back up, but that doesn't look
promising either for arbitrary multibyte encodings.
        regards, tom lane

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Robert Haas

Date:

13 June 2012, 13:00:57

On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>> On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote:
>>> According to my testing, the main cost of an exception block catching
>>> a division-by-zero error is that of generating the error message,
>>> primarily sprintf()-type stuff.  The cost of scanning a multi-megabyte
>>> string seems likely to be much larger.
>
>> True. I ignored that there doesn't have to be an xid assigned yet... I still
>> think its not very relevant though.
>
> I don't think it's relevant either.  In the first place, I don't think
> that errors occurring "multi megabytes" into a JSON blob are going to
> occur often enough to be performance critical.  In the second place,
> removing cycles from the non-error case is worth something too, probably
> more than making the error case faster is worth.
>
> In any case, the proposed scheme for providing context requires that
> you know where the error is before you can identify the context.  I
> considered schemes that would keep track of the last N characters or
> line breaks in case one of them proved to be the one we need.  That
> would add enough cycles to non-error cases to almost certainly not be
> desirable.  I also considered trying to back up, but that doesn't look
> promising either for arbitrary multibyte encodings.

Oh, I see.  :-(

Well, I guess I'll have to suck it up, then.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Tom Lane

Date:

13 June 2012, 13:27:50

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In any case, the proposed scheme for providing context requires that
>> you know where the error is before you can identify the context.  I
>> considered schemes that would keep track of the last N characters or
>> line breaks in case one of them proved to be the one we need.  That
>> would add enough cycles to non-error cases to almost certainly not be
>> desirable.  I also considered trying to back up, but that doesn't look
>> promising either for arbitrary multibyte encodings.

> Oh, I see.  :-(

Attached is a complete proposed patch for this, with some further
adjustments to make the output look a bit more like what we use for
SQL error context printouts (in particular, "..." at both ends of the
excerpt when appropriate).

One thing I did not touch, but am less than happy with, is the wording
of this detail message:

        errdetail("Expected string, number, object, array, true, false, or null, but found \"%s\".",
                  token),

This seems uselessly verbose to me.  It could be argued that enumerating
all the options is helpful to rank JSON novices ... but unless you know
exactly what an "object" is and why that's different from the other
options, it's not all that helpful.  I'm inclined to think that
something like this would be better:

        errdetail("Expected JSON value, but found \"%s\".",
                  token),

Thoughts?

            regards, tom lane


diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index e79c294..59335db 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** typedef struct                    /* state of JSON lexe
*** 43,50 ****
      char       *token_start;    /* start of current token within input */
      char       *token_terminator; /* end of previous or current token */
      JsonValueType token_type;    /* type of current token, once it's known */
-     int            line_number;    /* current line number (counting from 1) */
-     char       *line_start;        /* start of current line within input (BROKEN!!) */
  } JsonLexContext;

  typedef enum                    /* states of JSON parser */
--- 43,48 ----
*************** static void json_lex_string(JsonLexConte
*** 78,83 ****
--- 76,82 ----
  static void json_lex_number(JsonLexContext *lex, char *s);
  static void report_parse_error(JsonParseStack *stack, JsonLexContext *lex);
  static void report_invalid_token(JsonLexContext *lex);
+ static int report_json_context(JsonLexContext *lex);
  static char *extract_mb_char(char *s);
  static void composite_to_json(Datum composite, StringInfo result,
                                bool use_line_feeds);
*************** json_validate_cstring(char *input)
*** 185,192 ****
      /* Set up lexing context. */
      lex.input = input;
      lex.token_terminator = lex.input;
-     lex.line_number = 1;
-     lex.line_start = input;

      /* Set up parse stack. */
      stacksize = 32;
--- 184,189 ----
*************** json_lex(JsonLexContext *lex)
*** 335,345 ****
      /* Skip leading whitespace. */
      s = lex->token_terminator;
      while (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r')
-     {
-         if (*s == '\n')
-             lex->line_number++;
          s++;
-     }
      lex->token_start = s;

      /* Determine token type. */
--- 332,338 ----
*************** json_lex(JsonLexContext *lex)
*** 350,356 ****
          {
              /* End of string. */
              lex->token_start = NULL;
!             lex->token_terminator = NULL;
          }
          else
          {
--- 343,349 ----
          {
              /* End of string. */
              lex->token_start = NULL;
!             lex->token_terminator = s;
          }
          else
          {
*************** json_lex(JsonLexContext *lex)
*** 397,403 ****
              /*
               * We got some sort of unexpected punctuation or an otherwise
               * unexpected character, so just complain about that one
!              * character.
               */
              lex->token_terminator = s + 1;
              report_invalid_token(lex);
--- 390,397 ----
              /*
               * We got some sort of unexpected punctuation or an otherwise
               * unexpected character, so just complain about that one
!              * character.  (It can't be multibyte because the above loop
!              * will advance over any multibyte characters.)
               */
              lex->token_terminator = s + 1;
              report_invalid_token(lex);
*************** json_lex_string(JsonLexContext *lex)
*** 443,453 ****
                  lex->token_terminator = s;
                  report_invalid_token(lex);
              }
              ereport(ERROR,
                      (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                       errmsg("invalid input syntax for type json"),
!                      errdetail("line %d: Character with value \"0x%02x\" must be escaped.",
!                                      lex->line_number, (unsigned char) *s)));
          }
          else if (*s == '\\')
          {
--- 437,449 ----
                  lex->token_terminator = s;
                  report_invalid_token(lex);
              }
+             lex->token_terminator = s + pg_mblen(s);
              ereport(ERROR,
                      (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                       errmsg("invalid input syntax for type json"),
!                      errdetail("Character with value 0x%02x must be escaped.",
!                                (unsigned char) *s),
!                      report_json_context(lex)));
          }
          else if (*s == '\\')
          {
*************** json_lex_string(JsonLexContext *lex)
*** 465,502 ****

                  for (i = 1; i <= 4; i++)
                  {
!                     if (s[i] == '\0')
                      {
!                         lex->token_terminator = s + i;
                          report_invalid_token(lex);
                      }
!                     else if (s[i] >= '0' && s[i] <= '9')
!                         ch = (ch * 16) + (s[i] - '0');
!                     else if (s[i] >= 'a' && s[i] <= 'f')
!                         ch = (ch * 16) + (s[i] - 'a') + 10;
!                     else if (s[i] >= 'A' && s[i] <= 'F')
!                         ch = (ch * 16) + (s[i] - 'A') + 10;
                      else
                      {
                          ereport(ERROR,
                                  (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                   errmsg("invalid input syntax for type json"),
!                                  errdetail("line %d: \"\\u\" must be followed by four hexadecimal digits.",
!                                                     lex->line_number)));
                      }
                  }
-
-                 /* Account for the four additional bytes we just parsed. */
-                 s += 4;
              }
              else if (strchr("\"\\/bfnrt", *s) == NULL)
              {
                  /* Not a valid string escape, so error out. */
                  ereport(ERROR,
                          (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                           errmsg("invalid input syntax for type json"),
!                          errdetail("line %d: Invalid escape \"\\%s\".",
!                                      lex->line_number, extract_mb_char(s))));
              }
          }
      }
--- 461,499 ----

                  for (i = 1; i <= 4; i++)
                  {
!                     s++;
!                     if (*s == '\0')
                      {
!                         lex->token_terminator = s;
                          report_invalid_token(lex);
                      }
!                     else if (*s >= '0' && *s <= '9')
!                         ch = (ch * 16) + (*s - '0');
!                     else if (*s >= 'a' && *s <= 'f')
!                         ch = (ch * 16) + (*s - 'a') + 10;
!                     else if (*s >= 'A' && *s <= 'F')
!                         ch = (ch * 16) + (*s - 'A') + 10;
                      else
                      {
+                         lex->token_terminator = s + pg_mblen(s);
                          ereport(ERROR,
                                  (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                   errmsg("invalid input syntax for type json"),
!                                  errdetail("\"\\u\" must be followed by four hexadecimal digits."),
!                                  report_json_context(lex)));
                      }
                  }
              }
              else if (strchr("\"\\/bfnrt", *s) == NULL)
              {
                  /* Not a valid string escape, so error out. */
+                 lex->token_terminator = s + pg_mblen(s);
                  ereport(ERROR,
                          (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                           errmsg("invalid input syntax for type json"),
!                          errdetail("Invalid string escape \"\\%s\".",
!                                    extract_mb_char(s)),
!                          report_json_context(lex)));
              }
          }
      }
*************** json_lex_number(JsonLexContext *lex, cha
*** 599,666 ****

  /*
   * Report a parse error.
   */
  static void
  report_parse_error(JsonParseStack *stack, JsonLexContext *lex)
  {
!     char       *detail = NULL;
!     char       *token = NULL;
      int            toklen;

      /* Handle case where the input ended prematurely. */
      if (lex->token_start == NULL)
          ereport(ERROR,
                  (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
!                  errmsg("invalid input syntax for type json: \"%s\"",
!                         lex->input),
!                  errdetail("The input string ended unexpectedly.")));

!     /* Separate out the offending token. */
      toklen = lex->token_terminator - lex->token_start;
      token = palloc(toklen + 1);
      memcpy(token, lex->token_start, toklen);
      token[toklen] = '\0';

!     /* Select correct detail message. */
      if (stack == NULL)
!         detail = "line %d: Expected end of input, but found \"%s\".";
      else
      {
          switch (stack->state)
          {
              case JSON_PARSE_VALUE:
!                 detail = "line %d: Expected string, number, object, array, true, false, or null, but found \"%s\".";
                  break;
              case JSON_PARSE_ARRAY_START:
!                 detail = "line %d: Expected array element or \"]\", but found \"%s\".";
                  break;
              case JSON_PARSE_ARRAY_NEXT:
!                 detail = "line %d: Expected \",\" or \"]\", but found \"%s\".";
                  break;
              case JSON_PARSE_OBJECT_START:
!                 detail = "line %d: Expected string or \"}\", but found \"%s\".";
                  break;
              case JSON_PARSE_OBJECT_LABEL:
!                 detail = "line %d: Expected \":\", but found \"%s\".";
                  break;
              case JSON_PARSE_OBJECT_NEXT:
!                 detail = "line %d: Expected \",\" or \"}\", but found \"%s\".";
                  break;
              case JSON_PARSE_OBJECT_COMMA:
!                 detail = "line %d: Expected string, but found \"%s\".";
                  break;
          }
      }
-
-     ereport(ERROR,
-             (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-              errmsg("invalid input syntax for type json: \"%s\"",
-                     lex->input),
-           detail ? errdetail(detail, lex->line_number, token) : 0));
  }

  /*
   * Report an invalid input token.
   */
  static void
  report_invalid_token(JsonLexContext *lex)
--- 596,703 ----

  /*
   * Report a parse error.
+  *
+  * lex->token_start and lex->token_terminator must identify the current token.
   */
  static void
  report_parse_error(JsonParseStack *stack, JsonLexContext *lex)
  {
!     char       *token;
      int            toklen;

      /* Handle case where the input ended prematurely. */
      if (lex->token_start == NULL)
          ereport(ERROR,
                  (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
!                  errmsg("invalid input syntax for type json"),
!                  errdetail("The input string ended unexpectedly."),
!                  report_json_context(lex)));

!     /* Separate out the current token. */
      toklen = lex->token_terminator - lex->token_start;
      token = palloc(toklen + 1);
      memcpy(token, lex->token_start, toklen);
      token[toklen] = '\0';

!     /* Complain, with the appropriate detail message. */
      if (stack == NULL)
!         ereport(ERROR,
!                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
!                  errmsg("invalid input syntax for type json"),
!                  errdetail("Expected end of input, but found \"%s\".",
!                            token),
!                  report_json_context(lex)));
      else
      {
          switch (stack->state)
          {
              case JSON_PARSE_VALUE:
!                 ereport(ERROR,
!                         (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
!                          errmsg("invalid input syntax for type json"),
!                          errdetail("Expected string, number, object, array, true, false, or null, but found \"%s\".",
!                                    token),
!                          report_json_context(lex)));
                  break;
              case JSON_PARSE_ARRAY_START:
!                 ereport(ERROR,
!                         (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
!                          errmsg("invalid input syntax for type json"),
!                          errdetail("Expected array element or \"]\", but found \"%s\".",
!                                    token),
!                          report_json_context(lex)));
                  break;
              case JSON_PARSE_ARRAY_NEXT:
!                 ereport(ERROR,
!                         (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
!                          errmsg("invalid input syntax for type json"),
!                          errdetail("Expected \",\" or \"]\", but found \"%s\".",
!                                    token),
!                          report_json_context(lex)));
                  break;
              case JSON_PARSE_OBJECT_START:
!                 ereport(ERROR,
!                         (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
!                          errmsg("invalid input syntax for type json"),
!                          errdetail("Expected string or \"}\", but found \"%s\".",
!                                    token),
!                          report_json_context(lex)));
                  break;
              case JSON_PARSE_OBJECT_LABEL:
!                 ereport(ERROR,
!                         (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
!                          errmsg("invalid input syntax for type json"),
!                          errdetail("Expected \":\", but found \"%s\".",
!                                    token),
!                          report_json_context(lex)));
                  break;
              case JSON_PARSE_OBJECT_NEXT:
!                 ereport(ERROR,
!                         (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
!                          errmsg("invalid input syntax for type json"),
!                          errdetail("Expected \",\" or \"}\", but found \"%s\".",
!                                    token),
!                          report_json_context(lex)));
                  break;
              case JSON_PARSE_OBJECT_COMMA:
!                 ereport(ERROR,
!                         (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
!                          errmsg("invalid input syntax for type json"),
!                          errdetail("Expected string, but found \"%s\".",
!                                    token),
!                          report_json_context(lex)));
                  break;
+             default:
+                 elog(ERROR, "unexpected json parse state: %d",
+                      (int) stack->state);
          }
      }
  }

  /*
   * Report an invalid input token.
+  *
+  * lex->token_start and lex->token_terminator must identify the token.
   */
  static void
  report_invalid_token(JsonLexContext *lex)
*************** report_invalid_token(JsonLexContext *lex
*** 668,673 ****
--- 705,711 ----
      char       *token;
      int            toklen;

+     /* Separate out the offending token. */
      toklen = lex->token_terminator - lex->token_start;
      token = palloc(toklen + 1);
      memcpy(token, lex->token_start, toklen);
*************** report_invalid_token(JsonLexContext *lex
*** 676,683 ****
      ereport(ERROR,
              (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
               errmsg("invalid input syntax for type json"),
!              errdetail("line %d: Token \"%s\" is invalid.",
!                                 lex->line_number, token)));
  }

  /*
--- 714,793 ----
      ereport(ERROR,
              (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
               errmsg("invalid input syntax for type json"),
!              errdetail("Token \"%s\" is invalid.", token),
!              report_json_context(lex)));
! }
!
! /*
!  * Report a CONTEXT line for bogus JSON input.
!  *
!  * lex->token_terminator must be set to identify the spot where we detected
!  * the error.  Note that lex->token_start might be NULL, in case we recognized
!  * error at EOF.
!  *
!  * The return value isn't meaningful, but we make it non-void so that this
!  * can be invoked inside ereport().
!  */
! static int
! report_json_context(JsonLexContext *lex)
! {
!     const char *context_start;
!     const char *context_end;
!     const char *line_start;
!     int            line_number;
!     char       *ctxt;
!     int            ctxtlen;
!     const char *prefix;
!     const char *suffix;
!
!     /* Choose boundaries for the part of the input we will display */
!     context_start = lex->input;
!     context_end = lex->token_terminator;
!     line_start = context_start;
!     line_number = 1;
!     for (;;)
!     {
!         /* Always advance over newlines (context_end test is just paranoia) */
!         if (*context_start == '\n' && context_start < context_end)
!         {
!             context_start++;
!             line_start = context_start;
!             line_number++;
!             continue;
!         }
!         /* Otherwise, done as soon as we are close enough to context_end */
!         if (context_end - context_start < 50)
!             break;
!         /* Advance to next multibyte character */
!         if (IS_HIGHBIT_SET(*context_start))
!             context_start += pg_mblen(context_start);
!         else
!             context_start++;
!     }
!
!     /*
!      * We add "..." to indicate that the excerpt doesn't start at the
!      * beginning of the line ... but if we're within 3 characters of the
!      * beginning of the line, we might as well just show the whole line.
!      */
!     if (context_start - line_start <= 3)
!         context_start = line_start;
!
!     /* Get a null-terminated copy of the data to present */
!     ctxtlen = context_end - context_start;
!     ctxt = palloc(ctxtlen + 1);
!     memcpy(ctxt, context_start, ctxtlen);
!     ctxt[ctxtlen] = '\0';
!
!     /*
!      * Show the context, prefixing "..." if not starting at start of line, and
!      * suffixing "..." if not ending at end of line.
!      */
!     prefix = (context_start > line_start) ? "..." : "";
!     suffix = (*context_end != '\0' && *context_end != '\n' && *context_end != '\r') ? "..." : "";
!
!     return errcontext("JSON data, line %d: %s%s%s",
!                       line_number, prefix, ctxt, suffix);
  }

  /*
diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out
index 4b1ad89..1f79e05 100644
*** a/src/test/regress/expected/json.out
--- b/src/test/regress/expected/json.out
*************** SELECT $$''$$::json;            -- ERROR, single
*** 9,15 ****
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT $$''$$::json;
                 ^
! DETAIL:  line 1: Token "'" is invalid.
  SELECT '"abc"'::json;            -- OK
   json
  -------
--- 9,16 ----
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT $$''$$::json;
                 ^
! DETAIL:  Token "'" is invalid.
! CONTEXT:  JSON data, line 1: '...
  SELECT '"abc"'::json;            -- OK
   json
  -------
*************** SELECT '"abc'::json;            -- ERROR, quotes
*** 20,32 ****
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '"abc'::json;
                 ^
! DETAIL:  line 1: Token ""abc" is invalid.
  SELECT '"abc
  def"'::json;                    -- ERROR, unescaped newline in string constant
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '"abc
                 ^
! DETAIL:  line 1: Character with value "0x0a" must be escaped.
  SELECT '"\n\"\\"'::json;        -- OK, legal escapes
     json
  ----------
--- 21,36 ----
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '"abc'::json;
                 ^
! DETAIL:  Token ""abc" is invalid.
! CONTEXT:  JSON data, line 1: "abc
  SELECT '"abc
  def"'::json;                    -- ERROR, unescaped newline in string constant
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '"abc
                 ^
! DETAIL:  Character with value 0x0a must be escaped.
! CONTEXT:  JSON data, line 1: "abc
! ...
  SELECT '"\n\"\\"'::json;        -- OK, legal escapes
     json
  ----------
*************** SELECT '"\v"'::json;            -- ERROR, not a v
*** 37,58 ****
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '"\v"'::json;
                 ^
! DETAIL:  line 1: Invalid escape "\v".
  SELECT '"\u"'::json;            -- ERROR, incomplete escape
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '"\u"'::json;
                 ^
! DETAIL:  line 1: "\u" must be followed by four hexadecimal digits.
  SELECT '"\u00"'::json;            -- ERROR, incomplete escape
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '"\u00"'::json;
                 ^
! DETAIL:  line 1: "\u" must be followed by four hexadecimal digits.
  SELECT '"\u000g"'::json;        -- ERROR, g is not a hex digit
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '"\u000g"'::json;
                 ^
! DETAIL:  line 1: "\u" must be followed by four hexadecimal digits.
  SELECT '"\u0000"'::json;        -- OK, legal escape
     json
  ----------
--- 41,66 ----
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '"\v"'::json;
                 ^
! DETAIL:  Invalid string escape "\v".
! CONTEXT:  JSON data, line 1: "\v...
  SELECT '"\u"'::json;            -- ERROR, incomplete escape
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '"\u"'::json;
                 ^
! DETAIL:  "\u" must be followed by four hexadecimal digits.
! CONTEXT:  JSON data, line 1: "\u"
  SELECT '"\u00"'::json;            -- ERROR, incomplete escape
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '"\u00"'::json;
                 ^
! DETAIL:  "\u" must be followed by four hexadecimal digits.
! CONTEXT:  JSON data, line 1: "\u00"
  SELECT '"\u000g"'::json;        -- ERROR, g is not a hex digit
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '"\u000g"'::json;
                 ^
! DETAIL:  "\u" must be followed by four hexadecimal digits.
! CONTEXT:  JSON data, line 1: "\u000g...
  SELECT '"\u0000"'::json;        -- OK, legal escape
     json
  ----------
*************** SELECT '01'::json;                -- ERROR, not vali
*** 82,88 ****
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '01'::json;
                 ^
! DETAIL:  line 1: Token "01" is invalid.
  SELECT '0.1'::json;                -- OK
   json
  ------
--- 90,97 ----
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '01'::json;
                 ^
! DETAIL:  Token "01" is invalid.
! CONTEXT:  JSON data, line 1: 01
  SELECT '0.1'::json;                -- OK
   json
  ------
*************** SELECT '1f2'::json;                -- ERROR
*** 111,127 ****
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '1f2'::json;
                 ^
! DETAIL:  line 1: Token "1f2" is invalid.
  SELECT '0.x1'::json;            -- ERROR
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '0.x1'::json;
                 ^
! DETAIL:  line 1: Token "0.x1" is invalid.
  SELECT '1.3ex100'::json;        -- ERROR
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '1.3ex100'::json;
                 ^
! DETAIL:  line 1: Token "1.3ex100" is invalid.
  -- Arrays.
  SELECT '[]'::json;                -- OK
   json
--- 120,139 ----
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '1f2'::json;
                 ^
! DETAIL:  Token "1f2" is invalid.
! CONTEXT:  JSON data, line 1: 1f2
  SELECT '0.x1'::json;            -- ERROR
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '0.x1'::json;
                 ^
! DETAIL:  Token "0.x1" is invalid.
! CONTEXT:  JSON data, line 1: 0.x1
  SELECT '1.3ex100'::json;        -- ERROR
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '1.3ex100'::json;
                 ^
! DETAIL:  Token "1.3ex100" is invalid.
! CONTEXT:  JSON data, line 1: 1.3ex100
  -- Arrays.
  SELECT '[]'::json;                -- OK
   json
*************** SELECT '[1,2]'::json;            -- OK
*** 142,161 ****
  (1 row)

  SELECT '[1,2,]'::json;            -- ERROR, trailing comma
! ERROR:  invalid input syntax for type json: "[1,2,]"
  LINE 1: SELECT '[1,2,]'::json;
                 ^
! DETAIL:  line 1: Expected string, number, object, array, true, false, or null, but found "]".
  SELECT '[1,2'::json;            -- ERROR, no closing bracket
! ERROR:  invalid input syntax for type json: "[1,2"
  LINE 1: SELECT '[1,2'::json;
                 ^
  DETAIL:  The input string ended unexpectedly.
  SELECT '[1,[2]'::json;            -- ERROR, no closing bracket
! ERROR:  invalid input syntax for type json: "[1,[2]"
  LINE 1: SELECT '[1,[2]'::json;
                 ^
  DETAIL:  The input string ended unexpectedly.
  -- Objects.
  SELECT '{}'::json;                -- OK
   json
--- 154,176 ----
  (1 row)

  SELECT '[1,2,]'::json;            -- ERROR, trailing comma
! ERROR:  invalid input syntax for type json
  LINE 1: SELECT '[1,2,]'::json;
                 ^
! DETAIL:  Expected string, number, object, array, true, false, or null, but found "]".
! CONTEXT:  JSON data, line 1: [1,2,]
  SELECT '[1,2'::json;            -- ERROR, no closing bracket
! ERROR:  invalid input syntax for type json
  LINE 1: SELECT '[1,2'::json;
                 ^
  DETAIL:  The input string ended unexpectedly.
+ CONTEXT:  JSON data, line 1: [1,2
  SELECT '[1,[2]'::json;            -- ERROR, no closing bracket
! ERROR:  invalid input syntax for type json
  LINE 1: SELECT '[1,[2]'::json;
                 ^
  DETAIL:  The input string ended unexpectedly.
+ CONTEXT:  JSON data, line 1: [1,[2]
  -- Objects.
  SELECT '{}'::json;                -- OK
   json
*************** SELECT '{}'::json;                -- OK
*** 164,173 ****
  (1 row)

  SELECT '{"abc"}'::json;            -- ERROR, no value
! ERROR:  invalid input syntax for type json: "{"abc"}"
  LINE 1: SELECT '{"abc"}'::json;
                 ^
! DETAIL:  line 1: Expected ":", but found "}".
  SELECT '{"abc":1}'::json;        -- OK
     json
  -----------
--- 179,189 ----
  (1 row)

  SELECT '{"abc"}'::json;            -- ERROR, no value
! ERROR:  invalid input syntax for type json
  LINE 1: SELECT '{"abc"}'::json;
                 ^
! DETAIL:  Expected ":", but found "}".
! CONTEXT:  JSON data, line 1: {"abc"}
  SELECT '{"abc":1}'::json;        -- OK
     json
  -----------
*************** SELECT '{"abc":1}'::json;        -- OK
*** 175,199 ****
  (1 row)

  SELECT '{1:"abc"}'::json;        -- ERROR, keys must be strings
! ERROR:  invalid input syntax for type json: "{1:"abc"}"
  LINE 1: SELECT '{1:"abc"}'::json;
                 ^
! DETAIL:  line 1: Expected string or "}", but found "1".
  SELECT '{"abc",1}'::json;        -- ERROR, wrong separator
! ERROR:  invalid input syntax for type json: "{"abc",1}"
  LINE 1: SELECT '{"abc",1}'::json;
                 ^
! DETAIL:  line 1: Expected ":", but found ",".
  SELECT '{"abc"=1}'::json;        -- ERROR, totally wrong separator
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '{"abc"=1}'::json;
                 ^
! DETAIL:  line 1: Token "=" is invalid.
  SELECT '{"abc"::1}'::json;        -- ERROR, another wrong separator
! ERROR:  invalid input syntax for type json: "{"abc"::1}"
  LINE 1: SELECT '{"abc"::1}'::json;
                 ^
! DETAIL:  line 1: Expected string, number, object, array, true, false, or null, but found ":".
  SELECT '{"abc":1,"def":2,"ghi":[3,4],"hij":{"klm":5,"nop":[6]}}'::json; -- OK
                            json
  ---------------------------------------------------------
--- 191,219 ----
  (1 row)

  SELECT '{1:"abc"}'::json;        -- ERROR, keys must be strings
! ERROR:  invalid input syntax for type json
  LINE 1: SELECT '{1:"abc"}'::json;
                 ^
! DETAIL:  Expected string or "}", but found "1".
! CONTEXT:  JSON data, line 1: {1...
  SELECT '{"abc",1}'::json;        -- ERROR, wrong separator
! ERROR:  invalid input syntax for type json
  LINE 1: SELECT '{"abc",1}'::json;
                 ^
! DETAIL:  Expected ":", but found ",".
! CONTEXT:  JSON data, line 1: {"abc",...
  SELECT '{"abc"=1}'::json;        -- ERROR, totally wrong separator
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT '{"abc"=1}'::json;
                 ^
! DETAIL:  Token "=" is invalid.
! CONTEXT:  JSON data, line 1: {"abc"=...
  SELECT '{"abc"::1}'::json;        -- ERROR, another wrong separator
! ERROR:  invalid input syntax for type json
  LINE 1: SELECT '{"abc"::1}'::json;
                 ^
! DETAIL:  Expected string, number, object, array, true, false, or null, but found ":".
! CONTEXT:  JSON data, line 1: {"abc"::...
  SELECT '{"abc":1,"def":2,"ghi":[3,4],"hij":{"klm":5,"nop":[6]}}'::json; -- OK
                            json
  ---------------------------------------------------------
*************** SELECT '{"abc":1,"def":2,"ghi":[3,4],"hi
*** 201,215 ****
  (1 row)

  SELECT '{"abc":1:2}'::json;        -- ERROR, colon in wrong spot
! ERROR:  invalid input syntax for type json: "{"abc":1:2}"
  LINE 1: SELECT '{"abc":1:2}'::json;
                 ^
! DETAIL:  line 1: Expected "," or "}", but found ":".
  SELECT '{"abc":1,3}'::json;        -- ERROR, no value
! ERROR:  invalid input syntax for type json: "{"abc":1,3}"
  LINE 1: SELECT '{"abc":1,3}'::json;
                 ^
! DETAIL:  line 1: Expected string, but found "3".
  -- Miscellaneous stuff.
  SELECT 'true'::json;            -- OK
   json
--- 221,237 ----
  (1 row)

  SELECT '{"abc":1:2}'::json;        -- ERROR, colon in wrong spot
! ERROR:  invalid input syntax for type json
  LINE 1: SELECT '{"abc":1:2}'::json;
                 ^
! DETAIL:  Expected "," or "}", but found ":".
! CONTEXT:  JSON data, line 1: {"abc":1:...
  SELECT '{"abc":1,3}'::json;        -- ERROR, no value
! ERROR:  invalid input syntax for type json
  LINE 1: SELECT '{"abc":1,3}'::json;
                 ^
! DETAIL:  Expected string, but found "3".
! CONTEXT:  JSON data, line 1: {"abc":1,3...
  -- Miscellaneous stuff.
  SELECT 'true'::json;            -- OK
   json
*************** SELECT ' true '::json;            -- OK, even wit
*** 236,270 ****
  (1 row)

  SELECT 'true false'::json;        -- ERROR, too many values
! ERROR:  invalid input syntax for type json: "true false"
  LINE 1: SELECT 'true false'::json;
                 ^
! DETAIL:  line 1: Expected end of input, but found "false".
  SELECT 'true, false'::json;        -- ERROR, too many values
! ERROR:  invalid input syntax for type json: "true, false"
  LINE 1: SELECT 'true, false'::json;
                 ^
! DETAIL:  line 1: Expected end of input, but found ",".
  SELECT 'truf'::json;            -- ERROR, not a keyword
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT 'truf'::json;
                 ^
! DETAIL:  line 1: Token "truf" is invalid.
  SELECT 'trues'::json;            -- ERROR, not a keyword
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT 'trues'::json;
                 ^
! DETAIL:  line 1: Token "trues" is invalid.
  SELECT ''::json;                -- ERROR, no value
! ERROR:  invalid input syntax for type json: ""
  LINE 1: SELECT ''::json;
                 ^
  DETAIL:  The input string ended unexpectedly.
  SELECT '    '::json;            -- ERROR, no value
! ERROR:  invalid input syntax for type json: "    "
  LINE 1: SELECT '    '::json;
                 ^
  DETAIL:  The input string ended unexpectedly.
  --constructors
  -- array_to_json
  SELECT array_to_json(array(select 1 as a));
--- 258,298 ----
  (1 row)

  SELECT 'true false'::json;        -- ERROR, too many values
! ERROR:  invalid input syntax for type json
  LINE 1: SELECT 'true false'::json;
                 ^
! DETAIL:  Expected end of input, but found "false".
! CONTEXT:  JSON data, line 1: true false
  SELECT 'true, false'::json;        -- ERROR, too many values
! ERROR:  invalid input syntax for type json
  LINE 1: SELECT 'true, false'::json;
                 ^
! DETAIL:  Expected end of input, but found ",".
! CONTEXT:  JSON data, line 1: true,...
  SELECT 'truf'::json;            -- ERROR, not a keyword
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT 'truf'::json;
                 ^
! DETAIL:  Token "truf" is invalid.
! CONTEXT:  JSON data, line 1: truf
  SELECT 'trues'::json;            -- ERROR, not a keyword
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT 'trues'::json;
                 ^
! DETAIL:  Token "trues" is invalid.
! CONTEXT:  JSON data, line 1: trues
  SELECT ''::json;                -- ERROR, no value
! ERROR:  invalid input syntax for type json
  LINE 1: SELECT ''::json;
                 ^
  DETAIL:  The input string ended unexpectedly.
+ CONTEXT:  JSON data, line 1:
  SELECT '    '::json;            -- ERROR, no value
! ERROR:  invalid input syntax for type json
  LINE 1: SELECT '    '::json;
                 ^
  DETAIL:  The input string ended unexpectedly.
+ CONTEXT:  JSON data, line 1:
  --constructors
  -- array_to_json
  SELECT array_to_json(array(select 1 as a));

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Robert Haas

Date:

13 June 2012, 13:45:49

On Wed, Jun 13, 2012 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> In any case, the proposed scheme for providing context requires that
>>> you know where the error is before you can identify the context.  I
>>> considered schemes that would keep track of the last N characters or
>>> line breaks in case one of them proved to be the one we need.  That
>>> would add enough cycles to non-error cases to almost certainly not be
>>> desirable.  I also considered trying to back up, but that doesn't look
>>> promising either for arbitrary multibyte encodings.
>
>> Oh, I see.  :-(
>
> Attached is a complete proposed patch for this, with some further
> adjustments to make the output look a bit more like what we use for
> SQL error context printouts (in particular, "..." at both ends of the
> excerpt when appropriate).

I like some of these changes - in particular, the use of errcontext(),
but some of them still seem off.

! DETAIL:  Token "'" is invalid.
! CONTEXT:  JSON data, line 1: '...

This doesn't make sense to me.

! DETAIL:  Character with value 0x0a must be escaped.
! CONTEXT:  JSON data, line 1: "abc
! ...

This seems an odd way to present this, especially if the goal is to
NOT include the character needing escaping in the log unescaped, which
I thought was the point of saying 0x0a.

! DETAIL:  "\u" must be followed by four hexadecimal digits.
! CONTEXT:  JSON data, line 1: "\u000g...

Why does this end in ... even though there's nothing further in the
input string?

> One thing I did not touch, but am less than happy with, is the wording
> of this detail message:
>
>        errdetail("Expected string, number, object, array, true, false, or null, but found \"%s\".",
>                  token),
>
> This seems uselessly verbose to me.  It could be argued that enumerating
> all the options is helpful to rank JSON novices ... but unless you know
> exactly what an "object" is and why that's different from the other
> options, it's not all that helpful.  I'm inclined to think that
> something like this would be better:
>
>        errdetail("Expected JSON value, but found \"%s\".",
>                  token),
>
> Thoughts?

Good idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Tom Lane

Date:

13 June 2012, 14:04:57

Robert Haas <robertmhaas@gmail.com> writes:
> I like some of these changes - in particular, the use of errcontext(),
> but some of them still seem off.

> ! DETAIL:  Token "'" is invalid.
> ! CONTEXT:  JSON data, line 1: '...

> This doesn't make sense to me.

Well, the input is two single quotes and the complaint is that the first
one of them doesn't constitute a valid JSON token.  What would you
expect to see instead?

I considered putting double quotes around the excerpt text, but we do
not do that in SQL error-context reports, and it seems likely to just be
confusing given the prevalence of double quotes in JSON data.  But happy
to consider opposing arguments.

Another thing that could be done here is to print the rest of the line,
rather than "...", if there's not very much of it.  I'm not sure that's
an improvement though.  The advantage of the proposed design is that the
point where the excerpt ends is exactly where the error was detected;
lacking an error cursor, I don't see how else to present that info.

> ! DETAIL:  Character with value 0x0a must be escaped.
> ! CONTEXT:  JSON data, line 1: "abc
> ! ...

> This seems an odd way to present this, especially if the goal is to
> NOT include the character needing escaping in the log unescaped, which
> I thought was the point of saying 0x0a.

Do you think it would be better to present something that isn't what the
user typed?  Again, I don't see an easy improvement here.  If you don't
want newlines in the logged context, what will we do for something like
{"foo": {    "bar":44    }]

There basically isn't any useful context to present unless we are
willing to back up several lines, and I don't think it'll be more
readable if we escape all the newlines.

> ! DETAIL:  "\u" must be followed by four hexadecimal digits.
> ! CONTEXT:  JSON data, line 1: "\u000g...

> Why does this end in ... even though there's nothing further in the
> input string?

There is something further, namely the trailing double quote.

The examples in the regression tests are not really designed to show
cases where this type of error reporting is an improvement, because
there's hardly any context around the error sites.
        regards, tom lane

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Robert Haas

Date:

13 June 2012, 15:11:05

On Wed, Jun 13, 2012 at 1:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I like some of these changes - in particular, the use of errcontext(),
>> but some of them still seem off.
>
>> ! DETAIL:  Token "'" is invalid.
>> ! CONTEXT:  JSON data, line 1: '...
>
>> This doesn't make sense to me.
>
> Well, the input is two single quotes and the complaint is that the first
> one of them doesn't constitute a valid JSON token.  What would you
> expect to see instead?

Oh, I see.

> Another thing that could be done here is to print the rest of the line,
> rather than "...", if there's not very much of it.  I'm not sure that's
> an improvement though.  The advantage of the proposed design is that the
> point where the excerpt ends is exactly where the error was detected;
> lacking an error cursor, I don't see how else to present that info.

OK.


>> ! DETAIL:  Character with value 0x0a must be escaped.
>> ! CONTEXT:  JSON data, line 1: "abc
>> ! ...
>
>> This seems an odd way to present this, especially if the goal is to
>> NOT include the character needing escaping in the log unescaped, which
>> I thought was the point of saying 0x0a.
>
> Do you think it would be better to present something that isn't what the
> user typed?  Again, I don't see an easy improvement here.  If you don't
> want newlines in the logged context, what will we do for something like
>
>        {"foo": {
>                "bar":44
>                }
>        ]
>
> There basically isn't any useful context to present unless we are
> willing to back up several lines, and I don't think it'll be more
> readable if we escape all the newlines.

Hmm.  If your plan is to trace back to the opening brace you were
expecting to match, I don't think that's going to work either.  What
if there are three pages (or 3MB) of data in between?

> The examples in the regression tests are not really designed to show
> cases where this type of error reporting is an improvement, because
> there's hardly any context around the error sites.

Yeah, true.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Tom Lane

Date:

13 June 2012, 15:55:43

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Jun 13, 2012 at 1:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> ! DETAIL:  Character with value 0x0a must be escaped.
>>> ! CONTEXT:  JSON data, line 1: "abc
>>> ! ...
>>> 
>>> This seems an odd way to present this, especially if the goal is to
>>> NOT include the character needing escaping in the log unescaped, which
>>> I thought was the point of saying 0x0a.

>> Do you think it would be better to present something that isn't what the
>> user typed?  Again, I don't see an easy improvement here.  If you don't
>> want newlines in the logged context, what will we do for something like
>> 
>>        {"foo": {
>>                "bar":44
>>                }
>>        ]

> Hmm.  If your plan is to trace back to the opening brace you were
> expecting to match, I don't think that's going to work either.  What
> if there are three pages (or 3MB) of data in between?

No, that's not the proposal; I only anticipate printing a few dozen
characters of context.  But that could still mean printing something
like
DETAIL:  expected "," or "}", but found "]".CONTEXT:  JSON data, line 123: ..."bar":44              }       ]

which I argue is much more useful than just seeing the "]".  So the
question is whether it's still as useful if we mangle the whitespace.
I'm thinking it's not.  We don't mangle whitespace when printing SQL
statements into the log, anyway.
        regards, tom lane

Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

From

Tom Lane

Date:

13 June 2012, 20:49:27

Robert Haas <robertmhaas@gmail.com> writes:
>>> ! DETAIL:  Character with value 0x0a must be escaped.
>>> ! CONTEXT:  JSON data, line 1: "abc
>>> ! ...
>>>
>>> This seems an odd way to present this, especially if the goal is to
>>> NOT include the character needing escaping in the log unescaped, which
>>> I thought was the point of saying 0x0a.

I thought of a simple way to address that objection for this particular
case: we can just truncate the context display *at* the offending
character, instead of *after* it.  This is playing a bit fast and loose
with the general principle that the context should end at the point of
detection of the error; but given that the character in question is
always unprintable, I think it's probably not going to bother anyone.

I've gone ahead and committed this before branching, though I'm
certainly still willing to entertain suggestions for further
improvement.
        regards, tom lane