Thread: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Robert Haas <rhaas@postgresql.org> writes: > Mark JSON error detail messages for translation. > Per gripe from Tom Lane. OK, that was one generically bad thing about json.c's error messages. The other thing that was bothering me was the style of the errdetail strings, eg ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type json"), errdetail("line %d: Invalid escape \"\\%s\".", lex->line_number, extract_mb_char(s)))); I'm not thrilled with the "line %d:" prefixes. That seems to me to violate the letter and spirit of the guideline about making errdetail messages be complete sentences. Furthermore, the line number is not the most important part of the detail message, if indeed it has any usefulness at all which is debatable. (It's certainly not going to be tremendously useful when the error report doesn't show any of the specific input string being complained of.) Also, a minority of the errors report the whole input string: ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntaxfor type json: \"%s\"", lex->input), errdetail("The input string ended unexpectedly."))); And then there are some that provide the whole input string AND a line number, just in case you thought there was any intentional design decision there. A minimum change IMO would be to recast the detail messages as complete sentences, say "The escape sequence \"\\%s\" in line %d is invalid." But I'd like a better-thought-out solution to identifying the location of the error. I notice that there's an unfinished attempt to maintain a line_start pointer; if that were carried through, we could imagine printing the current line up to the point of an error, which might provide a reasonable balance between verbosity and insufficient context. As an example, instead of regression=# select '{"x":1, z "y":"txt1"}'::json; ERROR: invalid input syntax for type json LINE 1: select '{"x":1, ^ DETAIL: line 2: Token "z" is invalid. maybe we could print regression=# select '{"x":1, z "y":"txt1"}'::json; ERROR: invalid input syntax for type json LINE 1: select '{"x":1, ^ DETAIL: Token "z" is invalid in line 2: "z". or perhaps better let it run to the end of the line: regression=# select '{"x":1, z "y":"txt1"}'::json; ERROR: invalid input syntax for type json LINE 1: select '{"x":1, ^ DETAIL: Token "z" is invalid in line 2: "z "y":"txt1"}". Thoughts? regards, tom lane
On Tue, Jun 12, 2012 at 2:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <rhaas@postgresql.org> writes: >> Mark JSON error detail messages for translation. >> Per gripe from Tom Lane. > > OK, that was one generically bad thing about json.c's error messages. > The other thing that was bothering me was the style of the errdetail > strings, eg > > ereport(ERROR, > (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), > errmsg("invalid input syntax for type json"), > errdetail("line %d: Invalid escape \"\\%s\".", > lex->line_number, extract_mb_char(s)))); > > I'm not thrilled with the "line %d:" prefixes. That seems to me to > violate the letter and spirit of the guideline about making errdetail > messages be complete sentences. Furthermore, the line number is not > the most important part of the detail message, if indeed it has any > usefulness at all which is debatable. (It's certainly not going to > be tremendously useful when the error report doesn't show any of the > specific input string being complained of.) I am amenable to rephrasing the errdetail to put the line number elsewhere in the string, but I am strongly opposed to getting rid of it. JSON blobs can easily run to dozens or hundreds of lines, and you will want to know the line number. Knowing the contents of the line will in many cases be LESS useful, because the line might contain, say, a single closing bracket. That's not gonna help much if it happens many times in the input value, which is not unlikely. > Also, a minority of the errors report the whole input string: > > ereport(ERROR, > (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), > errmsg("invalid input syntax for type json: \"%s\"", > lex->input), > errdetail("The input string ended unexpectedly."))); > > And then there are some that provide the whole input string AND a line > number, just in case you thought there was any intentional design > decision there. I did try. > A minimum change IMO would be to recast the detail messages as complete > sentences, say "The escape sequence \"\\%s\" in line %d is invalid." > But I'd like a better-thought-out solution to identifying the location > of the error. > > I notice that there's an unfinished attempt to maintain a line_start > pointer; if that were carried through, we could imagine printing the > current line up to the point of an error, which might provide a > reasonable balance between verbosity and insufficient context. > As an example, instead of > > regression=# select '{"x":1, > z "y":"txt1"}'::json; > ERROR: invalid input syntax for type json > LINE 1: select '{"x":1, > ^ > DETAIL: line 2: Token "z" is invalid. > > maybe we could print > > regression=# select '{"x":1, > z "y":"txt1"}'::json; > ERROR: invalid input syntax for type json > LINE 1: select '{"x":1, > ^ > DETAIL: Token "z" is invalid in line 2: "z". > > or perhaps better let it run to the end of the line: > > regression=# select '{"x":1, > z "y":"txt1"}'::json; > ERROR: invalid input syntax for type json > LINE 1: select '{"x":1, > ^ > DETAIL: Token "z" is invalid in line 2: "z "y":"txt1"}". > > Thoughts? I'm not sure I find that an improvement, but I'm open to what other people think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jun 12, 2012 at 2:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm not thrilled with the "line %d:" prefixes. That seems to me to >> violate the letter and spirit of the guideline about making errdetail >> messages be complete sentences. Furthermore, the line number is not >> the most important part of the detail message, if indeed it has any >> usefulness at all which is debatable. (It's certainly not going to >> be tremendously useful when the error report doesn't show any of the >> specific input string being complained of.) > I am amenable to rephrasing the errdetail to put the line number > elsewhere in the string, but I am strongly opposed to getting rid of > it. JSON blobs can easily run to dozens or hundreds of lines, and you > will want to know the line number. Agreed, the input strings could be quite large, but in that case we definitely don't want to be including the whole input in the primary error message (which is supposed to be short). I doubt it'd even be a good idea to try to put it into errdetail; thus I'm thinking about providing one line only. > Knowing the contents of the line > will in many cases be LESS useful, because the line might contain, > say, a single closing bracket. True, but I don't think the line number alone is adequate. There are too many situations where it's not entirely clear what value is being complained of. (Considering only syntax errors thrown from literal constants in SQL commands is quite misleading about the requirements here.) So I think we need to include at least some of the input in the error. >> I notice that there's an unfinished attempt to maintain a line_start >> pointer; if that were carried through, we could imagine printing the >> current line up to the point of an error, which might provide a >> reasonable balance between verbosity and insufficient context. >> ... >> or perhaps better let it run to the end of the line: > I'm not sure I find that an improvement, but I'm open to what other > people think. Anybody here besides the crickets? regards, tom lane
Excerpts from Tom Lane's message of mar jun 12 16:52:20 -0400 2012: > Robert Haas <robertmhaas@gmail.com> writes: > >> I notice that there's an unfinished attempt to maintain a line_start > >> pointer; if that were carried through, we could imagine printing the > >> current line up to the point of an error, which might provide a > >> reasonable balance between verbosity and insufficient context. > >> ... > >> or perhaps better let it run to the end of the line: > > > I'm not sure I find that an improvement, but I'm open to what other > > people think. > > Anybody here besides the crickets? I think providing both partial line contents (so +1 for maintaining line_start carefully as required) and line number would be useful enough to track down problems. I am not sure about the idea of letting the detail run to the end of the line; that would be problematic should the line be long (there might not be newlines in the literal at all, which is not that unusual). I think it should be truncated at, say, 76 chars or so. For the case where you have a single } in a line, this isn't all that helpful; we could consider printing the previous line as well. But if you end up with } } then it's not that helpful either. I am not sure. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > I am not sure about the idea of letting the detail run to the end of the > line; that would be problematic should the line be long (there might not > be newlines in the literal at all, which is not that unusual). I think > it should be truncated at, say, 76 chars or so. Yeah, I was wondering about trying to provide a given amount of context instead of fixing it to "one line". We could do something like (1) back up N characters; (2) find the next newline, if there is one at least M characters before the error point; (3) print from there to the error point. This would give between M and N characters of context, except when the error point is less than M characters from the start of the input. Not sure how to display such text together with a line number though; with a multi-line fragment it would not be obvious which part the line number refers to. (Backing up over multibyte data might be a bit complicated too, but I assume we can think of something workable for that.) regards, tom lane
I wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> I am not sure about the idea of letting the detail run to the end of the >> line; that would be problematic should the line be long (there might not >> be newlines in the literal at all, which is not that unusual). I think >> it should be truncated at, say, 76 chars or so. > Yeah, I was wondering about trying to provide a given amount of context > instead of fixing it to "one line". We could do something like > (1) back up N characters; > (2) find the next newline, if there is one at least M characters before > the error point; > (3) print from there to the error point. After experimenting with this for awhile I concluded that the above is overcomplicated, and that we might as well just print up to N characters of context; in most input, the line breaks are far enough apart that preferentially breaking at them just leads to not having very much context. Also, it seems like it might be a good idea to present the input as a CONTEXT line, because that provides more space; you can fit 50 or so characters of data without overrunning standard display width. This gives me output like regression=# select '{"unique1":8800,"unique2":0,"two":0,"four":0,"ten":0,"twenty":0,"hundred":0,"thousand":800, "twothous and":800,"fivethous":3800,"tenthous":8800,"odd":0,"even":1, "stringu1":"MAAAAA","stringu2":"AAAAAA","string4":"AAAAxx"}' ::json; ERROR: invalid input syntax for type json LINE 1: select '{"unique1":8800,"unique2":0,"two":0,"four":0,"ten":0... ^ DETAIL: Character with value "0x0a" must be escaped. CONTEXT: JSON data, line 1: ..."twenty":0,"hundred":0,"thousand":800, "twothous regression=# I can't give too many examples because I've only bothered to context-ify this single error case as yet ;-) ... but does this seem like a sane way to go? The code for this is as attached. Note that I'd rip out the normal-path tracking of line boundaries; it seems better to have a second scan of the data in the error case and save the cycles in non-error cases. ... ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type json"), errdetail("Character with value \"0x%02x\" must be escaped.", (unsigned char) *s), report_json_context(lex))); ... /** Report a CONTEXT line for bogus JSON input.** The return value isn't meaningful, but we make it non-void so that this*can be invoked inside ereport().*/ static int report_json_context(JsonLexContext *lex) { char *context_start; char *context_end; char *line_start; int line_number; char *ctxt; int ctxtlen; /* Choose boundaries for the part of the input we will display */ context_start = lex->input; context_end = lex->token_terminator; line_start = context_start; line_number = 1; for (;;) { /* Always advance over a newline,unless it's the current token */ if (*context_start == '\n' && context_start < lex->token_start) { context_start++; line_start = context_start; line_number++; continue; } /* Otherwise, done as soon as we are close enough to context_end */ if (context_end - context_start < 50) break; /* Advance to next multibyte character */ if (IS_HIGHBIT_SET(*context_start)) context_start+= pg_mblen(context_start); else context_start++; } /* Get a null-terminated copy of the data to present */ ctxtlen = context_end - context_start; ctxt = palloc(ctxtlen+ 1); memcpy(ctxt, context_start, ctxtlen); ctxt[ctxtlen] = '\0'; return errcontext("JSON data, line %d: %s%s", line_number, (context_start > line_start)? "..." : "", ctxt); } regards, tom lane
On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The code for this is as attached. Note that I'd rip out the normal-path > tracking of line boundaries; it seems better to have a second scan of > the data in the error case and save the cycles in non-error cases. Really?! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The code for this is as attached. �Note that I'd rip out the normal-path >> tracking of line boundaries; it seems better to have a second scan of >> the data in the error case and save the cycles in non-error cases. > Really?! Um ... do you have a problem with that idea, and if so what? It would be considerably more complicated to do it without a second pass. regards, tom lane
On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The code for this is as attached. Note that I'd rip out the normal-path >>> tracking of line boundaries; it seems better to have a second scan of >>> the data in the error case and save the cycles in non-error cases. > >> Really?! > > Um ... do you have a problem with that idea, and if so what? It would > be considerably more complicated to do it without a second pass. Could you explain how it's broken now, and why it will be hard to fix?People may well want to use a cast to JSON within anexception block as a way of testing whether strings are valid JSON. We should not assume that the cost of an exception is totally irrelevant, because this might be iterated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote: > On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> The code for this is as attached. Note that I'd rip out the > >>> normal-path tracking of line boundaries; it seems better to have a > >>> second scan of the data in the error case and save the cycles in > >>> non-error cases. > >> > >> Really?! > > > > Um ... do you have a problem with that idea, and if so what? It would > > be considerably more complicated to do it without a second pass. > > Could you explain how it's broken now, and why it will be hard to fix? > People may well want to use a cast to JSON within an exception block > as a way of testing whether strings are valid JSON. We should not > assume that the cost of an exception is totally irrelevant, because > this might be iterated. Exception blocks/subtransctions already are considerably expensive. I have a hard time believing this additional cost would be measureable. Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jun 13, 2012 at 11:06 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote: >> On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > Robert Haas <robertmhaas@gmail.com> writes: >> >> On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >>> The code for this is as attached. Note that I'd rip out the >> >>> normal-path tracking of line boundaries; it seems better to have a >> >>> second scan of the data in the error case and save the cycles in >> >>> non-error cases. >> >> >> >> Really?! >> > >> > Um ... do you have a problem with that idea, and if so what? It would >> > be considerably more complicated to do it without a second pass. >> >> Could you explain how it's broken now, and why it will be hard to fix? >> People may well want to use a cast to JSON within an exception block >> as a way of testing whether strings are valid JSON. We should not >> assume that the cost of an exception is totally irrelevant, because >> this might be iterated. > Exception blocks/subtransctions already are considerably expensive. I have a > hard time believing this additional cost would be measureable. According to my testing, the main cost of an exception block catching a division-by-zero error is that of generating the error message, primarily sprintf()-type stuff. The cost of scanning a multi-megabyte string seems likely to be much larger. Mind you, I'm not going to spend a lot of time crying into my beer if it turns out that there's no other reasonable way to implement this, but I do think that it's entirely appropriate to ask why it's not possible to do better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote: > On Wed, Jun 13, 2012 at 11:06 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote: > >> On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> > Robert Haas <robertmhaas@gmail.com> writes: > >> >> On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> >>> The code for this is as attached. Note that I'd rip out the > >> >>> normal-path tracking of line boundaries; it seems better to have a > >> >>> second scan of the data in the error case and save the cycles in > >> >>> non-error cases. > >> >> > >> >> Really?! > >> > > >> > Um ... do you have a problem with that idea, and if so what? It would > >> > be considerably more complicated to do it without a second pass. > >> > >> Could you explain how it's broken now, and why it will be hard to fix? > >> People may well want to use a cast to JSON within an exception block > >> as a way of testing whether strings are valid JSON. We should not > >> assume that the cost of an exception is totally irrelevant, because > >> this might be iterated. > > > > Exception blocks/subtransctions already are considerably expensive. I > > have a hard time believing this additional cost would be measureable. > > According to my testing, the main cost of an exception block catching > a division-by-zero error is that of generating the error message, > primarily sprintf()-type stuff. The cost of scanning a multi-megabyte > string seems likely to be much larger. True. I ignored that there doesn't have to be an xid assigned yet... I still think its not very relevant though. Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote: >> According to my testing, the main cost of an exception block catching >> a division-by-zero error is that of generating the error message, >> primarily sprintf()-type stuff. The cost of scanning a multi-megabyte >> string seems likely to be much larger. > True. I ignored that there doesn't have to be an xid assigned yet... I still > think its not very relevant though. I don't think it's relevant either. In the first place, I don't think that errors occurring "multi megabytes" into a JSON blob are going to occur often enough to be performance critical. In the second place, removing cycles from the non-error case is worth something too, probably more than making the error case faster is worth. In any case, the proposed scheme for providing context requires that you know where the error is before you can identify the context. I considered schemes that would keep track of the last N characters or line breaks in case one of them proved to be the one we need. That would add enough cycles to non-error cases to almost certainly not be desirable. I also considered trying to back up, but that doesn't look promising either for arbitrary multibyte encodings. regards, tom lane
On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@2ndquadrant.com> writes: >> On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote: >>> According to my testing, the main cost of an exception block catching >>> a division-by-zero error is that of generating the error message, >>> primarily sprintf()-type stuff. The cost of scanning a multi-megabyte >>> string seems likely to be much larger. > >> True. I ignored that there doesn't have to be an xid assigned yet... I still >> think its not very relevant though. > > I don't think it's relevant either. In the first place, I don't think > that errors occurring "multi megabytes" into a JSON blob are going to > occur often enough to be performance critical. In the second place, > removing cycles from the non-error case is worth something too, probably > more than making the error case faster is worth. > > In any case, the proposed scheme for providing context requires that > you know where the error is before you can identify the context. I > considered schemes that would keep track of the last N characters or > line breaks in case one of them proved to be the one we need. That > would add enough cycles to non-error cases to almost certainly not be > desirable. I also considered trying to back up, but that doesn't look > promising either for arbitrary multibyte encodings. Oh, I see. :-( Well, I guess I'll have to suck it up, then. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In any case, the proposed scheme for providing context requires that >> you know where the error is before you can identify the context. I >> considered schemes that would keep track of the last N characters or >> line breaks in case one of them proved to be the one we need. That >> would add enough cycles to non-error cases to almost certainly not be >> desirable. I also considered trying to back up, but that doesn't look >> promising either for arbitrary multibyte encodings. > Oh, I see. :-( Attached is a complete proposed patch for this, with some further adjustments to make the output look a bit more like what we use for SQL error context printouts (in particular, "..." at both ends of the excerpt when appropriate). One thing I did not touch, but am less than happy with, is the wording of this detail message: errdetail("Expected string, number, object, array, true, false, or null, but found \"%s\".", token), This seems uselessly verbose to me. It could be argued that enumerating all the options is helpful to rank JSON novices ... but unless you know exactly what an "object" is and why that's different from the other options, it's not all that helpful. I'm inclined to think that something like this would be better: errdetail("Expected JSON value, but found \"%s\".", token), Thoughts? regards, tom lane diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c index e79c294..59335db 100644 *** a/src/backend/utils/adt/json.c --- b/src/backend/utils/adt/json.c *************** typedef struct /* state of JSON lexe *** 43,50 **** char *token_start; /* start of current token within input */ char *token_terminator; /* end of previous or current token */ JsonValueType token_type; /* type of current token, once it's known */ - int line_number; /* current line number (counting from 1) */ - char *line_start; /* start of current line within input (BROKEN!!) */ } JsonLexContext; typedef enum /* states of JSON parser */ --- 43,48 ---- *************** static void json_lex_string(JsonLexConte *** 78,83 **** --- 76,82 ---- static void json_lex_number(JsonLexContext *lex, char *s); static void report_parse_error(JsonParseStack *stack, JsonLexContext *lex); static void report_invalid_token(JsonLexContext *lex); + static int report_json_context(JsonLexContext *lex); static char *extract_mb_char(char *s); static void composite_to_json(Datum composite, StringInfo result, bool use_line_feeds); *************** json_validate_cstring(char *input) *** 185,192 **** /* Set up lexing context. */ lex.input = input; lex.token_terminator = lex.input; - lex.line_number = 1; - lex.line_start = input; /* Set up parse stack. */ stacksize = 32; --- 184,189 ---- *************** json_lex(JsonLexContext *lex) *** 335,345 **** /* Skip leading whitespace. */ s = lex->token_terminator; while (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r') - { - if (*s == '\n') - lex->line_number++; s++; - } lex->token_start = s; /* Determine token type. */ --- 332,338 ---- *************** json_lex(JsonLexContext *lex) *** 350,356 **** { /* End of string. */ lex->token_start = NULL; ! lex->token_terminator = NULL; } else { --- 343,349 ---- { /* End of string. */ lex->token_start = NULL; ! lex->token_terminator = s; } else { *************** json_lex(JsonLexContext *lex) *** 397,403 **** /* * We got some sort of unexpected punctuation or an otherwise * unexpected character, so just complain about that one ! * character. */ lex->token_terminator = s + 1; report_invalid_token(lex); --- 390,397 ---- /* * We got some sort of unexpected punctuation or an otherwise * unexpected character, so just complain about that one ! * character. (It can't be multibyte because the above loop ! * will advance over any multibyte characters.) */ lex->token_terminator = s + 1; report_invalid_token(lex); *************** json_lex_string(JsonLexContext *lex) *** 443,453 **** lex->token_terminator = s; report_invalid_token(lex); } ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type json"), ! errdetail("line %d: Character with value \"0x%02x\" must be escaped.", ! lex->line_number, (unsigned char) *s))); } else if (*s == '\\') { --- 437,449 ---- lex->token_terminator = s; report_invalid_token(lex); } + lex->token_terminator = s + pg_mblen(s); ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type json"), ! errdetail("Character with value 0x%02x must be escaped.", ! (unsigned char) *s), ! report_json_context(lex))); } else if (*s == '\\') { *************** json_lex_string(JsonLexContext *lex) *** 465,502 **** for (i = 1; i <= 4; i++) { ! if (s[i] == '\0') { ! lex->token_terminator = s + i; report_invalid_token(lex); } ! else if (s[i] >= '0' && s[i] <= '9') ! ch = (ch * 16) + (s[i] - '0'); ! else if (s[i] >= 'a' && s[i] <= 'f') ! ch = (ch * 16) + (s[i] - 'a') + 10; ! else if (s[i] >= 'A' && s[i] <= 'F') ! ch = (ch * 16) + (s[i] - 'A') + 10; else { ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type json"), ! errdetail("line %d: \"\\u\" must be followed by four hexadecimal digits.", ! lex->line_number))); } } - - /* Account for the four additional bytes we just parsed. */ - s += 4; } else if (strchr("\"\\/bfnrt", *s) == NULL) { /* Not a valid string escape, so error out. */ ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type json"), ! errdetail("line %d: Invalid escape \"\\%s\".", ! lex->line_number, extract_mb_char(s)))); } } } --- 461,499 ---- for (i = 1; i <= 4; i++) { ! s++; ! if (*s == '\0') { ! lex->token_terminator = s; report_invalid_token(lex); } ! else if (*s >= '0' && *s <= '9') ! ch = (ch * 16) + (*s - '0'); ! else if (*s >= 'a' && *s <= 'f') ! ch = (ch * 16) + (*s - 'a') + 10; ! else if (*s >= 'A' && *s <= 'F') ! ch = (ch * 16) + (*s - 'A') + 10; else { + lex->token_terminator = s + pg_mblen(s); ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type json"), ! errdetail("\"\\u\" must be followed by four hexadecimal digits."), ! report_json_context(lex))); } } } else if (strchr("\"\\/bfnrt", *s) == NULL) { /* Not a valid string escape, so error out. */ + lex->token_terminator = s + pg_mblen(s); ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type json"), ! errdetail("Invalid string escape \"\\%s\".", ! extract_mb_char(s)), ! report_json_context(lex))); } } } *************** json_lex_number(JsonLexContext *lex, cha *** 599,666 **** /* * Report a parse error. */ static void report_parse_error(JsonParseStack *stack, JsonLexContext *lex) { ! char *detail = NULL; ! char *token = NULL; int toklen; /* Handle case where the input ended prematurely. */ if (lex->token_start == NULL) ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), ! errmsg("invalid input syntax for type json: \"%s\"", ! lex->input), ! errdetail("The input string ended unexpectedly."))); ! /* Separate out the offending token. */ toklen = lex->token_terminator - lex->token_start; token = palloc(toklen + 1); memcpy(token, lex->token_start, toklen); token[toklen] = '\0'; ! /* Select correct detail message. */ if (stack == NULL) ! detail = "line %d: Expected end of input, but found \"%s\"."; else { switch (stack->state) { case JSON_PARSE_VALUE: ! detail = "line %d: Expected string, number, object, array, true, false, or null, but found \"%s\"."; break; case JSON_PARSE_ARRAY_START: ! detail = "line %d: Expected array element or \"]\", but found \"%s\"."; break; case JSON_PARSE_ARRAY_NEXT: ! detail = "line %d: Expected \",\" or \"]\", but found \"%s\"."; break; case JSON_PARSE_OBJECT_START: ! detail = "line %d: Expected string or \"}\", but found \"%s\"."; break; case JSON_PARSE_OBJECT_LABEL: ! detail = "line %d: Expected \":\", but found \"%s\"."; break; case JSON_PARSE_OBJECT_NEXT: ! detail = "line %d: Expected \",\" or \"}\", but found \"%s\"."; break; case JSON_PARSE_OBJECT_COMMA: ! detail = "line %d: Expected string, but found \"%s\"."; break; } } - - ereport(ERROR, - (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), - errmsg("invalid input syntax for type json: \"%s\"", - lex->input), - detail ? errdetail(detail, lex->line_number, token) : 0)); } /* * Report an invalid input token. */ static void report_invalid_token(JsonLexContext *lex) --- 596,703 ---- /* * Report a parse error. + * + * lex->token_start and lex->token_terminator must identify the current token. */ static void report_parse_error(JsonParseStack *stack, JsonLexContext *lex) { ! char *token; int toklen; /* Handle case where the input ended prematurely. */ if (lex->token_start == NULL) ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), ! errmsg("invalid input syntax for type json"), ! errdetail("The input string ended unexpectedly."), ! report_json_context(lex))); ! /* Separate out the current token. */ toklen = lex->token_terminator - lex->token_start; token = palloc(toklen + 1); memcpy(token, lex->token_start, toklen); token[toklen] = '\0'; ! /* Complain, with the appropriate detail message. */ if (stack == NULL) ! ereport(ERROR, ! (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), ! errmsg("invalid input syntax for type json"), ! errdetail("Expected end of input, but found \"%s\".", ! token), ! report_json_context(lex))); else { switch (stack->state) { case JSON_PARSE_VALUE: ! ereport(ERROR, ! (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), ! errmsg("invalid input syntax for type json"), ! errdetail("Expected string, number, object, array, true, false, or null, but found \"%s\".", ! token), ! report_json_context(lex))); break; case JSON_PARSE_ARRAY_START: ! ereport(ERROR, ! (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), ! errmsg("invalid input syntax for type json"), ! errdetail("Expected array element or \"]\", but found \"%s\".", ! token), ! report_json_context(lex))); break; case JSON_PARSE_ARRAY_NEXT: ! ereport(ERROR, ! (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), ! errmsg("invalid input syntax for type json"), ! errdetail("Expected \",\" or \"]\", but found \"%s\".", ! token), ! report_json_context(lex))); break; case JSON_PARSE_OBJECT_START: ! ereport(ERROR, ! (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), ! errmsg("invalid input syntax for type json"), ! errdetail("Expected string or \"}\", but found \"%s\".", ! token), ! report_json_context(lex))); break; case JSON_PARSE_OBJECT_LABEL: ! ereport(ERROR, ! (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), ! errmsg("invalid input syntax for type json"), ! errdetail("Expected \":\", but found \"%s\".", ! token), ! report_json_context(lex))); break; case JSON_PARSE_OBJECT_NEXT: ! ereport(ERROR, ! (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), ! errmsg("invalid input syntax for type json"), ! errdetail("Expected \",\" or \"}\", but found \"%s\".", ! token), ! report_json_context(lex))); break; case JSON_PARSE_OBJECT_COMMA: ! ereport(ERROR, ! (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), ! errmsg("invalid input syntax for type json"), ! errdetail("Expected string, but found \"%s\".", ! token), ! report_json_context(lex))); break; + default: + elog(ERROR, "unexpected json parse state: %d", + (int) stack->state); } } } /* * Report an invalid input token. + * + * lex->token_start and lex->token_terminator must identify the token. */ static void report_invalid_token(JsonLexContext *lex) *************** report_invalid_token(JsonLexContext *lex *** 668,673 **** --- 705,711 ---- char *token; int toklen; + /* Separate out the offending token. */ toklen = lex->token_terminator - lex->token_start; token = palloc(toklen + 1); memcpy(token, lex->token_start, toklen); *************** report_invalid_token(JsonLexContext *lex *** 676,683 **** ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type json"), ! errdetail("line %d: Token \"%s\" is invalid.", ! lex->line_number, token))); } /* --- 714,793 ---- ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type json"), ! errdetail("Token \"%s\" is invalid.", token), ! report_json_context(lex))); ! } ! ! /* ! * Report a CONTEXT line for bogus JSON input. ! * ! * lex->token_terminator must be set to identify the spot where we detected ! * the error. Note that lex->token_start might be NULL, in case we recognized ! * error at EOF. ! * ! * The return value isn't meaningful, but we make it non-void so that this ! * can be invoked inside ereport(). ! */ ! static int ! report_json_context(JsonLexContext *lex) ! { ! const char *context_start; ! const char *context_end; ! const char *line_start; ! int line_number; ! char *ctxt; ! int ctxtlen; ! const char *prefix; ! const char *suffix; ! ! /* Choose boundaries for the part of the input we will display */ ! context_start = lex->input; ! context_end = lex->token_terminator; ! line_start = context_start; ! line_number = 1; ! for (;;) ! { ! /* Always advance over newlines (context_end test is just paranoia) */ ! if (*context_start == '\n' && context_start < context_end) ! { ! context_start++; ! line_start = context_start; ! line_number++; ! continue; ! } ! /* Otherwise, done as soon as we are close enough to context_end */ ! if (context_end - context_start < 50) ! break; ! /* Advance to next multibyte character */ ! if (IS_HIGHBIT_SET(*context_start)) ! context_start += pg_mblen(context_start); ! else ! context_start++; ! } ! ! /* ! * We add "..." to indicate that the excerpt doesn't start at the ! * beginning of the line ... but if we're within 3 characters of the ! * beginning of the line, we might as well just show the whole line. ! */ ! if (context_start - line_start <= 3) ! context_start = line_start; ! ! /* Get a null-terminated copy of the data to present */ ! ctxtlen = context_end - context_start; ! ctxt = palloc(ctxtlen + 1); ! memcpy(ctxt, context_start, ctxtlen); ! ctxt[ctxtlen] = '\0'; ! ! /* ! * Show the context, prefixing "..." if not starting at start of line, and ! * suffixing "..." if not ending at end of line. ! */ ! prefix = (context_start > line_start) ? "..." : ""; ! suffix = (*context_end != '\0' && *context_end != '\n' && *context_end != '\r') ? "..." : ""; ! ! return errcontext("JSON data, line %d: %s%s%s", ! line_number, prefix, ctxt, suffix); } /* diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out index 4b1ad89..1f79e05 100644 *** a/src/test/regress/expected/json.out --- b/src/test/regress/expected/json.out *************** SELECT $$''$$::json; -- ERROR, single *** 9,15 **** ERROR: invalid input syntax for type json LINE 1: SELECT $$''$$::json; ^ ! DETAIL: line 1: Token "'" is invalid. SELECT '"abc"'::json; -- OK json ------- --- 9,16 ---- ERROR: invalid input syntax for type json LINE 1: SELECT $$''$$::json; ^ ! DETAIL: Token "'" is invalid. ! CONTEXT: JSON data, line 1: '... SELECT '"abc"'::json; -- OK json ------- *************** SELECT '"abc'::json; -- ERROR, quotes *** 20,32 **** ERROR: invalid input syntax for type json LINE 1: SELECT '"abc'::json; ^ ! DETAIL: line 1: Token ""abc" is invalid. SELECT '"abc def"'::json; -- ERROR, unescaped newline in string constant ERROR: invalid input syntax for type json LINE 1: SELECT '"abc ^ ! DETAIL: line 1: Character with value "0x0a" must be escaped. SELECT '"\n\"\\"'::json; -- OK, legal escapes json ---------- --- 21,36 ---- ERROR: invalid input syntax for type json LINE 1: SELECT '"abc'::json; ^ ! DETAIL: Token ""abc" is invalid. ! CONTEXT: JSON data, line 1: "abc SELECT '"abc def"'::json; -- ERROR, unescaped newline in string constant ERROR: invalid input syntax for type json LINE 1: SELECT '"abc ^ ! DETAIL: Character with value 0x0a must be escaped. ! CONTEXT: JSON data, line 1: "abc ! ... SELECT '"\n\"\\"'::json; -- OK, legal escapes json ---------- *************** SELECT '"\v"'::json; -- ERROR, not a v *** 37,58 **** ERROR: invalid input syntax for type json LINE 1: SELECT '"\v"'::json; ^ ! DETAIL: line 1: Invalid escape "\v". SELECT '"\u"'::json; -- ERROR, incomplete escape ERROR: invalid input syntax for type json LINE 1: SELECT '"\u"'::json; ^ ! DETAIL: line 1: "\u" must be followed by four hexadecimal digits. SELECT '"\u00"'::json; -- ERROR, incomplete escape ERROR: invalid input syntax for type json LINE 1: SELECT '"\u00"'::json; ^ ! DETAIL: line 1: "\u" must be followed by four hexadecimal digits. SELECT '"\u000g"'::json; -- ERROR, g is not a hex digit ERROR: invalid input syntax for type json LINE 1: SELECT '"\u000g"'::json; ^ ! DETAIL: line 1: "\u" must be followed by four hexadecimal digits. SELECT '"\u0000"'::json; -- OK, legal escape json ---------- --- 41,66 ---- ERROR: invalid input syntax for type json LINE 1: SELECT '"\v"'::json; ^ ! DETAIL: Invalid string escape "\v". ! CONTEXT: JSON data, line 1: "\v... SELECT '"\u"'::json; -- ERROR, incomplete escape ERROR: invalid input syntax for type json LINE 1: SELECT '"\u"'::json; ^ ! DETAIL: "\u" must be followed by four hexadecimal digits. ! CONTEXT: JSON data, line 1: "\u" SELECT '"\u00"'::json; -- ERROR, incomplete escape ERROR: invalid input syntax for type json LINE 1: SELECT '"\u00"'::json; ^ ! DETAIL: "\u" must be followed by four hexadecimal digits. ! CONTEXT: JSON data, line 1: "\u00" SELECT '"\u000g"'::json; -- ERROR, g is not a hex digit ERROR: invalid input syntax for type json LINE 1: SELECT '"\u000g"'::json; ^ ! DETAIL: "\u" must be followed by four hexadecimal digits. ! CONTEXT: JSON data, line 1: "\u000g... SELECT '"\u0000"'::json; -- OK, legal escape json ---------- *************** SELECT '01'::json; -- ERROR, not vali *** 82,88 **** ERROR: invalid input syntax for type json LINE 1: SELECT '01'::json; ^ ! DETAIL: line 1: Token "01" is invalid. SELECT '0.1'::json; -- OK json ------ --- 90,97 ---- ERROR: invalid input syntax for type json LINE 1: SELECT '01'::json; ^ ! DETAIL: Token "01" is invalid. ! CONTEXT: JSON data, line 1: 01 SELECT '0.1'::json; -- OK json ------ *************** SELECT '1f2'::json; -- ERROR *** 111,127 **** ERROR: invalid input syntax for type json LINE 1: SELECT '1f2'::json; ^ ! DETAIL: line 1: Token "1f2" is invalid. SELECT '0.x1'::json; -- ERROR ERROR: invalid input syntax for type json LINE 1: SELECT '0.x1'::json; ^ ! DETAIL: line 1: Token "0.x1" is invalid. SELECT '1.3ex100'::json; -- ERROR ERROR: invalid input syntax for type json LINE 1: SELECT '1.3ex100'::json; ^ ! DETAIL: line 1: Token "1.3ex100" is invalid. -- Arrays. SELECT '[]'::json; -- OK json --- 120,139 ---- ERROR: invalid input syntax for type json LINE 1: SELECT '1f2'::json; ^ ! DETAIL: Token "1f2" is invalid. ! CONTEXT: JSON data, line 1: 1f2 SELECT '0.x1'::json; -- ERROR ERROR: invalid input syntax for type json LINE 1: SELECT '0.x1'::json; ^ ! DETAIL: Token "0.x1" is invalid. ! CONTEXT: JSON data, line 1: 0.x1 SELECT '1.3ex100'::json; -- ERROR ERROR: invalid input syntax for type json LINE 1: SELECT '1.3ex100'::json; ^ ! DETAIL: Token "1.3ex100" is invalid. ! CONTEXT: JSON data, line 1: 1.3ex100 -- Arrays. SELECT '[]'::json; -- OK json *************** SELECT '[1,2]'::json; -- OK *** 142,161 **** (1 row) SELECT '[1,2,]'::json; -- ERROR, trailing comma ! ERROR: invalid input syntax for type json: "[1,2,]" LINE 1: SELECT '[1,2,]'::json; ^ ! DETAIL: line 1: Expected string, number, object, array, true, false, or null, but found "]". SELECT '[1,2'::json; -- ERROR, no closing bracket ! ERROR: invalid input syntax for type json: "[1,2" LINE 1: SELECT '[1,2'::json; ^ DETAIL: The input string ended unexpectedly. SELECT '[1,[2]'::json; -- ERROR, no closing bracket ! ERROR: invalid input syntax for type json: "[1,[2]" LINE 1: SELECT '[1,[2]'::json; ^ DETAIL: The input string ended unexpectedly. -- Objects. SELECT '{}'::json; -- OK json --- 154,176 ---- (1 row) SELECT '[1,2,]'::json; -- ERROR, trailing comma ! ERROR: invalid input syntax for type json LINE 1: SELECT '[1,2,]'::json; ^ ! DETAIL: Expected string, number, object, array, true, false, or null, but found "]". ! CONTEXT: JSON data, line 1: [1,2,] SELECT '[1,2'::json; -- ERROR, no closing bracket ! ERROR: invalid input syntax for type json LINE 1: SELECT '[1,2'::json; ^ DETAIL: The input string ended unexpectedly. + CONTEXT: JSON data, line 1: [1,2 SELECT '[1,[2]'::json; -- ERROR, no closing bracket ! ERROR: invalid input syntax for type json LINE 1: SELECT '[1,[2]'::json; ^ DETAIL: The input string ended unexpectedly. + CONTEXT: JSON data, line 1: [1,[2] -- Objects. SELECT '{}'::json; -- OK json *************** SELECT '{}'::json; -- OK *** 164,173 **** (1 row) SELECT '{"abc"}'::json; -- ERROR, no value ! ERROR: invalid input syntax for type json: "{"abc"}" LINE 1: SELECT '{"abc"}'::json; ^ ! DETAIL: line 1: Expected ":", but found "}". SELECT '{"abc":1}'::json; -- OK json ----------- --- 179,189 ---- (1 row) SELECT '{"abc"}'::json; -- ERROR, no value ! ERROR: invalid input syntax for type json LINE 1: SELECT '{"abc"}'::json; ^ ! DETAIL: Expected ":", but found "}". ! CONTEXT: JSON data, line 1: {"abc"} SELECT '{"abc":1}'::json; -- OK json ----------- *************** SELECT '{"abc":1}'::json; -- OK *** 175,199 **** (1 row) SELECT '{1:"abc"}'::json; -- ERROR, keys must be strings ! ERROR: invalid input syntax for type json: "{1:"abc"}" LINE 1: SELECT '{1:"abc"}'::json; ^ ! DETAIL: line 1: Expected string or "}", but found "1". SELECT '{"abc",1}'::json; -- ERROR, wrong separator ! ERROR: invalid input syntax for type json: "{"abc",1}" LINE 1: SELECT '{"abc",1}'::json; ^ ! DETAIL: line 1: Expected ":", but found ",". SELECT '{"abc"=1}'::json; -- ERROR, totally wrong separator ERROR: invalid input syntax for type json LINE 1: SELECT '{"abc"=1}'::json; ^ ! DETAIL: line 1: Token "=" is invalid. SELECT '{"abc"::1}'::json; -- ERROR, another wrong separator ! ERROR: invalid input syntax for type json: "{"abc"::1}" LINE 1: SELECT '{"abc"::1}'::json; ^ ! DETAIL: line 1: Expected string, number, object, array, true, false, or null, but found ":". SELECT '{"abc":1,"def":2,"ghi":[3,4],"hij":{"klm":5,"nop":[6]}}'::json; -- OK json --------------------------------------------------------- --- 191,219 ---- (1 row) SELECT '{1:"abc"}'::json; -- ERROR, keys must be strings ! ERROR: invalid input syntax for type json LINE 1: SELECT '{1:"abc"}'::json; ^ ! DETAIL: Expected string or "}", but found "1". ! CONTEXT: JSON data, line 1: {1... SELECT '{"abc",1}'::json; -- ERROR, wrong separator ! ERROR: invalid input syntax for type json LINE 1: SELECT '{"abc",1}'::json; ^ ! DETAIL: Expected ":", but found ",". ! CONTEXT: JSON data, line 1: {"abc",... SELECT '{"abc"=1}'::json; -- ERROR, totally wrong separator ERROR: invalid input syntax for type json LINE 1: SELECT '{"abc"=1}'::json; ^ ! DETAIL: Token "=" is invalid. ! CONTEXT: JSON data, line 1: {"abc"=... SELECT '{"abc"::1}'::json; -- ERROR, another wrong separator ! ERROR: invalid input syntax for type json LINE 1: SELECT '{"abc"::1}'::json; ^ ! DETAIL: Expected string, number, object, array, true, false, or null, but found ":". ! CONTEXT: JSON data, line 1: {"abc"::... SELECT '{"abc":1,"def":2,"ghi":[3,4],"hij":{"klm":5,"nop":[6]}}'::json; -- OK json --------------------------------------------------------- *************** SELECT '{"abc":1,"def":2,"ghi":[3,4],"hi *** 201,215 **** (1 row) SELECT '{"abc":1:2}'::json; -- ERROR, colon in wrong spot ! ERROR: invalid input syntax for type json: "{"abc":1:2}" LINE 1: SELECT '{"abc":1:2}'::json; ^ ! DETAIL: line 1: Expected "," or "}", but found ":". SELECT '{"abc":1,3}'::json; -- ERROR, no value ! ERROR: invalid input syntax for type json: "{"abc":1,3}" LINE 1: SELECT '{"abc":1,3}'::json; ^ ! DETAIL: line 1: Expected string, but found "3". -- Miscellaneous stuff. SELECT 'true'::json; -- OK json --- 221,237 ---- (1 row) SELECT '{"abc":1:2}'::json; -- ERROR, colon in wrong spot ! ERROR: invalid input syntax for type json LINE 1: SELECT '{"abc":1:2}'::json; ^ ! DETAIL: Expected "," or "}", but found ":". ! CONTEXT: JSON data, line 1: {"abc":1:... SELECT '{"abc":1,3}'::json; -- ERROR, no value ! ERROR: invalid input syntax for type json LINE 1: SELECT '{"abc":1,3}'::json; ^ ! DETAIL: Expected string, but found "3". ! CONTEXT: JSON data, line 1: {"abc":1,3... -- Miscellaneous stuff. SELECT 'true'::json; -- OK json *************** SELECT ' true '::json; -- OK, even wit *** 236,270 **** (1 row) SELECT 'true false'::json; -- ERROR, too many values ! ERROR: invalid input syntax for type json: "true false" LINE 1: SELECT 'true false'::json; ^ ! DETAIL: line 1: Expected end of input, but found "false". SELECT 'true, false'::json; -- ERROR, too many values ! ERROR: invalid input syntax for type json: "true, false" LINE 1: SELECT 'true, false'::json; ^ ! DETAIL: line 1: Expected end of input, but found ",". SELECT 'truf'::json; -- ERROR, not a keyword ERROR: invalid input syntax for type json LINE 1: SELECT 'truf'::json; ^ ! DETAIL: line 1: Token "truf" is invalid. SELECT 'trues'::json; -- ERROR, not a keyword ERROR: invalid input syntax for type json LINE 1: SELECT 'trues'::json; ^ ! DETAIL: line 1: Token "trues" is invalid. SELECT ''::json; -- ERROR, no value ! ERROR: invalid input syntax for type json: "" LINE 1: SELECT ''::json; ^ DETAIL: The input string ended unexpectedly. SELECT ' '::json; -- ERROR, no value ! ERROR: invalid input syntax for type json: " " LINE 1: SELECT ' '::json; ^ DETAIL: The input string ended unexpectedly. --constructors -- array_to_json SELECT array_to_json(array(select 1 as a)); --- 258,298 ---- (1 row) SELECT 'true false'::json; -- ERROR, too many values ! ERROR: invalid input syntax for type json LINE 1: SELECT 'true false'::json; ^ ! DETAIL: Expected end of input, but found "false". ! CONTEXT: JSON data, line 1: true false SELECT 'true, false'::json; -- ERROR, too many values ! ERROR: invalid input syntax for type json LINE 1: SELECT 'true, false'::json; ^ ! DETAIL: Expected end of input, but found ",". ! CONTEXT: JSON data, line 1: true,... SELECT 'truf'::json; -- ERROR, not a keyword ERROR: invalid input syntax for type json LINE 1: SELECT 'truf'::json; ^ ! DETAIL: Token "truf" is invalid. ! CONTEXT: JSON data, line 1: truf SELECT 'trues'::json; -- ERROR, not a keyword ERROR: invalid input syntax for type json LINE 1: SELECT 'trues'::json; ^ ! DETAIL: Token "trues" is invalid. ! CONTEXT: JSON data, line 1: trues SELECT ''::json; -- ERROR, no value ! ERROR: invalid input syntax for type json LINE 1: SELECT ''::json; ^ DETAIL: The input string ended unexpectedly. + CONTEXT: JSON data, line 1: SELECT ' '::json; -- ERROR, no value ! ERROR: invalid input syntax for type json LINE 1: SELECT ' '::json; ^ DETAIL: The input string ended unexpectedly. + CONTEXT: JSON data, line 1: --constructors -- array_to_json SELECT array_to_json(array(select 1 as a));
On Wed, Jun 13, 2012 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> In any case, the proposed scheme for providing context requires that >>> you know where the error is before you can identify the context. I >>> considered schemes that would keep track of the last N characters or >>> line breaks in case one of them proved to be the one we need. That >>> would add enough cycles to non-error cases to almost certainly not be >>> desirable. I also considered trying to back up, but that doesn't look >>> promising either for arbitrary multibyte encodings. > >> Oh, I see. :-( > > Attached is a complete proposed patch for this, with some further > adjustments to make the output look a bit more like what we use for > SQL error context printouts (in particular, "..." at both ends of the > excerpt when appropriate). I like some of these changes - in particular, the use of errcontext(), but some of them still seem off. ! DETAIL: Token "'" is invalid. ! CONTEXT: JSON data, line 1: '... This doesn't make sense to me. ! DETAIL: Character with value 0x0a must be escaped. ! CONTEXT: JSON data, line 1: "abc ! ... This seems an odd way to present this, especially if the goal is to NOT include the character needing escaping in the log unescaped, which I thought was the point of saying 0x0a. ! DETAIL: "\u" must be followed by four hexadecimal digits. ! CONTEXT: JSON data, line 1: "\u000g... Why does this end in ... even though there's nothing further in the input string? > One thing I did not touch, but am less than happy with, is the wording > of this detail message: > > errdetail("Expected string, number, object, array, true, false, or null, but found \"%s\".", > token), > > This seems uselessly verbose to me. It could be argued that enumerating > all the options is helpful to rank JSON novices ... but unless you know > exactly what an "object" is and why that's different from the other > options, it's not all that helpful. I'm inclined to think that > something like this would be better: > > errdetail("Expected JSON value, but found \"%s\".", > token), > > Thoughts? Good idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I like some of these changes - in particular, the use of errcontext(), > but some of them still seem off. > ! DETAIL: Token "'" is invalid. > ! CONTEXT: JSON data, line 1: '... > This doesn't make sense to me. Well, the input is two single quotes and the complaint is that the first one of them doesn't constitute a valid JSON token. What would you expect to see instead? I considered putting double quotes around the excerpt text, but we do not do that in SQL error-context reports, and it seems likely to just be confusing given the prevalence of double quotes in JSON data. But happy to consider opposing arguments. Another thing that could be done here is to print the rest of the line, rather than "...", if there's not very much of it. I'm not sure that's an improvement though. The advantage of the proposed design is that the point where the excerpt ends is exactly where the error was detected; lacking an error cursor, I don't see how else to present that info. > ! DETAIL: Character with value 0x0a must be escaped. > ! CONTEXT: JSON data, line 1: "abc > ! ... > This seems an odd way to present this, especially if the goal is to > NOT include the character needing escaping in the log unescaped, which > I thought was the point of saying 0x0a. Do you think it would be better to present something that isn't what the user typed? Again, I don't see an easy improvement here. If you don't want newlines in the logged context, what will we do for something like {"foo": { "bar":44 }] There basically isn't any useful context to present unless we are willing to back up several lines, and I don't think it'll be more readable if we escape all the newlines. > ! DETAIL: "\u" must be followed by four hexadecimal digits. > ! CONTEXT: JSON data, line 1: "\u000g... > Why does this end in ... even though there's nothing further in the > input string? There is something further, namely the trailing double quote. The examples in the regression tests are not really designed to show cases where this type of error reporting is an improvement, because there's hardly any context around the error sites. regards, tom lane
On Wed, Jun 13, 2012 at 1:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I like some of these changes - in particular, the use of errcontext(), >> but some of them still seem off. > >> ! DETAIL: Token "'" is invalid. >> ! CONTEXT: JSON data, line 1: '... > >> This doesn't make sense to me. > > Well, the input is two single quotes and the complaint is that the first > one of them doesn't constitute a valid JSON token. What would you > expect to see instead? Oh, I see. > Another thing that could be done here is to print the rest of the line, > rather than "...", if there's not very much of it. I'm not sure that's > an improvement though. The advantage of the proposed design is that the > point where the excerpt ends is exactly where the error was detected; > lacking an error cursor, I don't see how else to present that info. OK. >> ! DETAIL: Character with value 0x0a must be escaped. >> ! CONTEXT: JSON data, line 1: "abc >> ! ... > >> This seems an odd way to present this, especially if the goal is to >> NOT include the character needing escaping in the log unescaped, which >> I thought was the point of saying 0x0a. > > Do you think it would be better to present something that isn't what the > user typed? Again, I don't see an easy improvement here. If you don't > want newlines in the logged context, what will we do for something like > > {"foo": { > "bar":44 > } > ] > > There basically isn't any useful context to present unless we are > willing to back up several lines, and I don't think it'll be more > readable if we escape all the newlines. Hmm. If your plan is to trace back to the opening brace you were expecting to match, I don't think that's going to work either. What if there are three pages (or 3MB) of data in between? > The examples in the regression tests are not really designed to show > cases where this type of error reporting is an improvement, because > there's hardly any context around the error sites. Yeah, true. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Jun 13, 2012 at 1:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> ! DETAIL: Character with value 0x0a must be escaped. >>> ! CONTEXT: JSON data, line 1: "abc >>> ! ... >>> >>> This seems an odd way to present this, especially if the goal is to >>> NOT include the character needing escaping in the log unescaped, which >>> I thought was the point of saying 0x0a. >> Do you think it would be better to present something that isn't what the >> user typed? Again, I don't see an easy improvement here. If you don't >> want newlines in the logged context, what will we do for something like >> >> {"foo": { >> "bar":44 >> } >> ] > Hmm. If your plan is to trace back to the opening brace you were > expecting to match, I don't think that's going to work either. What > if there are three pages (or 3MB) of data in between? No, that's not the proposal; I only anticipate printing a few dozen characters of context. But that could still mean printing something like DETAIL: expected "," or "}", but found "]".CONTEXT: JSON data, line 123: ..."bar":44 } ] which I argue is much more useful than just seeing the "]". So the question is whether it's still as useful if we mangle the whitespace. I'm thinking it's not. We don't mangle whitespace when printing SQL statements into the log, anyway. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: >>> ! DETAIL: Character with value 0x0a must be escaped. >>> ! CONTEXT: JSON data, line 1: "abc >>> ! ... >>> >>> This seems an odd way to present this, especially if the goal is to >>> NOT include the character needing escaping in the log unescaped, which >>> I thought was the point of saying 0x0a. I thought of a simple way to address that objection for this particular case: we can just truncate the context display *at* the offending character, instead of *after* it. This is playing a bit fast and loose with the general principle that the context should end at the point of detection of the error; but given that the character in question is always unprintable, I think it's probably not going to bother anyone. I've gone ahead and committed this before branching, though I'm certainly still willing to entertain suggestions for further improvement. regards, tom lane