Thread: array_in sub function ReadArrayDimensions error message

array_in sub function ReadArrayDimensions error message

From
jian he
Date:
while reviewing the json query doc,
I found out the following error message was not quite right.

select '[1,2]'::int[];
ERROR:  malformed array literal: "[1,2]"
LINE 1: select '[1,2]'::int[];
               ^
DETAIL:  Missing "]" after array dimensions.
should it be:
"Missing delimiter ":" while specifying array dimensions."
?
Because here, the first problem is the comma should be colon.


also
"DETAIL:  Missing "]" after array dimensions."
should be
DETAIL:  Missing "]" while specifying array dimensions.
?



Re: array_in sub function ReadArrayDimensions error message

From
Tom Lane
Date:
jian he <jian.universality@gmail.com> writes:
> while reviewing the json query doc,
> I found out the following error message was not quite right.

> select '[1,2]'::int[];
> ERROR:  malformed array literal: "[1,2]"
> LINE 1: select '[1,2]'::int[];
>                ^
> DETAIL:  Missing "]" after array dimensions.

> should it be:
> "Missing delimiter ":" while specifying array dimensions."

That's presuming quite a lot about the nature of the error.
All the code knows is that what follows the "1" should be
either ":" or "]", and when it sees "," instead it throws
this error.  I agree the existing message isn't great, but
trying to be more specific could confuse people even more
if the more-specific message doesn't apply either.

One possibility could be

         if (*p != ']')
             ereturn(escontext, false,
                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                      errmsg("malformed array literal: \"%s\"", origStr),
+                     (strchr(p, ']') != NULL) ?
+                     errdetail("Array dimensions have invalid syntax.") :
                      errdetail("Missing \"%s\" after array dimensions.",
                                "]")));

that is, only say "Missing "]"" if there's no ']' anywhere, and
otherwise just say the dimensions are wrong.  This could be fooled
by a ']' that's part of some string in the data, but even then the
errdetail isn't wrong.

Or we could just say "Array dimensions have invalid syntax."
unconditionally.

            regards, tom lane



On Mon, Jul 8, 2024 at 10:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> jian he <jian.universality@gmail.com> writes:
> > while reviewing the json query doc,
> > I found out the following error message was not quite right.
>
> > select '[1,2]'::int[];
> > ERROR:  malformed array literal: "[1,2]"
> > LINE 1: select '[1,2]'::int[];
> >                ^
> > DETAIL:  Missing "]" after array dimensions.
>
> > should it be:
> > "Missing delimiter ":" while specifying array dimensions."
>
> That's presuming quite a lot about the nature of the error.
> All the code knows is that what follows the "1" should be
> either ":" or "]", and when it sees "," instead it throws
> this error.  I agree the existing message isn't great, but
> trying to be more specific could confuse people even more
> if the more-specific message doesn't apply either.
>
> One possibility could be
>
>          if (*p != ']')
>              ereturn(escontext, false,
>                      (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
>                       errmsg("malformed array literal: \"%s\"", origStr),
> +                     (strchr(p, ']') != NULL) ?
> +                     errdetail("Array dimensions have invalid syntax.") :
>                       errdetail("Missing \"%s\" after array dimensions.",
>                                 "]")));
>
> that is, only say "Missing "]"" if there's no ']' anywhere, and
> otherwise just say the dimensions are wrong.  This could be fooled
> by a ']' that's part of some string in the data, but even then the
> errdetail isn't wrong.
>
> Or we could just say "Array dimensions have invalid syntax."
> unconditionally.
>
>                         regards, tom lane


we can
if (*p == ':')
{
....

if (*p != ']')
ereturn(escontext, false,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("malformed array literal: \"%s\"", origStr),
errdetail("Missing \"%s\" after array dimensions.",
"]")));
}
else
{
if (*p != ']')
ereturn(escontext, false,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("malformed array literal: \"%s\"", origStr),
errdetail("Missing delimiter \"%s\" while specifying array dimensions.",
":")));
}


different error message in IF ELSE blocks.
please check attached.

Attachment

Re: array_in sub function ReadArrayDimensions error message

From
Tom Lane
Date:
jian he <jian.universality@gmail.com> writes:
> On Mon, Jul 8, 2024 at 10:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> One possibility could be
>> ...
>> that is, only say "Missing "]"" if there's no ']' anywhere, and
>> otherwise just say the dimensions are wrong.  This could be fooled
>> by a ']' that's part of some string in the data, but even then the
>> errdetail isn't wrong.
>> 
>> Or we could just say "Array dimensions have invalid syntax."
>> unconditionally.

> please check attached.

Meh.  This just replaces one possibly-off-point error message
with a different one that will be off-point in a different set
of cases.  I think we should simply reduce the specificity of
the message.

BTW, I think we have pretty much the same issue with respect
to the check for "=" later on.  For instance, if someone
is confused about whether to insert commas:

=# select '[1:1],[1:2]={{3,4}}'::int[];
ERROR:  malformed array literal: "[1:1],[1:2]={{3,4}}"
LINE 1: select  '[1:1],[1:2]={{3,4}}'::int[];
                ^
DETAIL:  Missing "=" after array dimensions.

Here again, the problem is not a missing "=", it's invalid
syntax somewhere before that.

Another thing we could consider doing here (and similarly
for your original case) is

DETAIL:  Expected "=" not "," after array dimensions.

The advantage of this is that it provides a little more
clarity as to exactly where things went wrong.

            regards, tom lane



Re: array_in sub function ReadArrayDimensions error message

From
"David G. Johnston"
Date:
On Tue, Jul 9, 2024 at 8:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here again, the problem is not a missing "=", it's invalid
syntax somewhere before that.

Another thing we could consider doing here (and similarly
for your original case) is

DETAIL:  Expected "=" not "," after array dimensions.

The advantage of this is that it provides a little more
clarity as to exactly where things went wrong.


One possibility all this ignores is that what we are calling array-dimensions are in reality most likely a user using json array syntax in our SQL arrays.  That seems eminently more likely than someone mis-typing this niche incantation of building an array literal nowadays.  I'd add a hint if the first symbol is [ and we fail to get to the point of actually seeing the equal sign or the first subsequent unquoted symbol is a comma instead of a colon.

David J.

Re: array_in sub function ReadArrayDimensions error message

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> One possibility all this ignores is that what we are calling
> array-dimensions are in reality most likely a user using json array syntax
> in our SQL arrays.  That seems eminently more likely than someone
> mis-typing this niche incantation of building an array literal nowadays.

Yeah, that's a good point.

> I'd add a hint if the first symbol is [ and we fail to get to the point of
> actually seeing the equal sign or the first subsequent unquoted symbol is a
> comma instead of a colon.

That seems closely related to my suggestion of applying strchr() to
see if the character we are expecting to see actually appears
anywhere.  Or did you have something else in mind?  Please be
specific, both about the test you are thinking of and how the
message should be worded.

            regards, tom lane



Re: array_in sub function ReadArrayDimensions error message

From
"David G. Johnston"
Date:
On Tue, Jul 9, 2024 at 8:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:

> I'd add a hint if the first symbol is [ and we fail to get to the point of
> actually seeing the equal sign or the first subsequent unquoted symbol is a
> comma instead of a colon.

That seems closely related to my suggestion of applying strchr() to
see if the character we are expecting to see actually appears
anywhere.  Or did you have something else in mind?  Please be
specific, both about the test you are thinking of and how the
message should be worded.


Pseudo-code, the syntax for adding a conditional errhint I didn't try to learn along with figuring out the logic.  Also not totally sure on the ReadDimensionInt behavior interplay here.

In short, the ambiguous first dimension [n] case is cleared by seeing either a second dimension or the equal sign separator. (ToDo: see how end-of-string get resolved)

The [n:m] first dimension case clears as soon as the colon is seen.

The normal, undimensioned case clears once the first non-blank character is not a [

If we error before clearing we assume the query author provided an SQL array using json syntax and point out that the delimiters for SQL arrays are {}.
We also assume, in the single bounds case, that a too-large upper-bound value means they did intend to supply a single number in a json array format.  We would need to modify these tests so they occur after checking whether the next part of the string is [ or = but, in the [ case, before processing the next dimension.

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index d6641b570d..0ac1eabba1 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -404,7 +404,7 @@ ReadArrayDimensions(char **srcptr, int *ndim_p, int *dim, int *lBound,
 {
        char       *p = *srcptr;
        int                     ndim;
-
+    bool couldBeJson = true;
        /*
         * Dimension info takes the form of one or more [n] or [m:n] items.  This
         * loop iterates once per dimension item.
@@ -422,8 +422,19 @@ ReadArrayDimensions(char **srcptr, int *ndim_p, int *dim, int *lBound,
                 */
                while (scanner_isspace(*p))
                        p++;
+               if (*p == '=') {
+                       //if we have an equals sign we are not dealing with a json array
+                       couldBeJson = false;
+                       break; // and we are at the end of the bounds specification for the SQL array literal we do have
+               }
                if (*p != '[')
-                       break;                          /* no more dimension items */
+               {
+                       couldBeJson = false; // json arrays will start with [
+                       break;  
+                       // on subsequent passes we better have either this or an equal sign and the later is covered above
+               }                       /* no (more?) dimension items */
+               if (ndim > 0)
+                       couldBeJson = false; //multi-dimensional arrays specs are invalid json arrays
                p++;
                if (ndim >= MAXDIM)
                        ereturn(escontext, false,
@@ -438,11 +449,18 @@ ReadArrayDimensions(char **srcptr, int *ndim_p, int *dim, int *lBound,
                        ereturn(escontext, false,
                                        (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                         errmsg("malformed array literal: \"%s\"", origStr),
-                                        errdetail("\"[\" must introduce explicitly-specified array dimensions.")));
+                                        errdetail("\"[\" must introduce explicitly-specified array dimensions.")),
+                                        //if it isn't a digit and we might still have a json array its a good bet it is one
+                                        //with non-numeric content
+                                        couldBeJson ? errhint("SQL array vaLues are delimited by {}" : null));
 
                if (*p == ':')
                {
                        /* [m:n] format */
+                       //if we have numbers as the first entry, the presence of a colon,
+                       //which is not a valid json separator, in a number literal or an array,
+                       //means we have a bounds specification in an SQL array
+                       couldBeJson = false;
                        lBound[ndim] = i;
                        p++;
                        q = p;
@@ -454,18 +472,22 @@ ReadArrayDimensions(char **srcptr, int *ndim_p, int *dim, int *lBound,
                                                 errmsg("malformed array literal: \"%s\"", origStr),
                                                 errdetail("Missing array dimension value.")));
                }
-               else
+               else if (*p == ']')
                {
                        /* [n] format */
+                       //single digit doesn't disprove json with single number element
                        lBound[ndim] = 1;
                        ub = i;
                }
-               if (*p != ']')
+               else
+               {
                        ereturn(escontext, false,
                                        (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                         errmsg("malformed array literal: \"%s\"", origStr),
                                         errdetail("Missing \"%s\" after array dimensions.",
-                                                          "]")));
+                                                          "]")),
+                                       couldBeJson ? errhint("SQL array values are delimited by {}" : null));
+               }
                p++;
 
                /*
@@ -475,29 +497,37 @@ ReadArrayDimensions(char **srcptr, int *ndim_p, int *dim, int *lBound,
                 * would be equivalent.  Given the lack of field demand, there seems
                 * little point in allowing such cases.
                 */
+               //not possible in the single bound case so cannot be json
                if (ub < lBound[ndim])
                        ereturn(escontext, false,
                                        (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
                                         errmsg("upper bound cannot be less than lower bound")));
 
                /* Upper bound of INT_MAX must be disallowed, cf ArrayCheckBounds() */
+               // the singular json number may very well be larger than an integer...
                if (ub == INT_MAX)
                        ereturn(escontext, false,
                                        (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-                                        errmsg("array upper bound is too large: %d", ub)));
+                                        errmsg("array upper bound is too large: %d", ub),
+                                        couldBeJson ? errhint("SQL array values are delimited by {}" : null)));
 
                /* Compute "ub - lBound[ndim] + 1", detecting overflow */
+               //a whatever this threshold is a random number passed in a json array likely will exceed it
                if (pg_sub_s32_overflow(ub, lBound[ndim], &ub) ||
                        pg_add_s32_overflow(ub, 1, &ub))
                        ereturn(escontext, false,
                                        (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                                         errmsg("array size exceeds the maximum allowed (%d)",
-                                                       (int) MaxArraySize)));
+                                                       (int) MaxArraySize,
+                                       couldBeJson ? errhint("SQL array values are delimited by {}" : null))));
 
                dim[ndim] = ub;
                ndim++;
        }
 
+    if (couldBeJson == true)
+           assert('not reachable, need to disprove json array literal prior to returning');
+
        *srcptr = p;
        *ndim_p = ndim;
        return true;


David J.

On Tue, Jul 9, 2024 at 11:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> jian he <jian.universality@gmail.com> writes:
> > On Mon, Jul 8, 2024 at 10:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> One possibility could be
> >> ...
> >> that is, only say "Missing "]"" if there's no ']' anywhere, and
> >> otherwise just say the dimensions are wrong.  This could be fooled
> >> by a ']' that's part of some string in the data, but even then the
> >> errdetail isn't wrong.
> >>
> >> Or we could just say "Array dimensions have invalid syntax."
> >> unconditionally.
>
> > please check attached.
>
> Meh.  This just replaces one possibly-off-point error message
> with a different one that will be off-point in a different set
> of cases.  I think we should simply reduce the specificity of
> the message.

After playing around, I agree with you that reducing the specificity of
the message here is better.


> BTW, I think we have pretty much the same issue with respect
> to the check for "=" later on.  For instance, if someone
> is confused about whether to insert commas:
>
> =# select '[1:1],[1:2]={{3,4}}'::int[];
> ERROR:  malformed array literal: "[1:1],[1:2]={{3,4}}"
> LINE 1: select  '[1:1],[1:2]={{3,4}}'::int[];
>                 ^
> DETAIL:  Missing "=" after array dimensions.
>
> Here again, the problem is not a missing "=", it's invalid
> syntax somewhere before that.
>
> Another thing we could consider doing here (and similarly
> for your original case) is
>
> DETAIL:  Expected "=" not "," after array dimensions.
>
> The advantage of this is that it provides a little more
> clarity as to exactly where things went wrong.
>
this looks good to me.


I also found other issues, like.
select '[-2:2147483644]={3}'::int[];
ERROR:  malformed array literal: "[-2:2147483644]={3}"
LINE 1: select '[-2:2147483644]={3}'::int[];
               ^
DETAIL:  Specified array dimensions do not match array contents.

select '[-2:2147483645]={3}'::int[];
ERROR:  array size exceeds the maximum allowed (134217727)
LINE 1: select '[-2:2147483645]={3}'::int[];
               ^

Both cases exceed the maximum allowed (134217727), but error is different.
also
"ERROR:  array size exceeds the maximum allowed (134217727)"
is weird. we are still validating the dimension info function, we
don't even know the actual size of the array.
maybe
"ERROR:  array dimensions specified array size exceeds the maximum
allowed (134217727)"


This customized dimension info is kind of obscure, even the
documentation of it is buried
in the third paragraph of
https://www.postgresql.org/docs/current/arrays.html#ARRAYS-IO


to address David G. Johnston concern,
in ReadArrayDimensions, some error message can unconditionally mention
errhint like:
errhint("array dimension template is \"[lower_bound:upper_bound]\", we
can optional have 6 of this"



Re: array_in sub function ReadArrayDimensions error message

From
"David G. Johnston"
Date:
On Tuesday, July 9, 2024, jian he <jian.universality@gmail.com> wrote:

I also found other issues, like.
select '[-2:2147483644]={3}'::int[];
ERROR:  malformed array literal: "[-2:2147483644]={3}"
LINE 1: select '[-2:2147483644]={3}'::int[];
               ^
DETAIL:  Specified array dimensions do not match array contents.

select '[-2:2147483645]={3}'::int[];
ERROR:  array size exceeds the maximum allowed (134217727)
LINE 1: select '[-2:2147483645]={3}'::int[];
               ^

Both cases exceed the maximum allowed (134217727), but error is different.
also


They do not both exceed (i.e., strictly greater than) 134217727 - the first case equals it which is why it gets past the dimension specification parser.

"ERROR:  array size exceeds the maximum allowed (134217727)"
is weird. we are still validating the dimension info function, we
don't even know the actual size of the array. 
maybe
"ERROR:  array dimensions specified array size exceeds the maximum
allowed (134217727)"

How about: 

“array dimension %d declared size exceeds the maximum allowed %d”, ndim, (int) MaxArraySize

But while that message fits the code aren’t we supposed to be checking, after processing all dimensions, whether the combined number of cells is greater than MaxArraySize?  Obviously if any one dimension is the whole thing will be, so this specific check and error is still useful.

to address David G. Johnston concern,
in ReadArrayDimensions, some error message can unconditionally mention
errhint like:
errhint("array dimension template is \"[lower_bound:upper_bound]\", we
can optional have 6 of this"

I suppose, but if they are writing json array syntax that hint is going to mean little to them. Pointing out that their use of [] brackets is wrong and that {} should be used seems more helpful.  The extent we need to consider people writing literal dimensions to set their lower bound to something other than one seems close to none and not needing a hint, IMO.  

That said, it isn’t making it back to us if our users are actually having this confusion and would benefit meaningfully from such a hint.

David J.