Re: Cleaning up array_in() - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Cleaning up array_in()
Date
Msg-id 874153.1688495594@sss.pgh.pa.us
Whole thread Raw
In response to Re: Cleaning up array_in()  (Nikhil Benesch <nikhil.benesch@gmail.com>)
Responses Re: Cleaning up array_in()
List pgsql-hackers
Nikhil Benesch <nikhil.benesch@gmail.com> writes:
> I spotted a few opportunities for further reducing state tracked by
> `ArrayCount`. You may not find all of these suggestions to be
> worthwhile.

I found some time today to look at these points.

> 1) `in_quotes` appears to be wholly redundant with `parse_state ==
> ARRAY_QUOTED_ELEM_STARTED`.

I agree that it is redundant, but I'm disinclined to remove it because
the in_quotes logic matches that in ReadArrayStr.  I think it's better
to keep those two functions in sync.  The parse_state represents an
independent set of checks that need not be repeated by ReadArrayStr,
but both functions have to track quoting.  The same for eoArray.

> 2) The `empty_array` special case does not seem to be important to
> ArrayCount's callers, which don't even special case `ndims == 0` but
> rather `ArrayGetNItemsSafe(..) == 0`. Perhaps this is a philosophical
> question as to whether `ArrayCount('{{}, {}}')` should return
> (ndims=2, dims=[2, 0]) or (ndims=0). Obviously someone needs to do
> that normalization, but `ArrayCount` could leave that normalization to
> `ReadArrayStr`.

This idea I do like.  While looking at the callers, I also noticed
that it's impossible currently to write an empty array with explicit
specification of bounds.  It seems to me that you ought to be able
to write, say,

SELECT '[1:0]={}'::int[];

but up to now you got "upper bound cannot be less than lower bound";
and if you somehow got past that, you'd get "Specified array
dimensions do not match array contents." because of ArrayCount's
premature optimization of "one-dimensional array with length zero"
to "zero-dimensional array".  We can fix that by doing what you said
and adjusting the initial bounds restriction to be "upper bound cannot
be less than lower bound minus one".

> I also have a sense that `ndims_frozen` made the distinction between
> `ARRAY_ELEM_DELIMITED` and `ARRAY_LEVEL_DELIMITED` unnecessary, and
> the two states could be merged into a single `ARRAY_DELIMITED` state,
> but I've not pulled on this thread hard enough to say so confidently.

I looked at jian he's implementation of that and was not impressed:
I do not think the logic gets any clearer, and it seems to me that
this makes a substantial dent in ArrayCount's ability to detect syntax
errors.  The fact that one of the test case error messages got better
seems pretty accidental to me.  We can get the same result in a more
purposeful way by giving a different error message for
ARRAY_ELEM_DELIMITED.

So I end up with the attached.  I went ahead and dropped
ArrayGetOffset0() as part of 0001, and I split 0002 into two patches
where the new 0002 avoids re-indenting any existing code in order
to ease review, and then 0003 is just a mechanical application
of pgindent.

I still didn't do anything about "number of array dimensions (7)
exceeds the maximum allowed (6)".  There are quite a few instances
of that wording, not only array_in's, and I'm not sure whether to
change the rest.  In any case that looks like something that
could be addressed separately.  The other error message wording
changes here seem to me to be okay.

            regards, tom lane

From 82cac24a618db69e149c140e7064eebda9f1ddfc Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue, 4 Jul 2023 12:39:41 -0400
Subject: [PATCH v2 1/3] Simplify and speed up ReadArrayStr().

ReadArrayStr() seems to have been written on the assumption that
non-rectangular input is fine and it should pad with NULLs anywhere
that elements are missing.  We disallowed non-rectangular input
ages ago (commit 0e13d627b), but never simplified this function
as a follow-up.  In particular, the existing code recomputes each
element's linear location from scratch, which is quite unnecessary
for rectangular input: we can just assign the elements sequentially,
saving lots of arithmetic.  Add some more commentary while at it.

ArrayGetOffset0() is no longer used anywhere, so remove it.
---
 src/backend/utils/adt/arrayfuncs.c | 69 ++++++++++++++----------------
 src/backend/utils/adt/arrayutils.c | 15 -------
 src/include/utils/array.h          |  1 -
 3 files changed, 33 insertions(+), 52 deletions(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 9000f83a83..050568808a 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -93,7 +93,7 @@ static bool array_isspace(char ch);
 static int    ArrayCount(const char *str, int *dim, char typdelim,
                        Node *escontext);
 static bool ReadArrayStr(char *arrayStr, const char *origStr,
-                         int nitems, int ndim, int *dim,
+                         int nitems,
                          FmgrInfo *inputproc, Oid typioparam, int32 typmod,
                          char typdelim,
                          int typlen, bool typbyval, char typalign,
@@ -391,7 +391,7 @@ array_in(PG_FUNCTION_ARGS)
     dataPtr = (Datum *) palloc(nitems * sizeof(Datum));
     nullsPtr = (bool *) palloc(nitems * sizeof(bool));
     if (!ReadArrayStr(p, string,
-                      nitems, ndim, dim,
+                      nitems,
                       &my_extra->proc, typioparam, typmod,
                       typdelim,
                       typlen, typbyval, typalign,
@@ -457,7 +457,8 @@ array_isspace(char ch)

 /*
  * ArrayCount
- *     Determines the dimensions for an array string.
+ *     Determines the dimensions for an array string.  This includes
+ *     syntax-checking the array structure decoration (braces and delimiters).
  *
  * Returns number of dimensions as function result.  The axis lengths are
  * returned in dim[], which must be of size MAXDIM.
@@ -704,16 +705,14 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
 /*
  * ReadArrayStr :
  *     parses the array string pointed to by "arrayStr" and converts the values
- *     to internal format.  Unspecified elements are initialized to nulls.
- *     The array dimensions must already have been determined.
+ *     to internal format.  The array dimensions must have been determined,
+ *     and the case of an empty array must have been handled earlier.
  *
  * Inputs:
  *    arrayStr: the string to parse.
  *              CAUTION: the contents of "arrayStr" will be modified!
  *    origStr: the unmodified input string, used only in error messages.
  *    nitems: total number of array elements, as already determined.
- *    ndim: number of array dimensions
- *    dim[]: array axis lengths
  *    inputproc: type-specific input procedure for element datatype.
  *    typioparam, typmod: auxiliary values to pass to inputproc.
  *    typdelim: the value delimiter (type-specific).
@@ -738,8 +737,6 @@ static bool
 ReadArrayStr(char *arrayStr,
              const char *origStr,
              int nitems,
-             int ndim,
-             int *dim,
              FmgrInfo *inputproc,
              Oid typioparam,
              int32 typmod,
@@ -753,20 +750,13 @@ ReadArrayStr(char *arrayStr,
              int32 *nbytes,
              Node *escontext)
 {
-    int            i,
+    int            i = 0,
                 nest_level = 0;
     char       *srcptr;
     bool        in_quotes = false;
     bool        eoArray = false;
     bool        hasnull;
     int32        totbytes;
-    int            indx[MAXDIM] = {0},
-                prod[MAXDIM];
-
-    mda_get_prod(ndim, dim, prod);
-
-    /* Initialize is-null markers to true */
-    memset(nulls, true, nitems * sizeof(bool));

     /*
      * We have to remove " and \ characters to create a clean item value to
@@ -789,11 +779,20 @@ ReadArrayStr(char *arrayStr,
         bool        itemdone = false;
         bool        leadingspace = true;
         bool        hasquoting = false;
-        char       *itemstart;
-        char       *dstptr;
-        char       *dstendptr;
+        char       *itemstart;    /* start of de-escaped text */
+        char       *dstptr;        /* next output point for de-escaped text */
+        char       *dstendptr;    /* last significant output char + 1 */

-        i = -1;
+        /*
+         * Parse next array element, collecting the de-escaped text into
+         * itemstart..dstendptr-1.
+         *
+         * Notice that we do not set "itemdone" until we see a separator
+         * (typdelim character) or the array's final right brace.  Since the
+         * array is already verified to be nonempty and rectangular, there is
+         * guaranteed to be another element to be processed in the first case,
+         * while in the second case of course we'll exit the outer loop.
+         */
         itemstart = dstptr = dstendptr = srcptr;

         while (!itemdone)
@@ -840,13 +839,7 @@ ReadArrayStr(char *arrayStr,
                 case '{':
                     if (!in_quotes)
                     {
-                        if (nest_level >= ndim)
-                            ereturn(escontext, false,
-                                    (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-                                     errmsg("malformed array literal: \"%s\"",
-                                            origStr)));
                         nest_level++;
-                        indx[nest_level - 1] = 0;
                         srcptr++;
                     }
                     else
@@ -860,14 +853,9 @@ ReadArrayStr(char *arrayStr,
                                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                      errmsg("malformed array literal: \"%s\"",
                                             origStr)));
-                        if (i == -1)
-                            i = ArrayGetOffset0(ndim, indx, prod);
-                        indx[nest_level - 1] = 0;
                         nest_level--;
                         if (nest_level == 0)
                             eoArray = itemdone = true;
-                        else
-                            indx[nest_level - 1]++;
                         srcptr++;
                     }
                     else
@@ -878,10 +866,7 @@ ReadArrayStr(char *arrayStr,
                         *dstptr++ = *srcptr++;
                     else if (*srcptr == typdelim)
                     {
-                        if (i == -1)
-                            i = ArrayGetOffset0(ndim, indx, prod);
                         itemdone = true;
-                        indx[ndim - 1]++;
                         srcptr++;
                     }
                     else if (array_isspace(*srcptr))
@@ -905,15 +890,18 @@ ReadArrayStr(char *arrayStr,
             }
         }

+        /* Terminate de-escaped string */
         Assert(dstptr < srcptr);
         *dstendptr = '\0';

-        if (i < 0 || i >= nitems)
+        /* Safety check that we don't write past the output arrays */
+        if (i >= nitems)
             ereturn(escontext, false,
                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                      errmsg("malformed array literal: \"%s\"",
                             origStr)));

+        /* Convert the de-escaped string into the next output array entries */
         if (Array_nulls && !hasquoting &&
             pg_strcasecmp(itemstart, "NULL") == 0)
         {
@@ -934,8 +922,17 @@ ReadArrayStr(char *arrayStr,
                 return false;
             nulls[i] = false;
         }
+
+        i++;
     }

+    /* Cross-check that we filled all the output array entries */
+    if (i != nitems)
+        ereturn(escontext, false,
+                (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                 errmsg("malformed array literal: \"%s\"",
+                        origStr)));
+
     /*
      * Check for nulls, compute total data space needed
      */
diff --git a/src/backend/utils/adt/arrayutils.c b/src/backend/utils/adt/arrayutils.c
index 62152f79f5..b8bccc7cca 100644
--- a/src/backend/utils/adt/arrayutils.c
+++ b/src/backend/utils/adt/arrayutils.c
@@ -43,21 +43,6 @@ ArrayGetOffset(int n, const int *dim, const int *lb, const int *indx)
     return offset;
 }

-/*
- * Same, but subscripts are assumed 0-based, and use a scale array
- * instead of raw dimension data (see mda_get_prod to create scale array)
- */
-int
-ArrayGetOffset0(int n, const int *tup, const int *scale)
-{
-    int            i,
-                lin = 0;
-
-    for (i = 0; i < n; i++)
-        lin += tup[i] * scale[i];
-    return lin;
-}
-
 /*
  * Convert array dimensions into number of elements
  *
diff --git a/src/include/utils/array.h b/src/include/utils/array.h
index b13dfb345e..d49adc38b7 100644
--- a/src/include/utils/array.h
+++ b/src/include/utils/array.h
@@ -448,7 +448,6 @@ extern void array_free_iterator(ArrayIterator iterator);
  */

 extern int    ArrayGetOffset(int n, const int *dim, const int *lb, const int *indx);
-extern int    ArrayGetOffset0(int n, const int *tup, const int *scale);
 extern int    ArrayGetNItems(int ndim, const int *dims);
 extern int    ArrayGetNItemsSafe(int ndim, const int *dims,
                                struct Node *escontext);
--
2.39.3

From 64f318bbd5a43e59749657c3972c4c6eaa58af8f Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue, 4 Jul 2023 14:03:40 -0400
Subject: [PATCH v2 2/3] Rewrite ArrayCount() to make dimensionality checks
 more complete.

ArrayCount has had a few layers of revision over the years,
reaching a point where it's really quite baroque and unintelligible.
There are actually four different arrays holding dimensions, which
can be reduced to two with little effort.  It was also close to
impossible to figure out why the dimension-counting actually worked
correctly.  Rewrite to make it perhaps a little less opaque.
In particular, a new element or subarray is now counted when we see
the start of the element or subarray, not at some later point where
we might not even be at the same nest_level anymore.

Also, install guards to catch non-rectangularity cases that were
previously missed, along the same lines as recent plperl and plpython
fixes: we were not checking that all scalar elements appear at the
same nesting depth.  Thanks to the very different logic, this seems
not to have caused any of the sort of internal errors that the PLs
suffered from, but it's still wrong.

On the other hand, the behavior of plperl and plpython suggests
that we should allow empty sub-arrays, as in '{{},{}}'; those
languages interpret equivalent constructs such as '[[],[]]' as
an empty (hence zero-dimensional) array, and it seems unclear
why array_in should not.  Remove the hack that was installed to
reject that case.

Also, fix things so that it's possible to write an empty array
with explicit bounds spec, along the lines of '[1:0]={}'.
Before, that was rejected with a complaint about upper bound
less than lower, but I don't see a good reason not to allow it.

In passing, remove ArrayParseState.ARRAY_ELEM_COMPLETED, a state
that the state machine never entered yet wasted cycles checking for.

(This patch looks a bit odd because I tried to avoid reindenting
existing code, despite having removed one level of loop.  That
seemed to me to make it easier to read the code changes.  A
follow-up pgindent run is needed.)
---
 src/backend/utils/adt/arrayfuncs.c   | 241 +++++++++++++++++----------
 src/test/regress/expected/arrays.out |  68 +++++++-
 src/test/regress/sql/arrays.sql      |  13 +-
 3 files changed, 230 insertions(+), 92 deletions(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 050568808a..4fb052b23e 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -57,7 +57,6 @@ typedef enum
     ARRAY_NO_LEVEL,
     ARRAY_LEVEL_STARTED,
     ARRAY_ELEM_STARTED,
-    ARRAY_ELEM_COMPLETED,
     ARRAY_QUOTED_ELEM_STARTED,
     ARRAY_QUOTED_ELEM_COMPLETED,
     ARRAY_ELEM_DELIMITED,
@@ -302,10 +301,18 @@ array_in(PG_FUNCTION_ARGS)
         *q = '\0';
         ub = atoi(p);
         p = q + 1;
-        if (ub < lBound[ndim])
+
+        /* Upper bound of INT_MAX is disallowed, cf ArrayCheckBounds() */
+        if (ub == INT_MAX)
+            ereturn(escontext, (Datum) 0,
+                    (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                     errmsg("array upper bound is too large: %d",
+                            ub)));
+        /* Now it's safe to compute ub + 1 */
+        if (ub + 1 < lBound[ndim])
             ereturn(escontext, (Datum) 0,
                     (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
-                     errmsg("upper bound cannot be less than lower bound")));
+                     errmsg("upper bound cannot be less than lower bound minus one")));

         dim[ndim] = ub - lBound[ndim] + 1;
         ndim++;
@@ -461,7 +468,9 @@ array_isspace(char ch)
  *     syntax-checking the array structure decoration (braces and delimiters).
  *
  * Returns number of dimensions as function result.  The axis lengths are
- * returned in dim[], which must be of size MAXDIM.
+ * returned in dim[], which must be of size MAXDIM.  This function does not
+ * special-case empty arrays: it will return a positive result with some
+ * zero axis length(s).
  *
  * If we detect an error, fill *escontext with error details and return -1
  * (unless escontext isn't provided, in which case errors will be thrown).
@@ -472,31 +481,22 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
     int            nest_level = 0,
                 i;
     int            ndim = 1,
-                temp[MAXDIM],
-                nelems[MAXDIM],
-                nelems_last[MAXDIM];
+                nelems[MAXDIM];
+    bool        ndim_frozen = false;
     bool        in_quotes = false;
     bool        eoArray = false;
-    bool        empty_array = true;
     const char *ptr;
     ArrayParseState parse_state = ARRAY_NO_LEVEL;

+    /* Initialize dim[] entries to -1 meaning "unknown" */
     for (i = 0; i < MAXDIM; ++i)
-    {
-        temp[i] = dim[i] = nelems_last[i] = 0;
-        nelems[i] = 1;
-    }
+        dim[i] = -1;

+    /* Scan string until we reach closing brace */
     ptr = str;
     while (!eoArray)
     {
-        bool        itemdone = false;
-
-        while (!itemdone)
-        {
-            if (parse_state == ARRAY_ELEM_STARTED ||
-                parse_state == ARRAY_QUOTED_ELEM_STARTED)
-                empty_array = false;
+        bool        new_element = false;

             switch (*ptr)
             {
@@ -513,17 +513,25 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                      * start, or after an element delimiter. In any case we
                      * now must be past an element start.
                      */
-                    if (parse_state != ARRAY_LEVEL_STARTED &&
-                        parse_state != ARRAY_ELEM_STARTED &&
-                        parse_state != ARRAY_QUOTED_ELEM_STARTED &&
-                        parse_state != ARRAY_ELEM_DELIMITED)
+                switch (parse_state)
+                {
+                    case ARRAY_LEVEL_STARTED:
+                    case ARRAY_ELEM_DELIMITED:
+                        /* start new unquoted element */
+                        parse_state = ARRAY_ELEM_STARTED;
+                        new_element = true;
+                        break;
+                    case ARRAY_ELEM_STARTED:
+                    case ARRAY_QUOTED_ELEM_STARTED:
+                        /* already in element */
+                        break;
+                    default:
                         ereturn(escontext, -1,
                                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                  errmsg("malformed array literal: \"%s\"", str),
                                  errdetail("Unexpected \"%c\" character.",
                                            '\\')));
-                    if (parse_state != ARRAY_QUOTED_ELEM_STARTED)
-                        parse_state = ARRAY_ELEM_STARTED;
+                }
                     /* skip the escaped character */
                     if (*(ptr + 1))
                         ptr++;
@@ -540,18 +548,28 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                      * element start, or after an element delimiter. In any
                      * case we now must be past an element start.
                      */
-                    if (parse_state != ARRAY_LEVEL_STARTED &&
-                        parse_state != ARRAY_QUOTED_ELEM_STARTED &&
-                        parse_state != ARRAY_ELEM_DELIMITED)
+                switch (parse_state)
+                {
+                    case ARRAY_LEVEL_STARTED:
+                    case ARRAY_ELEM_DELIMITED:
+                        /* start new quoted element */
+                        Assert(!in_quotes);
+                        in_quotes = true;
+                        parse_state = ARRAY_QUOTED_ELEM_STARTED;
+                        new_element = true;
+                        break;
+                    case ARRAY_QUOTED_ELEM_STARTED:
+                        /* already in element, end it */
+                        Assert(in_quotes);
+                        in_quotes = false;
+                        parse_state = ARRAY_QUOTED_ELEM_COMPLETED;
+                        break;
+                    default:
                         ereturn(escontext, -1,
                                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                  errmsg("malformed array literal: \"%s\"", str),
                                  errdetail("Unexpected array element.")));
-                    in_quotes = !in_quotes;
-                    if (in_quotes)
-                        parse_state = ARRAY_QUOTED_ELEM_STARTED;
-                    else
-                        parse_state = ARRAY_QUOTED_ELEM_COMPLETED;
+                }
                     break;
                 case '{':
                     if (!in_quotes)
@@ -559,27 +577,52 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                         /*
                          * A left brace can occur if no nesting has occurred
                          * yet, after a level start, or after a level
-                         * delimiter.
+                         * delimiter.  If we see one after an element
+                         * delimiter, we have something like "{1,{2}}", so
+                         * complain about non-rectangularity.
                          */
-                        if (parse_state != ARRAY_NO_LEVEL &&
-                            parse_state != ARRAY_LEVEL_STARTED &&
-                            parse_state != ARRAY_LEVEL_DELIMITED)
+                    switch (parse_state)
+                    {
+                        case ARRAY_NO_LEVEL:
+                        case ARRAY_LEVEL_STARTED:
+                        case ARRAY_LEVEL_DELIMITED:
+                            /* okay */
+                            break;
+                        case ARRAY_ELEM_DELIMITED:
+                            ereturn(escontext, -1,
+                                    (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                                     errmsg("malformed array literal: \"%s\"", str),
+                                     errdetail("Multidimensional arrays must have sub-arrays with matching
dimensions.")));
+                        default:
                             ereturn(escontext, -1,
                                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                      errmsg("malformed array literal: \"%s\"", str),
                                      errdetail("Unexpected \"%c\" character.",
                                                '{')));
+                    }
                         parse_state = ARRAY_LEVEL_STARTED;
+                    /* Nested sub-arrays count as elements of outer level */
+                    if (nest_level > 0)
+                        nelems[nest_level - 1]++;
+                    /* Initialize element counting in the new level */
                         if (nest_level >= MAXDIM)
                             ereturn(escontext, -1,
                                     (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                                      errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
                                             nest_level + 1, MAXDIM)));
-                        temp[nest_level] = 0;
+                    nelems[nest_level] = 0;
                         nest_level++;
                         if (ndim < nest_level)
+                    {
+                        /* Can't increase ndim once it's frozen */
+                        if (ndim_frozen)
+                            ereturn(escontext, -1,
+                                    (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                                     errmsg("malformed array literal: \"%s\"", str),
+                                     errdetail("Multidimensional arrays must have sub-arrays with matching
dimensions.")));
                             ndim = nest_level;
                     }
+                }
                     break;
                 case '}':
                     if (!in_quotes)
@@ -587,46 +630,56 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                         /*
                          * A right brace can occur after an element start, an
                          * element completion, a quoted element completion, or
-                         * a level completion.
+                         * a level completion.  We also allow it after a level
+                         * start, that is an empty sub-array "{}" --- but that
+                         * freezes the number of dimensions and all such
+                         * sub-arrays must be at the same level, just like
+                         * sub-arrays containing elements.
                          */
-                        if (parse_state != ARRAY_ELEM_STARTED &&
-                            parse_state != ARRAY_ELEM_COMPLETED &&
-                            parse_state != ARRAY_QUOTED_ELEM_COMPLETED &&
-                            parse_state != ARRAY_LEVEL_COMPLETED &&
-                            !(nest_level == 1 && parse_state == ARRAY_LEVEL_STARTED))
+                    switch (parse_state)
+                    {
+                        case ARRAY_ELEM_STARTED:
+                        case ARRAY_QUOTED_ELEM_COMPLETED:
+                        case ARRAY_LEVEL_COMPLETED:
+                            /* okay */
+                            break;
+                        case ARRAY_LEVEL_STARTED:
+                            /* empty sub-array: OK if at correct nest_level */
+                            ndim_frozen = true;
+                            if (nest_level != ndim)
                             ereturn(escontext, -1,
                                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                      errmsg("malformed array literal: \"%s\"", str),
-                                     errdetail("Unexpected \"%c\" character.",
-                                               '}')));
-                        parse_state = ARRAY_LEVEL_COMPLETED;
-                        if (nest_level == 0)
+                                         errdetail("Multidimensional arrays must have sub-arrays with matching
dimensions.")));
+                            break;
+                        default:
                             ereturn(escontext, -1,
                                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                      errmsg("malformed array literal: \"%s\"", str),
-                                     errdetail("Unmatched \"%c\" character.", '}')));
+                                     errdetail("Unexpected \"%c\" character.",
+                                               '}')));
+                    }
+                    parse_state = ARRAY_LEVEL_COMPLETED;
+                    /* The parse state check assured we're in a level. */
+                    Assert(nest_level > 0);
                         nest_level--;

-                        if (nelems_last[nest_level] != 0 &&
-                            nelems[nest_level] != nelems_last[nest_level])
+                    if (dim[nest_level] < 0)
+                    {
+                        /* Save length of first sub-array of this level */
+                        dim[nest_level] = nelems[nest_level];
+                    }
+                    else if (nelems[nest_level] != dim[nest_level])
+                    {
+                        /* Subsequent sub-arrays must have same length */
                             ereturn(escontext, -1,
                                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                      errmsg("malformed array literal: \"%s\"", str),
-                                     errdetail("Multidimensional arrays must have "
-                                               "sub-arrays with matching "
-                                               "dimensions.")));
-                        nelems_last[nest_level] = nelems[nest_level];
-                        nelems[nest_level] = 1;
-                        if (nest_level == 0)
-                            eoArray = itemdone = true;
-                        else
-                        {
-                            /*
-                             * We don't set itemdone here; see comments in
-                             * ReadArrayStr
-                             */
-                            temp[nest_level - 1]++;
+                                 errdetail("Multidimensional arrays must have sub-arrays with matching
dimensions.")));
                         }
+                    /* Done if this is the outermost level's '}' */
+                    if (nest_level == 0)
+                        eoArray = true;
                     }
                     break;
                 default:
@@ -635,12 +688,11 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                         if (*ptr == typdelim)
                         {
                             /*
-                             * Delimiters can occur after an element start, an
-                             * element completion, a quoted element
-                             * completion, or a level completion.
+                             * Delimiters can occur after an element start, a
+                             * quoted element completion, or a level
+                             * completion.
                              */
                             if (parse_state != ARRAY_ELEM_STARTED &&
-                                parse_state != ARRAY_ELEM_COMPLETED &&
                                 parse_state != ARRAY_QUOTED_ELEM_COMPLETED &&
                                 parse_state != ARRAY_LEVEL_COMPLETED)
                                 ereturn(escontext, -1,
@@ -652,8 +704,6 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                                 parse_state = ARRAY_LEVEL_DELIMITED;
                             else
                                 parse_state = ARRAY_ELEM_DELIMITED;
-                            itemdone = true;
-                            nelems[nest_level - 1]++;
                         }
                         else if (!array_isspace(*ptr))
                         {
@@ -662,23 +712,51 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                              * level start, after an element start, or after
                              * an element delimiter. In any case we now must
                              * be past an element start.
+                             *
+                             * If it's a space character, we can ignore it; it
+                             * might be data or not, but it doesn't change the
+                             * parsing state.
                              */
-                            if (parse_state != ARRAY_LEVEL_STARTED &&
-                                parse_state != ARRAY_ELEM_STARTED &&
-                                parse_state != ARRAY_ELEM_DELIMITED)
+                        switch (parse_state)
+                        {
+                            case ARRAY_LEVEL_STARTED:
+                            case ARRAY_ELEM_DELIMITED:
+                                /* start new unquoted element */
+                                parse_state = ARRAY_ELEM_STARTED;
+                                new_element = true;
+                                break;
+                            case ARRAY_ELEM_STARTED:
+                                /* already in element */
+                                break;
+                            default:
                                 ereturn(escontext, -1,
                                         (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                          errmsg("malformed array literal: \"%s\"", str),
                                          errdetail("Unexpected array element.")));
-                            parse_state = ARRAY_ELEM_STARTED;
+                        }
                         }
                     }
                     break;
             }
-            if (!itemdone)
-                ptr++;
+
+        /* To reduce duplication, all new-element cases go through here. */
+        if (new_element)
+        {
+            /*
+             * Once we have found an element, the number of dimensions can no
+             * longer increase, and subsequent elements must all be at the
+             * same nesting depth.
+             */
+            ndim_frozen = true;
+            if (nest_level != ndim)
+                ereturn(escontext, -1,
+                        (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                         errmsg("malformed array literal: \"%s\"", str),
+                         errdetail("Multidimensional arrays must have sub-arrays with matching dimensions.")));
+            /* Count the new element */
+            nelems[nest_level - 1]++;
         }
-        temp[ndim - 1]++;
+
         ptr++;
     }

@@ -692,13 +770,6 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                      errdetail("Junk after closing right brace.")));
     }

-    /* special case for an empty array */
-    if (empty_array)
-        return 0;
-
-    for (i = 0; i < ndim; ++i)
-        dim[i] = temp[i];
-
     return ndim;
 }

diff --git a/src/test/regress/expected/arrays.out b/src/test/regress/expected/arrays.out
index 7064391468..26d4281d94 100644
--- a/src/test/regress/expected/arrays.out
+++ b/src/test/regress/expected/arrays.out
@@ -1475,12 +1475,7 @@ select '{{1,{2}},{2,3}}'::text[];
 ERROR:  malformed array literal: "{{1,{2}},{2,3}}"
 LINE 1: select '{{1,{2}},{2,3}}'::text[];
                ^
-DETAIL:  Unexpected "{" character.
-select '{{},{}}'::text[];
-ERROR:  malformed array literal: "{{},{}}"
-LINE 1: select '{{},{}}'::text[];
-               ^
-DETAIL:  Unexpected "}" character.
+DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
 select E'{{1,2},\\{2,3}}'::text[];
 ERROR:  malformed array literal: "{{1,2},\{2,3}}"
 LINE 1: select E'{{1,2},\\{2,3}}'::text[];
@@ -1501,6 +1496,49 @@ ERROR:  malformed array literal: "{ }}"
 LINE 1: select '{ }}'::text[];
                ^
 DETAIL:  Junk after closing right brace.
+select '{{1},{{2}}}'::text[];
+ERROR:  malformed array literal: "{{1},{{2}}}"
+LINE 1: select '{{1},{{2}}}'::text[];
+               ^
+DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
+select '{{{1}},{2}}'::text[];
+ERROR:  malformed array literal: "{{{1}},{2}}"
+LINE 1: select '{{{1}},{2}}'::text[];
+               ^
+DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
+select '{{},{{}}}'::text[];
+ERROR:  malformed array literal: "{{},{{}}}"
+LINE 1: select '{{},{{}}}'::text[];
+               ^
+DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
+select '{{{}},{}}'::text[];
+ERROR:  malformed array literal: "{{{}},{}}"
+LINE 1: select '{{{}},{}}'::text[];
+               ^
+DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
+select '{{1},{}}'::text[];
+ERROR:  malformed array literal: "{{1},{}}"
+LINE 1: select '{{1},{}}'::text[];
+               ^
+DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
+select '{{},{1}}'::text[];
+ERROR:  malformed array literal: "{{},{1}}"
+LINE 1: select '{{},{1}}'::text[];
+               ^
+DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
+select '[2147483646:2147483647]={1,2}'::int[];
+ERROR:  array upper bound is too large: 2147483647
+LINE 1: select '[2147483646:2147483647]={1,2}'::int[];
+               ^
+select '[1:-1]={}'::int[];
+ERROR:  upper bound cannot be less than lower bound minus one
+LINE 1: select '[1:-1]={}'::int[];
+               ^
+select '[1:0]={1}'::int[];
+ERROR:  malformed array literal: "[1:0]={1}"
+LINE 1: select '[1:0]={1}'::int[];
+               ^
+DETAIL:  Specified array dimensions do not match array contents.
 select array[];
 ERROR:  cannot determine type of empty array
 LINE 1: select array[];
@@ -1514,6 +1552,12 @@ select '{}'::text[];
  {}
 (1 row)

+select '{{},{}}'::text[];
+ text
+------
+ {}
+(1 row)
+
 select '{{{1,2,3,4},{2,3,4,5}},{{3,4,5,6},{4,5,6,7}}}'::text[];
                      text
 -----------------------------------------------
@@ -1559,6 +1603,18 @@ select '[0:1]={1.1,2.2}'::float8[];
  [0:1]={1.1,2.2}
 (1 row)

+select '[2147483646:2147483646]={1}'::int[];
+            int4
+-----------------------------
+ [2147483646:2147483646]={1}
+(1 row)
+
+select '[1:0]={}'::int[];
+ int4
+------
+ {}
+(1 row)
+
 -- all of the above should be accepted
 -- tests for array aggregates
 CREATE TEMP TABLE arraggtest ( f1 INT[], f2 TEXT[][], f3 FLOAT[]);
diff --git a/src/test/regress/sql/arrays.sql b/src/test/regress/sql/arrays.sql
index f1375621e0..b7e2b180cc 100644
--- a/src/test/regress/sql/arrays.sql
+++ b/src/test/regress/sql/arrays.sql
@@ -454,16 +454,25 @@ select 'foo' ilike all (array['F%', '%O']); -- t

 -- none of the following should be accepted
 select '{{1,{2}},{2,3}}'::text[];
-select '{{},{}}'::text[];
 select E'{{1,2},\\{2,3}}'::text[];
 select '{{"1 2" x},{3}}'::text[];
 select '{}}'::text[];
 select '{ }}'::text[];
+select '{{1},{{2}}}'::text[];
+select '{{{1}},{2}}'::text[];
+select '{{},{{}}}'::text[];
+select '{{{}},{}}'::text[];
+select '{{1},{}}'::text[];
+select '{{},{1}}'::text[];
+select '[2147483646:2147483647]={1,2}'::int[];
+select '[1:-1]={}'::int[];
+select '[1:0]={1}'::int[];
 select array[];
 -- none of the above should be accepted

 -- all of the following should be accepted
 select '{}'::text[];
+select '{{},{}}'::text[];
 select '{{{1,2,3,4},{2,3,4,5}},{{3,4,5,6},{4,5,6,7}}}'::text[];
 select '{0 second  ,0 second}'::interval[];
 select '{ { "," } , { 3 } }'::text[];
@@ -474,6 +483,8 @@ select '{
          }'::interval[];
 select array[]::text[];
 select '[0:1]={1.1,2.2}'::float8[];
+select '[2147483646:2147483646]={1}'::int[];
+select '[1:0]={}'::int[];
 -- all of the above should be accepted

 -- tests for array aggregates
--
2.39.3

From 95cae8ead8264298bacf91100a1a1e02a071b7c1 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue, 4 Jul 2023 14:07:31 -0400
Subject: [PATCH v2 3/3] Re-indent ArrayCount().

This cleans up after the removal of the "while (!itemdone)" loop.
---
 src/backend/utils/adt/arrayfuncs.c | 214 ++++++++++++++---------------
 1 file changed, 106 insertions(+), 108 deletions(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 4fb052b23e..ca2a690768 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -498,21 +498,21 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
     {
         bool        new_element = false;

-            switch (*ptr)
-            {
-                case '\0':
-                    /* Signal a premature end of the string */
-                    ereturn(escontext, -1,
-                            (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-                             errmsg("malformed array literal: \"%s\"", str),
-                             errdetail("Unexpected end of input.")));
-                case '\\':
+        switch (*ptr)
+        {
+            case '\0':
+                /* Signal a premature end of the string */
+                ereturn(escontext, -1,
+                        (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                         errmsg("malformed array literal: \"%s\"", str),
+                         errdetail("Unexpected end of input.")));
+            case '\\':

-                    /*
-                     * An escape must be after a level start, after an element
-                     * start, or after an element delimiter. In any case we
-                     * now must be past an element start.
-                     */
+                /*
+                 * An escape must be after a level start, after an element
+                 * start, or after an element delimiter. In any case we now
+                 * must be past an element start.
+                 */
                 switch (parse_state)
                 {
                     case ARRAY_LEVEL_STARTED:
@@ -532,22 +532,22 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                                  errdetail("Unexpected \"%c\" character.",
                                            '\\')));
                 }
-                    /* skip the escaped character */
-                    if (*(ptr + 1))
-                        ptr++;
-                    else
-                        ereturn(escontext, -1,
-                                (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-                                 errmsg("malformed array literal: \"%s\"", str),
-                                 errdetail("Unexpected end of input.")));
-                    break;
-                case '"':
+                /* skip the escaped character */
+                if (*(ptr + 1))
+                    ptr++;
+                else
+                    ereturn(escontext, -1,
+                            (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                             errmsg("malformed array literal: \"%s\"", str),
+                             errdetail("Unexpected end of input.")));
+                break;
+            case '"':

-                    /*
-                     * A quote must be after a level start, after a quoted
-                     * element start, or after an element delimiter. In any
-                     * case we now must be past an element start.
-                     */
+                /*
+                 * A quote must be after a level start, after a quoted element
+                 * start, or after an element delimiter. In any case we now
+                 * must be past an element start.
+                 */
                 switch (parse_state)
                 {
                     case ARRAY_LEVEL_STARTED:
@@ -570,17 +570,16 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                                  errmsg("malformed array literal: \"%s\"", str),
                                  errdetail("Unexpected array element.")));
                 }
-                    break;
-                case '{':
-                    if (!in_quotes)
-                    {
-                        /*
-                         * A left brace can occur if no nesting has occurred
-                         * yet, after a level start, or after a level
-                         * delimiter.  If we see one after an element
-                         * delimiter, we have something like "{1,{2}}", so
-                         * complain about non-rectangularity.
-                         */
+                break;
+            case '{':
+                if (!in_quotes)
+                {
+                    /*
+                     * A left brace can occur if no nesting has occurred yet,
+                     * after a level start, or after a level delimiter.  If we
+                     * see one after an element delimiter, we have something
+                     * like "{1,{2}}", so complain about non-rectangularity.
+                     */
                     switch (parse_state)
                     {
                         case ARRAY_NO_LEVEL:
@@ -600,19 +599,19 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                                      errdetail("Unexpected \"%c\" character.",
                                                '{')));
                     }
-                        parse_state = ARRAY_LEVEL_STARTED;
+                    parse_state = ARRAY_LEVEL_STARTED;
                     /* Nested sub-arrays count as elements of outer level */
                     if (nest_level > 0)
                         nelems[nest_level - 1]++;
                     /* Initialize element counting in the new level */
-                        if (nest_level >= MAXDIM)
-                            ereturn(escontext, -1,
-                                    (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-                                     errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
-                                            nest_level + 1, MAXDIM)));
+                    if (nest_level >= MAXDIM)
+                        ereturn(escontext, -1,
+                                (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                                 errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
+                                        nest_level + 1, MAXDIM)));
                     nelems[nest_level] = 0;
-                        nest_level++;
-                        if (ndim < nest_level)
+                    nest_level++;
+                    if (ndim < nest_level)
                     {
                         /* Can't increase ndim once it's frozen */
                         if (ndim_frozen)
@@ -620,22 +619,22 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                      errmsg("malformed array literal: \"%s\"", str),
                                      errdetail("Multidimensional arrays must have sub-arrays with matching
dimensions.")));
-                            ndim = nest_level;
+                        ndim = nest_level;
                     }
                 }
-                    break;
-                case '}':
-                    if (!in_quotes)
-                    {
-                        /*
-                         * A right brace can occur after an element start, an
-                         * element completion, a quoted element completion, or
-                         * a level completion.  We also allow it after a level
-                         * start, that is an empty sub-array "{}" --- but that
-                         * freezes the number of dimensions and all such
-                         * sub-arrays must be at the same level, just like
-                         * sub-arrays containing elements.
-                         */
+                break;
+            case '}':
+                if (!in_quotes)
+                {
+                    /*
+                     * A right brace can occur after an element start, an
+                     * element completion, a quoted element completion, or a
+                     * level completion.  We also allow it after a level
+                     * start, that is an empty sub-array "{}" --- but that
+                     * freezes the number of dimensions and all such
+                     * sub-arrays must be at the same level, just like
+                     * sub-arrays containing elements.
+                     */
                     switch (parse_state)
                     {
                         case ARRAY_ELEM_STARTED:
@@ -647,9 +646,9 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                             /* empty sub-array: OK if at correct nest_level */
                             ndim_frozen = true;
                             if (nest_level != ndim)
-                            ereturn(escontext, -1,
-                                    (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-                                     errmsg("malformed array literal: \"%s\"", str),
+                                ereturn(escontext, -1,
+                                        (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                                         errmsg("malformed array literal: \"%s\"", str),
                                          errdetail("Multidimensional arrays must have sub-arrays with matching
dimensions.")));
                             break;
                         default:
@@ -662,7 +661,7 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                     parse_state = ARRAY_LEVEL_COMPLETED;
                     /* The parse state check assured we're in a level. */
                     Assert(nest_level > 0);
-                        nest_level--;
+                    nest_level--;

                     if (dim[nest_level] < 0)
                     {
@@ -672,51 +671,50 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                     else if (nelems[nest_level] != dim[nest_level])
                     {
                         /* Subsequent sub-arrays must have same length */
-                            ereturn(escontext, -1,
-                                    (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-                                     errmsg("malformed array literal: \"%s\"", str),
+                        ereturn(escontext, -1,
+                                (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                                 errmsg("malformed array literal: \"%s\"", str),
                                  errdetail("Multidimensional arrays must have sub-arrays with matching
dimensions.")));
-                        }
+                    }
                     /* Done if this is the outermost level's '}' */
                     if (nest_level == 0)
                         eoArray = true;
+                }
+                break;
+            default:
+                if (!in_quotes)
+                {
+                    if (*ptr == typdelim)
+                    {
+                        /*
+                         * Delimiters can occur after an element start, a
+                         * quoted element completion, or a level completion.
+                         */
+                        if (parse_state != ARRAY_ELEM_STARTED &&
+                            parse_state != ARRAY_QUOTED_ELEM_COMPLETED &&
+                            parse_state != ARRAY_LEVEL_COMPLETED)
+                            ereturn(escontext, -1,
+                                    (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                                     errmsg("malformed array literal: \"%s\"", str),
+                                     errdetail("Unexpected \"%c\" character.",
+                                               typdelim)));
+                        if (parse_state == ARRAY_LEVEL_COMPLETED)
+                            parse_state = ARRAY_LEVEL_DELIMITED;
+                        else
+                            parse_state = ARRAY_ELEM_DELIMITED;
                     }
-                    break;
-                default:
-                    if (!in_quotes)
+                    else if (!array_isspace(*ptr))
                     {
-                        if (*ptr == typdelim)
-                        {
-                            /*
-                             * Delimiters can occur after an element start, a
-                             * quoted element completion, or a level
-                             * completion.
-                             */
-                            if (parse_state != ARRAY_ELEM_STARTED &&
-                                parse_state != ARRAY_QUOTED_ELEM_COMPLETED &&
-                                parse_state != ARRAY_LEVEL_COMPLETED)
-                                ereturn(escontext, -1,
-                                        (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-                                         errmsg("malformed array literal: \"%s\"", str),
-                                         errdetail("Unexpected \"%c\" character.",
-                                                   typdelim)));
-                            if (parse_state == ARRAY_LEVEL_COMPLETED)
-                                parse_state = ARRAY_LEVEL_DELIMITED;
-                            else
-                                parse_state = ARRAY_ELEM_DELIMITED;
-                        }
-                        else if (!array_isspace(*ptr))
-                        {
-                            /*
-                             * Other non-space characters must be after a
-                             * level start, after an element start, or after
-                             * an element delimiter. In any case we now must
-                             * be past an element start.
-                             *
-                             * If it's a space character, we can ignore it; it
-                             * might be data or not, but it doesn't change the
-                             * parsing state.
-                             */
+                        /*
+                         * Other non-space characters must be after a level
+                         * start, after an element start, or after an element
+                         * delimiter. In any case we now must be past an
+                         * element start.
+                         *
+                         * If it's a space character, we can ignore it; it
+                         * might be data or not, but it doesn't change the
+                         * parsing state.
+                         */
                         switch (parse_state)
                         {
                             case ARRAY_LEVEL_STARTED:
@@ -734,10 +732,10 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
                                          errmsg("malformed array literal: \"%s\"", str),
                                          errdetail("Unexpected array element.")));
                         }
-                        }
                     }
-                    break;
-            }
+                }
+                break;
+        }

         /* To reduce duplication, all new-element cases go through here. */
         if (new_element)
--
2.39.3


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: pg_basebackup check vs Windows file path limits
Next
From: Tom Lane
Date:
Subject: Re: On /*----- comments