Thread: Spaces before newlines in view definitions in 9.3

Spaces before newlines in view definitions in 9.3

From

Joe Abbate

Date:

11 September 2013, 00:22:08

Example test code:

$ psql pyrseas_testdb
psql (9.3.0)
Type "help" for help.

pyrseas_testdb=# create table t1 (c1 int, c2 text);
CREATE TABLE
pyrseas_testdb=# create view v1 as select * from t1;
CREATE VIEW
pyrseas_testdb=# \d+ v1
                   View "public.v1"
 Column |  Type   | Modifiers | Storage  | Description
--------+---------+-----------+----------+-------------
 c1     | integer |           | plain    |
 c2     | text    |           | extended |
View definition:
 SELECT t1.c1,
    t1.c2
   FROM t1;

It may not be immediately obvious but there is a space after the
"t1.c1," and before the first newline.  In 9.2 and previous releases,
the view definition is:

 SELECT t1.c1, t1.c2
   FROM t1;

If there are more columns, there's an extra space for each except the
last one, e.g., (with _ denoting a trailing space):

 SELECT t2.c1,_
    t2.c2,_
    t2.c3,_
    t2.c4
   FROM t2;

The problem is that the string comes back, e.g., from pg_get_viewdef()
with those extra spaces before the newlines, e.g.,

" SELECT t1.c1, \n    t1.c3 * 2 AS mc3\n   FROM t1;

and YAML has a way displaying a text string nicely so that it can be
recovered when it's read back, but it *doesn't* work if there are
invisible characters such as tabs or spaces before a newline because
obviously one can't tell how many or of what kind they are.

Note: This applies to both views and materialized views.

I believe the reformatting of view text (breaking each column on a
separate line) was done to improve readability but it has the side
effect of making the text unreadable if processed via a YAML utility
such as Pyrseas dbtoyaml (since YAML then quotes the entire string and
even breaks it down further with extra backslashes).

Regards,

Joe

Re: Spaces before newlines in view definitions in 9.3

From

Tom Lane

Date:

09 November 2013, 01:25:02

Joe Abbate <jma@freedomcircle.com> writes:
> View definition:
>  SELECT t1.c1,
>     t1.c2
>    FROM t1;

> It may not be immediately obvious but there is a space after the
> "t1.c1," and before the first newline.  In 9.2 and previous releases,
> the view definition is:

>  SELECT t1.c1, t1.c2
>    FROM t1;

Yeah, this was changed by commit 2f582f76b1945929ff07116cd4639747ce9bb8a1,
which added newlines to the output without making any attempt to suppress
the spaces that had been printed before.

> The problem is that the string comes back, e.g., from pg_get_viewdef()
> with those extra spaces before the newlines, e.g.,

> " SELECT t1.c1, \n    t1.c3 * 2 AS mc3\n   FROM t1;

> and YAML has a way displaying a text string nicely so that it can be
> recovered when it's read back, but it *doesn't* work if there are
> invisible characters such as tabs or spaces before a newline because
> obviously one can't tell how many or of what kind they are.

Hm.  I am not sure whether to consider that a significant complaint or
not, because in reality the ruleutils code has been pretty sloppy about
trailing whitespace ever since it grew any "pretty printing" capability
at all.  Looking for trailing whitespace in the ruleutils regression test,
I find it in "UNION ALL" and "INSERT ... VALUES" constructs, which were
that way long before that patch, and there are probably more cases that
don't happen to get exercised by the regression tests.  That patch
certainly made things worse, but it's not like pre-9.3 output was
YAML-safe.

Having said that, this seems relatively easy to fix by adjusting
appendContextKeyword to delete any trailing spaces that are immediately
before the newline it inserts.  AFAICS that's the only place in ruleutils
that inserts newlines that might follow a space.  get_target_list also
needs to do that when transposing a newline-started field value into the
main buffer; but that complexity is offset because it no longer need
consider the possibility of spaces before said newline.  I've tested the
attached patch and it seems to do what's wanted (note: I didn't bother
including the regression test output changes in the patch, as they're
bulky and boring).

I think that this is a reasonable thing to fix, but I'm not sure
if we ought to back-patch it into 9.3 or not.  The pretty-printing
change evidently broke your use-case but I'm thinking you weren't
pushing it very hard.

Comments?

            regards, tom lane

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 915fb7a..139939d 100644
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
*************** get_target_list(List *targetList, depars
*** 4479,4520 ****
          /* Consider line-wrapping if enabled */
          if (PRETTY_INDENT(context) && context->wrapColumn >= 0)
          {
!             int            leading_nl_pos = -1;
!             char       *trailing_nl;
!             int            pos;

!             /* Does the new field start with whitespace plus a new line? */
!             for (pos = 0; pos < targetbuf.len; pos++)
              {
!                 if (targetbuf.data[pos] == '\n')
!                 {
!                     leading_nl_pos = pos;
!                     break;
!                 }
!                 if (targetbuf.data[pos] != ' ')
!                     break;
              }
-
-             /* Locate the start of the current line in the output buffer */
-             trailing_nl = strrchr(buf->data, '\n');
-             if (trailing_nl == NULL)
-                 trailing_nl = buf->data;
              else
!                 trailing_nl++;

!             /*
!              * If the field we're adding is the first in the list, or it
!              * already has a leading newline, don't add anything. Otherwise,
!              * add a newline, plus some indentation, if either the new field
!              * would cause an overflow or the last field used more than one
!              * line.
!              */
!             if (colno > 1 &&
!                 leading_nl_pos == -1 &&
!                 ((strlen(trailing_nl) + strlen(targetbuf.data) > context->wrapColumn) ||
!                  last_was_multiline))
!                 appendContextKeyword(context, "", -PRETTYINDENT_STD,
!                                      PRETTYINDENT_STD, PRETTYINDENT_VAR);

              /* Remember this field's multiline status for next iteration */
              last_was_multiline =
--- 4479,4521 ----
          /* Consider line-wrapping if enabled */
          if (PRETTY_INDENT(context) && context->wrapColumn >= 0)
          {
!             int            leading_nl_pos;

!             /* Does the new field start with a new line? */
!             if (targetbuf.len > 0 && targetbuf.data[0] == '\n')
!                 leading_nl_pos = 0;
!             else
!                 leading_nl_pos = -1;
!
!             /* If so, we shouldn't add anything */
!             if (leading_nl_pos >= 0)
              {
!                 /* instead, remove any trailing spaces currently in buf */
!                 while (buf->len > 0 && buf->data[buf->len - 1] == ' ')
!                     buf->data[--(buf->len)] = '\0';
              }
              else
!             {
!                 char       *trailing_nl;

!                 /* Locate the start of the current line in the output buffer */
!                 trailing_nl = strrchr(buf->data, '\n');
!                 if (trailing_nl == NULL)
!                     trailing_nl = buf->data;
!                 else
!                     trailing_nl++;
!
!                 /*
!                  * Add a newline, plus some indentation, if the new field is
!                  * not the first and either the new field would cause an
!                  * overflow or the last field used more than one line.
!                  */
!                 if (colno > 1 &&
!                     ((strlen(trailing_nl) + strlen(targetbuf.data) > context->wrapColumn) ||
!                      last_was_multiline))
!                     appendContextKeyword(context, "", -PRETTYINDENT_STD,
!                                          PRETTYINDENT_STD, PRETTYINDENT_VAR);
!             }

              /* Remember this field's multiline status for next iteration */
              last_was_multiline =
*************** static void
*** 6236,6256 ****
  appendContextKeyword(deparse_context *context, const char *str,
                       int indentBefore, int indentAfter, int indentPlus)
  {
      if (PRETTY_INDENT(context))
      {
          context->indentLevel += indentBefore;

!         appendStringInfoChar(context->buf, '\n');
!         appendStringInfoSpaces(context->buf,
                                 Max(context->indentLevel, 0) + indentPlus);
!         appendStringInfoString(context->buf, str);

          context->indentLevel += indentAfter;
          if (context->indentLevel < 0)
              context->indentLevel = 0;
      }
      else
!         appendStringInfoString(context->buf, str);
  }

  /*
--- 6237,6264 ----
  appendContextKeyword(deparse_context *context, const char *str,
                       int indentBefore, int indentAfter, int indentPlus)
  {
+     StringInfo    buf = context->buf;
+
      if (PRETTY_INDENT(context))
      {
          context->indentLevel += indentBefore;

!         /* remove any trailing spaces currently in the buffer ... */
!         while (buf->len > 0 && buf->data[buf->len - 1] == ' ')
!             buf->data[--(buf->len)] = '\0';
!         /* ... then add a newline and some spaces */
!         appendStringInfoChar(buf, '\n');
!         appendStringInfoSpaces(buf,
                                 Max(context->indentLevel, 0) + indentPlus);
!
!         appendStringInfoString(buf, str);

          context->indentLevel += indentAfter;
          if (context->indentLevel < 0)
              context->indentLevel = 0;
      }
      else
!         appendStringInfoString(buf, str);
  }

  /*

Re: Spaces before newlines in view definitions in 9.3

From

Joe Abbate

Date:

09 November 2013, 02:06:43

Hello Tom,

On 08/11/13 17:24, Tom Lane wrote:
> Joe Abbate <jma@freedomcircle.com> writes:
>> View definition:
>>  SELECT t1.c1,
>>     t1.c2
>>    FROM t1;
>
>> It may not be immediately obvious but there is a space after the
>> "t1.c1," and before the first newline.  In 9.2 and previous releases,
>> the view definition is:
>
>>  SELECT t1.c1, t1.c2
>>    FROM t1;
>
> Yeah, this was changed by commit 2f582f76b1945929ff07116cd4639747ce9bb8a1,
> which added newlines to the output without making any attempt to suppress
> the spaces that had been printed before.
>
>> The problem is that the string comes back, e.g., from pg_get_viewdef()
>> with those extra spaces before the newlines, e.g.,
>
>> " SELECT t1.c1, \n    t1.c3 * 2 AS mc3\n   FROM t1;
>
>> and YAML has a way displaying a text string nicely so that it can be
>> recovered when it's read back, but it *doesn't* work if there are
>> invisible characters such as tabs or spaces before a newline because
>> obviously one can't tell how many or of what kind they are.
>
> Hm.  I am not sure whether to consider that a significant complaint or
> not, because in reality the ruleutils code has been pretty sloppy about
> trailing whitespace ever since it grew any "pretty printing" capability
> at all.  Looking for trailing whitespace in the ruleutils regression test,
> I find it in "UNION ALL" and "INSERT ... VALUES" constructs, which were
> that way long before that patch, and there are probably more cases that
> don't happen to get exercised by the regression tests.  That patch
> certainly made things worse, but it's not like pre-9.3 output was
> YAML-safe.
>
> Having said that, this seems relatively easy to fix by adjusting
> appendContextKeyword to delete any trailing spaces that are immediately
> before the newline it inserts.  AFAICS that's the only place in ruleutils
> that inserts newlines that might follow a space.  get_target_list also
> needs to do that when transposing a newline-started field value into the
> main buffer; but that complexity is offset because it no longer need
> consider the possibility of spaces before said newline.  I've tested the
> attached patch and it seems to do what's wanted (note: I didn't bother
> including the regression test output changes in the patch, as they're
> bulky and boring).
>
> I think that this is a reasonable thing to fix, but I'm not sure
> if we ought to back-patch it into 9.3 or not.  The pretty-printing
> change evidently broke your use-case but I'm thinking you weren't
> pushing it very hard.

Agreed it's not a significant complaint, just an extra annoyance --made
more visible because for Pyrseas 0.7 we added nicer view text formatting
rather than outputting what pg_get_viewdef gave us, no matter how ugly
and long the resulting quoted string looked like.

Our tests generally compare SQL texts in a "forgiving" manner,
particularly when it comes to whitespace.  However, the problem was
introduced in 9.3 and broke some of our view tests (and analogous
matview tests derived from them).  It's not a big deal if you don't
backpatch to 9.3, but it would help, since some of our tests now read:

        if ("postgres version" < 90300):
            fmt = "(%s%s %s%s )"
        else:
            fmt = "( %s\n %s\n %s\n %s\n)"

That would probably require yet another branch for 9.4 and later.

Best regards,

Joe

Re: Spaces before newlines in view definitions in 9.3

From

Tom Lane

Date:

11 November 2013, 21:38:03

Joe Abbate <jma@freedomcircle.com> writes:
> On 08/11/13 17:24, Tom Lane wrote:
>> I think that this is a reasonable thing to fix, but I'm not sure
>> if we ought to back-patch it into 9.3 or not.  The pretty-printing
>> change evidently broke your use-case but I'm thinking you weren't
>> pushing it very hard.

> Agreed it's not a significant complaint, just an extra annoyance --made
> more visible because for Pyrseas 0.7 we added nicer view text formatting
> rather than outputting what pg_get_viewdef gave us, no matter how ugly
> and long the resulting quoted string looked like.

Not hearing any objections, I went ahead and pushed this fix into
9.3 as well as HEAD.

            regards, tom lane