Thread: pg_get_constraintdef() doesn't always give an equal constraint

pg_get_constraintdef() doesn't always give an equal constraint

From
Jeff Davis
Date:
create table pt(a int, b int, c float8 check (c < '10.1'));
create table ct(a int, b int, c float8);
select pg_get_constraintdef(oid) from pg_constraint where conname='pt_c_check';
         pg_get_constraintdef
--------------------------------------
 CHECK ((c < 10.1::double precision))
(1 row)

-- copy constraint to "ct" using syntax given above
alter table ct add constraint pt_c_check CHECK ((c < 10.1::double precision));
select pg_get_constraintdef(oid) from pg_constraint where conname='pt_c_check';
          pg_get_constraintdef
----------------------------------------
 CHECK ((c < 10.1::double precision))
 CHECK ((c < (10.1)::double precision))
(2 rows)

-- notice extra parenthesis above

-- now, we can't attach "ct" as an inheritance child of "pt"
alter table ct inherit pt;
ERROR:  child table "ct" has different definition for check constraint
"pt_c_check"

Also, the pg_dump output is different for pt and ct.

Strangely, the "\d" output is the same, so I tried using
pg_get_constraintdef with pretty-printing mode, which works fine. But
that's contrary to the docs, which say:

"The pretty-printed format is more readable, but the default format is
more likely to be interpreted the same way by future versions of
PostgreSQL; avoid using pretty-printed output for dump purposes."

Regards,
     Jeff Davis

Re: pg_get_constraintdef() doesn't always give an equal constraint

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> create table pt(a int, b int, c float8 check (c < '10.1'));
> create table ct(a int, b int, c float8);
> select pg_get_constraintdef(oid) from pg_constraint where conname='pt_c_check';
>          pg_get_constraintdef
> --------------------------------------
>  CHECK ((c < 10.1::double precision))
> (1 row)

> -- copy constraint to "ct" using syntax given above
> alter table ct add constraint pt_c_check CHECK ((c < 10.1::double precision));
> select pg_get_constraintdef(oid) from pg_constraint where conname='pt_c_check';
>           pg_get_constraintdef
> ----------------------------------------
>  CHECK ((c < 10.1::double precision))
>  CHECK ((c < (10.1)::double precision))
> (2 rows)

What's evidently happening here is that ruleutils.c thinks it can dump a
float8 constant using the syntax "10.1::double precision", but the parser
will interpret that as a numeric constant with a separate cast step.

I don't see any simple way around that except to dump using the syntax
    '10.1'::double precision
which is ugly as sin, but perhaps we have no choice.  A longer-term
solution might be to get the parser to interpret
    10.1::double precision
as a float8 literal to start with, but that seems like it could have
unexpected side-effects?  Not sure.

OTOH, you could argue that existing dump files already have the
unquoted-literal syntax so it behooves us to try to get the parser
to read them as they were meant.

A larger issue is that I have a nasty feeling that this isn't the
only place where ruleutils.c output might be read in a way that's
functionally equivalent to the original, but not the exact same
parsetree.

            regards, tom lane

Re: pg_get_constraintdef() doesn't always give an equal constraint

From
Jon Jensen
Date:
On Mon, 23 Mar 2015, Tom Lane wrote:

> Jeff Davis <pgsql@j-davis.com> writes:
>> create table pt(a int, b int, c float8 check (c < '10.1'));
>> create table ct(a int, b int, c float8);
>> select pg_get_constraintdef(oid) from pg_constraint where conname='pt_c_check';
>>          pg_get_constraintdef
>> --------------------------------------
>>  CHECK ((c < 10.1::double precision))
>> (1 row)
>
>> -- copy constraint to "ct" using syntax given above
>> alter table ct add constraint pt_c_check CHECK ((c < 10.1::double precision));
>> select pg_get_constraintdef(oid) from pg_constraint where conname='pt_c_check';
>>           pg_get_constraintdef
>> ----------------------------------------
>>  CHECK ((c < 10.1::double precision))
>>  CHECK ((c < (10.1)::double precision))
>> (2 rows)
>
> A larger issue is that I have a nasty feeling that this isn't the
> only place where ruleutils.c output might be read in a way that's
> functionally equivalent to the original, but not the exact same
> parsetree.

Jeff pointed this out to me and as we discussed it, it seems like a good
time to add a regression test to make us aware of the current parse +
deparse behavior, and call attention to any changes in that behavior
later.

Attached is a Perl script that generates many combinations of CHECK
constraints like Jeff's, but with various types, and the result of running
it in the regression test suite.

Would something like this be welcome in the regression test suite?

Jon


--
Jon Jensen
End Point Corporation
https://www.endpoint.com/

Re: pg_get_constraintdef() doesn't always give an equal constraint

From
Jeff Davis
Date:
[forgot to copy this to pgsql-bugs]

On Mon, Mar 23, 2015 at 7:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't see any simple way around that except to dump using the syntax
>         '10.1'::double precision

There is a similar problem related to NUMERIC:
  '1'::numeric is dumped as
  1::numeric
which introduces a cast as well. There's also a problem with negative
constants, because it introduces parenthesis instead of single quotes:
  '-1'::numeric
is dumped as
  (-1)::numeric

> which is ugly as sin, but perhaps we have no choice.  A longer-term
> solution might be to get the parser to interpret
>         10.1::double precision
> as a float8 literal to start with, but that seems like it could have
> unexpected side-effects?  Not sure.
>
> OTOH, you could argue that existing dump files already have the
> unquoted-literal syntax so it behooves us to try to get the parser
> to read them as they were meant.

Hmm. I'm not sure how you intend to parse that, but it seems like it
would be hard to differentiate between:
  10.1::double precision
and
  10.1::text
in the parser, so I think one side effect would be that the latter
expression would be a text literal with no cast, which is a little
strange (though not necessarily bad). I guess you could hard-code it
to recognize double precision (and other float types and aliases?).

This bug is pretty old and nobody has complained about it before.
Let's just figure out a good unambiguous representation of float and
numeric literals, and then backport it. Anyone who cares enough to fix
this issue can upgrade to the latest point release on the old version,
and then dump/reload.

For numeric, I think appending ".0" (if it's an integral value) is the
easiest, prettiest, and least-surprising. For floats, we can either
use the single-quotes and type annotation, or we can come up with
something new like appending an "f" to the value.

> A larger issue is that I have a nasty feeling that this isn't the
> only place where ruleutils.c output might be read in a way that's
> functionally equivalent to the original, but not the exact same
> parsetree.

I'm concerned about other cases as well, but fixing this seems like a
step in the right direction.

Regards,
     Jeff Davis

Re: pg_get_constraintdef() doesn't always give an equal constraint

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Mon, Mar 23, 2015 at 7:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't see any simple way around that except to dump using the syntax
>> '10.1'::double precision

> There is a similar problem related to NUMERIC:
>   '1'::numeric is dumped as
>   1::numeric
> which introduces a cast as well. There's also a problem with negative
> constants, because it introduces parenthesis instead of single quotes:
>   '-1'::numeric
> is dumped as
>   (-1)::numeric

Yeah.  In general, this code was trying to produce something nice-looking
and semantically equivalent, but not necessarily something that would
re-parse as the exact same Const node.

> This bug is pretty old and nobody has complained about it before.
> Let's just figure out a good unambiguous representation of float and
> numeric literals, and then backport it. Anyone who cares enough to fix
> this issue can upgrade to the latest point release on the old version,
> and then dump/reload.

> For numeric, I think appending ".0" (if it's an integral value) is the
> easiest, prettiest, and least-surprising.

... and wrong, because that would affect the dscale.  I don't think we
want this code editorializing on the printed format in any case.

> For floats, we can either
> use the single-quotes and type annotation, or we can come up with
> something new like appending an "f" to the value.

I cannot see back-porting anything as invasive as changing the lexer.

Basically, I think we have to change ruleutils so that it quotes anything
that wouldn't be seen as a simple integer or numeric constant by the
lexer+grammar.

The attached patch does this; the regression test changes illustrate
what's going to happen to the output if we do this.

Looking at the changes, I'm not 100% convinced we want to back-patch.
As you say, nobody's complained of this problem before, and I'm worried
that people will see the output changes as a bigger deal than the issue
we're trying to fix.

Thoughts?

            regards, tom lane

(PS: I've not checked contrib or pl tests, so this patch may be incomplete
as far as expected-output changes go.)

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 28e1acf..4672cf5 100644
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
***************
*** 15,20 ****
--- 15,21 ----
   */
  #include "postgres.h"

+ #include <ctype.h>
  #include <unistd.h>
  #include <fcntl.h>

*************** get_const_expr(Const *constval, deparse_
*** 8018,8025 ****
      Oid            typoutput;
      bool        typIsVarlena;
      char       *extval;
!     bool        isfloat = false;
!     bool        needlabel;

      if (constval->constisnull)
      {
--- 8019,8025 ----
      Oid            typoutput;
      bool        typIsVarlena;
      char       *extval;
!     bool        needlabel = false;

      if (constval->constisnull)
      {
*************** get_const_expr(Const *constval, deparse_
*** 8045,8084 ****

      switch (constval->consttype)
      {
-         case INT2OID:
          case INT4OID:
!         case INT8OID:
!         case OIDOID:
!         case FLOAT4OID:
!         case FLOAT8OID:
          case NUMERICOID:
              {
!                 /*
!                  * These types are printed without quotes unless they contain
!                  * values that aren't accepted by the scanner unquoted (e.g.,
!                  * 'NaN').  Note that strtod() and friends might accept NaN,
!                  * so we can't use that to test.
!                  *
!                  * In reality we only need to defend against infinity and NaN,
!                  * so we need not get too crazy about pattern matching here.
!                  *
!                  * There is a special-case gotcha: if the constant is signed,
!                  * we need to parenthesize it, else the parser might see a
!                  * leading plus/minus as binding less tightly than adjacent
!                  * operators --- particularly, the cast that we might attach
!                  * below.
!                  */
!                 if (strspn(extval, "0123456789+-eE.") == strlen(extval))
!                 {
!                     if (extval[0] == '+' || extval[0] == '-')
!                         appendStringInfo(buf, "(%s)", extval);
!                     else
!                         appendStringInfoString(buf, extval);
!                     if (strcspn(extval, "eE.") != strlen(extval))
!                         isfloat = true; /* it looks like a float */
!                 }
!                 else
!                     appendStringInfo(buf, "'%s'", extval);
              }
              break;

--- 8045,8086 ----

      switch (constval->consttype)
      {
          case INT4OID:
!
!             /*
!              * INT4 can be printed without any decoration, unless it is
!              * negative; in that case print it as '-nnn'::integer to ensure
!              * that the output will re-parse as a constant, not as a constant
!              * plus operator.  In most cases we could get away with printing
!              * (-nnn) instead, because of the way that gram.y handles negative
!              * literals; but that doesn't work for INT_MIN, and it doesn't
!              * seem that much prettier anyway.
!              */
!             if (extval[0] != '-')
!                 appendStringInfoString(buf, extval);
!             else
!             {
!                 appendStringInfo(buf, "'%s'", extval);
!                 needlabel = true;        /* we must attach a cast */
!             }
!             break;
!
          case NUMERICOID:
+
+             /*
+              * NUMERIC can be printed without quotes if it looks like a float
+              * constant (not an integer, and not Infinity or NaN) and doesn't
+              * have a leading sign (for the same reason as for INT4).
+              */
+             if (isdigit((unsigned char) extval[0]) &&
+                 strcspn(extval, "eE.") != strlen(extval))
              {
!                 appendStringInfoString(buf, extval);
!             }
!             else
!             {
!                 appendStringInfo(buf, "'%s'", extval);
!                 needlabel = true;        /* we must attach a cast */
              }
              break;

*************** get_const_expr(Const *constval, deparse_
*** 8114,8131 ****
      switch (constval->consttype)
      {
          case BOOLOID:
-         case INT4OID:
          case UNKNOWNOID:
              /* These types can be left unlabeled */
              needlabel = false;
              break;
          case NUMERICOID:

              /*
               * Float-looking constants will be typed as numeric, but if
               * there's a specific typmod we need to show it.
               */
!             needlabel = !isfloat || (constval->consttypmod >= 0);
              break;
          default:
              needlabel = true;
--- 8116,8135 ----
      switch (constval->consttype)
      {
          case BOOLOID:
          case UNKNOWNOID:
              /* These types can be left unlabeled */
              needlabel = false;
              break;
+         case INT4OID:
+             /* We determined above whether a label is needed */
+             break;
          case NUMERICOID:

              /*
               * Float-looking constants will be typed as numeric, but if
               * there's a specific typmod we need to show it.
               */
!             needlabel |= (constval->consttypmod >= 0);
              break;
          default:
              needlabel = true;
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index dfae84e..0391b8e 100644
*** a/src/test/regress/expected/equivclass.out
--- b/src/test/regress/expected/equivclass.out
*************** set enable_mergejoin = off;
*** 104,114 ****
  --
  explain (costs off)
    select * from ec0 where ff = f1 and f1 = '42'::int8;
!             QUERY PLAN
! ----------------------------------
   Index Scan using ec0_pkey on ec0
!    Index Cond: (ff = 42::bigint)
!    Filter: (f1 = 42::bigint)
  (3 rows)

  explain (costs off)
--- 104,114 ----
  --
  explain (costs off)
    select * from ec0 where ff = f1 and f1 = '42'::int8;
!             QUERY PLAN
! -----------------------------------
   Index Scan using ec0_pkey on ec0
!    Index Cond: (ff = '42'::bigint)
!    Filter: (f1 = '42'::bigint)
  (3 rows)

  explain (costs off)
*************** explain (costs off)
*** 139,150 ****

  explain (costs off)
    select * from ec1, ec2 where ff = x1 and ff = '42'::int8;
!                           QUERY PLAN
! ---------------------------------------------------------------
   Nested Loop
     Join Filter: (ec1.ff = ec2.x1)
     ->  Index Scan using ec1_pkey on ec1
!          Index Cond: ((ff = 42::bigint) AND (ff = 42::bigint))
     ->  Seq Scan on ec2
  (5 rows)

--- 139,150 ----

  explain (costs off)
    select * from ec1, ec2 where ff = x1 and ff = '42'::int8;
!                             QUERY PLAN
! -------------------------------------------------------------------
   Nested Loop
     Join Filter: (ec1.ff = ec2.x1)
     ->  Index Scan using ec1_pkey on ec1
!          Index Cond: ((ff = '42'::bigint) AND (ff = '42'::bigint))
     ->  Seq Scan on ec2
  (5 rows)

*************** explain (costs off)
*** 161,174 ****

  explain (costs off)
    select * from ec1, ec2 where ff = x1 and '42'::int8 = x1;
!                QUERY PLAN
! ----------------------------------------
   Nested Loop
     Join Filter: (ec1.ff = ec2.x1)
     ->  Index Scan using ec1_pkey on ec1
!          Index Cond: (ff = 42::bigint)
     ->  Seq Scan on ec2
!          Filter: (42::bigint = x1)
  (6 rows)

  explain (costs off)
--- 161,174 ----

  explain (costs off)
    select * from ec1, ec2 where ff = x1 and '42'::int8 = x1;
!                QUERY PLAN
! -----------------------------------------
   Nested Loop
     Join Filter: (ec1.ff = ec2.x1)
     ->  Index Scan using ec1_pkey on ec1
!          Index Cond: (ff = '42'::bigint)
     ->  Seq Scan on ec2
!          Filter: ('42'::bigint = x1)
  (6 rows)

  explain (costs off)
*************** explain (costs off)
*** 210,216 ****
  -----------------------------------------------------
   Nested Loop
     ->  Index Scan using ec1_pkey on ec1
!          Index Cond: (ff = 42::bigint)
     ->  Append
           ->  Index Scan using ec1_expr2 on ec1 ec1_1
                 Index Cond: (((ff + 2) + 1) = ec1.f1)
--- 210,216 ----
  -----------------------------------------------------
   Nested Loop
     ->  Index Scan using ec1_pkey on ec1
!          Index Cond: (ff = '42'::bigint)
     ->  Append
           ->  Index Scan using ec1_expr2 on ec1 ec1_1
                 Index Cond: (((ff + 2) + 1) = ec1.f1)
*************** explain (costs off)
*** 229,248 ****
       union all
       select ff + 4 as x from ec1) as ss1
    where ss1.x = ec1.f1 and ec1.ff = 42::int8 and ec1.ff = ec1.f1;
!                           QUERY PLAN
! ---------------------------------------------------------------
   Nested Loop
     Join Filter: ((((ec1_1.ff + 2) + 1)) = ec1.f1)
     ->  Index Scan using ec1_pkey on ec1
!          Index Cond: ((ff = 42::bigint) AND (ff = 42::bigint))
           Filter: (ff = f1)
     ->  Append
           ->  Index Scan using ec1_expr2 on ec1 ec1_1
!                Index Cond: (((ff + 2) + 1) = 42::bigint)
           ->  Index Scan using ec1_expr3 on ec1 ec1_2
!                Index Cond: (((ff + 3) + 1) = 42::bigint)
           ->  Index Scan using ec1_expr4 on ec1 ec1_3
!                Index Cond: ((ff + 4) = 42::bigint)
  (12 rows)

  explain (costs off)
--- 229,248 ----
       union all
       select ff + 4 as x from ec1) as ss1
    where ss1.x = ec1.f1 and ec1.ff = 42::int8 and ec1.ff = ec1.f1;
!                             QUERY PLAN
! -------------------------------------------------------------------
   Nested Loop
     Join Filter: ((((ec1_1.ff + 2) + 1)) = ec1.f1)
     ->  Index Scan using ec1_pkey on ec1
!          Index Cond: ((ff = '42'::bigint) AND (ff = '42'::bigint))
           Filter: (ff = f1)
     ->  Append
           ->  Index Scan using ec1_expr2 on ec1 ec1_1
!                Index Cond: (((ff + 2) + 1) = '42'::bigint)
           ->  Index Scan using ec1_expr3 on ec1 ec1_2
!                Index Cond: (((ff + 3) + 1) = '42'::bigint)
           ->  Index Scan using ec1_expr4 on ec1 ec1_3
!                Index Cond: ((ff + 4) = '42'::bigint)
  (12 rows)

  explain (costs off)
*************** explain (costs off)
*** 265,271 ****
   Nested Loop
     ->  Nested Loop
           ->  Index Scan using ec1_pkey on ec1
!                Index Cond: (ff = 42::bigint)
           ->  Append
                 ->  Index Scan using ec1_expr2 on ec1 ec1_1
                       Index Cond: (((ff + 2) + 1) = ec1.f1)
--- 265,271 ----
   Nested Loop
     ->  Nested Loop
           ->  Index Scan using ec1_pkey on ec1
!                Index Cond: (ff = '42'::bigint)
           ->  Append
                 ->  Index Scan using ec1_expr2 on ec1 ec1_1
                       Index Cond: (((ff + 2) + 1) = ec1.f1)
*************** explain (costs off)
*** 321,327 ****
                       ->  Sort
                             Sort Key: ec1.f1 USING <
                             ->  Index Scan using ec1_pkey on ec1
!                                  Index Cond: (ff = 42::bigint)
  (20 rows)

  -- check partially indexed scan
--- 321,327 ----
                       ->  Sort
                             Sort Key: ec1.f1 USING <
                             ->  Index Scan using ec1_pkey on ec1
!                                  Index Cond: (ff = '42'::bigint)
  (20 rows)

  -- check partially indexed scan
*************** explain (costs off)
*** 341,347 ****
  -----------------------------------------------------
   Nested Loop
     ->  Index Scan using ec1_pkey on ec1
!          Index Cond: (ff = 42::bigint)
     ->  Append
           ->  Index Scan using ec1_expr2 on ec1 ec1_1
                 Index Cond: (((ff + 2) + 1) = ec1.f1)
--- 341,347 ----
  -----------------------------------------------------
   Nested Loop
     ->  Index Scan using ec1_pkey on ec1
!          Index Cond: (ff = '42'::bigint)
     ->  Append
           ->  Index Scan using ec1_expr2 on ec1 ec1_1
                 Index Cond: (((ff + 2) + 1) = ec1.f1)
*************** explain (costs off)
*** 378,383 ****
           ->  Sort
                 Sort Key: ec1.f1 USING <
                 ->  Index Scan using ec1_pkey on ec1
!                      Index Cond: (ff = 42::bigint)
  (14 rows)

--- 378,383 ----
           ->  Sort
                 Sort Key: ec1.f1 USING <
                 ->  Index Scan using ec1_pkey on ec1
!                      Index Cond: (ff = '42'::bigint)
  (14 rows)

diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 57fc910..046b085 100644
*** a/src/test/regress/expected/join.out
--- b/src/test/regress/expected/join.out
*************** SELECT qq, unique1
*** 2544,2559 ****
    ( SELECT COALESCE(q2, -1) AS qq FROM int8_tbl b ) AS ss2
    USING (qq)
    INNER JOIN tenk1 c ON qq = unique2;
!                                               QUERY PLAN
! -------------------------------------------------------------------------------------------------------
   Nested Loop
     ->  Hash Full Join
!          Hash Cond: (COALESCE(a.q1, 0::bigint) = COALESCE(b.q2, (-1)::bigint))
           ->  Seq Scan on int8_tbl a
           ->  Hash
                 ->  Seq Scan on int8_tbl b
     ->  Index Scan using tenk1_unique2 on tenk1 c
!          Index Cond: (unique2 = COALESCE((COALESCE(a.q1, 0::bigint)), (COALESCE(b.q2, (-1)::bigint))))
  (8 rows)

  SELECT qq, unique1
--- 2544,2559 ----
    ( SELECT COALESCE(q2, -1) AS qq FROM int8_tbl b ) AS ss2
    USING (qq)
    INNER JOIN tenk1 c ON qq = unique2;
!                                                QUERY PLAN
! ---------------------------------------------------------------------------------------------------------
   Nested Loop
     ->  Hash Full Join
!          Hash Cond: (COALESCE(a.q1, '0'::bigint) = COALESCE(b.q2, '-1'::bigint))
           ->  Seq Scan on int8_tbl a
           ->  Hash
                 ->  Seq Scan on int8_tbl b
     ->  Index Scan using tenk1_unique2 on tenk1 c
!          Index Cond: (unique2 = COALESCE((COALESCE(a.q1, '0'::bigint)), (COALESCE(b.q2, '-1'::bigint))))
  (8 rows)

  SELECT qq, unique1
*************** select * from
*** 3003,3012 ****
  ) ss
  where fault = 122
  order by fault;
!                            QUERY PLAN
! -----------------------------------------------------------------
   Nested Loop Left Join
!    Filter: ((COALESCE(tenk1.unique1, (-1)) + int8_tbl.q1) = 122)
     ->  Seq Scan on int8_tbl
     ->  Index Scan using tenk1_unique2 on tenk1
           Index Cond: (int8_tbl.q2 = unique2)
--- 3003,3012 ----
  ) ss
  where fault = 122
  order by fault;
!                                 QUERY PLAN
! --------------------------------------------------------------------------
   Nested Loop Left Join
!    Filter: ((COALESCE(tenk1.unique1, '-1'::integer) + int8_tbl.q1) = 122)
     ->  Seq Scan on int8_tbl
     ->  Index Scan using tenk1_unique2 on tenk1
           Index Cond: (int8_tbl.q2 = unique2)
*************** explain (verbose, costs off)
*** 4012,4025 ****
  select * from
    int8_tbl a left join
    lateral (select *, coalesce(a.q2, 42) as x from int8_tbl b) ss on a.q2 = ss.q1;
!                            QUERY PLAN
! ----------------------------------------------------------------
   Nested Loop Left Join
!    Output: a.q1, a.q2, b.q1, b.q2, (COALESCE(a.q2, 42::bigint))
     ->  Seq Scan on public.int8_tbl a
           Output: a.q1, a.q2
     ->  Seq Scan on public.int8_tbl b
!          Output: b.q1, b.q2, COALESCE(a.q2, 42::bigint)
           Filter: (a.q2 = b.q1)
  (7 rows)

--- 4012,4025 ----
  select * from
    int8_tbl a left join
    lateral (select *, coalesce(a.q2, 42) as x from int8_tbl b) ss on a.q2 = ss.q1;
!                             QUERY PLAN
! ------------------------------------------------------------------
   Nested Loop Left Join
!    Output: a.q1, a.q2, b.q1, b.q2, (COALESCE(a.q2, '42'::bigint))
     ->  Seq Scan on public.int8_tbl a
           Output: a.q1, a.q2
     ->  Seq Scan on public.int8_tbl b
!          Output: b.q1, b.q2, COALESCE(a.q2, '42'::bigint)
           Filter: (a.q2 = b.q1)
  (7 rows)

*************** select * from
*** 4235,4266 ****
      lateral (select q1, coalesce(ss1.x,q2) as y from int8_tbl d) ss2
    ) on c.q2 = ss2.q1,
    lateral (select ss2.y offset 0) ss3;
!                                                                                   QUERY PLAN
                                                        
!
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Nested Loop
!    Output: c.q1, c.q2, a.q1, a.q2, b.q1, (COALESCE(b.q2, 42::bigint)), d.q1, (COALESCE((COALESCE(b.q2, 42::bigint)),
d.q2)),((COALESCE((COALESCE(b.q2, 42::bigint)), d.q2))) 
     ->  Hash Right Join
!          Output: c.q1, c.q2, a.q1, a.q2, b.q1, d.q1, (COALESCE(b.q2, 42::bigint)), (COALESCE((COALESCE(b.q2,
42::bigint)),d.q2)) 
           Hash Cond: (d.q1 = c.q2)
           ->  Nested Loop
!                Output: a.q1, a.q2, b.q1, d.q1, (COALESCE(b.q2, 42::bigint)), (COALESCE((COALESCE(b.q2, 42::bigint)),
d.q2))
                 ->  Hash Left Join
!                      Output: a.q1, a.q2, b.q1, (COALESCE(b.q2, 42::bigint))
                       Hash Cond: (a.q2 = b.q1)
                       ->  Seq Scan on public.int8_tbl a
                             Output: a.q1, a.q2
                       ->  Hash
!                            Output: b.q1, (COALESCE(b.q2, 42::bigint))
                             ->  Seq Scan on public.int8_tbl b
!                                  Output: b.q1, COALESCE(b.q2, 42::bigint)
                 ->  Seq Scan on public.int8_tbl d
!                      Output: d.q1, COALESCE((COALESCE(b.q2, 42::bigint)), d.q2)
           ->  Hash
                 Output: c.q1, c.q2
                 ->  Seq Scan on public.int8_tbl c
                       Output: c.q1, c.q2
     ->  Result
!          Output: (COALESCE((COALESCE(b.q2, 42::bigint)), d.q2))
  (24 rows)

  -- case that breaks the old ph_may_need optimization
--- 4235,4266 ----
      lateral (select q1, coalesce(ss1.x,q2) as y from int8_tbl d) ss2
    ) on c.q2 = ss2.q1,
    lateral (select ss2.y offset 0) ss3;
!                                                                                      QUERY PLAN
                                                              
!
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Nested Loop
!    Output: c.q1, c.q2, a.q1, a.q2, b.q1, (COALESCE(b.q2, '42'::bigint)), d.q1, (COALESCE((COALESCE(b.q2,
'42'::bigint)),d.q2)), ((COALESCE((COALESCE(b.q2, '42'::bigint)), d.q2))) 
     ->  Hash Right Join
!          Output: c.q1, c.q2, a.q1, a.q2, b.q1, d.q1, (COALESCE(b.q2, '42'::bigint)), (COALESCE((COALESCE(b.q2,
'42'::bigint)),d.q2)) 
           Hash Cond: (d.q1 = c.q2)
           ->  Nested Loop
!                Output: a.q1, a.q2, b.q1, d.q1, (COALESCE(b.q2, '42'::bigint)), (COALESCE((COALESCE(b.q2,
'42'::bigint)),d.q2)) 
                 ->  Hash Left Join
!                      Output: a.q1, a.q2, b.q1, (COALESCE(b.q2, '42'::bigint))
                       Hash Cond: (a.q2 = b.q1)
                       ->  Seq Scan on public.int8_tbl a
                             Output: a.q1, a.q2
                       ->  Hash
!                            Output: b.q1, (COALESCE(b.q2, '42'::bigint))
                             ->  Seq Scan on public.int8_tbl b
!                                  Output: b.q1, COALESCE(b.q2, '42'::bigint)
                 ->  Seq Scan on public.int8_tbl d
!                      Output: d.q1, COALESCE((COALESCE(b.q2, '42'::bigint)), d.q2)
           ->  Hash
                 Output: c.q1, c.q2
                 ->  Seq Scan on public.int8_tbl c
                       Output: c.q1, c.q2
     ->  Result
!          Output: (COALESCE((COALESCE(b.q2, '42'::bigint)), d.q2))
  (24 rows)

  -- case that breaks the old ph_may_need optimization
diff --git a/src/test/regress/expected/rowtypes.out b/src/test/regress/expected/rowtypes.out
index 54525de..d3a98d1 100644
*** a/src/test/regress/expected/rowtypes.out
--- b/src/test/regress/expected/rowtypes.out
*************** ERROR:  cannot compare dissimilar column
*** 284,293 ****
  explain (costs off)
  select * from int8_tbl i8
  where i8 in (row(123,456)::int8_tbl, '(4567890123456789,123)');
!                                                  QUERY PLAN
! -------------------------------------------------------------------------------------------------------------
   Seq Scan on int8_tbl i8
!    Filter: (i8.* = ANY (ARRAY[ROW(123::bigint, 456::bigint)::int8_tbl, '(4567890123456789,123)'::int8_tbl]))
  (2 rows)

  select * from int8_tbl i8
--- 284,293 ----
  explain (costs off)
  select * from int8_tbl i8
  where i8 in (row(123,456)::int8_tbl, '(4567890123456789,123)');
!                                                    QUERY PLAN
! -----------------------------------------------------------------------------------------------------------------
   Seq Scan on int8_tbl i8
!    Filter: (i8.* = ANY (ARRAY[ROW('123'::bigint, '456'::bigint)::int8_tbl, '(4567890123456789,123)'::int8_tbl]))
  (2 rows)

  select * from int8_tbl i8
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index a678e96..016571b 100644
*** a/src/test/regress/expected/union.out
--- b/src/test/regress/expected/union.out
*************** SELECT * FROM
*** 654,666 ****
     UNION
     SELECT 2 AS t, 4 AS x) ss
  WHERE x > 3;
!                                  QUERY PLAN
! ----------------------------------------------------------------------------
   Subquery Scan on ss
     Filter: (ss.x > 3)
     ->  Unique
           ->  Sort
!                Sort Key: (1), (((random() * 3::double precision))::integer)
                 ->  Append
                       ->  Result
                       ->  Result
--- 654,666 ----
     UNION
     SELECT 2 AS t, 4 AS x) ss
  WHERE x > 3;
!                                   QUERY PLAN
! ------------------------------------------------------------------------------
   Subquery Scan on ss
     Filter: (ss.x > 3)
     ->  Unique
           ->  Sort
!                Sort Key: (1), (((random() * '3'::double precision))::integer)
                 ->  Append
                       ->  Result
                       ->  Result
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 6986f47..a31ec34 100644
*** a/src/test/regress/expected/with.out
--- b/src/test/regress/expected/with.out
*************** SELECT * FROM parent;
*** 2083,2090 ****
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH wcte AS ( INSERT INTO int8_tbl VALUES ( 42, 47 ) RETURNING q2 )
  DELETE FROM a USING wcte WHERE aa = q2;
!                    QUERY PLAN
! ------------------------------------------------
   Delete on public.a
     Delete on public.a
     Delete on public.b
--- 2083,2090 ----
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH wcte AS ( INSERT INTO int8_tbl VALUES ( 42, 47 ) RETURNING q2 )
  DELETE FROM a USING wcte WHERE aa = q2;
!                      QUERY PLAN
! ----------------------------------------------------
   Delete on public.a
     Delete on public.a
     Delete on public.b
*************** DELETE FROM a USING wcte WHERE aa = q2;
*** 2094,2100 ****
       ->  Insert on public.int8_tbl
             Output: int8_tbl.q2
             ->  Result
!                  Output: 42::bigint, 47::bigint
     ->  Nested Loop
           Output: a.ctid, wcte.*
           Join Filter: (a.aa = wcte.q2)
--- 2094,2100 ----
       ->  Insert on public.int8_tbl
             Output: int8_tbl.q2
             ->  Result
!                  Output: '42'::bigint, '47'::bigint
     ->  Nested Loop
           Output: a.ctid, wcte.*
           Join Filter: (a.aa = wcte.q2)

Re: pg_get_constraintdef() doesn't always give an equal constraint

From
Tom Lane
Date:
Jon Jensen <jon@endpoint.com> writes:
> On Mon, 23 Mar 2015, Tom Lane wrote:
>> A larger issue is that I have a nasty feeling that this isn't the
>> only place where ruleutils.c output might be read in a way that's
>> functionally equivalent to the original, but not the exact same
>> parsetree.

> Jeff pointed this out to me and as we discussed it, it seems like a good
> time to add a regression test to make us aware of the current parse +
> deparse behavior, and call attention to any changes in that behavior
> later.

> Attached is a Perl script that generates many combinations of CHECK
> constraints like Jeff's, but with various types, and the result of running
> it in the regression test suite.

> Would something like this be welcome in the regression test suite?

I can't really see adding something like this to the regression tests;
the value per cycle expended, over the long term, just isn't there IMO.

We could possibly use this approach as a one-shot test for vetting a
proposed patch ... but as you've got it set up, it seems like it requires
manual inspection of each output to see if it's OK or not, which isn't
all that helpful.  I wonder whether there isn't a more direct way of
testing whether each output re-parses to the same node tree.

            regards, tom lane

Re: pg_get_constraintdef() doesn't always give an equal constraint

From
Jon Jensen
Date:
On Sat, 28 Mar 2015, Tom Lane wrote:

>> Attached is a Perl script that generates many combinations of CHECK
>> constraints like Jeff's, but with various types, and the result of
>> running it in the regression test suite.
>
>> Would something like this be welcome in the regression test suite?
>
> I can't really see adding something like this to the regression tests;
> the value per cycle expended, over the long term, just isn't there IMO.
>
> We could possibly use this approach as a one-shot test for vetting a
> proposed patch ... but as you've got it set up, it seems like it
> requires manual inspection of each output to see if it's OK or not,
> which isn't all that helpful.

Well, as part of the standard regression test suite it's comparing stored
expected output with newly-generated output, like all the other tests. I
must be misunderstanding what you mean because nothing manual about that,
is there?

> I wonder whether there isn't a more direct way of testing whether each
> output re-parses to the same node tree.

That would be nice: Perhaps a function to parse into a node tree. But I'm
not aware of anything like that existing and it seems it would require
creating a node tree datatype to put the parse result into, to then feed
to a new deparse function variant.

Jon

Re: pg_get_constraintdef() doesn't always give an equal constraint

From
Tom Lane
Date:
Jon Jensen <jon@endpoint.com> writes:
> On Sat, 28 Mar 2015, Tom Lane wrote:
>> We could possibly use this approach as a one-shot test for vetting a
>> proposed patch ... but as you've got it set up, it seems like it
>> requires manual inspection of each output to see if it's OK or not,
>> which isn't all that helpful.

> Well, as part of the standard regression test suite it's comparing stored
> expected output with newly-generated output, like all the other tests. I
> must be misunderstanding what you mean because nothing manual about that,
> is there?

My point is that the expected output has to be manually checked initially
to see if it's right or not; and since it often doesn't look exactly like
the original SQL, that checking is painful and not foolproof.

It's interesting to consider something like the COPY_PARSE_PLAN_TREES
#define, which inserts code into the backend to see whether copyfuncs.c
and equalfuncs.c are sane or not.  Running the regression tests with
that turned on provides pretty thorough coverage of those modules.

            regards, tom lane

Re: pg_get_constraintdef() doesn't always give an equal constraint

From
Jeff Davis
Date:
On Sat, 2015-03-28 at 15:08 -0400, Tom Lane wrote:
> Basically, I think we have to change ruleutils so that it quotes anything
> that wouldn't be seen as a simple integer or numeric constant by the
> lexer+grammar.
>
> The attached patch does this; the regression test changes illustrate
> what's going to happen to the output if we do this.

This fixes my problem, thank you.

There are two switch statements in that function, and they have
overlapping purposes. Can't we just always set needlabel in the first
switch statement, and remove the second switch statement?

When I refactored it to do that (attached), I noticed that if showtype
== -1, it returns before appending the collation. I don't see any reason
those should be related, so I removed the early return (I don't think
there's an actual bug there, though).

There are more diffs in the contrib tests, also included in my version.

> Looking at the changes, I'm not 100% convinced we want to back-patch.
> As you say, nobody's complained of this problem before, and I'm worried
> that people will see the output changes as a bigger deal than the issue
> we're trying to fix.

Fine with me.

Regards,
    Jeff Davis


Attachment

Re: pg_get_constraintdef() doesn't always give an equal constraint

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Jon Jensen <jon@endpoint.com> writes:
> > Well, as part of the standard regression test suite it's comparing stor=
ed=20
> > expected output with newly-generated output, like all the other tests. =
I=20
> > must be misunderstanding what you mean because nothing manual about tha=
t,=20
> > is there?
>=20
> My point is that the expected output has to be manually checked initially
> to see if it's right or not; and since it often doesn't look exactly like
> the original SQL, that checking is painful and not foolproof.
>=20
> It's interesting to consider something like the COPY_PARSE_PLAN_TREES
> #define, which inserts code into the backend to see whether copyfuncs.c
> and equalfuncs.c are sane or not.  Running the regression tests with
> that turned on provides pretty thorough coverage of those modules.

Do we have any buildfarm members which are running with that?  If not,
would anyone object to doing so?

    Thanks,

        Stephen

Re: pg_get_constraintdef() doesn't always give an equal constraint

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> It's interesting to consider something like the COPY_PARSE_PLAN_TREES
>> #define, which inserts code into the backend to see whether copyfuncs.c
>> and equalfuncs.c are sane or not.  Running the regression tests with
>> that turned on provides pretty thorough coverage of those modules.

> Do we have any buildfarm members which are running with that?  If not,
> would anyone object to doing so?

Yeah, it would be a fine idea to have a critter or two using that.

            regards, tom lane

Re: pg_get_constraintdef() doesn't always give an equal constraint

From
Jon Jensen
Date:
On Sun, 29 Mar 2015, Tom Lane wrote:

> Jon Jensen <jon@endpoint.com> writes:
>> On Sat, 28 Mar 2015, Tom Lane wrote:
>>> We could possibly use this approach as a one-shot test for vetting a
>>> proposed patch ... but as you've got it set up, it seems like it
>>> requires manual inspection of each output to see if it's OK or not,
>>> which isn't all that helpful.
>
>> Well, as part of the standard regression test suite it's comparing stored
>> expected output with newly-generated output, like all the other tests. I
>> must be misunderstanding what you mean because nothing manual about that,
>> is there?
>
> My point is that the expected output has to be manually checked initially
> to see if it's right or not; and since it often doesn't look exactly like
> the original SQL, that checking is painful and not foolproof.

Oh, I see what you mean. I'm going from the assumption that a new
regression test is useful even if it starts life documenting only the
current behavior, without any idea of whether that was or is correct by
some outside standard.

I think we should know if the parse/deparse round-trip changes behavior
due to some change, regardless of whether it's better or worse. The
regression test should call our attention to it and then we can spend the
human cycles to investigate whether it's better now, or worse. In any
case, the difference could cause trouble for people who depended on the
old behavior, even if the new is ostensibly better.

> It's interesting to consider something like the COPY_PARSE_PLAN_TREES
> #define, which inserts code into the backend to see whether copyfuncs.c
> and equalfuncs.c are sane or not.  Running the regression tests with
> that turned on provides pretty thorough coverage of those modules.

Maybe that's the way to go, then, if it avoids the necessity of creating
all those tables with CHECK constraints, since we don't have a
user-accessible parse function.

Jon

--
Jon Jensen
End Point Corporation
https://www.endpoint.com/

Re: pg_get_constraintdef() doesn't always give an equal constraint

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Sat, 2015-03-28 at 15:08 -0400, Tom Lane wrote:
>> Basically, I think we have to change ruleutils so that it quotes anything
>> that wouldn't be seen as a simple integer or numeric constant by the
>> lexer+grammar.
>>
>> The attached patch does this; the regression test changes illustrate
>> what's going to happen to the output if we do this.

> This fixes my problem, thank you.

OK.  Given the lack of other suggestions, let's press forward with
fixing it this way.

> There are two switch statements in that function, and they have
> overlapping purposes. Can't we just always set needlabel in the first
> switch statement, and remove the second switch statement?

Meh.  The two switches are considering different things and don't have
exactly the same set of switch labels, so I don't really see that as
an improvement ...

> When I refactored it to do that (attached), I noticed that if showtype
> == -1, it returns before appending the collation. I don't see any reason
> those should be related, so I removed the early return (I don't think
> there's an actual bug there, though).

... especially since this change is outright wrong.  showtype == -1
implies that the caller is going to print a cast; we cannot insert a
COLLATE clause in the middle of that without breaking things.  (It may
be that this case is unreachable because a vanilla Const would never
have nondefault collation, but that doesn't make it a good change.)

>> Looking at the changes, I'm not 100% convinced we want to back-patch.
>> As you say, nobody's complained of this problem before, and I'm worried
>> that people will see the output changes as a bigger deal than the issue
>> we're trying to fix.

> Fine with me.

Agreed, we'll fix HEAD only.

            regards, tom lane

Re: pg_get_constraintdef() doesn't always give an equal constraint

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> It's interesting to consider something like the COPY_PARSE_PLAN_TREES
> >> #define, which inserts code into the backend to see whether copyfuncs.c
> >> and equalfuncs.c are sane or not.  Running the regression tests with
> >> that turned on provides pretty thorough coverage of those modules.
>=20
> > Do we have any buildfarm members which are running with that?  If not,
> > would anyone object to doing so?
>=20
> Yeah, it would be a fine idea to have a critter or two using that.

Apologies for not having gotten to this, and thanks for setting one up.

    Stephen