Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types. - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.
Date
Msg-id 11695.1483840542@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Assignment of valid collation for SET operations onqueries with UNKNOWN types.  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses Re: [HACKERS] Assignment of valid collation for SET operations onqueries with UNKNOWN types.
Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.
List pgsql-hackers
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
> On Tue, Jan 3, 2017 at 5:57 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>> Are you suggesting extending the patch to include coercing from unknown to
>> text for all possible cases where a column of unknown type is being created?

> I guess, that's what Tom is suggesting.

Yes; I think the point is we should change the semantics we assume for
"SELECT 'foo'", not only for views but across the board.  There are plenty
of non-view-related cases where it doesn't behave well, for example

regression=# select * from
  (select 'foo' from generate_series(1,3)) ss order by 1;
ERROR:  failed to find conversion function from unknown to text

Furthermore, it seems like a seriously bad idea for "SELECT 'foo'" to mean
something different in the context of CREATE VIEW than it means elsewhere.

The trick is to not break cases where we've already hacked things to allow
external influence on the resolved types of SELECT-list items.  AFAICT,
these cases are just INSERT/SELECT and set operations (eg unions):

regression=# create table target (f1 int);
CREATE TABLE
regression=# explain verbose insert into target select '42' from generate_series(1,3);
                                       QUERY PLAN

--------------------------------------------------------------------------------
---------
 Insert on public.target  (cost=0.00..10.00 rows=1000 width=4)
   ->  Function Scan on pg_catalog.generate_series  (cost=0.00..10.00 rows=1000
width=4)
         Output: 42
         Function Call: generate_series(1, 3)
(4 rows)

regression=# explain verbose select '42' union all select 43;
                   QUERY PLAN
------------------------------------------------
 Append  (cost=0.00..0.04 rows=2 width=4)
   ->  Result  (cost=0.00..0.01 rows=1 width=4)
         Output: 42
   ->  Result  (cost=0.00..0.01 rows=1 width=4)
         Output: 43
(5 rows)

In both of the above cases, we're getting an integer constant not a
text or "unknown" constant, because the type information gets imported
from outside the "SELECT".

I spent some time fooling with this today and came up with the attached
patch.  I think this is basically the direction we should go in, but
there are various loose ends:

1. I adjusted a couple of existing regression tests whose results are
affected, but didn't do anything towards adding new tests showing the
desirable results of this change (as per the examples in this thread).

2. I didn't do anything about docs, either, though maybe no change
is needed.  One user-visible change from this is that queries should
never return any "unknown"-type columns to clients anymore.  But I
think that is not documented now, so maybe there's nothing to change.

3. We need to look at whether pg_upgrade is affected.  I think we
might be able to let it just ignore the issue for views, since they'd
get properly updated during the dump and reload anyway.  But if someone
had an actual table (or matview) with an "unknown" column, that would
be a pg_upgrade hazard, because "unknown" doesn't store like "text".

4. If "unknown" were marked as a pseudotype not base type in pg_type
(ie, typtype "p" not "b") then the special case for it in
CheckAttributeType could go away completely.  I'm not really sure
why it's "b" today --- maybe specifically because of this point that
it's currently possible to create tables with unknown columns?  But
I've not looked at what all the implications of changing that might
be, and anyway it could be left for a followon patch.

5. I noticed that the "resolveUnknown" arguments of transformSortClause
and other functions in parse_clause.c could go away: there's really no
reason not to just treat them as "true" always.  But that could be
separate cleanup as well.

Anybody want to hack on those loose ends?

            regards, tom lane

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 72aa0dd..af6ba47 100644
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
*************** CheckAttributeType(const char *attname,
*** 490,507 ****
      char        att_typtype = get_typtype(atttypid);
      Oid            att_typelem;

!     if (atttypid == UNKNOWNOID)
!     {
!         /*
!          * Warn user, but don't fail, if column to be created has UNKNOWN type
!          * (usually as a result of a 'retrieve into' - jolly)
!          */
!         ereport(WARNING,
!                 (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
!                  errmsg("column \"%s\" has type \"unknown\"", attname),
!                  errdetail("Proceeding with relation creation anyway.")));
!     }
!     else if (att_typtype == TYPTYPE_PSEUDO)
      {
          /*
           * Refuse any attempt to create a pseudo-type column, except for a
--- 490,497 ----
      char        att_typtype = get_typtype(atttypid);
      Oid            att_typelem;

!     if (atttypid == UNKNOWNOID ||
!         att_typtype == TYPTYPE_PSEUDO)
      {
          /*
           * Refuse any attempt to create a pseudo-type column, except for a
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 5116cbb..a3be333 100644
*** a/src/backend/parser/analyze.c
--- b/src/backend/parser/analyze.c
*************** parse_analyze_varparams(Node *parseTree,
*** 155,167 ****
  Query *
  parse_sub_analyze(Node *parseTree, ParseState *parentParseState,
                    CommonTableExpr *parentCTE,
!                   bool locked_from_parent)
  {
      ParseState *pstate = make_parsestate(parentParseState);
      Query       *query;

      pstate->p_parent_cte = parentCTE;
      pstate->p_locked_from_parent = locked_from_parent;

      query = transformStmt(pstate, parseTree);

--- 155,169 ----
  Query *
  parse_sub_analyze(Node *parseTree, ParseState *parentParseState,
                    CommonTableExpr *parentCTE,
!                   bool locked_from_parent,
!                   bool resolve_unknowns)
  {
      ParseState *pstate = make_parsestate(parentParseState);
      Query       *query;

      pstate->p_parent_cte = parentCTE;
      pstate->p_locked_from_parent = locked_from_parent;
+     pstate->p_resolve_unknowns = resolve_unknowns;

      query = transformStmt(pstate, parseTree);

*************** transformInsertStmt(ParseState *pstate,
*** 552,561 ****
--- 554,570 ----
           * otherwise the behavior of SELECT within INSERT might be different
           * from a stand-alone SELECT. (Indeed, Postgres up through 6.5 had
           * bugs of just that nature...)
+          *
+          * The sole exception is that we prevent resolving unknown-type
+          * outputs as TEXT.  This does not change the semantics since if the
+          * column type matters semantically, it would have been resolved to
+          * something else anyway.  Doing this lets us resolve such outputs as
+          * the target column's type, which we handle below.
           */
          sub_pstate->p_rtable = sub_rtable;
          sub_pstate->p_joinexprs = NIL;    /* sub_rtable has no joins */
          sub_pstate->p_namespace = sub_namespace;
+         sub_pstate->p_resolve_unknowns = false;

          selectQuery = transformStmt(sub_pstate, stmt->selectStmt);

*************** transformSelectStmt(ParseState *pstate,
*** 1252,1257 ****
--- 1261,1270 ----
                                                     pstate->p_windowdefs,
                                                     &qry->targetList);

+     /* resolve any still-unresolved output columns as being type text */
+     if (pstate->p_resolve_unknowns)
+         resolveTargetListUnknowns(pstate, qry->targetList);
+
      qry->rtable = pstate->p_rtable;
      qry->jointree = makeFromExpr(pstate->p_joinlist, qual);

*************** transformSetOperationTree(ParseState *ps
*** 1826,1836 ****
          /*
           * Transform SelectStmt into a Query.
           *
           * Note: previously transformed sub-queries don't affect the parsing
           * of this sub-query, because they are not in the toplevel pstate's
           * namespace list.
           */
!         selectQuery = parse_sub_analyze((Node *) stmt, pstate, NULL, false);

          /*
           * Check for bogus references to Vars on the current query level (but
--- 1839,1857 ----
          /*
           * Transform SelectStmt into a Query.
           *
+          * This works the same as SELECT transformation normally would, except
+          * that we prevent resolving unknown-type outputs as TEXT.  This does
+          * not change the subquery's semantics since if the column type
+          * matters semantically, it would have been resolved to something else
+          * anyway.  Doing this lets us resolve such outputs using
+          * select_common_type(), below.
+          *
           * Note: previously transformed sub-queries don't affect the parsing
           * of this sub-query, because they are not in the toplevel pstate's
           * namespace list.
           */
!         selectQuery = parse_sub_analyze((Node *) stmt, pstate,
!                                         NULL, false, false);

          /*
           * Check for bogus references to Vars on the current query level (but
*************** transformReturningList(ParseState *pstat
*** 2333,2338 ****
--- 2354,2363 ----
      /* mark column origins */
      markTargetListOrigins(pstate, rlist);

+     /* resolve any still-unresolved output columns as being type text */
+     if (pstate->p_resolve_unknowns)
+         resolveTargetListUnknowns(pstate, rlist);
+
      /* restore state */
      pstate->p_next_resno = save_next_resno;

diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index d788ffd..3a01e6c 100644
*** a/src/backend/parser/parse_clause.c
--- b/src/backend/parser/parse_clause.c
*************** transformRangeSubselect(ParseState *psta
*** 471,477 ****
       * Analyze and transform the subquery.
       */
      query = parse_sub_analyze(r->subquery, pstate, NULL,
!                               isLockedRefname(pstate, r->alias->aliasname));

      /* Restore state */
      pstate->p_lateral_active = false;
--- 471,478 ----
       * Analyze and transform the subquery.
       */
      query = parse_sub_analyze(r->subquery, pstate, NULL,
!                               isLockedRefname(pstate, r->alias->aliasname),
!                               true);

      /* Restore state */
      pstate->p_lateral_active = false;
diff --git a/src/backend/parser/parse_cte.c b/src/backend/parser/parse_cte.c
index fc8c15b..dfbcaa2 100644
*** a/src/backend/parser/parse_cte.c
--- b/src/backend/parser/parse_cte.c
*************** analyzeCTE(ParseState *pstate, CommonTab
*** 241,247 ****
      /* Analysis not done already */
      Assert(!IsA(cte->ctequery, Query));

!     query = parse_sub_analyze(cte->ctequery, pstate, cte, false);
      cte->ctequery = (Node *) query;

      /*
--- 241,247 ----
      /* Analysis not done already */
      Assert(!IsA(cte->ctequery, Query));

!     query = parse_sub_analyze(cte->ctequery, pstate, cte, false, true);
      cte->ctequery = (Node *) query;

      /*
*************** analyzeCTETargetList(ParseState *pstate,
*** 393,403 ****

          /*
           * If the CTE is recursive, force the exposed column type of any
!          * "unknown" column to "text".  This corresponds to the fact that
!          * SELECT 'foo' UNION SELECT 'bar' will ultimately produce text. We
!          * might see "unknown" as a result of an untyped literal in the
!          * non-recursive term's select list, and if we don't convert to text
!          * then we'll have a mismatch against the UNION result.
           *
           * The column might contain 'foo' COLLATE "bar", so don't override
           * collation if it's already set.
--- 393,402 ----

          /*
           * If the CTE is recursive, force the exposed column type of any
!          * "unknown" column to "text".  We must deal with this here because
!          * we're called on the non-recursive term before there's been any
!          * attempt to force unknown output columns to some other type.  We
!          * have to resolve unknowns before looking at the recursive term.
           *
           * The column might contain 'foo' COLLATE "bar", so don't override
           * collation if it's already set.
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index f62e45f..db7fb1e 100644
*** a/src/backend/parser/parse_expr.c
--- b/src/backend/parser/parse_expr.c
*************** transformSubLink(ParseState *pstate, Sub
*** 1845,1851 ****
      /*
       * OK, let's transform the sub-SELECT.
       */
!     qtree = parse_sub_analyze(sublink->subselect, pstate, NULL, false);

      /*
       * Check that we got something reasonable.  Many of these conditions are
--- 1845,1851 ----
      /*
       * OK, let's transform the sub-SELECT.
       */
!     qtree = parse_sub_analyze(sublink->subselect, pstate, NULL, false, true);

      /*
       * Check that we got something reasonable.  Many of these conditions are
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 73e7d65..2a5f147 100644
*** a/src/backend/parser/parse_node.c
--- b/src/backend/parser/parse_node.c
*************** make_parsestate(ParseState *parentParseS
*** 51,56 ****
--- 51,57 ----

      /* Fill in fields that don't start at null/false/zero */
      pstate->p_next_resno = 1;
+     pstate->p_resolve_unknowns = true;

      if (parentParseState)
      {
diff --git a/src/backend/parser/parse_target.c b/src/backend/parser/parse_target.c
index 081a8dd..2576e31 100644
*** a/src/backend/parser/parse_target.c
--- b/src/backend/parser/parse_target.c
*************** transformExpressionList(ParseState *psta
*** 289,300 ****


  /*
   * markTargetListOrigins()
   *        Mark targetlist columns that are simple Vars with the source
   *        table's OID and column number.
   *
!  * Currently, this is done only for SELECT targetlists, since we only
!  * need the info if we are going to send it to the frontend.
   */
  void
  markTargetListOrigins(ParseState *pstate, List *targetlist)
--- 289,329 ----


  /*
+  * resolveTargetListUnknowns()
+  *        Convert any unknown-type targetlist entries to type TEXT.
+  *
+  * We do this after we've exhausted all other ways of identifying the output
+  * column types of a query.
+  */
+ void
+ resolveTargetListUnknowns(ParseState *pstate, List *targetlist)
+ {
+     ListCell   *l;
+
+     foreach(l, targetlist)
+     {
+         TargetEntry *tle = (TargetEntry *) lfirst(l);
+         Oid            restype = exprType((Node *) tle->expr);
+
+         if (restype == UNKNOWNOID)
+         {
+             tle->expr = (Expr *) coerce_type(pstate, (Node *) tle->expr,
+                                              restype, TEXTOID, -1,
+                                              COERCION_IMPLICIT,
+                                              COERCE_IMPLICIT_CAST,
+                                              -1);
+         }
+     }
+ }
+
+
+ /*
   * markTargetListOrigins()
   *        Mark targetlist columns that are simple Vars with the source
   *        table's OID and column number.
   *
!  * Currently, this is done only for SELECT targetlists and RETURNING lists,
!  * since we only need the info if we are going to send it to the frontend.
   */
  void
  markTargetListOrigins(ParseState *pstate, List *targetlist)
diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h
index 3bf4edd..705f833 100644
*** a/src/include/parser/analyze.h
--- b/src/include/parser/analyze.h
*************** extern Query *parse_analyze_varparams(No
*** 29,35 ****

  extern Query *parse_sub_analyze(Node *parseTree, ParseState *parentParseState,
                    CommonTableExpr *parentCTE,
!                   bool locked_from_parent);

  extern Query *transformTopLevelStmt(ParseState *pstate, Node *parseTree);
  extern Query *transformStmt(ParseState *pstate, Node *parseTree);
--- 29,36 ----

  extern Query *parse_sub_analyze(Node *parseTree, ParseState *parentParseState,
                    CommonTableExpr *parentCTE,
!                   bool locked_from_parent,
!                   bool resolve_unknowns);

  extern Query *transformTopLevelStmt(ParseState *pstate, Node *parseTree);
  extern Query *transformStmt(ParseState *pstate, Node *parseTree);
diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.h
index 7cdf142..dda108e 100644
*** a/src/include/parser/parse_node.h
--- b/src/include/parser/parse_node.h
*************** typedef Node *(*CoerceParamHook) (ParseS
*** 149,154 ****
--- 149,157 ----
   * p_locked_from_parent: true if parent query level applies FOR UPDATE/SHARE
   * to this subquery as a whole.
   *
+  * p_resolve_unknowns: resolve unknown-type SELECT output columns as type TEXT
+  * (this is true by default).
+  *
   * p_hasAggs, p_hasWindowFuncs, etc: true if we've found any of the indicated
   * constructs in the query.
   *
*************** struct ParseState
*** 181,186 ****
--- 184,191 ----
      List       *p_locking_clause;        /* raw FOR UPDATE/FOR SHARE info */
      bool        p_locked_from_parent;    /* parent has marked this subquery
                                           * with FOR UPDATE/FOR SHARE */
+     bool        p_resolve_unknowns;        /* resolve unknown-type SELECT outputs
+                                          * as type text */

      /* Flags telling about things found in the query: */
      bool        p_hasAggs;
diff --git a/src/include/parser/parse_target.h b/src/include/parser/parse_target.h
index e035aac..d06a235 100644
*** a/src/include/parser/parse_target.h
--- b/src/include/parser/parse_target.h
*************** extern List *transformTargetList(ParseSt
*** 21,26 ****
--- 21,27 ----
                      ParseExprKind exprKind);
  extern List *transformExpressionList(ParseState *pstate, List *exprlist,
                          ParseExprKind exprKind, bool allowDefault);
+ extern void resolveTargetListUnknowns(ParseState *pstate, List *targetlist);
  extern void markTargetListOrigins(ParseState *pstate, List *targetlist);
  extern TargetEntry *transformTargetEntry(ParseState *pstate,
                       Node *node, Node *expr, ParseExprKind exprKind,
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 02fa08e..3b7f689 100644
*** a/src/test/regress/expected/with.out
--- b/src/test/regress/expected/with.out
*************** SELECT * FROM t LIMIT 10;
*** 133,141 ****

  -- Test behavior with an unknown-type literal in the WITH
  WITH q AS (SELECT 'foo' AS x)
! SELECT x, x IS OF (unknown) as is_unknown FROM q;
!   x  | is_unknown
! -----+------------
   foo | t
  (1 row)

--- 133,141 ----

  -- Test behavior with an unknown-type literal in the WITH
  WITH q AS (SELECT 'foo' AS x)
! SELECT x, x IS OF (text) AS is_text FROM q;
!   x  | is_text
! -----+---------
   foo | t
  (1 row)

*************** WITH RECURSIVE t(n) AS (
*** 144,150 ****
  UNION ALL
      SELECT n || ' bar' FROM t WHERE length(n) < 20
  )
! SELECT n, n IS OF (text) as is_text FROM t;
              n            | is_text
  -------------------------+---------
   foo                     | t
--- 144,150 ----
  UNION ALL
      SELECT n || ' bar' FROM t WHERE length(n) < 20
  )
! SELECT n, n IS OF (text) AS is_text FROM t;
              n            | is_text
  -------------------------+---------
   foo                     | t
*************** SELECT n, n IS OF (text) as is_text FROM
*** 155,160 ****
--- 155,172 ----
   foo bar bar bar bar bar | t
  (6 rows)

+ -- In a perfect world, this would work and resolve the literal as int ...
+ -- but for now, we have to be content with resolving to text too soon.
+ WITH RECURSIVE t(n) AS (
+     SELECT '7'
+ UNION ALL
+     SELECT n+1 FROM t WHERE n < 10
+ )
+ SELECT n, n IS OF (int) AS is_int FROM t;
+ ERROR:  operator does not exist: text + integer
+ LINE 4:     SELECT n+1 FROM t WHERE n < 10
+                     ^
+ HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.
  --
  -- Some examples with a tree
  --
diff --git a/src/test/regress/output/create_function_1.source b/src/test/regress/output/create_function_1.source
index 30c2936..957595c 100644
*** a/src/test/regress/output/create_function_1.source
--- b/src/test/regress/output/create_function_1.source
*************** CREATE FUNCTION test_atomic_ops()
*** 59,65 ****
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
      AS 'SELECT ''not an integer'';';
  ERROR:  return type mismatch in function declared to return integer
! DETAIL:  Actual return type is unknown.
  CONTEXT:  SQL function "test1"
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
      AS 'not even SQL';
--- 59,65 ----
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
      AS 'SELECT ''not an integer'';';
  ERROR:  return type mismatch in function declared to return integer
! DETAIL:  Actual return type is text.
  CONTEXT:  SQL function "test1"
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
      AS 'not even SQL';
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index 7ee32ba..08ddc8b 100644
*** a/src/test/regress/sql/with.sql
--- b/src/test/regress/sql/with.sql
*************** SELECT * FROM t LIMIT 10;
*** 69,82 ****

  -- Test behavior with an unknown-type literal in the WITH
  WITH q AS (SELECT 'foo' AS x)
! SELECT x, x IS OF (unknown) as is_unknown FROM q;

  WITH RECURSIVE t(n) AS (
      SELECT 'foo'
  UNION ALL
      SELECT n || ' bar' FROM t WHERE length(n) < 20
  )
! SELECT n, n IS OF (text) as is_text FROM t;

  --
  -- Some examples with a tree
--- 69,91 ----

  -- Test behavior with an unknown-type literal in the WITH
  WITH q AS (SELECT 'foo' AS x)
! SELECT x, x IS OF (text) AS is_text FROM q;

  WITH RECURSIVE t(n) AS (
      SELECT 'foo'
  UNION ALL
      SELECT n || ' bar' FROM t WHERE length(n) < 20
  )
! SELECT n, n IS OF (text) AS is_text FROM t;
!
! -- In a perfect world, this would work and resolve the literal as int ...
! -- but for now, we have to be content with resolving to text too soon.
! WITH RECURSIVE t(n) AS (
!     SELECT '7'
! UNION ALL
!     SELECT n+1 FROM t WHERE n < 10
! )
! SELECT n, n IS OF (int) AS is_int FROM t;

  --
  -- Some examples with a tree

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: [HACKERS] Allow controlling location of tmp_install
Next
From: Jim Nasby
Date:
Subject: Re: [HACKERS] merging some features from plpgsql2 project