Re: Schema variables - new implementation for Postgres 15 - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Schema variables - new implementation for Postgres 15
Date
Msg-id CAFj8pRCk-AR50YUn9131du6G_DPcckm+uiD-MJozzGVO9au2Rw@mail.gmail.com
Whole thread Raw
In response to Re: Schema variables - new implementation for Postgres 15  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
Hi

po 17. 10. 2022 v 5:17 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
Hi,

On Thu, Oct 13, 2022 at 07:41:32AM +0200, Pavel Stehule wrote:
>
> > I fixed the commit message of 0001 patch. Fixed shadowed variables too.

Thanks!

> >
> > There is a partially open issue, where I and Julien are not sure about a
> > solution, and we would like to ask for the  community's opinion. I'll send
> > this query in separate mail.
> >
>
>  rebased with simplified code related to usage of pfree function

If anyone is curious the discussion happend at [1].

I looked at the patchset, this time focusing on the LET command.  Here at the
comments I have for now:

- gram.y

@@ -11918,6 +11920,7 @@ ExplainableStmt:
                        | CreateMatViewStmt
                        | RefreshMatViewStmt
                        | ExecuteStmt                                   /* by default all are $$=$1 */
+                       | LetStmt
                ;

(and other similar places) the comment should be kept to the last statement

fixed
 

Also, having LetStmt as an ExplainableStmt means it's allowed in a CTE:

cte_list:
                common_table_expr                                               { $$ = list_make1($1); }
                | cte_list ',' common_table_expr                { $$ = lappend($1, $3); }
                ;

common_table_expr:  name opt_name_list AS opt_materialized '(' PreparableStmt ')' opt_search_clause opt_cycle_clause

And doing so hits this assert in transformWithClause:

                if (!IsA(cte->ctequery, SelectStmt))
                {
                        /* must be a data-modifying statement */
                        Assert(IsA(cte->ctequery, InsertStmt) ||
                                   IsA(cte->ctequery, UpdateStmt) ||
                                   IsA(cte->ctequery, DeleteStmt));

                        pstate->p_hasModifyingCTE = true;
                }

and I'm assuming it would also fail on this in transformLetStmt:

+       /* There can't be any outer WITH to worry about */
+       Assert(pstate->p_ctenamespace == NIL);

I guess it makes sense to be able to explain a LetStmt (or using it in a
prepared statement), so it should be properly handled in transformSelectStmt.
Also, I don't see any test for a prepared LET statement, this should also be
covered.

The LET statement doesn't return data, so it should be disallowed similar like  MERGE statement

I enhanced the regression test about PREPARE of the LET statement. I found and fix the missing plan dependency of target variable of LET command



- transformLetStmt:

+       varid = IdentifyVariable(names, &attrname, &not_unique);

It would be nice to have a comment saying that the lock is acquired here

done
 

+       /* The grammar should have produced a SELECT */
+       if (!IsA(selectQuery, Query) ||
+               selectQuery->commandType != CMD_SELECT)
+               elog(ERROR, "unexpected non-SELECT command in LET command");

I'm wondering if this should be an Assert instead, as the grammar shouldn't
produce anything else no matter what how hard a user try.

done
 

+       /* don't allow multicolumn result */
+       if (list_length(exprList) != 1)
+               ereport(ERROR,
+                               (errcode(ERRCODE_SYNTAX_ERROR),
+                                errmsg("expression is not scalar value"),
+                                parser_errposition(pstate,
+                                                                       exprLocation((Node *) exprList))));

This isn't covered by any regression test and it probably should.  It can be
reached with something like

LET myvar = (null::pg_class).*;

The error message could also use a bit of improvement.

done - the error message is like related plpgsql error message
 

I see that a_expr allows a select statement in parens, but this leads to a
sublink which already has all the required protection to gurantee a single
column, and a single row at most during execution.  This one returns for
non-scalar case:

subquery must return only one column

Maybe use something similar for it, like "expression must return only one
column"?  Similarly the error message in svariableStartupReceiver could be made
more consistent with the related errors:

+               if (++outcols > 1)
+                       elog(ERROR, "svariable DestReceiver can take only one attribute");

done
 

While on svariableReceiver, I see that the current code assumes that the caller
did everything right.  That's the case right now, but it should still be made
more robust in case future code (or extensions) is added.  I'm thinking:

- svariableState.rows.  Currently not really used, should check that one and
  only one row is received in svariableReceiveSlot and
  svariableShutdownReceiver (if no row is received the variable won't be reset
  which should probably always happen once you setup an svariableReceiver)

done
 
- svariableState.typid, typmod and typlen should be double checked with the
  given varid in svariableStartupReceiver.

done
 
- svariableState.varid should be initialized with InvalidOid to avoid undefined
  behavior is caller forgets to set it.

svariableState is initialized by palloc0
 

I'm also wondering if SetVariableDestReceiverParams() should have an assert
like LockHeldByMe() for the given varid,

done
 
and maybe an assert that the varid is
a session variable, to avoid running a possibly expensive execution that will

done
 
fail when receiving the slot.  I think the function would be better named
SetVariableDestReceiverVarid() or something like that.

done
 

+void
+ExecuteLetStmt(ParseState *pstate,
+                          LetStmt *stmt,
+                          ParamListInfo params,
+                          QueryEnvironment *queryEnv,
+                          QueryCompletion *qc)
+{
+ [...]
+       /* run the plan to completion */
+       ExecutorRun(queryDesc, ForwardScanDirection, 2L, true);

Why 2 rows?  I'm assuming it's an attempt to detect queries that returns more
than 1 row, but it should be documented.  Note that as mentioned above the dest
receiver currently doesn't check it, so this definitely needs to be fixed.

done + check + tests

 

- IdentifyVariable:

*attrname can be set even is no variable is identified.  I guess that's ok as
it avoids useless code, but it should probably be documented in the function
header.

This is a side effect. The attrname is used only when the returned oid is valid. I checked code, and
I extended comments on the function.

I am sending updated patch, next points I'll process tomorrow



 

Also, the API doesn't look ideal.  AFAICS the only reason this function doesn't
error out in case of ambiguous name is that transformColumnRef may check if a
given name shadows a variable when session_variables_ambiguity_warning is set.
But since IdentifyVariable returns InvalidOid if the given list of identifiers
is ambiguous, it seems that the shadow detection can fail to detect a shadowed
reference if multiple variable would shadow the name:

# CREATE TYPE ab AS (a integer, b integer);
CREATE TYPE
# CREATE VARIABLE v_ab AS ab;
CREATE VARIABLE

# CREATE TABLE v_ab (a integer, b integer);
CREATE TABLE

# SET session_variables_ambiguity_warning = 1;
SET

# sELECT v_ab.a FROM v_ab;
WARNING:  42702: session variable "v_ab.a" is shadowed
LINE 1: select v_ab.a from v_ab;
               ^
DETAIL:  Session variables can be shadowed by columns, routine's variables and routine's arguments with the same name.
 a
---
(0 rows)

# CREATE SCHEMA v_ab;
CREATE SCHEMA

# CREATE VARIABLE v_ab.a AS integer;
CREATE VARIABLE

# SELECT v_ab.a FROM v_ab;
 a
---
(0 rows)


Note that a bit later in transformColumnRef(), not_unique is checked only if
the returned varid is valid, which isn't correct as InvalidOid is currently
returned if not_unique is set.

I think that the error should be raised in IdentifyVariable rather than having
every caller check it.  I'm not sure how to perfectly handle the
session_variables_ambiguity_warning though.  Maybe make not_unique optional,
and error out if not_unique is null.  If not null, set it as necessary and
return one of the oid.  The only use would be for shadowing detection, and in
that case it won't be possible to check if a warning can be avoided as it would
be if no amgibuity is found, but that's probably ok.

Or maybe instead LookupVariable should have an extra argument to only match
variable with a composite type if caller asks to.  This would avoid scenarios
like:

CREATE VARIABLE myvar AS int;
SELECT myvar.blabla;
ERROR:  42809: type integer is not composite

Is that really ok to match a variable here rather than complaining about a
missing FROM-clause?

+       indirection_start = list_length(names) - (attrname ? 1 : 0);
+       indirection = list_copy_tail(stmt->target, indirection_start);
+ [...]
+               if (indirection != NULL)
+               {
+                       bool            targetIsArray;
+                       char       *targetName;
+
+                       targetName = get_session_variable_name(varid);
+                       targetIsArray = OidIsValid(get_element_type(typid));
+
+                       pstate->p_hasSessionVariables = true;
+
+                       coerced_expr = (Expr *)
+                               transformAssignmentIndirection(pstate,
+                                                                                          (Node *) param,
+                                                                                          targetName,
+                                                                                          targetIsArray,
+                                                                                          typid,
+                                                                                          typmod,
+                                                                                          InvalidOid,
+                                                                                          indirection,
+                                                                                          list_head(indirection),
+                                                                                          (Node *) expr,
+                                                                                          COERCION_PLPGSQL,
+                                                                                          stmt->location);
+               }

I'm not sure why you use this approach rather than just having something like
"ListCell *indirection_head", set it to a non-NULL value when needed, and use
that (with names) instead.  Note that it's also not correct to compare a List
to NULL, use NIL instead.

- expr_kind_allows_session_variables

Even if that's a bit annoying, I think it's better to explicitly put all values
there rather than having a default clause.

For instance, EXPR_KIND_CYCLE_MARK is currently allowing session variables,
which doesn't look ok.  It's probably just an error from when the patchset was
rebased, but this probably wouldn't happen if you get an error for an unmatched
value if you add a new expr kind (which doesn't happen that often).

+ fixed issue reported by Dmitry Dolgov

[1] https://www.postgresql.org/message-id/CAFj8pRB2+pVBFsidS-AzhHdZid40OTUspWfXS0vgahHmaWosZQ@mail.gmail.com
Attachment

pgsql-hackers by date:

Previous
From: "Jonathan S. Katz"
Date:
Subject: User functions for building SCRAM secrets
Next
From: Ilya Gladyshev
Date:
Subject: Re: Segfault on logical replication to partitioned table with foreign children