Re: Schema variables - new implementation for Postgres 15 - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Schema variables - new implementation for Postgres 15
Date
Msg-id 20220119080117.4c2fjzs5p7vtdxhk@jrouhaud
Whole thread Raw
In response to Re: Schema variables - new implementation for Postgres 15  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: Schema variables - new implementation for Postgres 15
Re: Schema variables - new implementation for Postgres 15
List pgsql-hackers
Hi,

On Tue, Jan 18, 2022 at 10:01:01PM +0100, Pavel Stehule wrote:
> pá 14. 1. 2022 v 3:44 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
> 
> >
> > =# let myvariable = 'AA';
> > LET
> >
> > =# select 'AA' collate "en-x-icu" < myvariable;
> >  ?column?
> > ----------
> >  f
> > (1 row)
> >
> > =# select 'AA' collate "en-x-icu" < myvariable collate mycollation;
> > ERROR:  42P21: collation mismatch between explicit collations "en-x-icu"
> > and "mycollation"
> > LINE 1: select 'AA' collate "en-x-icu" < myvariable collate mycollat...
> >
> 
> What do you expect?  I don't understand collating well, but it looks
> correct. Minimally the tables have the same behavior.

Indeed, I actually didn't know that such object's collation were implicit and
could be overloaded without a problem as long as there's no conflict between
all the explicit collations.  So I agree that the current behavior is ok,
including a correct handling for wanted conflicts:

=# create variable var1 text collate "fr-x-icu";
CREATE VARIABLE

=# create variable var2 text collate "en-x-icu";
CREATE VARIABLE

=# let var1 = 'hoho';
LET

=# let var2 = 'hoho';
LET

=# select var1 < var2;
ERROR:  42P22: could not determine which collation to use for string comparison
HINT:  Use the COLLATE clause to set the collation explicitly.

> Please, can you check the attached patches?

All the issue I mentioned are fixed, thanks!


I see a few problems with the other new features added though.  The new
session_variables_ambiguity_warning GUC is called even in contexts where it
shouldn't apply.  For instance:

=# set session_variables_ambiguity_warning = 1;
SET

=# create variable v text;
CREATE VARIABLE

=# DO $$
DECLARE v text;
BEGIN
v := 'test';
RAISE NOTICE 'v: %', v;
END;
$$ LANGUAGE plpgsql;
WARNING:  42702: session variable "v" is shadowed by column
LINE 1: v := 'test'
        ^
DETAIL:  The identifier can be column reference or session variable reference.
HINT:  The column reference is preferred against session variable reference.
QUERY:  v := 'test'

But this "v := 'test'" shouldn't be a substitute for a LET, and it indeed
doesn't work:

=# DO $$
BEGIN
v := 'test';
RAISE NOTICE 'v: %', v;
END;
$$ LANGUAGE plpgsql;
ERROR:  42601: "v" is not a known variable
LINE 3: v := 'test';

But the RAISE NOTICE does see the session variable (which should be the correct
behavior I think), so the warning should have been raised for this instruction
(and in that case the message is incorrect, as it's not shadowing a column).

Also, the pg_dump handling emits a COLLATION option for session variables even
for default collation, while it should only emit it if the collation is not the
type's default collation.  As a reference, for attributes the SQL used is:

                         "CASE WHEN a.attcollation <> t.typcollation "
                         "THEN a.attcollation ELSE 0 END AS attcollation,\n"

Also, should \dV or \dV+ show the collation?

And a few comments on the new chunks in this version of the patch (I didn't
look in detail at the whole patch yet):

+   <para>
+    The session variables can be overshadowed by columns in an query. When query
+    holds identifier or qualified identifier that can be used as session variable
+    identifier and as column identifier too, then it is used as column identifier
+    every time. This situation can be logged by enabling configuration
+    parameter <xref linkend="guc-session-variables-ambiguity-warning"/>.
+   </para>

Is "overshadowed" correct?  The rest of the patch only says "shadow(ed)".

While at it, here's some proposition to improve the phrasing:

+  The session variables can be shadowed by column references in a query. When a
+  query contains identifiers or qualified identifiers that could be used as both
+  a session variable identifiers and as column identifier, then the column
+  identifier is preferred every time. Warnings can be emitted when this situation
+  happens by enabling configuration parameter <xref
+  linkend="guc-session-variables-ambiguity-warning"/>.

Similarly, the next documentation could be rephrased to:

+ When on, a warning is raised when any identifier in a query could be used as both
+ a column identifier or a session variable identifier.
+ The default is <literal>off</literal>.


Few other nitpicking:

+            * If we really detect collision of column and variable identifier,
+            * then we prefer column, because we don't want to allow to break
+            * an existing valid queries by new variable.

s/an existing/existing

+-- it is ambigonuous, but columns are preferred

ambiguous?


@@ -369,6 +367,19 @@ VariableCreate(const char *varName,
    /* dependency on extension */
    recordDependencyOnCurrentExtension(&myself, false);

+   /*
+    * Normal dependency from a domain to its collation.  We know the default
+    * collation is pinned, so don't bother recording it.
+    */
+   if (OidIsValid(varCollation) &&
+       varCollation != DEFAULT_COLLATION_OID)

The comment mentions domains rather than session variables.

And for the initial patch, while looking around I found this comment on
fix_alternative_subplan():

@@ -1866,7 +1969,9 @@ fix_alternative_subplan(PlannerInfo *root, AlternativeSubPlan *asplan,
  * replacing Aggref nodes that should be replaced by initplan output Params,
  * choosing the best implementation for AlternativeSubPlans,
  * looking up operator opcode info for OpExpr and related nodes,
- * and adding OIDs from regclass Const nodes into root->glob->relationOids.
+ * and adding OIDs from regclass Const nodes into root->glob->relationOids,
+ * and replacing PARAM_VARIABLE paramid, that is the oid of the session variable
+ * to offset the array by query used session variables. ???

I don't really understand the comment, and the "???" looks a bit suspicious.
I'm assuming it's a reference to this new behavior in fix_param_node():

  * fix_param_node
  *     Do set_plan_references processing on a Param
+ *     Collect session variables list and replace variable oid by
+ *     index to collected list.
  *
  * If it's a PARAM_MULTIEXPR, replace it with the appropriate Param from
  * root->multiexpr_params; otherwise no change is needed.
  * Just for paranoia's sake, we make a copy of the node in either case.
+ *
+ * If it's a PARAM_VARIABLE, then we should to calculate paramid.

Some improvement on the comments would be welcome there, probably including
some mention to the "glob->sessionVariables" collected list?



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: a misbehavior of partition row movement (?)
Next
From: Greg Nancarrow
Date:
Subject: Re: row filtering for logical replication