Re: Schema variables - new implementation for Postgres 15 - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Schema variables - new implementation for Postgres 15
Date
Msg-id CAFj8pRAuxFx36TOs+k-9_rYR+gyBhvsdrGG1odahj8F1iM6R7Q@mail.gmail.com
Whole thread Raw
In response to Re: Schema variables - new implementation for Postgres 15  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Schema variables - new implementation for Postgres 15
List pgsql-hackers


čt 3. 3. 2022 v 8:06 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
Hi,

On Wed, Mar 02, 2022 at 06:03:06AM +0100, Pavel Stehule wrote:
>
> I lost commit with this change. I am sending updated patch.

Thanks a lot Pavel!

I did a more thorough review of the patch.  I'm attaching a diff (in .txt
extension) for comment improvement suggestions.  I may have misunderstood
things so feel free to discard some of it.  I will mention the comment I didn't
understand in this mail.

First, I spotted some problem in the invalidation logic.

+ * Assign sinval mark to session variable. This mark probably
+ * signalized, so the session variable was dropped. But this
+ * should be rechecked later against system catalog.
+ */
+static void
+pg_variable_cache_callback(Datum arg, int cacheid, uint32 hashvalue)

You mention that hashvalue can only be zero for commands that can't
affect session variables (like VACUUM or ANALYZE), but that's not true.  It can
also happen in case of sinval queue overflow (see InvalidateSystemCaches()).
So in that case we should trigger a full recheck, with some heuristics on how
to detect that a cached variable is still valid.  Unfortunately the oid can
wraparound so some other check is needed to make it safe.

Also, even if we get a non-zero hashvalue in the inval callback, we can't
assume that there weren't any collision in the hash.  So the additional check
should be used there too.

We had a long off-line discussion about this with Pavel yesterday on what
heuristic to use there.  Unlike other caches where discarding an entry when it
shouldn't have been is not really problematic, the cache here contains the real
variable value so we can't discard it unless the variable was really dropped.
It should be possible to make it work, so I will let Pavel comment on which
approach he wants to use and what the drawbacks are.  I guess that this will be
the most critical part of this patch to decide whether the approach is
acceptable or not.

I thought more about this issue, and I think it is solvable, although differently (little bit than we talked about). The check based on oid and xmin should not be enough for consistency check, because xmin can be quickly lost when a user executes VACUUM FREEZE or VACUUM FULL.

The consistency of a stored session variable should be checked always when the session variable is used (for reading) the first time in a transaction.  When value is created and used in the same transaction, then the consistency check is not necessary. When consistency check fails, then stored value is marked as broken and cannot be read. Can be overwritten.

We can believe that session variables based on buildin types are always consistent.

Composite types should be checked recursively from top to buildin types. It means we should hold tupledescs for all nested composites. Initially the check can be very strict.

Last case is consistency check for types owned by some extensions. For this case we can accept the version number of related extensions. Without change we can believe so the stored binary data are consistent.



The rest is only minor stylistic comments.

Using -DRAW_EXPRESSION_COVERAGE_TEST I see that T_LetStmt is missing in
raw_expression_tree_walker.

fixed
 

ALTER and DROP both suggest "IMMUTABLE VARIABLE" as valid completion, while
it should only be usable in the CREATE [ IMMUTABLE ] VARIABLE form.

fixed
 

+initVariable(Variable *var, Oid varid, bool fast_only)
+{
+   var->collation = varform->varcollation;
+   var->eoxaction = varform->vareoxaction;
+   var->is_not_null = varform->varisnotnull;
+   var->is_immutable = varform->varisimmutable;

nit: eoxaction is defined after is_not_null and is_immutable, it would be
better to keep the initialization order consistent (same in VariableCreate).

fixed
 

+   values[Anum_pg_variable_varcollation - 1] = ObjectIdGetDatum((char) varCollation);
+   values[Anum_pg_variable_vareoxaction - 1] = CharGetDatum(eoxaction);

seems like the char cast is on the wrong variable?

fixed
 

+ * [...] We have to hold two separate action lists:
+ * one for dropping the session variable from system catalog, and
+ * another one for resetting its value. Both are necessary, since
+ * dropping a session variable also needs to enforce a reset of
+ * the value.

I don't fully understand that comment.  Maybe you meant that the opposite isn't
true, ie. highlight that a reset should *not* drop the variable thus two lists?

I tried to describe the issue in the comment. When I have just one action list, then I had a problem with impossibility to extend this list about reset action enforced by drop variable when I iterated over this list in xact time. This issue was solved by using two lists - one for drop and second for reset and recheck.
 

+typedef enum SVariableXActAction
+{
+   SVAR_ON_COMMIT_DROP,        /* used for ON COMMIT DROP */
+   SVAR_ON_COMMIT_RESET,       /* used for DROP VARIABLE */
+   SVAR_RESET,                 /* used for ON TRANSACTION END RESET */
+   SVAR_RECHECK                /* verify if session variable still exists */
+} SVariableXActAction;
+
+typedef struct SVariableXActActionItem
+{
+   Oid         varid;          /* varid of session variable */
+   SVariableXActAction action; /* reset or drop */

the stored action isn't simply "reset or drop", even though the resulting
action will be a reset or a drop (or a no-op) right?  Since it's storing a enum
define just before, I'd just drop the comment on action, and maybe specify that
SVAR_RECHECK will do appropriate cleanup if the session variable doesn't exist.

done
 

+ * Release the variable defined by varid from sessionvars
+ * hashtab.
+ */
+static void
+free_session_variable(SVariable svar)

The function name is a bit confusing given the previous function.  Maybe this
one should be called forget_session_variable() instead, or something like that?

I think the function comment should also mention that caller is responsible for
making sure that the sessionvars htab exists before calling it, for extra
clarity, or just add an assert for that.

+static void
+free_session_variable_varid(Oid varid)

Similary, maybe renaming this function forget_session_variable_by_id()?

I don't like "forget" too much - maybe "remove" can be used instead - like HASH_REMOVE


+static void
+create_sessionvars_hashtable(void)
+{
+   HASHCTL     ctl;
+
+   /* set callbacks */
+   if (first_time)
+   {
+       /* Read sinval messages */
+       CacheRegisterSyscacheCallback(VARIABLEOID,
+                                     pg_variable_cache_callback,
+                                     (Datum) 0);
+
+       first_time = false;
+   }
+
+   /* needs its own long lived memory context */
+   if (SVariableMemoryContext == NULL)
+   {
+       SVariableMemoryContext =
+           AllocSetContextCreate(TopMemoryContext,
+                                 "session variables",
+                                 ALLOCSET_START_SMALL_SIZES);
+   }

As far as I can see the SVariableMemoryContext can be reset but never set to
NULL, so I think the initialization can be done in the first_time case, and
otherwise asserted that it's not NULL.

done
 

+   if (!isnull && svar->typid != typid)
+       ereport(ERROR,
+               (errcode(ERRCODE_DATATYPE_MISMATCH),
+                errmsg("type \"%s\" of assigned value is different than type \"%s\" of session variable \"%s.%

Why testing isnull?  I don't think it's ok to allow NULL::text in an int
variable for instance.  This isn't valid in other context (like inserting in a
table)

changed
 

+    * result of default expression always). Don't do this check, when variable
+    * is initialized.
+    */
+   if (!init_mode &&

I think the last part of the comment is a bit misleading.  Maybe "when variable
is being initialized" (and similary same for the function comment).

changed
 

+ * We try not to break the previous value, if something is wrong.
+ *
+ * As side efect this function acquires AccessShareLock on
+ * related session variable until commit.
+ */
+void
+SetSessionVariable(Oid varid, Datum value, bool isNull, Oid typid)

I don't understand what you mean by "We try not to break the previous value, if
something is wrong".

That means, so SetSessionVariable sets a new value or should preserve the original value.
 

+   /* Initialize svar when not initialized or when stored value is null */
+   if (!found)
+   {
+       Variable    var;
+
+       /* don't need defexpr and acl here */
+       initVariable(&var, varid, true);
+       init_session_variable(svar, &var);
+   }
+
+   set_session_variable(svar, value, isNull, typid, false);

Shouldn't the comment be on the set_session_variable() vall rather than on the
!found block?

This comment is obsolete,

removed
 

+ * Returns the value of the session variable specified by varid. Check correct
+ * result type. Optionally the result can be copied.
+ */
+Datum
+GetSessionVariable(Oid varid, bool *isNull, Oid expected_typid, bool copy)

All callers use copy == true, couldn't we get rid of it and say it returns a
copy of the value if any?

I replaced it with the new function CopySessionVariableWithTypeCheck. Probably in almost all situations, the copy will be required. And if not, we can enhance the API later.


+ * Create new ON_COMMIT_DROP xact action. We have to drop
+ * ON COMMIT DROP variable, although this variable should not
+ * be used. So we need to register this action in CREATE VARIABLE
+ * time.

I don't understand this comment.

changed
 

+AtPreEOXact_SessionVariable_on_xact_actions(bool isCommit)
+{
+   ListCell   *l;
+
+   foreach(l, xact_drop_actions)
+   {
+       SVariableXActActionItem *xact_ai =
+                           (SVariableXActActionItem *) lfirst(l);
+
+       /* Iterate only over non dropped entries */
+       if (xact_ai->deleting_subid == InvalidSubTransactionId)
+       {
+           Assert(xact_ai->action == SVAR_ON_COMMIT_DROP);

The assert sould probably be in the block above.

moved
 

+            * We want to reset session variable (release it from
+            * local memory) when RESET is required or when session
+            * variable was removed explicitly (DROP VARIABLE) or
+            * implicitly (ON COMMIT DROP). Explicit releasing should
+            * be done only if the transaction is commited.
+            */
+           if ((xact_ai->action == SVAR_RESET) ||
+               (xact_ai->action == SVAR_ON_COMMIT_RESET &&
+                xact_ai->deleting_subid == InvalidSubTransactionId &&
+                isCommit))
+               free_session_variable_varid(xact_ai->varid);

This chunk is a bit hard to follow.  Also, for SVAR_RESET wouldn't it be better
to only make the svar invalid and keep it in the htab?  If so, this could be
split in two different branches which would be easier to follow.

After some experiments, I think it is more simple to remove the svar entry in htab. It reduces the state space, and variable initialization once per transaction is not expensive. The problem is in necessary xact action registration and now I can call it simply just from init_session_variable. I updated  comments there

 

+       if (!isCommit &&
+           xact_ai->creating_subid == mySubid &&
+           xact_ai->action != SVAR_RESET &&
+           xact_ai->action != SVAR_RECHECK)
+       {
+           /* cur_item must be removed */
+           xact_reset_actions = foreach_delete_current(xact_reset_actions, cur_item);
+           pfree(xact_ai);

I think that be definition only the SVAR_ON_COMMIT_DROP (cleaning entry for a
dropped session variable) will ever need to be removed there, so we should
check for that instead of not being something else?


fixed
 

+   /*
+    * Prepare session variables, if not prepared in queryDesc
+    */
+   if (queryDesc->num_session_variables > 0)
 
I don't understand that comment.

I changed this comment

 

+static void
+svariableStartupReceiver(DestReceiver *self, int operation, TupleDesc typeinfo)
+{
+   svariableState *myState = (svariableState *) self;
+   int         natts = typeinfo->natts;
+   int         outcols = 0;
+   int         i;
+
+   for (i = 0; i < natts; i++)
+   {
+       Form_pg_attribute attr = TupleDescAttr(typeinfo, i);
+
+       if (attr->attisdropped)
+           continue;
+
+       if (++outcols > 1)
+           elog(ERROR, "svariable DestReceiver can take only one attribute");
+
+       myState->typid = attr->atttypid;
+       myState->typmod = attr->atttypmod;
+       myState->typlen = attr->attlen;
+       myState->slot_offset = i;
+   }
+
+   myState->rows = 0;
+}

Maybe add an initial Assert to make sure that caller did call
SetVariableDestReceiverParams(), and final check that one attribute was found?

done
 

@@ -1794,15 +1840,39 @@ fix_expr_common(PlannerInfo *root, Node *node)
                g->cols = cols;
        }
    }
+   else if (IsA(node, Param))
+   {
+       Param *p = (Param *) node;
+
+       if (p->paramkind == PARAM_VARIABLE)
+       {
+           PlanInvalItem *inval_item = makeNode(PlanInvalItem);
+
+           /* paramid is still session variable id */
+           inval_item->cacheId = VARIABLEOID;
+           inval_item->hashValue = GetSysCacheHashValue1(VARIABLEOID,
+                                                         ObjectIdGetDatum(p->paramvarid));
+
+           /* Append this variable to global, register dependency */
+           root->glob->invalItems = lappend(root->glob->invalItems,
+                                            inval_item);
+       }
+   }

I didn't see any test covering invalidation of cached plan using session
variables.  Could you add some?  While at it, maybe use different values on the
sesssion_variable.sql tests rather than 100 in many places, so it's easier to
identifier which case broke in case of problem.

I created new tests there

+static Node *
+makeParamSessionVariable(ParseState *pstate,
+                       Oid varid, Oid typid, int32 typmod, Oid collid,
+                       char *attrname, int location)
+{
[...]
+   /*
+    * There are two ways to access session variables - direct, used by simple
+    * plpgsql expressions, where it is not necessary to emulate stability.
+    * And Buffered access, which is used everywhere else. We should ensure
+    * stable values, and because session variables are global, then we should
+    * work with copied values instead of directly accessing variables. For
+    * direct access, the varid is best. For buffered access, we need
+    * to assign an index to the buffer - later, when we know what variables are
+    * used. Now, we just remember, so we use session variables.

I don't understand the last part, starting with "For buffered access, we
need...".  Also, the beginning of the comment seems like something more general
and may be moved somewhere, maybe at the beginning of sessionvariable.c?

moved to sessionvariable.c and modified.


+    * stmt->query is SelectStmt node. An tranformation of
+    * this node doesn't support SetToDefault node. Instead injecting
+    * of transformSelectStmt or parse state, we can directly
+    * transform target list here if holds SetToDefault node.
+    */
+   if (stmt->set_default)

I don't understand this comment.  Especially since the next
transformTargetList() will emit SetToDefault node that will be handled later in
that function and then in RewriteQuery.

This is messy, sorry. SelectStmt doesn't support SetToDefault. LetStmt supports it. I reworded.


+   /*
+    * rewrite SetToDefaults needs varid in Query structure
+    */
+   query->resultVariable = varid;

I also don't understand that comment.  Is is always set just in case there's a
SetToDefault, or something else?

This comment is not complete. This value is required by QueryRewriter (for replacement of the SetToDefault node by defexpr). It is required for acquiring locks, and for execution.

I rewrote this comment

 

+   /* translate paramvarid to session variable name */
+   if (param->paramkind == PARAM_VARIABLE)
+   {
+       appendStringInfo(context->buf, "%s",
+                        generate_session_variable_name(param->paramvarid));
+       return;
+   }

A bit more work seems to be needed for deparsing session variables:

# create variable myvar text;
CREATE VARIABLE

# create view myview as select myvar;
CREATE VIEW

# \d+ myview
                          View "public.myview"
 Column | Type | Collation | Nullable | Default | Storage  | Description
--------+------+-----------+----------+---------+----------+-------------
 myvar  | text |           |          |         | extended |
View definition:
 SELECT myvar AS myvar;

There shouldn't be an explicit alias I think.

this issue was described in other thread

I am sending rebased, updated patches. The type check is not implemented yet.

Regards

Pavel

 
Attachment

pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: Replica Identity check of partition table on subscriber
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Using PQexecQuery in pipeline mode produces unexpected Close messages