Thread: Re: proposal: schema variables

Re: proposal: schema variables

From
Laurenz Albe
Date:
On Thu, 2024-08-15 at 07:55 +0200, Pavel Stehule wrote:
> út 30. 7. 2024 v 21:46 odesílatel Laurenz Albe <laurenz.albe@cybertec.at> napsal:
> > - A general reminder: single line comments should start with a lower case
> >   letter and have to period in the end:
>
> Should it be "have not to period in the end" ?

I made a typo.  I mean "have *no* period in the end".

> I fixed almost all without parts related to psql and executor - there almost all
> current comments break the mentioned rule. So I use the style used in these files.
> I can fix it (if you like it) - or it can be fixed generally in a separated patch set.

Thanks.  I also noticed that a lot of existing comments break that rule,
so I think that it is fine if you stick with the way that the surrounding
code does it.

> > - Variable as parameter:
> >
> >   CREATE VARIABLE var AS date;
> >   LET var = current_date;
> >   PREPARE stmt(date) AS SELECT $1;
> >   EXECUTE stmt(var);
> >   ERROR:  paramid of PARAM_VARIABLE param is out of range
> >
> >   Is that working as intended?  I don't understand the error message.
>
> fixed in 0002 patch (variables cannot be used as EXECUTE argument) and 0014 (enable usage variables as an argument of
EXECUTE)

Thanks.

> >   > --- a/src/backend/catalog/namespace.c
> >   > +++ b/src/backend/catalog/namespace.c
> >   > [...]
> >   > +/*
> >   > + * IdentifyVariable - try to find variable identified by list of names.
> >   > [...]
> >
> >   Perhaps part of the reason why this function is so complicated is that
> >   you support things like this:
> >
> >     CREATE SCHEMA sch;
> >     CREATE TYPE cp AS (a integer, b integer);
> >     CREATE VARIABLE sch.v AS cp;
> >     LET sch.v = (1, 2);
> >     SELECT sch.v.b;
> >      b
> >     ═══
> >      2
> >     (1 row)
> >
> >     test=# SELECT test.sch.v.b;
> >      b
> >     ═══
> >      2
> >     (1 row)
> >
> >   We don't support that for tables:
> >
> >     CREATE TABLE tab (c cp);
> >     INSERT INTO tab VALUES (ROW(42, 43));
> >     SELECT tab.c.b FROM tab;
> >     ERROR:  cross-database references are not implemented: tab.c.b
> >
> >   You have to use extra parentheses:
> >
> >     SELECT (tab.c).b FROM tab;
> >      b 
> >     ════
> >      43
> >     (1 row)
> >
> >   I'd say that this should be the same for session variables.
> >   What do you think?
>
> I prefer the current state, but I don't have a very strong opinion about it.
> It can save 115 lines of almost trivial code, but I think the user experience
> can be much worse.  Using composite types in tables is not too common a
> pattern (and when I read some recommendations for Oracle, it is an antipattern),
> but usage of composite variables is common (it can hold a row). Moreover,
> we talked long about possible identifier's collisions, and the pattern
> schema.variable is very good protection against possible collisions.
> Probably the pattern catalog.schema.variable.field is not too interesting
> but the support has 50 lines.
>
> More, the plpgsql supports pattern label.variable.field, so can be little bit
> unfriendly if session variables doesn't support "similar" pattern

I see your point, and I'm fine with leaving it as it is.

>
>
> > - src/backend/commands/session_variable.c
> >
> >   There is one comment that confuses me and that I did not edit:
> >
> >   > +Datum
> >   > +GetSessionVariable(Oid varid, bool *isNull, Oid *typid)
> >   > +{
> >   > +   SVariable   svar;
> >   > +
> >   > +   svar = get_session_variable(varid);
> >   > +
> >   > +   /*
> >   > +    * Although svar is freshly validated in this point, the svar->is_valid can
> >   > +    * be false, due possible accepting invalidation message inside domain
> >   > +    * check. Now, the validation is done after lock, that can also accept
> >   > +    * invalidation message, so validation should be trustful.
> >   > +    *
> >   > +    * For now, we don't need to repeat validation. Only svar should be valid
> >   > +    * pointer.
> >   > +    */
>
> This comment is related to assertions. Before I had there `Assert(svar->is_valid)`,
> because I expected it. But it was not always true. And although it is true,
> we don't need to validate a variable, because at this moment, the variable
> should be locked, and then we can return content safely.

I guess my main problem is the word "trustful".  I don't recognize that word.
Perhaps you can reword the comment along the lines of your above explanation.

>
> > - src/backend/executor/execMain.c
> >
> >   > @@ -196,6 +198,51 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
> >   >     Assert(queryDesc->sourceText != NULL);
> >   >     estate->es_sourceText = queryDesc->sourceText;
> >   >
> >   > +   /*
> >   > +    * The executor doesn't work with session variables directly. Values of
> >   > +    * related session variables are copied to dedicated array, and this array
> >   > +    * is passed to executor.
> >   > +    */
> >
> >   Why?  Is that a performance measure?  The comment should explain that.
>
> Session variables are volatile internally - some function that uses LET statements
> can be called more times inside a query. This behavior is not consistent with
> plpgsql's variables or external parameters. And if we want to support parallel
> queries, then volatile session variables can be pretty messy (and unusable).
> If we want the same behaviour for queries with or without parallel support,
> then the session variables should be "stable" (like stable functions).
> Simple implementation is using "snapshot" of values of used session variables
> when query is started. This "snapshot" is immutable, so we don't need to make
> a copy more times.
>
> I changed this comment

Thanks.

> > - parallel safety
> >
> >   > --- a/src/backend/optimizer/util/clauses.c
> >   > +++ b/src/backend/optimizer/util/clauses.c
> >   > @@ -935,6 +936,13 @@ max_parallel_hazard_walker(Node *node, max_parallel_hazard_context *context)
> >   >         if (param->paramkind == PARAM_EXTERN)
> >   >             return false;
> >   >
> >   > +       /* We doesn't support passing session variables to workers */
> >   > +       if (param->paramkind == PARAM_VARIABLE)
> >   > +       {
> >   > +           if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> >   > +               return true;
> >   > +       }
> >
> >   Even if a later patch alleviates that restriction, this patch should document that
> >   session variables imply "parallel restricted".  I have added that to doc/src/sgml/parallel.sgml.
> >
>
> merged (and removed in 0015)

Thanks.


I hope I can do some more review at some point in the future...

I sincerely hope that this patch set gets merged at some point.
The big obstacle is that it is so large.  That's probably because it is pretty
feature-complete (but, as we have found, not totally free of bugs).

Judging from the amount of time I put into my review so far, I guess that this
patch set would keep a committer busy for several weeks.  Perhaps the only way to
get this done would be to make you a committer...

Yours,
Laurenz Albe



Re: proposal: schema variables

From
Pavel Stehule
Date:
Hi

út 27. 8. 2024 v 8:15 odesílatel Laurenz Albe <laurenz.albe@cybertec.at> napsal:
On Thu, 2024-08-15 at 07:55 +0200, Pavel Stehule wrote:
> út 30. 7. 2024 v 21:46 odesílatel Laurenz Albe <laurenz.albe@cybertec.at> napsal:
> > - A general reminder: single line comments should start with a lower case
> >   letter and have to period in the end:
>
> Should it be "have not to period in the end" ?

I made a typo.  I mean "have *no* period in the end".

> I fixed almost all without parts related to psql and executor - there almost all
> current comments break the mentioned rule. So I use the style used in these files.
> I can fix it (if you like it) - or it can be fixed generally in a separated patch set.

Thanks.  I also noticed that a lot of existing comments break that rule,
so I think that it is fine if you stick with the way that the surrounding
code does it.

> > - Variable as parameter:
> >
> >   CREATE VARIABLE var AS date;
> >   LET var = current_date;
> >   PREPARE stmt(date) AS SELECT $1;
> >   EXECUTE stmt(var);
> >   ERROR:  paramid of PARAM_VARIABLE param is out of range
> >
> >   Is that working as intended?  I don't understand the error message.
>
> fixed in 0002 patch (variables cannot be used as EXECUTE argument) and 0014 (enable usage variables as an argument of EXECUTE)

Thanks.

> >   > --- a/src/backend/catalog/namespace.c
> >   > +++ b/src/backend/catalog/namespace.c
> >   > [...]
> >   > +/*
> >   > + * IdentifyVariable - try to find variable identified by list of names.
> >   > [...]
> >
> >   Perhaps part of the reason why this function is so complicated is that
> >   you support things like this:
> >
> >     CREATE SCHEMA sch;
> >     CREATE TYPE cp AS (a integer, b integer);
> >     CREATE VARIABLE sch.v AS cp;
> >     LET sch.v = (1, 2);
> >     SELECT sch.v.b;
> >      b
> >     ═══
> >      2
> >     (1 row)
> >
> >     test=# SELECT test.sch.v.b;
> >      b
> >     ═══
> >      2
> >     (1 row)
> >
> >   We don't support that for tables:
> >
> >     CREATE TABLE tab (c cp);
> >     INSERT INTO tab VALUES (ROW(42, 43));
> >     SELECT tab.c.b FROM tab;
> >     ERROR:  cross-database references are not implemented: tab.c.b
> >
> >   You have to use extra parentheses:
> >
> >     SELECT (tab.c).b FROM tab;
> >      b 
> >     ════
> >      43
> >     (1 row)
> >
> >   I'd say that this should be the same for session variables.
> >   What do you think?
>
> I prefer the current state, but I don't have a very strong opinion about it.
> It can save 115 lines of almost trivial code, but I think the user experience
> can be much worse.  Using composite types in tables is not too common a
> pattern (and when I read some recommendations for Oracle, it is an antipattern),
> but usage of composite variables is common (it can hold a row). Moreover,
> we talked long about possible identifier's collisions, and the pattern
> schema.variable is very good protection against possible collisions.
> Probably the pattern catalog.schema.variable.field is not too interesting
> but the support has 50 lines.
>
> More, the plpgsql supports pattern label.variable.field, so can be little bit
> unfriendly if session variables doesn't support "similar" pattern

I see your point, and I'm fine with leaving it as it is.

>
>
> > - src/backend/commands/session_variable.c
> >
> >   There is one comment that confuses me and that I did not edit:
> >
> >   > +Datum
> >   > +GetSessionVariable(Oid varid, bool *isNull, Oid *typid)
> >   > +{
> >   > +   SVariable   svar;
> >   > +
> >   > +   svar = get_session_variable(varid);
> >   > +
> >   > +   /*
> >   > +    * Although svar is freshly validated in this point, the svar->is_valid can
> >   > +    * be false, due possible accepting invalidation message inside domain
> >   > +    * check. Now, the validation is done after lock, that can also accept
> >   > +    * invalidation message, so validation should be trustful.
> >   > +    *
> >   > +    * For now, we don't need to repeat validation. Only svar should be valid
> >   > +    * pointer.
> >   > +    */
>
> This comment is related to assertions. Before I had there `Assert(svar->is_valid)`,
> because I expected it. But it was not always true. And although it is true,
> we don't need to validate a variable, because at this moment, the variable
> should be locked, and then we can return content safely.

I guess my main problem is the word "trustful".  I don't recognize that word.
Perhaps you can reword the comment along the lines of your above explanation.

I'll try to change it
 

>
> > - src/backend/executor/execMain.c
> >
> >   > @@ -196,6 +198,51 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
> >   >     Assert(queryDesc->sourceText != NULL);
> >   >     estate->es_sourceText = queryDesc->sourceText;
> >   >
> >   > +   /*
> >   > +    * The executor doesn't work with session variables directly. Values of
> >   > +    * related session variables are copied to dedicated array, and this array
> >   > +    * is passed to executor.
> >   > +    */
> >
> >   Why?  Is that a performance measure?  The comment should explain that.
>
> Session variables are volatile internally - some function that uses LET statements
> can be called more times inside a query. This behavior is not consistent with
> plpgsql's variables or external parameters. And if we want to support parallel
> queries, then volatile session variables can be pretty messy (and unusable).
> If we want the same behaviour for queries with or without parallel support,
> then the session variables should be "stable" (like stable functions).
> Simple implementation is using "snapshot" of values of used session variables
> when query is started. This "snapshot" is immutable, so we don't need to make
> a copy more times.
>
> I changed this comment

Thanks.

> > - parallel safety
> >
> >   > --- a/src/backend/optimizer/util/clauses.c
> >   > +++ b/src/backend/optimizer/util/clauses.c
> >   > @@ -935,6 +936,13 @@ max_parallel_hazard_walker(Node *node, max_parallel_hazard_context *context)
> >   >         if (param->paramkind == PARAM_EXTERN)
> >   >             return false;
> >   >
> >   > +       /* We doesn't support passing session variables to workers */
> >   > +       if (param->paramkind == PARAM_VARIABLE)
> >   > +       {
> >   > +           if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> >   > +               return true;
> >   > +       }
> >
> >   Even if a later patch alleviates that restriction, this patch should document that
> >   session variables imply "parallel restricted".  I have added that to doc/src/sgml/parallel.sgml.
> >
>
> merged (and removed in 0015)

Thanks.


I hope I can do some more review at some point in the future...

I sincerely hope that this patch set gets merged at some point.
The big obstacle is that it is so large.  That's probably because it is pretty
feature-complete (but, as we have found, not totally free of bugs).

Any feature that builds its own system catalog cannot be short. The first two patches have 300KB, like all others and just basic support for reading and writing. All other features have 300KB. On second hand almost all code is simple, and there are no changes in critical parts of Postgres. The size and complexity of JSONPath is level higher. I can throw 200KB from another 300KB patch set which can be better for review, but it can be harder to maintain this patch set. I'll try it, and I'll send a reduced version.


Judging from the amount of time I put into my review so far, I guess that this
patch set would keep a committer busy for several weeks.  Perhaps the only way to
get this done would be to make you a committer...

:-)
 
Unfortunately, I am very bad at languages, I had a lot of problems in basic school just with Czech lang, Russian and English was more terrible) - so I very appreciate your work on this patch.

Thank you very much

Pavel


Yours,
Laurenz Albe

Re: proposal: schema variables

From
Laurenz Albe
Date:
time""On Tue, 2024-08-27 at 08:52 +0200, Pavel Stehule wrote:
> I can throw 200KB from another 300KB patch set which can be better for review, but it
> can be harder to maintain this patch set. I'll try it, and I'll send a reduced version.

That was not a criticism, and I think the way you split up the patch set right now
is as good as it probably gets.  Ideally, one could say something like "we need at least
patch 1 to 4, 5 and 6 are optional, but desirable, all others can easily be deferred
to a later time".

Yours,
Laurenz Albe



Re: proposal: schema variables

From
Pavel Stehule
Date:
Hi

út 27. 8. 2024 v 16:52 odesílatel Laurenz Albe <laurenz.albe@cybertec.at> napsal:
time""On Tue, 2024-08-27 at 08:52 +0200, Pavel Stehule wrote:
> I can throw 200KB from another 300KB patch set which can be better for review, but it
> can be harder to maintain this patch set. I'll try it, and I'll send a reduced version.

That was not a criticism, and I think the way you split up the patch set right now
is as good as it probably gets.  Ideally, one could say something like "we need at least
patch 1 to 4, 5 and 6 are optional, but desirable, all others can easily be deferred
to a later time".

It was mentioned here more times (I thought).

1..4 are fundamental
5..6 optional (6 are just tests)

others can be deferred (5-6 can be deferred too). Without support of temporary session variables, it  is not too probable to repeatedly CREATE, DROP the same variable in one session, so memory usage can be similar to today's workarounds, but against workarounds, session variables use types and access rights. Note - plpgsql doesn't try to delete compiled code from cache immediately too - so the behaviour implemented in 1..4 is "similar" to plpgsql func cache

14 .. allow you to use session variables as arguments of CALL or EXECUTE statement, and variables can be used in plpgsql simple expr.
15 .. variables don't block parallelism
16 .. the result of plpgsql simple expr can be assigned to session variables
17 .. expr with session variable can be inlined in SQL

14-17 are performance related

7 - was requested by some people - some functionality can be possibly replaced by plpgsql_check.
It has only 40 rows - it just raise warning or error when some conditions are true

Regards

Pavel

 

Yours,
Laurenz Albe