Thread: Re: proposal: schema variables

Re: proposal: schema variables

From
Dmitry Dolgov
Date:
Hi folks,

Thanks for continuing this work. As a side note, I would like to mention
how strange the situation in this CF item is. If I understand correctly,
there are two hackers threads discussing the same patch, with recent
patches posted in both of them. This is obviously confusing, e.g. the
main concern from another thread, about names clashing, wasn't even
mentioned in this one. Is it possible to reconcile development in one
thread?



Re: proposal: schema variables

From
Pavel Stehule
Date:


ne 10. 11. 2024 v 16:24 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
Hi folks,

Thanks for continuing this work. As a side note, I would like to mention
how strange the situation in this CF item is. If I understand correctly,
there are two hackers threads discussing the same patch, with recent
patches posted in both of them. This is obviously confusing, e.g. the
main concern from another thread, about names clashing, wasn't even
mentioned in this one. Is it possible to reconcile development in one
thread?

This is probably my error. I don't try to organize threads, just I'll try to reply in the thread where I got a question.

Regards

Pavel
 

Re: proposal: schema variables

From
Dmitry Dolgov
Date:
> On Sun, Nov 10, 2024 at 05:19:09PM GMT, Pavel Stehule wrote:
> ne 10. 11. 2024 v 16:24 odesílatel Dmitry Dolgov <9erthalion6@gmail.com>
> napsal:
>
> > Hi folks,
> >
> > Thanks for continuing this work. As a side note, I would like to mention
> > how strange the situation in this CF item is. If I understand correctly,
> > there are two hackers threads discussing the same patch, with recent
> > patches posted in both of them. This is obviously confusing, e.g. the
> > main concern from another thread, about names clashing, wasn't even
> > mentioned in this one. Is it possible to reconcile development in one
> > thread?
> >
>
> This is probably my error. I don't try to organize threads, just I'll try
> to reply in the thread where I got a question.

It's fine. I just would appreciate some clarity about which patch to
look at. To confirm, the series in this thread contains everything from
the other one, plus some improvements, right?



Re: proposal: schema variables

From
Pavel Stehule
Date:


ne 10. 11. 2024 v 17:19 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:


ne 10. 11. 2024 v 16:24 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
Hi folks,

Thanks for continuing this work. As a side note, I would like to mention
how strange the situation in this CF item is. If I understand correctly,
there are two hackers threads discussing the same patch, with recent
patches posted in both of them. This is obviously confusing, e.g. the
main concern from another thread, about names clashing, wasn't even
mentioned in this one. Is it possible to reconcile development in one
thread?

This is probably my error. I don't try to organize threads, just I'll try to reply in the thread where I got a question.

I thought a lot of time about better solutions for identifier collisions and I really don't think so there is some consistent user friendly syntax. Personally I think there is an easy already implemented solution - convention - just use a dedicated schema for variables and this schema should not be in the search path. Or use secondary convention - like using prefix "__" for session variables. Common convention is using "_" for PLpgSQL variables. I searched how this issue is solved in other databases, or in standard, and I found nothing special. The Oracle and SQL/PSM has a concept of visibility - the variables are not visible outside packages or modules, but Postgres has nothing similar. It can be emulated by a dedicated schema without inserting a search path, but it is less strong.

I think we can introduce an alternative syntax, that will not be user friendly or readable friendly, but it can be without collisions - or can decrease possible risks.

It is nothing new - SQL does it with old, "new" syntax of inner joins, or in Postgres we can
where salary < 40000 
or

where pg_catalog.int4lt(salary, 40000);

or some like we use for operators OPERATOR(schema.operatorname)

So introducing VARIABLE(schema.variablename) syntax as an alternative syntax for accessing variables I really like. I strongly prefer to use this as only alternative (secondary) syntax, because I don't think it is friendly syntax or writing friendly, but it is safe, and I can imagine tools that can replace generic syntax to this special, or that detects generic syntax and shows some warning. Then users can choose what they prefer. Two syntaxes - generic and special can be good enough for all - and this can be perfectly consistent with current Postgres.

Regards

Pavel


Regards

Pavel
 

Re: proposal: schema variables

From
Pavel Stehule
Date:


ne 10. 11. 2024 v 18:41 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Sun, Nov 10, 2024 at 05:19:09PM GMT, Pavel Stehule wrote:
> ne 10. 11. 2024 v 16:24 odesílatel Dmitry Dolgov <9erthalion6@gmail.com>
> napsal:
>
> > Hi folks,
> >
> > Thanks for continuing this work. As a side note, I would like to mention
> > how strange the situation in this CF item is. If I understand correctly,
> > there are two hackers threads discussing the same patch, with recent
> > patches posted in both of them. This is obviously confusing, e.g. the
> > main concern from another thread, about names clashing, wasn't even
> > mentioned in this one. Is it possible to reconcile development in one
> > thread?
> >
>
> This is probably my error. I don't try to organize threads, just I'll try
> to reply in the thread where I got a question.

It's fine. I just would appreciate some clarity about which patch to
look at. To confirm, the series in this thread contains everything from
the other one, plus some improvements, right?

I don't remember any change that can be visible for users in this thread.  Laurens does very precious code review (thank you again) - there are bigger changes in comments, and less changes in code - some parts of code are moved between patches, some lines were redundant and removed, some lines were artefacts of some git work and were cleaned. Some redundant tests were removed. There is no new code.
 
Regards

Pavel

Re: proposal: schema variables

From
Pavel Stehule
Date:


ne 10. 11. 2024 v 18:51 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:


ne 10. 11. 2024 v 17:19 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:


ne 10. 11. 2024 v 16:24 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
Hi folks,

Thanks for continuing this work. As a side note, I would like to mention
how strange the situation in this CF item is. If I understand correctly,
there are two hackers threads discussing the same patch, with recent
patches posted in both of them. This is obviously confusing, e.g. the
main concern from another thread, about names clashing, wasn't even
mentioned in this one. Is it possible to reconcile development in one
thread?

This is probably my error. I don't try to organize threads, just I'll try to reply in the thread where I got a question.

I thought a lot of time about better solutions for identifier collisions and I really don't think so there is some consistent user friendly syntax. Personally I think there is an easy already implemented solution - convention - just use a dedicated schema for variables and this schema should not be in the search path. Or use secondary convention - like using prefix "__" for session variables. Common convention is using "_" for PLpgSQL variables. I searched how this issue is solved in other databases, or in standard, and I found nothing special. The Oracle and SQL/PSM has a concept of visibility - the variables are not visible outside packages or modules, but Postgres has nothing similar. It can be emulated by a dedicated schema without inserting a search path, but it is less strong.

There can be more collisions in Oracle, because the functions without arguments don't need parentheses. Postgres is safer, because this syntax is not allowed.
 

I think we can introduce an alternative syntax, that will not be user friendly or readable friendly, but it can be without collisions - or can decrease possible risks.

It is nothing new - SQL does it with old, "new" syntax of inner joins, or in Postgres we can
where salary < 40000 
or

where pg_catalog.int4lt(salary, 40000);

or some like we use for operators OPERATOR(schema.operatorname)

So introducing VARIABLE(schema.variablename) syntax as an alternative syntax for accessing variables I really like. I strongly prefer to use this as only alternative (secondary) syntax, because I don't think it is friendly syntax or writing friendly, but it is safe, and I can imagine tools that can replace generic syntax to this special, or that detects generic syntax and shows some warning. Then users can choose what they prefer. Two syntaxes - generic and special can be good enough for all - and this can be perfectly consistent with current Postgres.

Regards

Pavel


Regards

Pavel
 

Re: proposal: schema variables

From
Dmitry Dolgov
Date:
> On Sun, Nov 10, 2024 at 06:51:40PM GMT, Pavel Stehule wrote:
> ne 10. 11. 2024 v 17:19 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
> napsal:
> I thought a lot of time about better solutions for identifier collisions
> and I really don't think so there is some consistent user friendly syntax.
> Personally I think there is an easy already implemented solution -
> convention - just use a dedicated schema for variables and this schema
> should not be in the search path. Or use secondary convention - like using
> prefix "__" for session variables. Common convention is using "_" for
> PLpgSQL variables. I searched how this issue is solved in other databases,
> or in standard, and I found nothing special. The Oracle and SQL/PSM has a
> concept of visibility - the variables are not visible outside packages or
> modules, but Postgres has nothing similar. It can be emulated by a
> dedicated schema without inserting a search path, but it is less strong.
>
> I think we can introduce an alternative syntax, that will not be user
> friendly or readable friendly, but it can be without collisions - or can
> decrease possible risks.
>
> It is nothing new - SQL does it with old, "new" syntax of inner joins, or
> in Postgres we can
>
> where salary < 40000
>
> or
>
> where pg_catalog.int4lt(salary, 40000);
>
>
> or some like we use for operators OPERATOR(*schema*.*operatorname*)
>
> So introducing VARIABLE(schema.variablename) syntax as an alternative
> syntax for accessing variables I really like. I strongly prefer to use this
> as only alternative (secondary) syntax, because I don't think it is
> friendly syntax or writing friendly, but it is safe, and I can imagine
> tools that can replace generic syntax to this special, or that detects
> generic syntax and shows some warning. Then users can choose what they
> prefer. Two syntaxes - generic and special can be good enough for all - and
> this can be perfectly consistent with current Postgres.

As far as I recall, last time this topic was discussed in hackers, two
options were proposed: the one with VARIABLE(name), what you mention
here; and another one with adding variables to the FROM clause. The
VARIABLE(...) syntax didn't get much negative feedback, so I guess why
not -- if you find it fitting, it would be interesting to see the
implementation.

I'm afraid it should not be just an alternative syntax, but the only one
allowed, because otherwise I don't see how scenarious like "drop a
column with the same name" could be avoided. As in the previous thread:

    -- we've got a variable b at the same time
    SELECT a, b FROM table1;

Then dropping the column b, but everything still works beause the
variable b got silently picked up. But if it would be required to say
VARIABLE(b), then all fine.

And to make sure we're on the same page, could you post couple of
examples from curretly existing tests in the patch, how are they going
to look like with this proposal?

About adding variables to the FROM clause. Looks like this option was
quite popular, and you've mentioned some technical challenges
implementing that. If you'd like to go with another approach, it would
be great to elaborate on that -- maybe even with a PoC, to make a
convincing point here.



Re: proposal: schema variables

From
Pavel Stehule
Date:


st 13. 11. 2024 v 17:35 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Sun, Nov 10, 2024 at 06:51:40PM GMT, Pavel Stehule wrote:
> ne 10. 11. 2024 v 17:19 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
> napsal:
> I thought a lot of time about better solutions for identifier collisions
> and I really don't think so there is some consistent user friendly syntax.
> Personally I think there is an easy already implemented solution -
> convention - just use a dedicated schema for variables and this schema
> should not be in the search path. Or use secondary convention - like using
> prefix "__" for session variables. Common convention is using "_" for
> PLpgSQL variables. I searched how this issue is solved in other databases,
> or in standard, and I found nothing special. The Oracle and SQL/PSM has a
> concept of visibility - the variables are not visible outside packages or
> modules, but Postgres has nothing similar. It can be emulated by a
> dedicated schema without inserting a search path, but it is less strong.
>
> I think we can introduce an alternative syntax, that will not be user
> friendly or readable friendly, but it can be without collisions - or can
> decrease possible risks.
>
> It is nothing new - SQL does it with old, "new" syntax of inner joins, or
> in Postgres we can
>
> where salary < 40000
>
> or
>
> where pg_catalog.int4lt(salary, 40000);
>
>
> or some like we use for operators OPERATOR(*schema*.*operatorname*)
>
> So introducing VARIABLE(schema.variablename) syntax as an alternative
> syntax for accessing variables I really like. I strongly prefer to use this
> as only alternative (secondary) syntax, because I don't think it is
> friendly syntax or writing friendly, but it is safe, and I can imagine
> tools that can replace generic syntax to this special, or that detects
> generic syntax and shows some warning. Then users can choose what they
> prefer. Two syntaxes - generic and special can be good enough for all - and
> this can be perfectly consistent with current Postgres.

As far as I recall, last time this topic was discussed in hackers, two
options were proposed: the one with VARIABLE(name), what you mention
here; and another one with adding variables to the FROM clause. The
VARIABLE(...) syntax didn't get much negative feedback, so I guess why
not -- if you find it fitting, it would be interesting to see the
implementation.

I'm afraid it should not be just an alternative syntax, but the only one
allowed, because otherwise I don't see how scenarious like "drop a
column with the same name" could be avoided. As in the previous thread:

    -- we've got a variable b at the same time
    SELECT a, b FROM table1;

I am sorry, but I am in very strong opposition  against this idea. Nobody did reply to my questions, that can change my opinion.

1. This introduces possible inconsistency between LET syntax and SELECT syntax.
What will be the syntax of LET?

LET var = var FROM var

PLpgSQL does something, and it is really strange, and source of some unwanted bugs. See https://commitfest.postgresql.org/50/5044/

With current design I can support

LET var = expr with wars

or

LET var = (QUERY with vars)

It is perfectly consistent. The expressions will be expressions.

2. I don't know of any product in the world that introduced the same requirement. So this syntax will be proprietary (SQL/PSM it really doesn't require) and shock for any users that know other databases. Proprietary syntax in this area far from syntaxes of other products is hell. Try to explain to users the working with OUT variables of Postgres's procedures and functions. And there is some deeper logic.

3. There is a simple solution - convention. Use your own schema like vars, and use session variables in this schema, When this schema will not be on the search path, then there is not a collision.
Variables living in schema. Nobody without CREATE right can create it. So it is safe. Or use prefix in __ for variables in the search path.

4. this requirement introduces syntax inconsistency between plpgsql variables and session variables - which breaks one goal of the patch - introducing some global variables for plpgsql (and for all PL).

5. Using more variables and FROM clauses decrease readability of FROM clause

SELECT v1, v2, a, b, c FROM t1, v1, v2, t2, ...

6. Usually composite variables don't want to unpack. When you should use FROM clause, then composite variables will be unpacked. Then all fields can be possibly in collision with all other column name

Example

CREATE TYPE t1 AS (id int, name varchar)
CREATE TABLE tab(id int, name varchar)
CREATE VARIABLE var1 AS t1;

SELECT id, name, foo(var1) FROM tab, var1;

Now I have a collision in columns id, name, and everywhere I need to use aliases. Without necessity to use var in FROM clause, I can just write

SELECT id, name, foo(var) FROM tab

and there is not any risk of collision




Then dropping the column b, but everything still works beause the
variable b got silently picked up. But if it would be required to say
VARIABLE(b), then all fine.

but same risk you have any time in plpgsql - all time. I don't remember any bug report related to this issue. 
 

And to make sure we're on the same page, could you post couple of
examples from curretly existing tests in the patch, how are they going
to look like with this proposal?

About adding variables to the FROM clause. Looks like this option was
quite popular, and you've mentioned some technical challenges
implementing that. If you'd like to go with another approach, it would
be great to elaborate on that -- maybe even with a PoC, to make a
convincing point here.

There is not any problem with implementation. I see the main problem with usability, and I really don't want to implement some like LET var = var FROM var; I am sorry
It fixes one issue, but it increases possible collisions - so the variables will be unusable.

Regards

Pavel

Re: proposal: schema variables

From
Pavel Stehule
Date:


st 13. 11. 2024 v 17:35 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Sun, Nov 10, 2024 at 06:51:40PM GMT, Pavel Stehule wrote:
> ne 10. 11. 2024 v 17:19 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
> napsal:
> I thought a lot of time about better solutions for identifier collisions
> and I really don't think so there is some consistent user friendly syntax.
> Personally I think there is an easy already implemented solution -
> convention - just use a dedicated schema for variables and this schema
> should not be in the search path. Or use secondary convention - like using
> prefix "__" for session variables. Common convention is using "_" for
> PLpgSQL variables. I searched how this issue is solved in other databases,
> or in standard, and I found nothing special. The Oracle and SQL/PSM has a
> concept of visibility - the variables are not visible outside packages or
> modules, but Postgres has nothing similar. It can be emulated by a
> dedicated schema without inserting a search path, but it is less strong.
>
> I think we can introduce an alternative syntax, that will not be user
> friendly or readable friendly, but it can be without collisions - or can
> decrease possible risks.
>
> It is nothing new - SQL does it with old, "new" syntax of inner joins, or
> in Postgres we can
>
> where salary < 40000
>
> or
>
> where pg_catalog.int4lt(salary, 40000);
>
>
> or some like we use for operators OPERATOR(*schema*.*operatorname*)
>
> So introducing VARIABLE(schema.variablename) syntax as an alternative
> syntax for accessing variables I really like. I strongly prefer to use this
> as only alternative (secondary) syntax, because I don't think it is
> friendly syntax or writing friendly, but it is safe, and I can imagine
> tools that can replace generic syntax to this special, or that detects
> generic syntax and shows some warning. Then users can choose what they
> prefer. Two syntaxes - generic and special can be good enough for all - and
> this can be perfectly consistent with current Postgres.

As far as I recall, last time this topic was discussed in hackers, two
options were proposed: the one with VARIABLE(name), what you mention
here; and another one with adding variables to the FROM clause. The
VARIABLE(...) syntax didn't get much negative feedback, so I guess why
not -- if you find it fitting, it would be interesting to see the
implementation.

I'm afraid it should not be just an alternative syntax, but the only one
allowed, because otherwise I don't see how scenarious like "drop a
column with the same name" could be avoided. As in the previous thread:

    -- we've got a variable b at the same time
    SELECT a, b FROM table1;

Then dropping the column b, but everything still works beause the
variable b got silently picked up. But if it would be required to say
VARIABLE(b), then all fine.

In this scenario you will get  a warning related to variable shadowing (before you drop a column).

I think this issue can be partially similar to creating two equally named tables in different schemas (both schemas are in search path). When you drop one table, the query will work, but the result is different. It is the same issue. The SQL has no concept of shadowing and on the base line it is not necessary. But when you integrate SQL with some procedural code then you should solve this issue (or accept). This issue is real, and it is in every procedural enhancement of SQL that I know with the same syntax.  On the other hand I doubt this is a real issue. The changes of system catalogue are tested before production - so probably you will read a warning about a shadowed variable, and probably you will get different results, because variable b has the same value for all rows, and probably will have different value than column b. I can imagine the necessity of disabling this warning on production systems. Shadowing by self is not an issue, probably, but it is a signal of code quality problems.

But this scenario is real, and then it is a question if the warning about shadowed variables should be only optional and if it can be disabled. Maybe not. Generally the shadowing is a strange concept - it is safeguard against serious issues, but it should not be used generally and everywhere the developer should rename the conflict identifiers.

Regards

Pavel


And to make sure we're on the same page, could you post couple of
examples from curretly existing tests in the patch, how are they going
to look like with this proposal?

About adding variables to the FROM clause. Looks like this option was
quite popular, and you've mentioned some technical challenges
implementing that. If you'd like to go with another approach, it would
be great to elaborate on that -- maybe even with a PoC, to make a
convincing point here.

Re: proposal: schema variables

From
Pavel Stehule
Date:


čt 14. 11. 2024 v 8:41 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:


st 13. 11. 2024 v 17:35 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Sun, Nov 10, 2024 at 06:51:40PM GMT, Pavel Stehule wrote:
> ne 10. 11. 2024 v 17:19 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
> napsal:
> I thought a lot of time about better solutions for identifier collisions
> and I really don't think so there is some consistent user friendly syntax.
> Personally I think there is an easy already implemented solution -
> convention - just use a dedicated schema for variables and this schema
> should not be in the search path. Or use secondary convention - like using
> prefix "__" for session variables. Common convention is using "_" for
> PLpgSQL variables. I searched how this issue is solved in other databases,
> or in standard, and I found nothing special. The Oracle and SQL/PSM has a
> concept of visibility - the variables are not visible outside packages or
> modules, but Postgres has nothing similar. It can be emulated by a
> dedicated schema without inserting a search path, but it is less strong.
>
> I think we can introduce an alternative syntax, that will not be user
> friendly or readable friendly, but it can be without collisions - or can
> decrease possible risks.
>
> It is nothing new - SQL does it with old, "new" syntax of inner joins, or
> in Postgres we can
>
> where salary < 40000
>
> or
>
> where pg_catalog.int4lt(salary, 40000);
>
>
> or some like we use for operators OPERATOR(*schema*.*operatorname*)
>
> So introducing VARIABLE(schema.variablename) syntax as an alternative
> syntax for accessing variables I really like. I strongly prefer to use this
> as only alternative (secondary) syntax, because I don't think it is
> friendly syntax or writing friendly, but it is safe, and I can imagine
> tools that can replace generic syntax to this special, or that detects
> generic syntax and shows some warning. Then users can choose what they
> prefer. Two syntaxes - generic and special can be good enough for all - and
> this can be perfectly consistent with current Postgres.

As far as I recall, last time this topic was discussed in hackers, two
options were proposed: the one with VARIABLE(name), what you mention
here; and another one with adding variables to the FROM clause. The
VARIABLE(...) syntax didn't get much negative feedback, so I guess why
not -- if you find it fitting, it would be interesting to see the
implementation.

I'm afraid it should not be just an alternative syntax, but the only one
allowed, because otherwise I don't see how scenarious like "drop a
column with the same name" could be avoided. As in the previous thread:

    -- we've got a variable b at the same time
    SELECT a, b FROM table1;

Then dropping the column b, but everything still works beause the
variable b got silently picked up. But if it would be required to say
VARIABLE(b), then all fine.

In this scenario you will get  a warning related to variable shadowing (before you drop a column).

I think this issue can be partially similar to creating two equally named tables in different schemas (both schemas are in search path). When you drop one table, the query will work, but the result is different. It is the same issue. The SQL has no concept of shadowing and on the base line it is not necessary. But when you integrate SQL with some procedural code then you should solve this issue (or accept). This issue is real, and it is in every procedural enhancement of SQL that I know with the same syntax.  On the other hand I doubt this is a real issue. The changes of system catalogue are tested before production - so probably you will read a warning about a shadowed variable, and probably you will get different results, because variable b has the same value for all rows, and probably will have different value than column b. I can imagine the necessity of disabling this warning on production systems. Shadowing by self is not an issue, probably, but it is a signal of code quality problems.

But this scenario is real, and then it is a question if the warning about shadowed variables should be only optional and if it can be disabled. Maybe not. Generally the shadowing is a strange concept - it is safeguard against serious issues, but it should not be used generally and everywhere the developer should rename the conflict identifiers.

There can be another example against usage of the FROM clause for variables. Because it solves just one special case, but others are not covered.

Theoretically, variables can have the same names as tables. The table overshadows the variable, so it can work. But when somebody drops the variable, then the query still can work. So requirement of usage variable in FROM clause protects us just against drop column, but not against dropping table. In Postgres the dropping table is possibly risky due search_path (that introduces shadowing concept) without introduction variables. There is a possibility of this issue, but how common is this issue?

Regards

Pavel

 

Regards

Pavel


And to make sure we're on the same page, could you post couple of
examples from curretly existing tests in the patch, how are they going
to look like with this proposal?

About adding variables to the FROM clause. Looks like this option was
quite popular, and you've mentioned some technical challenges
implementing that. If you'd like to go with another approach, it would
be great to elaborate on that -- maybe even with a PoC, to make a
convincing point here.

Re: proposal: schema variables

From
Pavel Stehule
Date:


st 13. 11. 2024 v 17:35 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Sun, Nov 10, 2024 at 06:51:40PM GMT, Pavel Stehule wrote:
> ne 10. 11. 2024 v 17:19 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
> napsal:
> I thought a lot of time about better solutions for identifier collisions
> and I really don't think so there is some consistent user friendly syntax.
> Personally I think there is an easy already implemented solution -
> convention - just use a dedicated schema for variables and this schema
> should not be in the search path. Or use secondary convention - like using
> prefix "__" for session variables. Common convention is using "_" for
> PLpgSQL variables. I searched how this issue is solved in other databases,
> or in standard, and I found nothing special. The Oracle and SQL/PSM has a
> concept of visibility - the variables are not visible outside packages or
> modules, but Postgres has nothing similar. It can be emulated by a
> dedicated schema without inserting a search path, but it is less strong.
>
> I think we can introduce an alternative syntax, that will not be user
> friendly or readable friendly, but it can be without collisions - or can
> decrease possible risks.
>
> It is nothing new - SQL does it with old, "new" syntax of inner joins, or
> in Postgres we can
>
> where salary < 40000
>
> or
>
> where pg_catalog.int4lt(salary, 40000);
>
>
> or some like we use for operators OPERATOR(*schema*.*operatorname*)
>
> So introducing VARIABLE(schema.variablename) syntax as an alternative
> syntax for accessing variables I really like. I strongly prefer to use this
> as only alternative (secondary) syntax, because I don't think it is
> friendly syntax or writing friendly, but it is safe, and I can imagine
> tools that can replace generic syntax to this special, or that detects
> generic syntax and shows some warning. Then users can choose what they
> prefer. Two syntaxes - generic and special can be good enough for all - and
> this can be perfectly consistent with current Postgres.

As far as I recall, last time this topic was discussed in hackers, two
options were proposed: the one with VARIABLE(name), what you mention
here; and another one with adding variables to the FROM clause. The
VARIABLE(...) syntax didn't get much negative feedback, so I guess why
not -- if you find it fitting, it would be interesting to see the
implementation.

I'm afraid it should not be just an alternative syntax, but the only one
allowed, because otherwise I don't see how scenarious like "drop a
column with the same name" could be avoided. As in the previous thread:

    -- we've got a variable b at the same time
    SELECT a, b FROM table1;

Then dropping the column b, but everything still works beause the
variable b got silently picked up. But if it would be required to say
VARIABLE(b), then all fine.

And to make sure we're on the same page, could you post couple of
examples from curretly existing tests in the patch, how are they going
to look like with this proposal?

What do you think about the following design?  I can implement a warning "variable_usage_guard" when the variable is accessed without using VARIABLE() syntax. We can discuss later if this warning can be enabled by default or not. There I am open to any variant.

So for variable public.a and table public.foo(a, b)

I can write

LET a = 10; -- there is not possible collision
LET a = a + 1; -- there is not possible collision, no warning

SELECT a, b FROM foo; -- there is a collision - and warning "variable a is shadowed"
SELECT VARIABLE(a), b FROM foo; -- no collision, no warning

After ALTER TABLE foo DROP COLUMN a;

SELECT a, b FROM foo; -- possible warning "the usage in variable without safe syntax",
SELECT VARIABLE(a), b FROM foo; -- no warning

I think this design can be good for all. variable_usage_guard can be enabled by default. If somebody uses conventions for collision protection, then he can safely disable it.

Comments, notes?

Regards

Pavel


About adding variables to the FROM clause. Looks like this option was
quite popular, and you've mentioned some technical challenges
implementing that. If you'd like to go with another approach, it would
be great to elaborate on that -- maybe even with a PoC, to make a
convincing point here.

Re: proposal: schema variables

From
Dmitry Dolgov
Date:
> On Sat, Nov 16, 2024 at 07:10:31AM GMT, Pavel Stehule wrote:

Sorry, got distracted. Let me try to answer step by step.

> > As far as I recall, last time this topic was discussed in hackers, two
> > options were proposed: the one with VARIABLE(name), what you mention
> > here; and another one with adding variables to the FROM clause. The
> > VARIABLE(...) syntax didn't get much negative feedback, so I guess why
> > not -- if you find it fitting, it would be interesting to see the
> > implementation.
> >
> > I'm afraid it should not be just an alternative syntax, but the only one
> > allowed, because otherwise I don't see how scenarious like "drop a
> > column with the same name" could be avoided. As in the previous thread:
> >
> >     -- we've got a variable b at the same time
> >     SELECT a, b FROM table1;
> >
>
> I am sorry, but I am in very strong opposition  against this idea. Nobody
> did reply to my questions, that can change my opinion.

From your reply it's not quite clear, are you opposed to have a mandatory
VARIABLE syntax, or having variables in the FROM clause? My main proposal was
about the former, but the points that are following seems to talk about the
latter. I think it's fine to reject the idea about the FROM clause, as long as
you got some reasonable arguments.

> > Then dropping the column b, but everything still works beause the
> > variable b got silently picked up. But if it would be required to say
> > VARIABLE(b), then all fine.
>
> but same risk you have any time in plpgsql - all time. I don't remember any
> bug report related to this issue.

Which exactly scenario about plpgsql do you have in mind? Just have tried to
declare a variable inside a plpgsql function with the same name as a table
column, and got an error about an ambiguous reference.

> Theoretically, variables can have the same names as tables. The table
> overshadows the variable, so it can work. But when somebody drops the
> variable, then the query still can work. So requirement of usage variable
> in FROM clause protects us just against drop column, but not against
> dropping table. In Postgres the dropping table is possibly risky due
> search_path (that introduces shadowing concept) without introduction
> variables. There is a possibility of this issue, but how common is this
> issue?

This sounds to me like an argument against allowing name clashing between
variables and tables. It makes even more sense, since session variables are in
many ways similar to tables.

> I think this issue can be partially similar to creating two equally named
> tables in different schemas (both schemas are in search path). When you
> drop one table, the query will work, but the result is different. It is the
> same issue. The SQL has no concept of shadowing and on the base line it is
> not necessary.

The point is that most of users are aware about schemas and search path
dangers. But to me such a precedent is not an excuse to introduce a new feature
with similar traps, which folks would have to learn by making mistakes. Judging
from the feedback to this patch over time, I've got an impression that lots of
people are also not fans of that.

> > Then dropping the column b, but everything still works beause the
> > variable b got silently picked up. But if it would be required to say
> > VARIABLE(b), then all fine.
> >
>
> In this scenario you will get  a warning related to variable shadowing
> (before you drop a column).
>
> [...]
>
> What do you think about the following design?  I can implement a warning
> "variable_usage_guard" when the variable is accessed without using
> VARIABLE() syntax. We can discuss later if this warning can be enabled by
> default or not. There I am open to any variant.

I don't follow what are you winning by that? In the context of problem above
(i.e. dropping a column), such a warning is functionally equivalend to a
warning about shadowing.

The problem is that it doesn't sound very appealing to have a feature, which
requires some extra efforts to be used in a right way (e.g. put everything into
a special vars schema, or keep an eye on logs). Most certainly there are such
bits in PostgreSQL today, with all the best practices, crowd wisdom, etc. But
the bar for new features in this sense is much higher, you can see it from the
feedback to this patch. Thus I believe it makes sense, from purely tactical
reasons, to not try to convince half of the community to lower the bar, but
instead try to modify the feature to make it more acceptable, even if some
parts you might not like.

Btw, could you repeat, what was exactly your argument against mandatory
VARIABLE() syntax? It's somehow scattered across many replies, would be great
to summarize it in a couple of phrases.

> Shadowing by self is not an issue, probably, but it is a signal of code
> quality problems.

Agree, but I'm afraid code quality of an average application using PostgreSQL
is quite low, so here we are.

As a side note, I've recently caught myself thinking "it would be cool to have
session variables here". The use case was preparing a policy for RLS, based on
some session-level data set by an application. This session-level data is of a
composite data type, so simple current_setting is cumbersome to use, and a
temporary table will be dropped at the end, taking the policy with it due to
the recorded dependency between them. Thus a session variable of some composite
type sounds like a good fit.



Re: proposal: schema variables

From
Pavel Stehule
Date:
Hi


As a side note, I've recently caught myself thinking "it would be cool to have
session variables here". The use case was preparing a policy for RLS, based on
some session-level data set by an application. This session-level data is of a
composite data type, so simple current_setting is cumbersome to use, and a
temporary table will be dropped at the end, taking the policy with it due to
the recorded dependency between them. Thus a session variable of some composite
type sounds like a good fit.

yes, RLS support is one mentioned use case, and strong reason the access rights are implemented there.

Regards

Pavel

Re: proposal: schema variables

From
Wolfgang Walther
Date:
Dmitry Dolgov:
> This sounds to me like an argument against allowing name clashing between
> variables and tables. It makes even more sense, since session variables are in
> many ways similar to tables.

+1

My mental model of a session variable is similar to a single-row, 
optionally global temporary, table.

Is there any substantial difference that I am not aware of?

Best,

Wolfgang



Re: proposal: schema variables

From
Pavel Stehule
Date:


so 16. 11. 2024 v 15:56 odesílatel Wolfgang Walther <walther@technowledgy.de> napsal:
Dmitry Dolgov:
> This sounds to me like an argument against allowing name clashing between
> variables and tables. It makes even more sense, since session variables are in
> many ways similar to tables.

+1

It doesn't help too much, because the unique tuple (schema, name), and there is a search path.

Secondly, the pg_class is not good enough for description of scalar variables, and enhancing pg_class for scalar variables can be messy.

 

My mental model of a session variable is similar to a single-row,
optionally global temporary, table.

Is there any substantial difference that I am not aware of?

What I know, the variables are used as query parameters, not as relations - Oracle, DB2, MSSQL, MySQL, ...

semantically, yes - it is a global temporary object, but  it can be scalar or composite value - it is not row.

(global (temp)) table can hold 0, 1 or more rows (and rows are always composite of 0..n fields). The variable holds a value of some type. Proposed session variables are like plpgsql variables (only with different scope). In Postgres there is a difference between a scalar variable and composite variable with one field.

Regards

Pavel


Best,

Wolfgang

Re: proposal: schema variables

From
Wolfgang Walther
Date:
Pavel Stehule:
> (global (temp)) table can hold 0, 1 or more rows (and rows are always 
> composite of 0..n fields). The variable holds a value of some type. 
> Proposed session variables are like plpgsql variables (only with 
> different scope). In Postgres there is a difference between a scalar 
> variable and composite variable with one field. 

I can store composite values in table columns, too. A table column can 
either be scalar or composite in that sense.

So, maybe rephrase: Single-row, single-column (global (temp)) table = 
variable. One "cell" of that table.

For me, the major difference between a variable and a table is, that the 
table has 0...n rows and 0...m columns, while the variable has *exactly* 
one in both cases, not 0 either.

I must put tables into FROM, why not those nice mini-tables called 
variables as well? Because they are potentially scalar, you say!

But: I can already put functions returning scalar values into FROM:

   SELECT * FROM format('hello');

The function returns a plain string only.

I don't know. This just "fits" for me.

Or to put it differently: I don't really care whether I have to write 
"(SELECT variable FROM variable)" instead of just "variable". I don't 
want session variables for the syntax, I want session variables, because 
they are **so much better** than custom GUCs.

Best,

Wolfgang




Re: proposal: schema variables

From
Pavel Stehule
Date:


so 16. 11. 2024 v 18:13 odesílatel Wolfgang Walther <walther@technowledgy.de> napsal:
Pavel Stehule:
> (global (temp)) table can hold 0, 1 or more rows (and rows are always
> composite of 0..n fields). The variable holds a value of some type.
> Proposed session variables are like plpgsql variables (only with
> different scope). In Postgres there is a difference between a scalar
> variable and composite variable with one field.

I can store composite values in table columns, too. A table column can
either be scalar or composite in that sense.

So, maybe rephrase: Single-row, single-column (global (temp)) table =
variable. One "cell" of that table.

the tables are tables and variables are variables. For tables you have INSERT, UPDATE, DELETE commands, for variables you have a LET command.

and scalar is not a single column composite.

The session variables can in some particular use cases replace global temp tables, but this is not the goal. I would like to see global temp tables in Postgres too. Maybe session variables prepare a field for this, because some people better understand global temp objects. But again my proposal is not related to global temp tables. This is a different feature.
 

For me, the major difference between a variable and a table is, that the
table has 0...n rows and 0...m columns, while the variable has *exactly*
one in both cases, not 0 either.

I must put tables into FROM, why not those nice mini-tables called
variables as well? Because they are potentially scalar, you say!

But: I can already put functions returning scalar values into FROM:

   SELECT * FROM format('hello');

The function returns a plain string only.

I don't know. This just "fits" for me.

There are more issues - one - when you use some composite in FROM clause, then you expect an unpacked result. But there are a lot of uses, when unpackaging is not wanted. There is a syntax for this but it is really not intuitive and not well readable.
 

Or to put it differently: I don't really care whether I have to write
"(SELECT variable FROM variable)" instead of just "variable". I don't
want session variables for the syntax, I want session variables, because
they are **so much better** than custom GUCs.

session variables are better than GUC for the proposed purpose.  But it should look like variables. The software should respect standards or common typical usage when it is possible. If we introduce fully proprietary design, then it will be hell for all people who know any other databases. And I don't see a strong benefit from this syntax. It solves just one case, it doesn't solve other possible issues, and it introduces another possible risk. 

Regards

Pavel


Best,

Wolfgang

Re: proposal: schema variables

From
Pavel Stehule
Date:


so 16. 11. 2024 v 15:27 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Sat, Nov 16, 2024 at 07:10:31AM GMT, Pavel Stehule wrote:

Sorry, got distracted. Let me try to answer step by step.

> > As far as I recall, last time this topic was discussed in hackers, two
> > options were proposed: the one with VARIABLE(name), what you mention
> > here; and another one with adding variables to the FROM clause. The
> > VARIABLE(...) syntax didn't get much negative feedback, so I guess why
> > not -- if you find it fitting, it would be interesting to see the
> > implementation.
> >
> > I'm afraid it should not be just an alternative syntax, but the only one
> > allowed, because otherwise I don't see how scenarious like "drop a
> > column with the same name" could be avoided. As in the previous thread:
> >
> >     -- we've got a variable b at the same time
> >     SELECT a, b FROM table1;
> >
>
> I am sorry, but I am in very strong opposition  against this idea. Nobody
> did reply to my questions, that can change my opinion.

From your reply it's not quite clear, are you opposed to have a mandatory
VARIABLE syntax, or having variables in the FROM clause? My main proposal was
about the former, but the points that are following seems to talk about the
latter. I think it's fine to reject the idea about the FROM clause, as long as
you got some reasonable arguments.

I am against a requirement to specify a variable in the FROM clause.
 

> > Then dropping the column b, but everything still works beause the
> > variable b got silently picked up. But if it would be required to say
> > VARIABLE(b), then all fine.
>
> but same risk you have any time in plpgsql - all time. I don't remember any
> bug report related to this issue.

Which exactly scenario about plpgsql do you have in mind? Just have tried to
declare a variable inside a plpgsql function with the same name as a table
column, and got an error about an ambiguous reference.

Until you execute the query, you cannot know if there is a conflict or not. So you can change table structure and you can change the procedure's code, and there can be an invisible conflict until execution and query evaluation.  The conflict between PL/pgSQL and SQL raises an error. The conflict between session variables and SQL raises warnings. The issue is detected.

 

> Theoretically, variables can have the same names as tables. The table
> overshadows the variable, so it can work. But when somebody drops the
> variable, then the query still can work. So requirement of usage variable
> in FROM clause protects us just against drop column, but not against
> dropping table. In Postgres the dropping table is possibly risky due
> search_path (that introduces shadowing concept) without introduction
> variables. There is a possibility of this issue, but how common is this
> issue?

This sounds to me like an argument against allowing name clashing between
variables and tables. It makes even more sense, since session variables are in
many ways similar to tables.


It doesn't help too much. It can fix just one issue. But you can have tables with the same name in different schemas inside schemas from search_path. Unique table names solve nothing.
 
> I think this issue can be partially similar to creating two equally named
> tables in different schemas (both schemas are in search path). When you
> drop one table, the query will work, but the result is different. It is the
> same issue. The SQL has no concept of shadowing and on the base line it is
> not necessary.

The point is that most of users are aware about schemas and search path
dangers. But to me such a precedent is not an excuse to introduce a new feature
with similar traps, which folks would have to learn by making mistakes. Judging
from the feedback to this patch over time, I've got an impression that lots of
people are also not fans of that.

Unfortunately - I don't believe so there is some syntax without traps. You can check all implementations in other databases. These designs are very different, and all have some issues and all have some limits. It is native - you are trying to join the procedural and functional world.

I understand the risks. These risks are there. But there is no silver bullet - all proposed designs fixed just one case, and not others, and then I don't see a strong enough benefit to introduce design that is far from common usage. Maybe I have a different experience, because I am a man from the stored procedures area, and the risk of collisions is a known issue well solved by common conventions and in postgres by plpgsql.variable_conflict setting. The proposed patch set has very similar functionality. I think the introduction of VARIABLE(xx) syntax and safe syntax guard warning the usage of variables can be safe in how it is possible. But still I want to allow "short" "usual" usage to people who use a safe convention. There is no risk when you use a safe prefix or safe  schema.

 

> > Then dropping the column b, but everything still works beause the
> > variable b got silently picked up. But if it would be required to say
> > VARIABLE(b), then all fine.
> >
>
> In this scenario you will get  a warning related to variable shadowing
> (before you drop a column).
>
> [...]
>
> What do you think about the following design?  I can implement a warning
> "variable_usage_guard" when the variable is accessed without using
> VARIABLE() syntax. We can discuss later if this warning can be enabled by
> default or not. There I am open to any variant.

I don't follow what are you winning by that? In the context of problem above
(i.e. dropping a column), such a warning is functionally equivalend to a
warning about shadowing.

The problem is that it doesn't sound very appealing to have a feature, which
requires some extra efforts to be used in a right way (e.g. put everything into
a special vars schema, or keep an eye on logs). Most certainly there are such
bits in PostgreSQL today, with all the best practices, crowd wisdom, etc. But
the bar for new features in this sense is much higher, you can see it from the
feedback to this patch. Thus I believe it makes sense, from purely tactical
reasons, to not try to convince half of the community to lower the bar, but
instead try to modify the feature to make it more acceptable, even if some
parts you might not like.

Btw, could you repeat, what was exactly your argument against mandatory
VARIABLE() syntax? It's somehow scattered across many replies, would be great
to summarize it in a couple of phrases.

> Shadowing by self is not an issue, probably, but it is a signal of code
> quality problems.

Agree, but I'm afraid code quality of an average application using PostgreSQL
is quite low, so here we are.

As a side note, I've recently caught myself thinking "it would be cool to have
session variables here". The use case was preparing a policy for RLS, based on
some session-level data set by an application. This session-level data is of a
composite data type, so simple current_setting is cumbersome to use, and a
temporary table will be dropped at the end, taking the policy with it due to
the recorded dependency between them. Thus a session variable of some composite
type sounds like a good fit.

Re: proposal: schema variables

From
Pavel Stehule
Date:


so 16. 11. 2024 v 23:49 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:


so 16. 11. 2024 v 15:27 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Sat, Nov 16, 2024 at 07:10:31AM GMT, Pavel Stehule wrote:

Sorry, got distracted. Let me try to answer step by step.

> > As far as I recall, last time this topic was discussed in hackers, two
> > options were proposed: the one with VARIABLE(name), what you mention
> > here; and another one with adding variables to the FROM clause. The
> > VARIABLE(...) syntax didn't get much negative feedback, so I guess why
> > not -- if you find it fitting, it would be interesting to see the
> > implementation.
> >
> > I'm afraid it should not be just an alternative syntax, but the only one
> > allowed, because otherwise I don't see how scenarious like "drop a
> > column with the same name" could be avoided. As in the previous thread:
> >
> >     -- we've got a variable b at the same time
> >     SELECT a, b FROM table1;
> >
>
> I am sorry, but I am in very strong opposition  against this idea. Nobody
> did reply to my questions, that can change my opinion.

From your reply it's not quite clear, are you opposed to have a mandatory
VARIABLE syntax, or having variables in the FROM clause? My main proposal was
about the former, but the points that are following seems to talk about the
latter. I think it's fine to reject the idea about the FROM clause, as long as
you got some reasonable arguments.

I am against a requirement to specify a variable in the FROM clause.
 

> > Then dropping the column b, but everything still works beause the
> > variable b got silently picked up. But if it would be required to say
> > VARIABLE(b), then all fine.
>
> but same risk you have any time in plpgsql - all time. I don't remember any
> bug report related to this issue.

Which exactly scenario about plpgsql do you have in mind? Just have tried to
declare a variable inside a plpgsql function with the same name as a table
column, and got an error about an ambiguous reference.

Until you execute the query, you cannot know if there is a conflict or not. So you can change table structure and you can change the procedure's code, and there can be an invisible conflict until execution and query evaluation.  The conflict between PL/pgSQL and SQL raises an error. The conflict between session variables and SQL raises warnings. The issue is detected.

 

> Theoretically, variables can have the same names as tables. The table
> overshadows the variable, so it can work. But when somebody drops the
> variable, then the query still can work. So requirement of usage variable
> in FROM clause protects us just against drop column, but not against
> dropping table. In Postgres the dropping table is possibly risky due
> search_path (that introduces shadowing concept) without introduction
> variables. There is a possibility of this issue, but how common is this
> issue?

This sounds to me like an argument against allowing name clashing between
variables and tables. It makes even more sense, since session variables are in
many ways similar to tables.


It doesn't help too much. It can fix just one issue. But you can have tables with the same name in different schemas inside schemas from search_path. Unique table names solve nothing.

the combination of pg_class and pg_attribute cannot describe scalar variables (without hacks). Then you need to  enhance pg_class, which can be confusing. And on the second hand almost all columns in pg_class have no sense for variables. And when variables and tables are in different tables, you cannot ensure a unique name. Variables are similar to tables only in possibility to hold a value. That is all. But variables don't store data to file, don't store data in pages, don't allow usage of other storages or formats, and don't support foreign storage. The similarity between variables and tables is like the similarity between horses and cars. Both can help with moving.

 
> I think this issue can be partially similar to creating two equally named
> tables in different schemas (both schemas are in search path). When you
> drop one table, the query will work, but the result is different. It is the
> same issue. The SQL has no concept of shadowing and on the base line it is
> not necessary.

The point is that most of users are aware about schemas and search path
dangers. But to me such a precedent is not an excuse to introduce a new feature
with similar traps, which folks would have to learn by making mistakes. Judging
from the feedback to this patch over time, I've got an impression that lots of
people are also not fans of that.

Unfortunately - I don't believe so there is some syntax without traps. You can check all implementations in other databases. These designs are very different, and all have some issues and all have some limits. It is native - you are trying to join the procedural and functional world.

I understand the risks. These risks are there. But there is no silver bullet - all proposed designs fixed just one case, and not others, and then I don't see a strong enough benefit to introduce design that is far from common usage. Maybe I have a different experience, because I am a man from the stored procedures area, and the risk of collisions is a known issue well solved by common conventions and in postgres by plpgsql.variable_conflict setting. The proposed patch set has very similar functionality. I think the introduction of VARIABLE(xx) syntax and safe syntax guard warning the usage of variables can be safe in how it is possible. But still I want to allow "short" "usual" usage to people who use a safe convention. There is no risk when you use a safe prefix or safe  schema.

 

> > Then dropping the column b, but everything still works beause the
> > variable b got silently picked up. But if it would be required to say
> > VARIABLE(b), then all fine.
> >
>
> In this scenario you will get  a warning related to variable shadowing
> (before you drop a column).
>
> [...]
>
> What do you think about the following design?  I can implement a warning
> "variable_usage_guard" when the variable is accessed without using
> VARIABLE() syntax. We can discuss later if this warning can be enabled by
> default or not. There I am open to any variant.

I don't follow what are you winning by that? In the context of problem above
(i.e. dropping a column), such a warning is functionally equivalend to a
warning about shadowing.

The problem is that it doesn't sound very appealing to have a feature, which
requires some extra efforts to be used in a right way (e.g. put everything into
a special vars schema, or keep an eye on logs). Most certainly there are such
bits in PostgreSQL today, with all the best practices, crowd wisdom, etc. But
the bar for new features in this sense is much higher, you can see it from the
feedback to this patch. Thus I believe it makes sense, from purely tactical
reasons, to not try to convince half of the community to lower the bar, but
instead try to modify the feature to make it more acceptable, even if some
parts you might not like.

Btw, could you repeat, what was exactly your argument against mandatory
VARIABLE() syntax? It's somehow scattered across many replies, would be great
to summarize it in a couple of phrases.

> Shadowing by self is not an issue, probably, but it is a signal of code
> quality problems.

Agree, but I'm afraid code quality of an average application using PostgreSQL
is quite low, so here we are.

As a side note, I've recently caught myself thinking "it would be cool to have
session variables here". The use case was preparing a policy for RLS, based on
some session-level data set by an application. This session-level data is of a
composite data type, so simple current_setting is cumbersome to use, and a
temporary table will be dropped at the end, taking the policy with it due to
the recorded dependency between them. Thus a session variable of some composite
type sounds like a good fit.

Re: proposal: schema variables

From
Pavel Stehule
Date:


so 16. 11. 2024 v 23:07 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:


so 16. 11. 2024 v 18:13 odesílatel Wolfgang Walther <walther@technowledgy.de> napsal:
Pavel Stehule:
> (global (temp)) table can hold 0, 1 or more rows (and rows are always
> composite of 0..n fields). The variable holds a value of some type.
> Proposed session variables are like plpgsql variables (only with
> different scope). In Postgres there is a difference between a scalar
> variable and composite variable with one field.

I can store composite values in table columns, too. A table column can
either be scalar or composite in that sense.

So, maybe rephrase: Single-row, single-column (global (temp)) table =
variable. One "cell" of that table.

the tables are tables and variables are variables. For tables you have INSERT, UPDATE, DELETE commands, for variables you have a LET command.

and scalar is not a single column composite.

example

assignment to scalar versus single composite

LET a  = 10 
LET a.a = 10 or LET a = ROW(10)

Single column tables can be the result of some ALTERS - or sometimes you can use a table type. But for example, plpgsql, very strongly differs between scalar and composite variables. So introducing the "new" concept - single field composite is scalar introduces strong inconsistency there.

Regards

Pavel

 

The session variables can in some particular use cases replace global temp tables, but this is not the goal. I would like to see global temp tables in Postgres too. Maybe session variables prepare a field for this, because some people better understand global temp objects. But again my proposal is not related to global temp tables. This is a different feature.
 

For me, the major difference between a variable and a table is, that the
table has 0...n rows and 0...m columns, while the variable has *exactly*
one in both cases, not 0 either.

I must put tables into FROM, why not those nice mini-tables called
variables as well? Because they are potentially scalar, you say!

But: I can already put functions returning scalar values into FROM:

   SELECT * FROM format('hello');

The function returns a plain string only.

I don't know. This just "fits" for me.

There are more issues - one - when you use some composite in FROM clause, then you expect an unpacked result. But there are a lot of uses, when unpackaging is not wanted. There is a syntax for this but it is really not intuitive and not well readable.
 

Or to put it differently: I don't really care whether I have to write
"(SELECT variable FROM variable)" instead of just "variable". I don't
want session variables for the syntax, I want session variables, because
they are **so much better** than custom GUCs.

session variables are better than GUC for the proposed purpose.  But it should look like variables. The software should respect standards or common typical usage when it is possible. If we introduce fully proprietary design, then it will be hell for all people who know any other databases. And I don't see a strong benefit from this syntax. It solves just one case, it doesn't solve other possible issues, and it introduces another possible risk. 

Regards

Pavel


Best,

Wolfgang

Re: proposal: schema variables

From
Marcos Pegoraro
Date:
Em ter., 19 de nov. de 2024 às 16:15, Pavel Stehule <pavel.stehule@gmail.com> escreveu:
I wrote POC of VARIABLE(varname) syntax support

Not related with POC of VARIABLE but seeing your patches ...

Wouldn't it be better to use just one syntax and message for what to do ON COMMIT ?

When creating a new variable you use 
CREATE VARIABLE ... ON COMMIT DROP | ON TRANSACTION END RESET

On PSQL \dV+ you show 
Transactional end action

Maybe all them could be just ON COMMIT
CREATE VARIABLE ... [ON COMMIT {NO ACTION, DROP, RESET}] and \dV+ just "on commit" on title column

regards
Marcos

Re: proposal: schema variables

From
Pavel Stehule
Date:
Hi

st 20. 11. 2024 v 14:29 odesílatel Marcos Pegoraro <marcos@f10.com.br> napsal:
Em ter., 19 de nov. de 2024 às 16:15, Pavel Stehule <pavel.stehule@gmail.com> escreveu:
I wrote POC of VARIABLE(varname) syntax support

Not related with POC of VARIABLE but seeing your patches ...

Wouldn't it be better to use just one syntax and message for what to do ON COMMIT ?

When creating a new variable you use 
CREATE VARIABLE ... ON COMMIT DROP | ON TRANSACTION END RESET

On PSQL \dV+ you show 
Transactional end action

Maybe all them could be just ON COMMIT
CREATE VARIABLE ... [ON COMMIT {NO ACTION, DROP, RESET}] and \dV+ just "on commit" on title column

ON COMMIT DROP is related to temporary objects. In this case, you don't need to think about ROLLBACK, because in this case, the temp objects are removed implicitly.

ON TRANSACTION END RESET can be used for non temporary objects too. So this is a little bit of a different feature. But the reset is executed if the transaction is ended by ROLLBACK too. So using a syntax just ON COMMIT can be a little bit messy. TRANSACTION END is more intuitive, I think. If I remember there was a proposal ON COMMIT OR ROLLBACK, but I think TRANSACTION END is better and more intuitive, and better describes what is implemented. I can imagine to support clauses ON COMMIT RESET or ON ROLLBACK RESET that can be used independently, but for this time, I don't want to increase a complexity now - reset is just at transaction end without dependency if the transaction was committed or rollbacked.

Regards

Pavel


regards
Marcos

Re: proposal: schema variables

From
Marcos Pegoraro
Date:
Em qua., 20 de nov. de 2024 às 10:52, Pavel Stehule <pavel.stehule@gmail.com> escreveu:
COMMIT can be a little bit messy. TRANSACTION END is more intuitive, I think.

Exactly to be not messy I would just ON COMMIT for all, and DOCs can explain that this option is ignored for temp objects and do the same at the end of transaction, independently if commited or rolled back

Re: proposal: schema variables

From
Pavel Stehule
Date:


st 20. 11. 2024 v 15:15 odesílatel Marcos Pegoraro <marcos@f10.com.br> napsal:
Em qua., 20 de nov. de 2024 às 10:52, Pavel Stehule <pavel.stehule@gmail.com> escreveu:
COMMIT can be a little bit messy. TRANSACTION END is more intuitive, I think.

Exactly to be not messy I would just ON COMMIT for all, and DOCs can explain that this option is ignored for temp objects and do the same at the end of transaction, independently if commited or rolled back

I feel it differently - when I read ON COMMIT, then I expect really just a COMMIT event, not ROLLBACK.  Attention - the metadata about variables are transactional, but the variables are not transactional (by default).

But this feeling can be subjective. The objective argument against using ON COMMIT like ON TRANSACTION END is fact, so we lost a possibility for a more precious setting.

I can imagine scenarios with ON COMMIT RESET - and variable can hold some last value from transaction, or ON ROLLBACK RESET - and variable can hold some computed value from successfully ended transaction - last inserted id.

In this case, I don't see a problem to reopen a discussion about syntax or postpone this feature. I think the possibility to reset variables at some specified time can be an interesting feature (that can increase safety, because the application doesn't need to solve initial setup),  but from the list of implemented features for session variables, this is not too far from the end. If TRANSACTION END is not intuitive - with exactly the same functionality can be RESET AT TRANSACTION START - so the ON TRANSACTION END can be transformed to ON BEGIN RESET, and this syntax can be maybe better, because it signalize, for every transaction, the variable will be initialized to default. What do you think? Can be ON BEGIN RESET acceptable for you.

Regards

Pavel





Re: proposal: schema variables

From
Dmitry Dolgov
Date:
> On Tue, Nov 19, 2024 at 08:14:01PM +0100, Pavel Stehule wrote:
> Hi
>
> I wrote POC of VARIABLE(varname) syntax support

Thanks, the results look good. I'm still opposed the idea of having a
warning, and think it has to be an error -- but it's my subjective
opinion. Lets see if there are more votes on that topic.



Re: proposal: schema variables

From
Pavel Stehule
Date:


st 20. 11. 2024 v 21:14 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Tue, Nov 19, 2024 at 08:14:01PM +0100, Pavel Stehule wrote:
> Hi
>
> I wrote POC of VARIABLE(varname) syntax support

Thanks, the results look good. I'm still opposed the idea of having a
warning, and think it has to be an error -- but it's my subjective
opinion. Lets see if there are more votes on that topic.

The error breaks the possibility to use variables (session variables) like Oracle's package variables easily. It increases effort for transformation or porting because you should identify variables inside queries and you should wrap it to fence.  On the other hand, extensions that can read a query after transformation can easily detect unwrapped variables and they can raise an error. It can be very easy to implement this check to plpgsql_check for example or like plpgsql.extra_check.

In my ideal world, the shadowing warning should be enabled by default, and an unfencing warning disabled by default. But I have not a problem with activating both warnings by default. I think warnings  are enough, because if there is some risk then a shadowing warning is activated. And my experience is more positive about warnings, people read it.

Regards

Pavel


Re: proposal: schema variables

From
jian he
Date:
On Thu, Dec 5, 2024 at 2:52 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Hi
>
> only rebase

hi.
disclaimer, i *only* applied
v20241205-0001-Enhancing-catalog-for-support-session-variables-and-.patch.

create variable v2 as text;
alter variable v2 rename to v2;
ERROR:  session variable "v2" already exists in schema "public"

the above is  coverage tests for report_namespace_conflict:
        case VariableRelationId:
            Assert(OidIsValid(nspOid));
            msgfmt = gettext_noop("session variable \"%s\" already
exists in schema \"%s\"");
            break;

create type t1 as (a int, b int);
CREATE VARIABLE var1 AS t1;
alter type t1 drop attribute a;
should "alter type t1 drop attribute a;" not allowed?


GetCommandLogLevel also needs to deal with  case T_CreateSessionVarStmt?


there are no regress tests for the change we made in
find_composite_type_dependencies?
It looks like it will be reachable for sure.


CreateVariable, print out error position:
    if (get_typtype(typid) == TYPTYPE_PSEUDO)
        ereport(ERROR,
                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                 errmsg("session variable cannot be pseudo-type %s",
                        format_type_be(typid)),
                 parser_errposition(pstate, stmt->typeName->location)));

Alvaro Herrera told me actually, you don't need the extra parentheses
around errcode.
so you can:
    if (get_typtype(typid) == TYPTYPE_PSEUDO)
        ereport(ERROR,
                errcode(ERRCODE_WRONG_OBJECT_TYPE),
                errmsg("session variable cannot be pseudo-type %s",
                        format_type_be(typid)),
                parser_errposition(pstate, stmt->typeName->location))


pg_variable_is_visible seems to have done twice the system cache call.
maybe follow through with the pg_table_is_visible, pg_type_is_visible
code pattern.


IN src/bin/psql/describe.c
+ appendPQExpBufferStr(&buf, "\nWHERE true\n");
this is not needed?
------------------------------------------------
some of the `foreach` can change to foreach_oid, foreach_node
see: https://git.postgresql.org/cgit/postgresql.git/commit/?id=14dd0f27d7cd56ffae9ecdbe324965073d01a9ff
------------------------------------------------
doc/src/sgml/ref/create_variable.sgml
<programlisting>
CREATE VARIABLE var1 AS date;
LET var1 = current_date;
SELECT var1;
</programlisting>

v20241205-0001-Enhancing-catalog-for-support-session-variables-and-.patch
alone cannot do `LET var1 = current_date;`, `SELECT var1;`
maybe the following patch can do it. but that makes
we cannot commit
v20241205-0001-Enhancing-catalog-for-support-session-variables-and-.patch
alone.
------------------------------------------------
since CREATE VARIABLE is an extension to standard, so in create_schema.sgml
Compatibility section,
do we need to mention CREATE SCHEMA CREATE VARIABLE is an extension
from standard
?
-----------------------------------------------

/*
 * Drop variable by OID, and register the needed session variable
 * cleanup.
 */
void
DropVariableById(Oid varid)
i am not sure of the meaning of "register the needed session variable cleanup".



Re: proposal: schema variables

From
jian he
Date:
On Mon, Dec 9, 2024 at 2:33 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Hi
>
again. only applied
v20241208-0001-Enhancing-catalog-for-support-session-variables-and-.patch.

/* we want SessionVariableCreatePostprocess to see the catalog changes. */
0001 doesn't have SessionVariableCreatePostprocess,
so this comment is wrong in the context of 0001.

typo:
As above, but if the variable isn't found and is_mussing is not NULL
is_mussing should be is_missing.

----------------------------------------------
issue with  grant.sgml and revoke.sgml.

* there are no regress tests for WITH GRANT OPTION but it seems to work;
there are no REVOKE CASCADE tests. the following tests show
REVOKE CASCADE works.

create variable v1 as int;
GRANT select on VARIABLE v1 to alice with grant option;
set session authorization alice;
GRANT select on VARIABLE v1 to bob with grant option;
reset session authorization;

select varacl from pg_variable where varname  = 'v1';
--output
{jian=rw/jian,alice=r*/jian,bob=r*/alice}
revoke all privileges on variable v1 from alice cascade;
select varacl from pg_variable where varname  = 'v1';
--output
 {jian=rw/jian}

revoke GRANT OPTION FOR all privileges on variable v1 from alice cascade;
also works.

* in revoke.sgml and grant.sgml.
if you look closely, " | ALL VARIABLES IN SCHEMA schema_name [, ...] }" is wrong
because there is no left-curly-bracket, but there is a right-curly-bracket.

* in revoke.sgml.
<phrase>where <replaceable
class="parameter">role_specification</replaceable> can be:</phrase>
    [ GROUP ] <replaceable class="parameter">role_name</replaceable>
  | PUBLIC
  | CURRENT_ROLE
  | CURRENT_USER
  | SESSION_USER
should be at the end of the synopsis section?
otherwise it looks weird, maybe we can put the REVOKE VARIABLE code upper.
grant.sgml changes position looks fine to me.


*  <para>
   The <command>GRANT</command> command has two basic variants: one
   that grants privileges on a database object (table, column, view,
   foreign table, sequence, database, foreign-data wrapper, foreign server,
   function, procedure, procedural language, large object, configuration
   parameter, schema, tablespace, or type), and one that grants
   membership in a role.  These variants are similar in many ways, but
   they are different enough to be described separately.
  </para>
this <para> in grant.sgml needs to also mention "variable"?

*   <para>
    Privileges on databases, tablespaces, schemas, languages, and
    configuration parameters are
    <productname>PostgreSQL</productname> extensions.
   </para>
this <para> in grant.sgml needs to also mention "variable"?

----------------------------------------------
*
+   <para>
+    Inside a query or an expression, a session variable can be
+    <quote>shadowed</quote> by a column with the same name.  Similarly, the
+    name of a function or procedure argument or a PL/pgSQL variable (see
PL/pgSQL should decorated as <application>PL/pgSQL</application>

* we already support \dV and \dV+ in
v20241208-0001-Enhancing-catalog-for-support-session-variables-and-.patch
so we should update doc/src/sgml/ref/psql-ref.sgml also.
I briefly searched \dV in
v20241208-0002-Storage-for-session-variables-and-SQL-interface.patch,
no entry.


* in doc/src/sgml/ddl.sgml
<table id="privilege-abbrevs-table"> and <table
id="privileges-summary-table"> need to change for variable?
<varlistentry id="ddl-priv-select">, <varlistentry
id="ddl-priv-update"> need to change for variable?

it's kind of tricky. if we only apply
v20241208-0001-Enhancing-catalog-for-support-session-variables-and-.patch
we can not  SELECT or  UPDATE variable.
so how are we going to mention these privileges for variable?


 * we can add some tests for EVENT TRIGGER test,
since event trigger support CREATE|DROP variable. atually, I think
there is a bug.

create variable v1 as int;
CREATE OR REPLACE FUNCTION event_trigger_report_dropped()
RETURNS event_trigger
LANGUAGE plpgsql
AS $$
DECLARE r record;
BEGIN
FOR r IN SELECT * from pg_event_trigger_dropped_objects()
LOOP
IF NOT r.normal AND NOT r.original THEN
    CONTINUE;
END IF;
RAISE NOTICE 'NORMAL: orig=% normal=% istemp=% type=% identity=% name=% args=%',
    r.original, r.normal, r.is_temporary, r.object_type,
    r.object_identity, r.address_names, r.address_args;
END LOOP;
END; $$;

CREATE EVENT TRIGGER regress_event_trigger_report_dropped ON sql_drop
WHEN TAG IN ('DROP VARIABLE')
EXECUTE PROCEDURE event_trigger_report_dropped();

--output:
src9=# drop variable v1;
NOTICE:  test_event_trigger: ddl_command_start DROP VARIABLE
NOTICE:  NORMAL: orig=t normal=f istemp=f type=session variable
identity=public.v1 name={public,$} args={}
DROP VARIABLE

should i expect
NOTICE:  NORMAL: orig=t normal=f istemp=f type=session variable
identity=public.v1 name={public,$} args={}
to be
NOTICE:  NORMAL: orig=t normal=f istemp=f type=session variable
identity=public.v1 name={public,v1} args={}
?



Re: proposal: schema variables

From
Pavel Stehule
Date:
Hi

st 20. 11. 2024 v 21:14 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Tue, Nov 19, 2024 at 08:14:01PM +0100, Pavel Stehule wrote:
> Hi
>
> I wrote POC of VARIABLE(varname) syntax support

Thanks, the results look good. I'm still opposed the idea of having a
warning, and think it has to be an error -- but it's my subjective
opinion. Lets see if there are more votes on that topic.

Maybe the warning of usage of unfenced variables can be changed (enhanced) to some different mechanism that can be more restrictive (and safer), but I think it can still be user friendly.

My idea is based on assumption so users with knowledge of stored procedures know  variables and related risks (and know tools how to minimize risks), and for other people the risk is higher and we should force usage of variable fences. I think we can force usage of variable fences at query runtime, when query is not executed from the SPI environment. This behaviour can be enabled by default, but can be optionally disabled. 

CREATE VARIABLE s.x AS int; -- allowed when user has create right on schema s
CREATE VIEW v1 AS SELECT x; -- no problem, the check is dynamic (execution), not static
CREATE VIEW v2 AS SELECT VARIABLE(x); -- no problem

SELECT x; -- fails on access to unfenced variable
SELECT * FROM v1; -- fails on access to unfenced variable
SELECT * FROM v2; -- ok

but inside pl, this check will not be active, and then with default setting I can write an code like

LET var = 10; -- fencing is not allowed there, and there is not any collision
DO $$
BEGIN
  RAISE NOTICE 'var=%', var;
  RAISE NOTICE 'var=%', (SELECT * FROM v1); --is ok here too
END;
$$;

Outside PL the fencing can be required, inside PL the fencing can be optional. With this proposal we can limit the possible risk usage of unfenced variables only in PL context, and the behaviour can be very similar to PL/SQL or SQL/PSM. This check is only a runtime check, so it has no impact on any DDL statement. It doesn't change the syntax or behavior, so it can be implemented subsequently - it is just a safeguard against unwanted usage of variables in an environment, where users have no possibility to use variables already. I can imagine that this check "allow_unfenced_variables" can have three values (everywhere, pl, nowhere) and the default can be pl. The results of queries don't depend on the current setting of this check. For all values for all possible queries and situations, the result is the same (when it is finished). But sometimes, the check can break the execution - in similar meaning like access rights. All previous proposed warnings can be unchanged.

Comments, notes?

Regards

Pavel



 

Re: proposal: schema variables

From
jian he
Date:
hi.

GRANT|REVOKE  ALL VARIABLES IN SCHEMA schema_name [, ...] }
seems to work.
might be better to add tests.

also src/bin/psql/tab-complete.in.c
COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_grantables,
we can also add "ALL VARIABLES IN SCHEMA "

also need change this <para> in grant.sgml:
  <para>
   There is also an option to grant privileges on all objects of the same
   type within one or more schemas.  This functionality is currently supported
   only for tables, sequences, functions, and procedures.  <literal>ALL
   TABLES</literal> also affects views and foreign tables, just like the
   specific-object <command>GRANT</command> command.  <literal>ALL
   FUNCTIONS</literal> also affects aggregate and window functions, but not
   procedures, again just like the specific-object <command>GRANT</command>
   command.  Use <literal>ALL ROUTINES</literal> to include procedures.
  </para>

revoke.sgml, we should use <replaceable
class="parameter">role_specification</replaceable>?
so it will become like:

REVOKE [ GRANT OPTION FOR ]
    { { SELECT | UPDATE } [, ...] | ALL [ PRIVILEGES ] }
    ON { VARIABLE variable_name [, ...]
       | ALL VARIABLES IN SCHEMA schema_name [, ...] }
    FROM role_specification [, ...]

maybe also add
[ GRANTED BY role_specification ]
but I didn't test "REVOKE [ GRANTED BY role_specification ]".

Speaking of acl tests,
similar to has_table_privilege I am afraid we need to have a function
like has_variable_privilege for acl tests.
has_table_privilege has 6 function signatures. so there will be more code.
------------------------------------------------------
doc/src/sgml/ref/create_variable.sgml
<synopsis> section:
CREATE VARIABLE [ IF NOT EXISTS ] name [ AS ] data_type ] [ COLLATE collation ]

redundant right square bracket after "data_type".



Re: proposal: schema variables

From
jian he
Date:
hi.

/*
 * has_session_variable_privilege variants
 *        These are all named "has_session_variable_privilege" at the SQL level.
 *        They take various combinations of variable name, variable OID,
 *        user name, user OID, or implicit user = current_user.
 *
 *        The result is a boolean value: true if user has the indicated
 *        privilege, false if not.  The variants that take a relation OID
 *        return NULL if the OID doesn't exist.
 */
/*
 * has_session_variable_privilege_name_name
 *        Check user privileges on a session variable given
 *        name username, text sessin variable name, and text priv name.
 */
"The variants that take a relation OID return NULL if the OID doesn't exist."
should it be
"The variants that take an OID type return NULL if the OID doesn't exist."
?

typo, "sessin" should be "session".
----------------<<<>>>>-------------------
  <sect1 id="ddl-session-variables">
   <title>Session Variables</title>
only mentioned that "Session variables themselves are persistent, but their
values are neither persistent nor shared (like the content of temporary tables).
"
I feel like this sentence is not that explicit. we actually want to say
"Once a session exits, the variable value is reset to NULL, one
session cannot see another session variable value."

+    <para>
+     A persistent database object that holds a value in session memory.  This
+     value is private to each session and is released when the session ends.
+     Read or write access to session variables is controlled by privileges,
+     similar to other database objects.
+    </para>
i do like this description in glossary.sgml.
maybe we can copy it and put it to ddl.sgml "<sect1 id="ddl-session-variables">
----------------<<<>>>>-------------------
REVOKE [ GRANT OPTION FOR ]
    { { SELECT | UPDATE } [, ...] | ALL [ PRIVILEGES ] }
    ON { VARIABLE <replaceable>variable_name</replaceable> [, ...]
       | ALL VARIABLES IN SCHEMA <replaceable
class="parameter">schema_name</replaceable> [, ...] }
    FROM { [ GROUP ] <replaceable
class="parameter">role_specification</replaceable> | PUBLIC } [, ...]
    [ GRANTED BY <replaceable
class="parameter">role_specification</replaceable> ]
    [ CASCADE | RESTRICT ]
revoke, seems still not right.
since with this, we can say:
REVOKE ALL PRIVILEGES ON VARIABLE v1 FROM group group alice CASCADE;

i think the correct one should be:
REVOKE [ GRANT OPTION FOR ]
    { { SELECT | UPDATE } [, ...] | ALL [ PRIVILEGES ] }
    ON { VARIABLE <replaceable>variable_name</replaceable> [, ...]
       | ALL VARIABLES IN SCHEMA <replaceable
class="parameter">schema_name</replaceable> [, ...] }
    FROM <replaceable class="parameter">role_specification</replaceable> [, ...]
    [ GRANTED BY <replaceable
class="parameter">role_specification</replaceable> ]
    [ CASCADE | RESTRICT ]

----------------<<<>>>>-------------------
<programlisting>
CREATE VARIABLE public.current_user_id AS integer;
GRANT READ ON VARIABLE public.current_user_id TO PUBLIC;
LET current_user_id = (SELECT id FROM users WHERE usename = session_user);
SELECT current_user_id;
</programlisting>
"GRANT READ" should be "GRANT SELECT".
----------------<<<>>>>-------------------
doc/src/sgml/ref/alter_default_privileges.sgml
GRANT { SELECT | UPDATE | ALL [ PRIVILEGES ] }
    ON VARIABLES
    TO { [ GROUP ] <replaceable
class="parameter">role_name</replaceable> | PUBLIC } [, ...] [ WITH
GRANT OPTION ]
the above part is wrong?
should be:
GRANT { { SELECT | UPDATE } [,...]
    | ALL [ PRIVILEGES ] }
    ON VARIABLES
    TO { [ GROUP ] <replaceable
class="parameter">role_name</replaceable> | PUBLIC } [, ...] [ WITH
GRANT OPTION ]

since we can:
ALTER DEFAULT PRIVILEGES
FOR ROLE alice
IN SCHEMA svartest
GRANT SELECT, UPDATE ON VARIABLES TO bob;
----------------<<<>>>>-----------------------------
CREATE VARIABLE IF NOT EXISTS v2 AS comp;
grant update on variable v2 to alice;
set role alice;
LET v2.a  = 12; --acl permission error
LET v2.b = 12; --acl permission error
LET v2 = (11,12); --ok.

not sure this is the desired behavior, for composite type variables, you are
allowed to change all the values, but you are not allowed to update the field
value of the composite.  The following are normal table test update cases.

create type comp as (a int, b  int);
create table t2(a comp);
insert into t2 select '(11,12)';
grant update (a ) on t2 to alice;
set role alice;
update t2 set a.a = 13; --ok
update t2 set a.b = 13; --ok
update t2 set a = '(11,13)'; --ok
----------------<<<>>>>-----------------------------
domain seems to have an issue.

CREATE domain d1 AS int;
CREATE VARIABLE var1 AS d1;
let var1 = 3;
--this should fail?.
alter domain d1 add check (value <> 3);
select var1;
ERROR:  value for domain d1 violates check constraint "d1_check"
----------------<<<>>>>-----------------------------
doc/src/sgml/ref/alter_variable.sgml
<title>Parameters</title> section, the order should
be: name, new_owner, new_name, new_schema?

I am beginning to look around 0002.



Re: Re: proposal: schema variables

From
jian he
Date:
hi.
review is based on
v20241219-0002-Storage-for-session-variables-and-SQL-interface.patch
and
v20241219-0001-Enhancing-catalog-for-support-session-variables-and-.patch.

in doc/src/sgml/catalogs.sgml

<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>defaclobjtype</structfield> <type>char</type>
</para>
<para>
Type of object this entry is for:
<literal>r</literal> = relation (table, view),
<literal>S</literal> = sequence,
<literal>f</literal> = function,
<literal>T</literal> = type,
<literal>n</literal> = schema
</para></entry>
</row>
this need updated for session variable?

psql meta command \ddp
src/bin/psql/describe.c listDefaultACLs
also need to change.
----------------<<>>-------------------------
+-- show variable
+SELECT public.svar;
+SELECT public.svar.c;
+SELECT (public.svar).*;
+
+-- the variable is shadowed, raise error
+SELECT public.svar.c FROM public.svar;
+
+-- can be fixed by alias
+SELECT public.svar.c FROM public.svar x

The above tests are duplicated in session_variables.sql.
----------------<<>>-------------------------
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -49,7 +49,7 @@ typedef struct PlannedStmt

     NodeTag        type;

-    CmdType        commandType;    /*
select|insert|update|delete|merge|utility */
+    CmdType        commandType;    /*
select|let|insert|update|delete|merge|utility */

since we don't have CMD_LET CmdType.
so this comment change is not necessary?
----------------<<>>-------------------------
the following are simple tests that triggers makeParamSessionVariable error
messages. we don't have related tests, we can add it on session_variables.sql.

create type t1 as (a int, b int);
CREATE VARIABLE v1 text;
CREATE VARIABLE v2 as t1;
select v1.a;
select v2.c;

also
select v2.c;
ERROR:  could not identify column "c" in variable
LINE 1: select v2.c;
               ^
the error message is no good. i think we want:
ERROR:  could not identify column "c" in  variable "public.v1"
----------------<<>>-------------------------
+    /*
+     * Check for duplicates. Note that this does not really prevent
+     * duplicates, it's here just to provide nicer error message in common
+     * case. The real protection is the unique key on the catalog.
+     */
+    if (SearchSysCacheExists2(VARIABLENAMENSP,
+                              PointerGetDatum(varName),
+                              ObjectIdGetDatum(varNamespace)))
+    {
+    }
I am confused by these comments. i think the SearchSysCacheExists2
do actually prevent duplicates.
I noticed that publication_add_relation also has similar comments.
----------------<<>>-------------------------
typedef struct LetStmt
{
    NodeTag        type;
    List       *target;            /* target variable */
    Node       *query;            /* source expression */
    int            location;
} LetStmt;
here, location should be a type of ParseLoc.
----------------<<>>-------------------------
in 0001, 0002, function SetSessionVariableWithSecurityCheck
never being used.
----------------<<>>-------------------------
+/*
+ * transformLetStmt -
+ *      transform an Let Statement
+ */
+static Query *
+transformLetStmt(ParseState *pstate, LetStmt *stmt)
...
+    /*
+     * Save target session variable ID.  This value is used multiple times: by
+     * the query rewriter (for getting related defexpr), by planner (for
+     * acquiring an AccessShareLock on target variable), and by the executor
+     * (we need to know the target variable ID).
+     */
+    query->resultVariable = varid;

"defexpr", do you mean "default expression"?
Generally letsmt is like: "let variable = (SelectStmt)"
is there nothing related to the DEFAULT expression?
"(we need to know the target variable ID)." in ExecuteLetStmt that is true.
but I commented out the following lines, the regress test still
succeeded.

extract_query_dependencies_walker
/* record dependency on the target variable of a LET command */
// if (OidIsValid(query->resultVariable))
//     record_plan_variable_dependency(context, query->resultVariable);

ScanQueryForLocks
// /* process session variables */
// if (OidIsValid(parsetree->resultVariable))
// {
//     if (acquire)
//         LockDatabaseObject(VariableRelationId, parsetree->resultVariable,
//                            0, AccessShareLock);
//     else
//         UnlockDatabaseObject(VariableRelationId, parsetree->resultVariable,
//                              0, AccessShareLock);
// }
----------------<<>>-------------------------
in transformLetStmt, we already did ACL privilege check,
we don't need do it again at ExecuteLetStmt?