Thread: [HACKERS] merging some features from plpgsql2 project
Some points are well and can be benefit for PlpgSQL.
First I describe my initial position. I am strongly against introduction "new" language - plpgsql2 or new plpgsql, or any else. The trust of developers to us is important and introduction of any not compatible or different feature has to have really big reason. PostgreSQL is conservative environment, and PLpgSQL should not be a exception. More - I have not any information from my customers, colleagues about missing features in this language. If there is some gaps, then it is in outer environment - IDE, deployment, testing,* SELECT .. INTO vs. TOO_MANY_ROWS - can be implemented as extra check
* SELECT .. INTO and the number of result columns - good extra check too
* EXECUTE and FOUND - this is incompatible change, extra check can be used (test on unset variable). I see solution in leaving FOUND variable and introduction of some new without this issue - ROW_COUNT maybe (this is another possible incompatible change, but with higher benefit - maybe we can introduce some aliasing, PRAGMA clause, default PRAGMAs, ..).
* SELECT .. INTO and := - looks bizarre, but I see clean benefit and I can accept it
* The OUT namespace and OUT parameter visibility - I don't like it - not in this form - can we introduce some form of namespace aliasing? The arguments are in function name named namespace already.
2016-12-27 8:54 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Some points are well and can be benefit for PlpgSQL.First I describe my initial position. I am strongly against introduction "new" language - plpgsql2 or new plpgsql, or any else. The trust of developers to us is important and introduction of any not compatible or different feature has to have really big reason. PostgreSQL is conservative environment, and PLpgSQL should not be a exception. More - I have not any information from my customers, colleagues about missing features in this language. If there is some gaps, then it is in outer environment - IDE, deployment, testing,
I forgot - the big plpgsql issue are too weak expressions on left part of assignment statements.
Pavel
On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hi > > I reread ideas described on page https://github.com/trustly/plpgsql2 > > Some points are well and can be benefit for PlpgSQL. > > First I describe my initial position. I am strongly against introduction > "new" language - plpgsql2 or new plpgsql, or any else. The trust of > developers to us is important and introduction of any not compatible or > different feature has to have really big reason. PostgreSQL is conservative > environment, and PLpgSQL should not be a exception. More - I have not any > information from my customers, colleagues about missing features in this > language. If there is some gaps, then it is in outer environment - IDE, > deployment, testing, Breaking language compatibility is a really big deal. There has to be a lot of benefits to the effort and you have to make translation from plpgsql1 to plpgsql2 really simple. You have made some good points on the rationale but not nearly enough to justify implementation fork. So basically I agree. Having said that, If you don't mind I'd like to run with the topic (which I'm loosely interpreting as, "Things I'd like to do in SQL/PLPGSQL and can't"). #1 problem with plpgsql in my point of view is that the language and grammar are not supersets of sql. A lot of PLPGSQL keywords (EXECUTE, BEGIN, INTO, END) have incompatible meanings with our SQL implementation. IMNSHO, SQL ought to give the same behavior inside or outside of plpgsql. It doesn't, and this is one of the reasons why plpgsql may not be a good candidate for stored procedure implementation. #2 problem with plpgsql is after function entry it's too late to do things like set transaction isolation level and change certain kinds of variables (like statement_timeout). This is very obnoxious, I can't wrap the database in an API 100%; the application has to manage things that really should be controlled in SQL. #3 problem with plpgsql is complete lack of inlining. inlining function calls in postgres is a black art even for very trivial cases. This makes it hard for us to write quick things and in the worst case causes endless duplications of simple expressions. In short I guess the issue is that we don't have stored procedures and I don't see an easy path to getting there with the current language. There are a lot of other little annoyances but most of them can be solved without a compatibility break. It would be pretty neat if postgres SQL implementation could directly incorporate limited flow control and command execution. For example, CREATE my_proc(Done OUT BOOL) RETURNS BOOL AS $$ BEGIN; SET transaction_isolation = 'serializable'; SELECT some_plpgsql_func_returning_bool(); COMMIT; $$; CALL my_proc() UNTIL Done; Key points here are: *) my_proc is in native SQL (not plpgsql), and run outside of snapshot *) CALL is invocation into stored procedure. I extended it in similar fashion as pl/sql CALL (https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_4008.htm) but anything will do for syntaxs as long as you get arbitrary control of procedure lifetime external to snapshot and transaction *) simple addition of UNTIL gets us out of the debate for best 'stored procedure language'. Keeping things to pure SQL really simplifies things since we already have statement parsing at tcop level. We just need some special handling for CALL. *) In my usage of plpgsql maybe 80% of database cases are covered purely in language but maybe 20% of cases need support from application typically where threading and transaction management is involved. With the above it would be more like 95% would be covered and if you extended CALL to something like: CALL my_proc() IN BACKGROUND UNTIL Done; ..where "IN BACKGOUND" moved execution to a background worker one could do just about everything in SQL in tasks that do nothing but read and write to the database that today need significant support from outside language (primarily bash for me). With respect to stuff you mentioned, like smarter handling of INTO, are you really sure you need to break compatibility for that? merlin
2016-12-27 23:56 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>:
On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> I reread ideas described on page https://github.com/trustly/plpgsql2
>
> Some points are well and can be benefit for PlpgSQL.
>
> First I describe my initial position. I am strongly against introduction
> "new" language - plpgsql2 or new plpgsql, or any else. The trust of
> developers to us is important and introduction of any not compatible or
> different feature has to have really big reason. PostgreSQL is conservative
> environment, and PLpgSQL should not be a exception. More - I have not any
> information from my customers, colleagues about missing features in this
> language. If there is some gaps, then it is in outer environment - IDE,
> deployment, testing,
Breaking language compatibility is a really big deal. There has to be
a lot of benefits to the effort and you have to make translation from
plpgsql1 to plpgsql2 really simple. You have made some good points on
the rationale but not nearly enough to justify implementation fork. So
basically I agree. Having said that, If you don't mind I'd like to
run with the topic (which I'm loosely interpreting as, "Things I'd
like to do in SQL/PLPGSQL and can't").
#1 problem with plpgsql in my point of view is that the language and
grammar are not supersets of sql. A lot of PLPGSQL keywords (EXECUTE,
BEGIN, INTO, END) have incompatible meanings with our SQL
implementation. IMNSHO, SQL ought to give the same behavior inside or
outside of plpgsql. It doesn't, and this is one of the reasons why
plpgsql may not be a good candidate for stored procedure
implementation.
There is little bit cleaner language for this purpose - SQL/PSM. But it is hard to switch main language without big lost of reputation. I am not sure about benefit.
#2 problem with plpgsql is after function entry it's too late to do
things like set transaction isolation level and change certain kinds
of variables (like statement_timeout). This is very obnoxious, I
can't wrap the database in an API 100%; the application has to manage
things that really should be controlled in SQL.
It is long story about implementation procedures - it is not related to PLpgSQL - the language is not a issue.
#3 problem with plpgsql is complete lack of inlining. inlining
function calls in postgres is a black art even for very trivial cases.
This makes it hard for us to write quick things and in the worst case
causes endless duplications of simple expressions.
In short I guess the issue is that we don't have stored procedures and
I don't see an easy path to getting there with the current language.
There are a lot of other little annoyances but most of them can be
solved without a compatibility break.
I don't think so implementation of procedures will be simple, but I don't see any issue in PLpgSQL.
It would be pretty neat if postgres SQL implementation could directly
incorporate limited flow control and command execution. For example,
CREATE my_proc(Done OUT BOOL) RETURNS BOOL AS
$$
BEGIN;
SET transaction_isolation = 'serializable';
SELECT some_plpgsql_func_returning_bool();
COMMIT;
$$;
CALL my_proc() UNTIL Done;
Key points here are:
*) my_proc is in native SQL (not plpgsql), and run outside of snapshot
*) CALL is invocation into stored procedure. I extended it in similar
fashion as pl/sql CALL
(https://docs.oracle.com/cd/B19306_01/server.102/b14200/ statements_4008.htm)
but anything will do for syntaxs as long as you get arbitrary control
of procedure lifetime external to snapshot and transaction
*) simple addition of UNTIL gets us out of the debate for best 'stored
procedure language'. Keeping things to pure SQL really simplifies
things since we already have statement parsing at tcop level. We just
need some special handling for CALL.
*) In my usage of plpgsql maybe 80% of database cases are covered
purely in language but maybe 20% of cases need support from
application typically where threading and transaction management is
involved. With the above it would be more like 95% would be covered
and if you extended CALL to something like:
It is similar to my older proposals of stored procedures.
CALL my_proc() IN BACKGROUND UNTIL Done;
..where "IN BACKGOUND" moved execution to a background worker one
could do just about everything in SQL in tasks that do nothing but
read and write to the database that today need significant support
from outside language (primarily bash for me).
With respect to stuff you mentioned, like smarter handling of INTO,
are you really sure you need to break compatibility for that?
I didn't propose any compatibility break.
Can we talk about another proposals separately, please. Stored procedures, batch processing, different language are different topic.
Regards
Pavel
merlin
On 12/27/16 4:56 PM, Merlin Moncure wrote: > On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> First I describe my initial position. I am strongly against introduction >> "new" language - plpgsql2 or new plpgsql, or any else. The trust of >> developers to us is important and introduction of any not compatible or >> different feature has to have really big reason. PostgreSQL is conservative >> environment, and PLpgSQL should not be a exception. More - I have not any Which is why this is an external fork of plpgsql. ** The real problem is that we have no mechanism for allowing a PL's language/syntax/API to move forward without massive backwards compatibility problems. ** This is NOT unique to plpgsql. plpython (for one) definitely has some stupidity that will require an API break to fix. A secondary issue is the lack of a blessed collection of extensions. If we had that we could maintain some of this stuff outside of the core release schedule, as well as provide more room for people to run experimental versions of extensions if they desired. If we had this then perhaps plpgsql_check would become a viable answer to some of this (though IMHO plpgsql_check is just a work-around for our lack of dealing with API compatibility). >> information from my customers, colleagues about missing features in this >> language. If there is some gaps, then it is in outer environment - IDE, >> deployment, testing, I'm honestly surprised (even shocked) that you've never run into any of the problems plpgsql2 is trying to solve. I've hit all those problems except for OUT parameters. I'd say the order they're listed in actually corresponds to how often I hit the problems. > Breaking language compatibility is a really big deal. There has to be > a lot of benefits to the effort and you have to make translation from > plpgsql1 to plpgsql2 really simple. You have made some good points on I think trying to move the ball forward in a meaningful way without breaking compatibility is a lost cause. Some of these issues could be addressed by adding more syntax, but even that has limits (do we really want another variation of STRICT that allows only 0 or 1 rows?). And there's no way to fix your #1 item below without breaking compatibility. There *are* other ways this could be done, besides creating a different PL. One immediate possibility is custom GUCs; there may be other options. > #1 problem with plpgsql in my point of view is that the language and > grammar are not supersets of sql. A lot of PLPGSQL keywords (EXECUTE, > BEGIN, INTO, END) have incompatible meanings with our SQL > implementation. IMNSHO, SQL ought to give the same behavior inside or > outside of plpgsql. It doesn't, and this is one of the reasons why > plpgsql may not be a good candidate for stored procedure > implementation. While this doesn't bug me, it's got to be confusing as hell for newbies. > #2 problem with plpgsql is after function entry it's too late to do > things like set transaction isolation level and change certain kinds > of variables (like statement_timeout). This is very obnoxious, I > can't wrap the database in an API 100%; the application has to manage > things that really should be controlled in SQL. +1 > #3 problem with plpgsql is complete lack of inlining. inlining > function calls in postgres is a black art even for very trivial cases. > This makes it hard for us to write quick things and in the worst case > causes endless duplications of simple expressions. Instead of banging our heads against the fmgr API to try and solve this, I suspect it would be much simpler (and easier to understand) if we had the equivalent to a #define for queries. The fmgr API just isn't amenable to trying to inline stuff. This would allow you to define things like views that accept arguments, so you can shove the argument way down in the guts of the query without getting tripped up by fences. Here's some other plpgsql pain-points (though, not all of these require an API break): #4: it's impossible to operate on a Datum in-place. Though, maybe the work Tom did with ExpandedObjects eliminates some of this problem, but if it does it's hidden behind the existing syntax and you have no way to know it (and AFAICT the only thing using that infrastructure right now is arrays). Aside from the performance aspects, it'd be damn nice to be able to do things like ++, +=, etc. #5: handling of parameter name collisions still sucks. One way to improve this would be to put parameters inside the outer-most statement block, so you could use a block name that was different from the function name. Something else that might help is the ability to assign a namespace for query identifiers, so you don't have to alias every individual relation in a query. #6: The variations of syntax between the FOR variants is annoying (specifically, FOREACH necessitating the ARRAY keyword). #7: = vs := vs INTO. = can do everything the others can do except for STRICT, and when it comes to STRICT I actually wish we had language support for whether 0 or >1 rows are allowed. I've wanted that in the past for views, and if we had that then you'd be able to use it in SQL functions as well. If that's not possible then we should fid some other way to handle this in plpgsql, because STRICT is often too broad. #8: EVERYTHING command option should accept a variable. In particular, RAISE should accept a variable for level, but there's other cases of this I've run into. I'd also be nice if you could plop variables into SQL commands where you'd have an identifier, though presumably that would require some kind of explicit variable identifier. #9: You should be able to modify an exception before re-raising it. #10: Exception information should be passed around as a composite. #11: Composite support is very lacking. There's no easy way to get a list of what fields exist in a composite, let alone do something generic to some set of them. There are ways to work around this, but they're very tedious and ugly. #12: It'd be nice if any was allowed, as there are operations that can apply to more than one class of data type. #13: cstring support would allow a lot more people to experiment with things like custom types. Yes, plpgsql might be slow as hell for this, but sometimes that doesn't matter. Even if it does, it can be a lot easier to prototype in something other than C. (Granted, I think there's some non-plpgsql stuff that would need to happen to allow this.) > In short I guess the issue is that we don't have stored procedures and > I don't see an easy path to getting there with the current language. > There are a lot of other little annoyances but most of them can be > solved without a compatibility break. Well, actual stored *procedures* is an entirely different problem, which (again) fmgr is absolutely not designed to handle. All the PL handlers that I've looked at have completely in-grained the notion that they're running inside a transaction, so it would be a lot of work to try and change that. While there may be some plpgsql-specific problems with it supporting stored procs, there are much bigger questions to answer before worrying about that. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
2016-12-28 5:09 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
+1
On 12/27/16 4:56 PM, Merlin Moncure wrote:On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:First I describe my initial position. I am strongly against introduction
"new" language - plpgsql2 or new plpgsql, or any else. The trust of
developers to us is important and introduction of any not compatible or
different feature has to have really big reason. PostgreSQL is conservative
environment, and PLpgSQL should not be a exception. More - I have not any
Which is why this is an external fork of plpgsql.
ok. Just I would not to repeat Perl6 or Python3 story - it is big adventure, but big fail too
** The real problem is that we have no mechanism for allowing a PL's language/syntax/API to move forward without massive backwards compatibility problems. **
We have not, but there are few possibilities:
1. enhance #option command
2. we can introduce PRAGMA command https://en.wikipedia.org/wiki/Ada_(programming_language)#Pragmas
This is NOT unique to plpgsql. plpython (for one) definitely has some stupidity that will require an API break to fix.
A secondary issue is the lack of a blessed collection of extensions. If we had that we could maintain some of this stuff outside of the core release schedule, as well as provide more room for people to run experimental versions of extensions if they desired. If we had this then perhaps plpgsql_check would become a viable answer to some of this (though IMHO plpgsql_check is just a work-around for our lack of dealing with API compatibility).
plpgsql_check can do some test, that are impossible in plpgsql - from performance view, from features. But some "blessed collections of extension" can be nice. More if will be joined with some automatic test and build tools. Although lot of extensions are really mature, the knowleadge about these extensions are minimal - and building extensions on windows is hard work still (for Linux developer).
information from my customers, colleagues about missing features in this
language. If there is some gaps, then it is in outer environment - IDE,
deployment, testing,
I'm honestly surprised (even shocked) that you've never run into any of the problems plpgsql2 is trying to solve. I've hit all those problems except for OUT parameters. I'd say the order they're listed in actually corresponds to how often I hit the problems.
I hit lot of older harder (now solved) issues - now, with more experience I am able to see these issues. And I wrote plpgsql_check, partially for self too. Years ago I prefer safe expressions.
Breaking language compatibility is a really big deal. There has to be
a lot of benefits to the effort and you have to make translation from
plpgsql1 to plpgsql2 really simple. You have made some good points on
I think trying to move the ball forward in a meaningful way without breaking compatibility is a lost cause. Some of these issues could be addressed by adding more syntax, but even that has limits (do we really want another variation of STRICT that allows only 0 or 1 rows?). And there's no way to fix your #1 item below without breaking compatibility.
I think so there is way with extra check, or with persistent plpgsql options - just use it, please. Some checks are clear, some other not.
There *are* other ways this could be done, besides creating a different PL. One immediate possibility is custom GUCs; there may be other options.#1 problem with plpgsql in my point of view is that the language and
grammar are not supersets of sql. A lot of PLPGSQL keywords (EXECUTE,
BEGIN, INTO, END) have incompatible meanings with our SQL
implementation. IMNSHO, SQL ought to give the same behavior inside or
outside of plpgsql. It doesn't, and this is one of the reasons why
plpgsql may not be a good candidate for stored procedure
implementation.
While this doesn't bug me, it's got to be confusing as hell for newbies.
If you know ALGOL family languages, then it is not problem. What is a harder problem for people is different implementation of mix SQL and PL - different than Oracle, or MSSQL. Our model is better, simpler but different. It is difficult for people without knowleadge of differences between functions and procedures. Partially we badly speaking so our void functions are procedures.
#2 problem with plpgsql is after function entry it's too late to do
things like set transaction isolation level and change certain kinds
of variables (like statement_timeout). This is very obnoxious, I
can't wrap the database in an API 100%; the application has to manage
things that really should be controlled in SQL.
+1#3 problem with plpgsql is complete lack of inlining. inlining
function calls in postgres is a black art even for very trivial cases.
This makes it hard for us to write quick things and in the worst case
causes endless duplications of simple expressions.
Instead of banging our heads against the fmgr API to try and solve this, I suspect it would be much simpler (and easier to understand) if we had the equivalent to a #define for queries. The fmgr API just isn't amenable to trying to inline stuff. This would allow you to define things like views that accept arguments, so you can shove the argument way down in the guts of the query without getting tripped up by fences.
Here's some other plpgsql pain-points (though, not all of these require an API break):
#4: it's impossible to operate on a Datum in-place. Though, maybe the work Tom did with ExpandedObjects eliminates some of this problem, but if it does it's hidden behind the existing syntax and you have no way to know it (and AFAICT the only thing using that infrastructure right now is arrays). Aside from the performance aspects, it'd be damn nice to be able to do things like ++, +=, etc.
#5: handling of parameter name collisions still sucks. One way to improve this would be to put parameters inside the outer-most statement block, so you could use a block name that was different from the function name. Something else that might help is the ability to assign a namespace for query identifiers, so you don't have to alias every individual relation in a query.
#6: The variations of syntax between the FOR variants is annoying (specifically, FOREACH necessitating the ARRAY keyword).
this is design - FOR is old PL/SQL syntax. FOREACH is prepared for extending
#7: = vs := vs INTO. = can do everything the others can do except for STRICT, and when it comes to STRICT I actually wish we had language support for whether 0 or >1 rows are allowed. I've wanted that in the past for views, and if we had that then you'd be able to use it in SQL functions as well. If that's not possible then we should fid some other way to handle this in plpgsql, because STRICT is often too broad.
#8: EVERYTHING command option should accept a variable. In particular, RAISE should accept a variable for level, but there's other cases of this I've run into. I'd also be nice if you could plop variables into SQL commands where you'd have an identifier, though presumably that would require some kind of explicit variable identifier.
It is hiding dynamic SQL - I am strongly against it - minimally due performance issues. Important functionality should not be hidden.
#9: You should be able to modify an exception before re-raising it.
#10: Exception information should be passed around as a composite
I have not any problem with last two points
.
#11: Composite support is very lacking. There's no easy way to get a list of what fields exist in a composite, let alone do something generic to some set of them. There are ways to work around this, but they're very tedious and ugly.
+1
#12: It'd be nice if any was allowed, as there are operations that can apply to more than one class of data type.
#13: cstring support would allow a lot more people to experiment with things like custom types. Yes, plpgsql might be slow as hell for this, but sometimes that doesn't matter. Even if it does, it can be a lot easier to prototype in something other than C. (Granted, I think there's some non-plpgsql stuff that would need to happen to allow this.)
Not sure about it (I have really realy wrong experience with some developers about performance) - but PLPython, PLPerl can do it well, and I miss some possibility - We can use transformations more time - SQL/MM is based on new datatypes and transformations.
In short I guess the issue is that we don't have stored procedures and
I don't see an easy path to getting there with the current language.
There are a lot of other little annoyances but most of them can be
solved without a compatibility break.
Well, actual stored *procedures* is an entirely different problem, which (again) fmgr is absolutely not designed to handle. All the PL handlers that I've looked at have completely in-grained the notion that they're running inside a transaction, so it would be a lot of work to try and change that. While there may be some plpgsql-specific problems with it supporting stored procs, there are much bigger questions to answer before worrying about that.
yes. The design of transaction controlling inside stored procedures is hard work not related to any PL. Some can be partially solved by functions executed in autonomous transactions. With background workers we can implement asynchronous autonomous transactions - what can coverage lot of use cases where transaction controlling should be used in other databases.
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
On 12/28/16 7:16 AM, Pavel Stehule wrote: > ** The real problem is that we have no mechanism for allowing a PL's > language/syntax/API to move forward without massive backwards > compatibility problems. ** > > > We have not, but there are few possibilities: > > 1. enhance #option command > 2. we can introduce PRAGMA command > https://en.wikipedia.org/wiki/Ada_(programming_language)#Pragmas > <https://en.wikipedia.org/wiki/Ada_%28programming_language%29#Pragmas> I wanted to break this out separately, because IMO it's the real heart of the matter. I think it would be silly not to allow a global setting of compatibility. You certainly don't want to force people to stick magic keywords in their code forevermore. To that end, would GUCs be a workable answer here? That should give you the ability to control incompatibilities at a function, user, database and global level. It would also allow you to chose between raising a WARNING vs a FATAL. I realize we've had some bad experiences with compatibility GUCs in the past, but I'd argue we've also had some good experiences. I see that add_missing_from is now completely gone, for example, presumably with no complaints. There's probably several other compatibility GUCs we could remove now. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 12/28/16 7:16 AM, Pavel Stehule wrote: > > > 2016-12-28 5:09 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com > <mailto:Jim.Nasby@bluetreble.com>>: > > On 12/27/16 4:56 PM, Merlin Moncure wrote: > > On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule > <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>> wrote: > Which is why this is an external fork of plpgsql. > > > ok. Just I would not to repeat Perl6 or Python3 story - it is big > adventure, but big fail too Yeah, creating an entirely "new" PL to deal with compatibility doesn't seem like a good idea to me. > ** The real problem is that we have no mechanism for allowing a PL's > language/syntax/API to move forward without massive backwards > compatibility problems. ** > > > We have not, but there are few possibilities: > > 1. enhance #option command > 2. we can introduce PRAGMA command > https://en.wikipedia.org/wiki/Ada_(programming_language)#Pragmas See separate reply. <discussion about "blessed extensions"> > I'm honestly surprised (even shocked) that you've never run into any > of the problems plpgsql2 is trying to solve. I've hit all those > problems except for OUT parameters. I'd say the order they're listed > in actually corresponds to how often I hit the problems. > > > I hit lot of older harder (now solved) issues - now, with more > experience I am able to see these issues. And I wrote plpgsql_check, > partially for self too. Years ago I prefer safe expressions. Recognizing a problem ahead of time (or having plpgsql_check do it for you) still means you have to find a way to work around it. In some cases (ie: STRICT), that workaround can be a serious PITA. Better to just eliminate the problem itself. > I think trying to move the ball forward in a meaningful way without > breaking compatibility is a lost cause. Some of these issues could > be addressed by adding more syntax, but even that has limits (do we > really want another variation of STRICT that allows only 0 or 1 > rows?). And there's no way to fix your #1 item below without > breaking compatibility. > > > I think so there is way with extra check, or with persistent plpgsql > options - just use it, please. Some checks are clear, some other not. I will assert that there will ALWAYS be problems that you can't plaster over with some kind of extra checking (like plpgsql_check). At some point, in order to fix those, you have to somehow break compatibility. Look at libpq as an example. There's a reason we're on protocol V3. > If you know ALGOL family languages, then it is not problem. What is a Lets be realistic... what % of our users have even heard of ALGOL, let alone used it? :) > harder problem for people is different implementation of mix SQL and PL > - different than Oracle, or MSSQL. Our model is better, simpler but > different. It is difficult for people without knowleadge of differences > between functions and procedures. Partially we badly speaking so our > void functions are procedures. I suspect that's only confusing for people coming from Oracle (which of course is a non-trivial number of people). > #6: The variations of syntax between the FOR variants is annoying > (specifically, FOREACH necessitating the ARRAY keyword). > > > this is design - FOR is old PL/SQL syntax. FOREACH is prepared for > extending Understood. It still sucks though. :) > #8: EVERYTHING command option should accept a variable. In > particular, RAISE should accept a variable for level, but there's > other cases of this I've run into. I'd also be nice if you could > plop variables into SQL commands where you'd have an identifier, > though presumably that would require some kind of explicit variable > identifier. > > > It is hiding dynamic SQL - I am strongly against it - minimally due > performance issues. Important functionality should not be hidden. There's definitely ways around the performance issue. I do agree that it needs to be clear when you're doing something dynamic so it's not accidental. One way to do that would be to add support for variable decorators and mandate the use of decorators when using a variable for an identifier. That said, *every* option to RAISE can be a variable except the level. That's just plain silly and should be fixed. > #13: cstring support would allow a lot more people to experiment > with things like custom types. Yes, plpgsql might be slow as hell > for this, but sometimes that doesn't matter. Even if it does, it can > be a lot easier to prototype in something other than C. (Granted, I > think there's some non-plpgsql stuff that would need to happen to > allow this.) > > > Not sure about it (I have really realy wrong experience with some > developers about performance) - but PLPython, PLPerl can do it well, and > I miss some possibility - We can use transformations more time - SQL/MM > is based on new datatypes and transformations. Well, there's probably some other things that could be done to make plpgsql perform better in this regard. One thing I've wondered about is allowing array-like access to a plain string (or in this case, cstring). That would allow you to write code that would translate much better into fast C code. IE: if you needed to scan through an entire string you could do something like for (i=0; i<strlen(); i++). > yes. The design of transaction controlling inside stored procedures is > hard work not related to any PL. Some can be partially solved by > functions executed in autonomous transactions. With background workers > we can implement asynchronous autonomous transactions - what can > coverage lot of use cases where transaction controlling should be used > in other databases. Well, those are all really hacks around a fundamental problem of allowing user-defined, pre-stored code to execute outside of a transaction. I don't think solving that is necessarily *that* hard (though certainly non-trivial), but the fmgr interface is certainly NOT the right way to go about that. In any case, that's a completely different discussion. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
2016-12-28 18:54 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 12/28/16 7:16 AM, Pavel Stehule wrote:** The real problem is that we have no mechanism for allowing a PL's
language/syntax/API to move forward without massive backwards
compatibility problems. **
We have not, but there are few possibilities:
1. enhance #option command
2. we can introduce PRAGMA command
https://en.wikipedia.org/wiki/Ada_(programming_language)#Pra gmas
<https://en.wikipedia.org/wiki/Ada_%28programming_language% 29#Pragmas>
I wanted to break this out separately, because IMO it's the real heart of the matter.
I think it would be silly not to allow a global setting of compatibility. You certainly don't want to force people to stick magic keywords in their code forevermore.
To that end, would GUCs be a workable answer here? That should give you the ability to control incompatibilities at a function, user, database and global level. It would also allow you to chose between raising a WARNING vs a FATAL.
GUC are fragile - the source code and settings can be separated.
Our #option is more robust, because source code holds all flags required for execution. So I would to see a mechanism, that will be strongly joined with code.
Using function assigned GUC is similar, but it is looking less robust - and some editors can forgot this information.
Lot of issues we can solved by plpgsq.extra_error, extra_warnings - but probably not all - for example issue of FOUND variable or introducing new auto variable ROW_COUNT. PLpgSQL - PL/SQL is safe - it propose the statement GET DIAGNOSTICS, but I understand so isn't funny to write more and more GET DIAGNOSTICS rc = ROW_COUNT; So some shortcuts can be nice, but there is risk, so this shortcut breaks existing code, and the costs/benefits are individual. There cannot be 100% agreement ever. So some customisation should be good.
I realize we've had some bad experiences with compatibility GUCs in the past, but I'd argue we've also had some good experiences. I see that add_missing_from is now completely gone, for example, presumably with no complaints. There's probably several other compatibility GUCs we could remove now.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
On 12/28/16 12:15 PM, Pavel Stehule wrote: > GUC are fragile - the source code and settings can be separated. *Can* be, but they don't *have* to be. That's a huge feature, not a bug. > Our #option is more robust, because source code holds all flags required > for execution. So I would to see a mechanism, that will be strongly > joined with code. That means you must ALWAYS specify, which is an enormous pain. It basically guarantees that users will NEVER switch to the new syntax. > Using function assigned GUC is similar, but it is looking less robust - > and some editors can forgot this information. If you forget then you get an error. Then you remember. > Lot of issues we can solved by plpgsq.extra_error, extra_warnings - but > probably not all - for example issue of FOUND variable or introducing > new auto variable ROW_COUNT. PLpgSQL - PL/SQL is safe - it propose the > statement GET DIAGNOSTICS, but I understand so isn't funny to write more > and more GET DIAGNOSTICS rc = ROW_COUNT; So some shortcuts can be nice, > but there is risk, so this shortcut breaks existing code, and the > costs/benefits are individual. There cannot be 100% agreement ever. So > some customisation should be good. That's the whole point of having settings to deal with incompatibilities: so we can actually fix these warts without breaking everyone's code, yet also make it clear to users that they should stop using the warts and instead use the new and improved syntax. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
2016-12-28 19:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 12/28/16 12:15 PM, Pavel Stehule wrote:GUC are fragile - the source code and settings can be separated.
*Can* be, but they don't *have* to be. That's a huge feature, not a bug.Our #option is more robust, because source code holds all flags required
for execution. So I would to see a mechanism, that will be strongly
joined with code.
That means you must ALWAYS specify, which is an enormous pain. It basically guarantees that users will NEVER switch to the new syntax.Using function assigned GUC is similar, but it is looking less robust -
and some editors can forgot this information.
If you forget then you get an error. Then you remember.Lot of issues we can solved by plpgsq.extra_error, extra_warnings - but
probably not all - for example issue of FOUND variable or introducing
new auto variable ROW_COUNT. PLpgSQL - PL/SQL is safe - it propose the
statement GET DIAGNOSTICS, but I understand so isn't funny to write more
and more GET DIAGNOSTICS rc = ROW_COUNT; So some shortcuts can be nice,
but there is risk, so this shortcut breaks existing code, and the
costs/benefits are individual. There cannot be 100% agreement ever. So
some customisation should be good.
That's the whole point of having settings to deal with incompatibilities: so we can actually fix these warts without breaking everyone's code, yet also make it clear to users that they should stop using the warts and instead use the new and improved syntax.
Now, the incompatibility can be hard issue - it is big question if we lock some users on old versions because some users can save to lines of code. Introduction of ROW_COUNT is lowly incompatibility - it can be simply detected - but for example change of behave of FOUND variable is terrible, because the code will be quietly calculate differently. sometimes we can break code - probably people will not be happy, but sometimes we can change the results - it can be big fail. So on one side is big costs. On second side is few lines less code.
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
2016-12-28 18:54 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 12/28/16 7:16 AM, Pavel Stehule wrote:
2016-12-28 5:09 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com
<mailto:Jim.Nasby@bluetreble.com>>:
On 12/27/16 4:56 PM, Merlin Moncure wrote:
On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule
<pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>> wrote:
Which is why this is an external fork of plpgsql.
ok. Just I would not to repeat Perl6 or Python3 story - it is big
adventure, but big fail too
Yeah, creating an entirely "new" PL to deal with compatibility doesn't seem like a good idea to me.** The real problem is that we have no mechanism for allowing a PL's
language/syntax/API to move forward without massive backwards
compatibility problems. **
We have not, but there are few possibilities:
1. enhance #option command
2. we can introduce PRAGMA command
https://en.wikipedia.org/wiki/Ada_(programming_language)#Pra gmas
See separate reply.
<discussion about "blessed extensions">I'm honestly surprised (even shocked) that you've never run into any
of the problems plpgsql2 is trying to solve. I've hit all those
problems except for OUT parameters. I'd say the order they're listed
in actually corresponds to how often I hit the problems.
I hit lot of older harder (now solved) issues - now, with more
experience I am able to see these issues. And I wrote plpgsql_check,
partially for self too. Years ago I prefer safe expressions.
Recognizing a problem ahead of time (or having plpgsql_check do it for you) still means you have to find a way to work around it. In some cases (ie: STRICT), that workaround can be a serious PITA. Better to just eliminate the problem itself.I think trying to move the ball forward in a meaningful way without
breaking compatibility is a lost cause. Some of these issues could
be addressed by adding more syntax, but even that has limits (do we
really want another variation of STRICT that allows only 0 or 1
rows?). And there's no way to fix your #1 item below without
breaking compatibility.
I think so there is way with extra check, or with persistent plpgsql
options - just use it, please. Some checks are clear, some other not.
I will assert that there will ALWAYS be problems that you can't plaster over with some kind of extra checking (like plpgsql_check). At some point, in order to fix those, you have to somehow break compatibility.
Look at libpq as an example. There's a reason we're on protocol V3.If you know ALGOL family languages, then it is not problem. What is a
Lets be realistic... what % of our users have even heard of ALGOL, let alone used it? :)
not too much - but the problem is not in BEGIN, END. I wrote PL/PSM where BEGIN END doesn't exists. The functionality was same as PLpgSQL - and there was not anybody who use it.
harder problem for people is different implementation of mix SQL and PL
- different than Oracle, or MSSQL. Our model is better, simpler but
different. It is difficult for people without knowleadge of differences
between functions and procedures. Partially we badly speaking so our
void functions are procedures.
I suspect that's only confusing for people coming from Oracle (which of course is a non-trivial number of people).#6: The variations of syntax between the FOR variants is annoying
(specifically, FOREACH necessitating the ARRAY keyword).
this is design - FOR is old PL/SQL syntax. FOREACH is prepared for
extending
Understood. It still sucks though. :)#8: EVERYTHING command option should accept a variable. In
particular, RAISE should accept a variable for level, but there's
other cases of this I've run into. I'd also be nice if you could
plop variables into SQL commands where you'd have an identifier,
though presumably that would require some kind of explicit variable
identifier.
It is hiding dynamic SQL - I am strongly against it - minimally due
performance issues. Important functionality should not be hidden.
There's definitely ways around the performance issue. I do agree that it needs to be clear when you're doing something dynamic so it's not accidental. One way to do that would be to add support for variable decorators and mandate the use of decorators when using a variable for an identifier.
That said, *every* option to RAISE can be a variable except the level. That's just plain silly and should be fixed.
I am sorry - I read it wrong - If there is not a parser issue, then it can be fixed simply.
#13: cstring support would allow a lot more people to experiment
with things like custom types. Yes, plpgsql might be slow as hell
for this, but sometimes that doesn't matter. Even if it does, it can
be a lot easier to prototype in something other than C. (Granted, I
think there's some non-plpgsql stuff that would need to happen to
allow this.)
Not sure about it (I have really realy wrong experience with some
developers about performance) - but PLPython, PLPerl can do it well, and
I miss some possibility - We can use transformations more time - SQL/MM
is based on new datatypes and transformations.
Well, there's probably some other things that could be done to make plpgsql perform better in this regard. One thing I've wondered about is allowing array-like access to a plain string (or in this case, cstring). That would allow you to write code that would translate much better into fast C code. IE: if you needed to scan through an entire string you could do something like for (i=0; i<strlen(); i++).
It is fixable - and faster string operations can be nice feature - Just I am not volunteer for this task :)
yes. The design of transaction controlling inside stored procedures is
hard work not related to any PL. Some can be partially solved by
functions executed in autonomous transactions. With background workers
we can implement asynchronous autonomous transactions - what can
coverage lot of use cases where transaction controlling should be used
in other databases.
Well, those are all really hacks around a fundamental problem of allowing user-defined, pre-stored code to execute outside of a transaction. I don't think solving that is necessarily *that* hard (though certainly non-trivial), but the fmgr interface is certainly NOT the right way to go about that. In any case, that's a completely different discussion.
This points needs real use cases. Our current design is limited, but it isn't bad - It is much better than Oracle or MSSQL design - we can talk what use cases we can solve. I am sure, so we can find good mapping for PLpgSQL for any design.
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
On 12/28/16 12:51 PM, Pavel Stehule wrote: > Now, the incompatibility can be hard issue - it is big question if we > lock some users on old versions because some users can save to lines of > code. Introduction of ROW_COUNT is lowly incompatibility - it can be > simply detected - but for example change of behave of FOUND variable is > terrible, because the code will be quietly calculate differently. > sometimes we can break code - probably people will not be happy, but > sometimes we can change the results - it can be big fail. So on one side > is big costs. On second side is few lines less code. That's my whole point of why this needs to be settable at a global level: so that people with a lot of legacy code can set the OLD behavior at a global level, and deal with the old code over time. If there's no global setting then there are only two choices: we default to new behavior and force everyone to add a bunch of stuff to *every* function they have (loads of complaints), or we default to old behavior and no one bothers to even adopt the new usage because they have to add extra stuff to every function. Either way is a failure. This is why I think there MUST be some way to control this at a higher level than per function. Certainly GUCs aren't the only option, we could invent something else. One feature I could see being useful is being able to set a default on a schema level, which isn't currently possible with a GUC. But I can certainly see database and global settings being useful, and perhaps per-user as well. GUCs already have those. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
2016-12-28 20:25 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 12/28/16 12:51 PM, Pavel Stehule wrote:Now, the incompatibility can be hard issue - it is big question if we
lock some users on old versions because some users can save to lines of
code. Introduction of ROW_COUNT is lowly incompatibility - it can be
simply detected - but for example change of behave of FOUND variable is
terrible, because the code will be quietly calculate differently.
sometimes we can break code - probably people will not be happy, but
sometimes we can change the results - it can be big fail. So on one side
is big costs. On second side is few lines less code.
That's my whole point of why this needs to be settable at a global level: so that people with a lot of legacy code can set the OLD behavior at a global level, and deal with the old code over time.
If there's no global setting then there are only two choices: we default to new behavior and force everyone to add a bunch of stuff to *every* function they have (loads of complaints), or we default to old behavior and no one bothers to even adopt the new usage because they have to add extra stuff to every function. Either way is a failure. This is why I think there MUST be some way to control this at a higher level than per function.
we can have both - plpgsql.variable_conflict can be precedent.
Certainly GUCs aren't the only option, we could invent something else. One feature I could see being useful is being able to set a default on a schema level, which isn't currently possible with a GUC. But I can certainly see database and global settings being useful, and perhaps per-user as well. GUCs already have those.
yes, without GUC you cannot set the behave of plpgsql globally.
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
Hi
I wrote some initial patch
Do you think so has sense to continue in this topic?
Regards
Pavel
Attachment
On 1/1/17 12:17 PM, Pavel Stehule wrote: > I wrote some initial patch > > Do you think so has sense to continue in this topic? Perhaps I'm not understanding what plpgsql_extra_errors does, but I don't think either of these should depend on that being true. IMO these two checks should be default to throwing an exception. I think instead of tying these to extra_*, each GUC should accept a LOG level. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
2017-01-02 18:36 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/1/17 12:17 PM, Pavel Stehule wrote:I wrote some initial patch
Do you think so has sense to continue in this topic?
Perhaps I'm not understanding what plpgsql_extra_errors does, but I don't think either of these should depend on that being true. IMO these two checks should be default to throwing an exception.
There are use cases where these patters should be used and has sense like
SELECT (polymorphiccomposite).* INTO c1, c2; -- take first two columns
SELECT xx FROM tab ORDER BY yy INTO target -- more rows not a issue
I understand plpgsql_extra_errors as feature that can be enabled on developer, test, or preprod environments and can help to identify some strange places.
I think instead of tying these to extra_*, each GUC should accept a LOG level.
Why? Why the none, warning, error are not enough? Why are you think so separate GUC can be better than plpgsql_extra_* ?
The fast setting plpgsql.extra_errors = 'all' can switch to some "safe" configuration.
The fast setting plpgsql.extra_warnings = 'all' can helps with identification, but doesn't break production (or doesn't breaks other tests)
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
On 1/2/17 12:06 PM, Pavel Stehule wrote: > SELECT (polymorphiccomposite).* INTO c1, c2; -- take first two columns > > SELECT xx FROM tab ORDER BY yy INTO target -- more rows not a issue > > I understand plpgsql_extra_errors as feature that can be enabled on > developer, test, or preprod environments and can help to identify some > strange places. Yes, but the two cases you mentioned above are the "strange" cases, and you should have to do something extra to allow those, not the other way around. > I think instead of tying these to extra_*, each GUC should accept a > LOG level. > > > Why? Why the none, warning, error are not enough? Why are you think so > separate GUC can be better than plpgsql_extra_* ? > > The fast setting plpgsql.extra_errors = 'all' can switch to some "safe" > configuration. > The fast setting plpgsql.extra_warnings = 'all' can helps with > identification, but doesn't break production (or doesn't breaks other tests) I see two problems with those settings: 1) Neither is enabled by default, so 90% of users have no idea they exist. Obviously that's an easy enough fix, but... 2) There's no way to incrementally change those values for a single function. If you've set extra_errors = 'all' globally, a single function can't say "turn off the too many rows setting for this function". BTW, while I can see value in being able to change these settings for an entire function, I think the recommended use should be to only change them for a specific statement. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
2017-01-02 20:16 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/2/17 12:06 PM, Pavel Stehule wrote:SELECT (polymorphiccomposite).* INTO c1, c2; -- take first two columns
SELECT xx FROM tab ORDER BY yy INTO target -- more rows not a issue
I understand plpgsql_extra_errors as feature that can be enabled on
developer, test, or preprod environments and can help to identify some
strange places.
Yes, but the two cases you mentioned above are the "strange" cases, and you should have to do something extra to allow those, not the other way around.
The second example is really strange. But the first example is used in composite types conversion - when you convert from base to extend type. This routine is used in plpgsql when you use a assignment statement
composite_var := another_composite_var
I think instead of tying these to extra_*, each GUC should accept a
LOG level.
Why? Why the none, warning, error are not enough? Why are you think so
separate GUC can be better than plpgsql_extra_* ?
The fast setting plpgsql.extra_errors = 'all' can switch to some "safe"
configuration.
The fast setting plpgsql.extra_warnings = 'all' can helps with
identification, but doesn't break production (or doesn't breaks other tests)
I see two problems with those settings:
1) Neither is enabled by default, so 90% of users have no idea they exist. Obviously that's an easy enough fix, but...
We can strongly talk about it - there can be a chapter in plpgsql doc. Now, the patterns and antipatterns are not officially documented.
2) There's no way to incrementally change those values for a single function. If you've set extra_errors = 'all' globally, a single function can't say "turn off the too many rows setting for this function".
We can enhance the GUC syntax like "all -too_many_rows,-xxx"
BTW, while I can see value in being able to change these settings for an entire function, I think the recommended use should be to only change them for a specific statement.
What you can do in plain assign statement
target := expression ?
My border is any compatibility break - and I would not to across it. First issue is probably harder
related to typo "select 1 x into c1,c2" and it can be detected by plpgsql_check.
Second issue is not a performance issue today (we read only 2 rows everytime) and it is hard how often it returns wrong result. This issue cannot be detected by plpgsql_check now.
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
On Tue, Dec 27, 2016 at 10:09 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > On 12/27/16 4:56 PM, Merlin Moncure wrote: >> >> On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule <pavel.stehule@gmail.com> >> wrote: >>> >>> First I describe my initial position. I am strongly against introduction >>> "new" language - plpgsql2 or new plpgsql, or any else. The trust of >>> developers to us is important and introduction of any not compatible or >>> different feature has to have really big reason. PostgreSQL is >>> conservative >>> environment, and PLpgSQL should not be a exception. More - I have not any > > > Which is why this is an external fork of plpgsql. > > ** The real problem is that we have no mechanism for allowing a PL's > language/syntax/API to move forward without massive backwards compatibility > problems. ** Just got back from break :-). Have some thoughts on this. Backwards compatibility is really a fundamental problem. There's really no solution to it other than to try and avoid using syntax to solve problems. It should be obvious to everyone that plgsql cannot withstand a compatibility break. Another language could be offered as an alternative in core (say, pl/psm or pl/v8), but pl/pgsql has to support old code. Some really out there features could maybe be redacted (in particular, using = for assignment), but not not much. But I guess we're stuck with the status quo. I think we ought to avoid language features that influence the behavior (performance is ok) of the code (and that includes throwing errors). That's a freight train headed towards javscript's 'strict' mode, which is thinly disguised language fork. #option and pragma type syntaxes are trying to cheat the language -- hardly anyone uses them and it's a tricky way to try and make the language into something other than it is. C does it right -- dubious code is raised as warnings and it's up to the end user to determine which warnings are interesting and likely to be an error. So, rather than hacking the language to control throwing and errors and such there should be some ability validate the function heavily and verify suspicious use of INTO or other dubious things (unused variables, masked assignments, etc). The validation output could even be a set returning function. So -1 to strict mode, unless we can make a case why this can't be done as part of checking/validation. Other random points: *) Another major pain point is swapping in the input variables for debugging purposes. Something that emits a script based on a set of arguments would be wonderful. *) Would also like to have a FINALLY block *) A mechanic to manually print out a stack trace for debugging purposes would be helpful. *) COPY not being able to accept arguments as variables (in particular the filename) is a major headache *) Upon error, we ought to print a few lines of context on either side of the error. Not very fun to try and figure out where the errors is when you are working off of 'syntax error near "FROM"' etc. This is a major problem throughout the platform IMO. *) Some user visible mechanic other than forcing SQL through EXECUTE to be able to control plan caching would be useful. merlin
2017-01-03 16:23 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>:
On Tue, Dec 27, 2016 at 10:09 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 12/27/16 4:56 PM, Merlin Moncure wrote:
>>
>> On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule <pavel.stehule@gmail.com>
>> wrote:
>>>
>>> First I describe my initial position. I am strongly against introduction
>>> "new" language - plpgsql2 or new plpgsql, or any else. The trust of
>>> developers to us is important and introduction of any not compatible or
>>> different feature has to have really big reason. PostgreSQL is
>>> conservative
>>> environment, and PLpgSQL should not be a exception. More - I have not any
>
>
> Which is why this is an external fork of plpgsql.
>
> ** The real problem is that we have no mechanism for allowing a PL's
> language/syntax/API to move forward without massive backwards compatibility
> problems. **
Just got back from break :-). Have some thoughts on this. Backwards
compatibility is really a fundamental problem. There's really no
solution to it other than to try and avoid using syntax to solve
problems. It should be obvious to everyone that plgsql cannot
withstand a compatibility break. Another language could be offered as
an alternative in core (say, pl/psm or pl/v8), but pl/pgsql has to
support old code. Some really out there features could maybe be
redacted (in particular, using = for assignment), but not not much.
But I guess we're stuck with the status quo.
I think we ought to avoid language features that influence the
behavior (performance is ok) of the code (and that includes throwing
errors). That's a freight train headed towards javscript's 'strict'
mode, which is thinly disguised language fork. #option and pragma
type syntaxes are trying to cheat the language -- hardly anyone uses
them and it's a tricky way to try and make the language into something
other than it is.
C does it right -- dubious code is raised as warnings and it's up to
the end user to determine which warnings are interesting and likely to
be an error. So, rather than hacking the language to control throwing
and errors and such there should be some ability validate the function
heavily and verify suspicious use of INTO or other dubious things
(unused variables, masked assignments, etc). The validation output
could even be a set returning function.
So -1 to strict mode, unless we can make a case why this can't be done
as part of checking/validation.
Can be plpgsq.extra_errors and plpgsql.extra_warnings solution?
I am thinking so there is a space for improvement (in extra_* usage)
Do you know plpgsql_check https://github.com/okbob/plpgsql_check ?
Other random points:
*) Another major pain point is swapping in the input variables for
debugging purposes. Something that emits a script based on a set of
arguments would be wonderful.
???
*) Would also like to have a FINALLY block
What you can do there?
*) A mechanic to manually print out a stack trace for debugging
purposes would be helpful.
I had plan to develop a extension for this purpose - easy printing stack, function parameters, and local variables. But I had a motivation to start it. It can be usable for any PL
*) COPY not being able to accept arguments as variables (in particular
the filename) is a major headache
There is a patch "COPY as function"
*) Upon error, we ought to print a few lines of context on either side
of the error. Not very fun to try and figure out where the errors is
when you are working off of 'syntax error near "FROM"' etc. This is a
major problem throughout the platform IMO.
Have not idea how to fix it
*) Some user visible mechanic other than forcing SQL through EXECUTE
to be able to control plan caching would be useful.
fully agree.
Have you some ideas?
What about plpgsql option (function scope) -- WITHOUT-PLAN-CACHE - any non trivial plans will not be cached - and evaluated as parametrized query only.
Regards
Pavel
merlin
On 1/2/17 1:51 PM, Pavel Stehule wrote: > 1) Neither is enabled by default, so 90% of users have no idea they > exist. Obviously that's an easy enough fix, but... > > We can strongly talk about it - there can be a chapter in plpgsql doc. > Now, the patterns and antipatterns are not officially documented. Or just fix the issue, provide the backwards compatability GUCs and move on. > 2) There's no way to incrementally change those values for a single > function. If you've set extra_errors = 'all' globally, a single > function can't say "turn off the too many rows setting for this > function". > > > We can enhance the GUC syntax like "all -too_many_rows,-xxx" Why create all that framework when we could just have multiple plpgsql.blah GUCs? plpgsql.multirow_assign_level=FATAL solves that problem. We just need a plpgsql GUC for each backwards compatibility break. > BTW, while I can see value in being able to change these settings > for an entire function, I think the recommended use should be to > only change them for a specific statement. > > > What you can do in plain assign statement > > target := expression ? The point I was trying to make there is if you do have some cases where you need to silently ignore extra rows (for example) it's probably only one statement and not an entire function. That said, if we just make these options GUCs then you can just do SET and RESET. > My border is any compatibility break - and I would not to across it. > First issue is probably harder If we never broke compatibility we'd still be allowing SELECT without FROM, NULL = NULL being TRUE, and a whole bunch of other problems. We'd also be stuck on protocol v1 (and of course not talking about what we want in v4). We've successfully made incompatible changes that were *far worse* than this (ie: renaming pg_stat_activity.procpid). Obviously we shouldn't be breaking things willy-nilly, but these are long-standing warts (dare I say BUGS?) that should be fixed. They're ugly enough that someone took the time to break plpgsql out of the core code and fork it. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
2017-01-03 17:57 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/2/17 1:51 PM, Pavel Stehule wrote:1) Neither is enabled by default, so 90% of users have no idea they
exist. Obviously that's an easy enough fix, but...
We can strongly talk about it - there can be a chapter in plpgsql doc.
Now, the patterns and antipatterns are not officially documented.
Or just fix the issue, provide the backwards compatability GUCs and move on.
It is still compatibility break.
2) There's no way to incrementally change those values for a single
function. If you've set extra_errors = 'all' globally, a single
function can't say "turn off the too many rows setting for this
function".
We can enhance the GUC syntax like "all -too_many_rows,-xxx"
Why create all that framework when we could just have multiple plpgsql.blah GUCs? plpgsql.multirow_assign_level=FATAL solves that problem. We just need a plpgsql GUC for each backwards compatibility break.
We have this framework already, so why don't use it.
BTW, while I can see value in being able to change these settings
for an entire function, I think the recommended use should be to
only change them for a specific statement.
What you can do in plain assign statement
target := expression ?
The point I was trying to make there is if you do have some cases where you need to silently ignore extra rows (for example) it's probably only one statement and not an entire function. That said, if we just make these options GUCs then you can just do SET and RESET.My border is any compatibility break - and I would not to across it.
First issue is probably harder
If we never broke compatibility we'd still be allowing SELECT without FROM, NULL = NULL being TRUE, and a whole bunch of other problems. We'd also be stuck on protocol v1 (and of course not talking about what we want in v4).
This was in dark age - how much users of plpgsql was in 2000? Hard to speak about Postgres as mature software in this era.
We've successfully made incompatible changes that were *far worse* than this (ie: renaming pg_stat_activity.procpid). Obviously we shouldn't be breaking things willy-nilly, but these are long-standing warts (dare I say BUGS?) that should be fixed. They're ugly enough that someone took the time to break plpgsql out of the core code and fork it.
We are not talk about features that can be simply marked as bugs, so there is not too much what we should to fix it. We should to help to users to identify some possible risk places.
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
On 1/3/17 9:58 AM, Pavel Stehule wrote: > > ** The real problem is that we have no mechanism for allowing a PL's > > language/syntax/API to move forward without massive backwards compatibility > > problems. ** > > Just got back from break :-). Have some thoughts on this. Backwards > compatibility is really a fundamental problem. There's really no > solution to it other than to try and avoid using syntax to solve > problems. It should be obvious to everyone that plgsql cannot > withstand a compatibility break. Another language could be offered as I don't think that's obvious at all. We've introduced incompatibility in the main grammar without problem. You just need a way for people to get the old behavior back if they need it. Eventually people will stop relying on the old, broken behavior. > an alternative in core (say, pl/psm or pl/v8), but pl/pgsql has to > support old code. Some really out there features could maybe be > redacted (in particular, using = for assignment), but not not much. > But I guess we're stuck with the status quo. > > I think we ought to avoid language features that influence the > behavior (performance is ok) of the code (and that includes throwing > errors). That's a freight train headed towards javscript's 'strict' > mode, which is thinly disguised language fork. #option and pragma > type syntaxes are trying to cheat the language -- hardly anyone uses > them and it's a tricky way to try and make the language into something > other than it is. Yeah, trying to bulk all these changes into one "magic setting" is not a way to move forward. I think we're actually really well off in that regard, because unlike most languages we have a very robust settings system that allows controlling this behavior even at the statement level. > C does it right -- dubious code is raised as warnings and it's up to > the end user to determine which warnings are interesting and likely to > be an error. So, rather than hacking the language to control throwing > and errors and such there should be some ability validate the function > heavily and verify suspicious use of INTO or other dubious things > (unused variables, masked assignments, etc). The validation output > could even be a set returning function. While static analysis can do some good (and I think we should actually be enabling more of that by default), it won't realistically solve everything. Multi-row assignment is a good example: NO ONE is going to be OK with tons of warnings for every little := or SELECT INTO (without strict), but the reality is that most code actually won't work correctly if you have multiple rows coming back, so there's nothing technically wrong with `var = field FROM table WHERE table_id = plpgsql_variable` if table_id is the PK: you'll always get 0 or 1 rows back. > So -1 to strict mode, unless we can make a case why this can't be done > as part of checking/validation. > > > Can be plpgsq.extra_errors and plpgsql.extra_warnings solution? > > I am thinking so there is a space for improvement (in extra_* usage) > > Do you know plpgsql_check https://github.com/okbob/plpgsql_check ? I think we should look at what parts of that we should pull into core (as well as enabling more by default). Stuff that can be done at compile/load time is certainly better than runtime checks. > Other random points: > *) Another major pain point is swapping in the input variables for > debugging purposes. Something that emits a script based on a set of > arguments would be wonderful. > > ??? Yeah, could you elaborate here? > *) Would also like to have a FINALLY block > > What you can do there? It's a block that ALWAYS executes, even if an exception occurs. Python has this[1]. That (along with an ELSE clause for if there is no exception) would mean you could catch an exception for a single command instead of a bunch of commands. Somewhat related to that, I wish you could make GUC changes that were local only to a specific BEGIN block. AFAIK the GUC infrastructure fully supports that, it would just need to be exposed in plpgsql. > *) A mechanic to manually print out a stack trace for debugging > purposes would be helpful. > > > I had plan to develop a extension for this purpose - easy printing > stack, function parameters, and local variables. But I had a motivation > to start it. It can be usable for any PL I assume you're thinking an SRF that spits out PG_CONTEXT? It'd be really nice if you could also get things like function names and line numbers broken out separately. I've thought of building this myself. BTW, the biggest case I can think of using this for is a userspace method of doing "private" functions, where the function throws an exception unless it was called directly by a set of allowed functions (or views). > *) COPY not being able to accept arguments as variables (in particular > the filename) is a major headache > > > There is a patch "COPY as function" Instead of just COPY, I'd like an easier way to pass identifiers into SQL commands. format() certainly makes this easier than it was, but %3$s gets old pretty quick. (Of course, if we had the concept of a dictionary you could actually name the parameters and it wouldn't be quite so bad...) > *) Upon error, we ought to print a few lines of context on either side > of the error. Not very fun to try and figure out where the errors is > when you are working off of 'syntax error near "FROM"' etc. This is a > major problem throughout the platform IMO. > > > Have not idea how to fix it To do this I think you'd need to keep an array of preceding line positions instead of just one, which I don't think would be that hard. The bigger problem would be only spitting out the extra info on the initial error, and not for subsequent context calls up the stack. I don't think there's currently a way to tell if you're the level that the ereport originated at (at least, not in an error). Improving that would definitely be useful across the board, because right now DEBUG becomes completely useless once you have more than 1 or 2 levels of nested calls. > *) Some user visible mechanic other than forcing SQL through EXECUTE > to be able to control plan caching would be useful. > > > fully agree. > > Have you some ideas? > > What about plpgsql option (function scope) -- WITHOUT-PLAN-CACHE - any > non trivial plans will not be cached - and evaluated as parametrized > query only. I'd also like the ability to do a "localized" PREPARE; similar to a SQL level PREPARE statement, but ensuring that the statement got deallocated when the block ended. 1: https://docs.python.org/3/tutorial/errors.html#defining-clean-up-actions -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 1/3/17 11:19 AM, Pavel Stehule wrote: > 2) There's no way to incrementally change those values for a > single > function. If you've set extra_errors = 'all' globally, a single > function can't say "turn off the too many rows setting for this > function". > > > We can enhance the GUC syntax like "all -too_many_rows,-xxx" > > > Why create all that framework when we could just have multiple > plpgsql.blah GUCs? plpgsql.multirow_assign_level=FATAL solves that > problem. We just need a plpgsql GUC for each backwards compatibility > break. > > We have this framework already, so why don't use it. We *don't* have a framework that works for this, because you can't incrementally modify extra_errors. Maybe extra_errors is an OK API for static checking, but it's definitely a BAD API for something you'd need to control at a function (or even statement) level. > If we never broke compatibility we'd still be allowing SELECT > without FROM, NULL = NULL being TRUE, and a whole bunch of other > problems. We'd also be stuck on protocol v1 (and of course not > talking about what we want in v4). > > > This was in dark age - how much users of plpgsql was in 2000? Hard to > speak about Postgres as mature software in this era. I don't know about you' but I've considered Postgres to be mature since at least 8.0, if not earlier. Actually, in many ways it was far more mature than other databases I was using in 2000 (let alone 2007). > We've successfully made incompatible changes that were *far worse* > than this (ie: renaming pg_stat_activity.procpid). Obviously we > shouldn't be breaking things willy-nilly, but these are > long-standing warts (dare I say BUGS?) that should be fixed. They're > ugly enough that someone took the time to break plpgsql out of the > core code and fork it. > > We are not talk about features that can be simply marked as bugs, so > there is not too much what we should to fix it. We should to help to > users to identify some possible risk places. You keep claiming that these aren't serious bugs, yet someone felt so strongly that they ARE serious bugs that they forked the entire PL. If you're not willing to even consider a compatibility break (with a means to get the old behavior back) then I don't think there's any point in continuing this thread, because some of these issues can NOT be reasonably solved by a checker. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
2017-01-03 18:41 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/3/17 11:19 AM, Pavel Stehule wrote:2) There's no way to incrementally change those values for a
single
function. If you've set extra_errors = 'all' globally, a single
function can't say "turn off the too many rows setting for this
function".
We can enhance the GUC syntax like "all -too_many_rows,-xxx"
Why create all that framework when we could just have multiple
plpgsql.blah GUCs? plpgsql.multirow_assign_level=FATAL solves that
problem. We just need a plpgsql GUC for each backwards compatibility
break.
We have this framework already, so why don't use it.
We *don't* have a framework that works for this, because you can't incrementally modify extra_errors. Maybe extra_errors is an OK API for static checking, but it's definitely a BAD API for something you'd need to control at a function (or even statement) level.
I have different opinion then you - sure - it should not to change behave, it should to help with identification. And it is enough for this purpose.
If we never broke compatibility we'd still be allowing SELECT
without FROM, NULL = NULL being TRUE, and a whole bunch of other
problems. We'd also be stuck on protocol v1 (and of course not
talking about what we want in v4).
This was in dark age - how much users of plpgsql was in 2000? Hard to
speak about Postgres as mature software in this era.
I don't know about you' but I've considered Postgres to be mature since at least 8.0, if not earlier. Actually, in many ways it was far more mature than other databases I was using in 2000 (let alone 2007).We've successfully made incompatible changes that were *far worse*
than this (ie: renaming pg_stat_activity.procpid). Obviously we
shouldn't be breaking things willy-nilly, but these are
long-standing warts (dare I say BUGS?) that should be fixed. They're
ugly enough that someone took the time to break plpgsql out of the
core code and fork it.
We are not talk about features that can be simply marked as bugs, so
there is not too much what we should to fix it. We should to help to
users to identify some possible risk places.
You keep claiming that these aren't serious bugs, yet someone felt so strongly that they ARE serious bugs that they forked the entire PL.
Sorry, but it it is subjective - and there can be different opinions - some body would to prefer more rigidity, some other less rigidity.
If you're not willing to even consider a compatibility break (with a means to get the old behavior back) then I don't think there's any point in continuing this thread, because some of these issues can NOT be reasonably solved by a checker.
yes, I don't would to consider about a compatibility break. I accept so you have different opinion.
I'll send this patch + doc to next commitfest - and depends on commiters if the patch will be rejected or not. I know so it should not be fully fixed, but it is step forward from my perspective.
Thank you for discussion
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
On Tue, Jan 3, 2017 at 9:58 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2017-01-03 16:23 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>: >> So -1 to strict mode, unless we can make a case why this can't be done >> as part of checking/validation. > > Can be plpgsq.extra_errors and plpgsql.extra_warnings solution? > > I am thinking so there is a space for improvement (in extra_* usage) extra_warnings seems ok at the GUC level. However it's bad to have a body of code fail to compile based on GUC. check_function_bodies for example is a complete hack and should be avoided if at all possible IMO. There is very good informal rule that GUC should not impact behavior (minus some special cases like timeouts). Good examples of failure to follow this rule are mysql and php. Maybe settings at level of extension could be ok, but I'm skeptical. Good languages are clear without needing extra context. > Do you know plpgsql_check https://github.com/okbob/plpgsql_check ? Yes. This is good design and should be model for core-work (if any).In my ideal world, this could would be part of pgxnand to have pgxn client be installed in core. For plpgsql to enter modern era we need standardized packaging and deployment like cran, npm, etc. >> Other random points: >> *) Another major pain point is swapping in the input variables for >> debugging purposes. Something that emits a script based on a set of >> arguments would be wonderful. > > ??? Often for debugging of complicated cases I'm starting from errors in database log with function name and argument values. Sometimes I find myself pasting pl/pgsql function into text editor and replacing input variables with known values. >> >> *) Would also like to have a FINALLY block > > What you can do there? This is syntax sugar so you don't need second begin/end/exception block or duplicated code. It separates error handling from cleanup. BEGIN PERFORM dblink_connect(... <risky_stuff> EXCEPTION WHEN OTHERS THEN <log/handle error> FINALLY PERFORM dblink_disconnect(... END; >> *) Some user visible mechanic other than forcing SQL through EXECUTE >> to be able to control plan caching would be useful. > > fully agree. > > Have you some ideas? > > What about plpgsql option (function scope) -- WITHOUT-PLAN-CACHE - any non > trivial plans will not be cached - and evaluated as parametrized query only. I have slight preference for syntax marker for each query, similar to INTO. Maybe 'UNCACHED'? On Tue, Jan 3, 2017 at 10:57 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > Or just fix the issue, provide the backwards compatability GUCs and move on. I really don't think this will fly. I'm not buying your argument (at all) that compatibility breaks have have been cleanly done in the past, at least not in the modern era. In any event, marginal language improvements are not a good justification to do it. And yes, the continual monkey around with column names in pg_stat_activity are a major hassle. For heaven's sake, can we just add new columns and/or create a new view? merlin
2017-01-03 20:54 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>:
On Tue, Jan 3, 2017 at 9:58 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2017-01-03 16:23 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>:
>> So -1 to strict mode, unless we can make a case why this can't be done
>> as part of checking/validation.
>
> Can be plpgsq.extra_errors and plpgsql.extra_warnings solution?
>
> I am thinking so there is a space for improvement (in extra_* usage)
extra_warnings seems ok at the GUC level. However it's bad to have a
body of code fail to compile based on GUC. check_function_bodies for
example is a complete hack and should be avoided if at all possible
IMO. There is very good informal rule that GUC should not impact
behavior (minus some special cases like timeouts). Good examples of
failure to follow this rule are mysql and php.
Maybe settings at level of extension could be ok, but I'm skeptical.
Good languages are clear without needing extra context.
> Do you know plpgsql_check https://github.com/okbob/plpgsql_check ?
Yes. This is good design and should be model for core-work (if any).
In my ideal world, this could would be part of pgxn and to have pgxn
client be installed in core. For plpgsql to enter modern era we need
standardized packaging and deployment like cran, npm, etc.
>> Other random points:
>> *) Another major pain point is swapping in the input variables for
>> debugging purposes. Something that emits a script based on a set of
>> arguments would be wonderful.
>
> ???
Often for debugging of complicated cases I'm starting from errors in
database log with function name and argument values. Sometimes I find
myself pasting pl/pgsql function into text editor and replacing input
variables with known values.
is it related to plpgsql debugger? Have not idea how it can be better on language level.
>>
>> *) Would also like to have a FINALLY block
>
> What you can do there?
This is syntax sugar so you don't need second begin/end/exception
block or duplicated code. It separates error handling from cleanup.
BEGIN
PERFORM dblink_connect(...
<risky_stuff>
EXCEPTION WHEN OTHERS THEN
<log/handle error>
FINALLY
PERFORM dblink_disconnect(...
END;
Does know somebody this pattern from Ada or PL/SQL?
>> *) Some user visible mechanic other than forcing SQL through EXECUTE
>> to be able to control plan caching would be useful.
>
> fully agree.
>
> Have you some ideas?
>
> What about plpgsql option (function scope) -- WITHOUT-PLAN-CACHE - any non
> trivial plans will not be cached - and evaluated as parametrized query only.
I have slight preference for syntax marker for each query, similar to
INTO. Maybe 'UNCACHED'?
I am not clean opinion - the statement level is nice, but what readability?
SELECT UNCACHED t.a, t.b FROM INTO a,b;
Regards
Pavel
On Tue, Jan 3, 2017 at 10:57 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> Or just fix the issue, provide the backwards compatability GUCs and move on.
I really don't think this will fly. I'm not buying your argument (at
all) that compatibility breaks have have been cleanly done in the
past, at least not in the modern era. In any event, marginal language
improvements are not a good justification to do it. And yes, the
continual monkey around with column names in pg_stat_activity are a
major hassle. For heaven's sake, can we just add new columns and/or
create a new view?
merlin
On Tue, Jan 3, 2017 at 2:15 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > > 2017-01-03 20:54 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>: >> >> On Tue, Jan 3, 2017 at 9:58 AM, Pavel Stehule <pavel.stehule@gmail.com> >> wrote: >> > 2017-01-03 16:23 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>: >> >> *) Would also like to have a FINALLY block >> > >> > What you can do there? >> >> This is syntax sugar so you don't need second begin/end/exception >> block or duplicated code. It separates error handling from cleanup. >> >> BEGIN >> PERFORM dblink_connect(... >> <risky_stuff> >> EXCEPTION WHEN OTHERS THEN >> <log/handle error> >> FINALLY >> PERFORM dblink_disconnect(... >> END; > > > Does know somebody this pattern from Ada or PL/SQL? I guess probably not. It's a standard pattern in modern EH languages (for example, https://msdn.microsoft.com/en-us/library/dszsf989.aspx). >> >> >> *) Some user visible mechanic other than forcing SQL through EXECUTE >> >> to be able to control plan caching would be useful. >> > >> > fully agree. >> > >> > Have you some ideas? >> > >> > What about plpgsql option (function scope) -- WITHOUT-PLAN-CACHE - any >> > non >> > trivial plans will not be cached - and evaluated as parametrized query >> > only. >> >> I have slight preference for syntax marker for each query, similar to >> INTO. Maybe 'UNCACHED'? > > > I am not clean opinion - the statement level is nice, but what readability? > > SELECT UNCACHED t.a, t.b FROM INTO a,b; Yeah -- this is pretty ugly admittedly. Maybe control directive is ok, as long as you can set it mid function? merlin
2017-01-04 16:49 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>:
On Tue, Jan 3, 2017 at 2:15 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
> 2017-01-03 20:54 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>:
>>
>> On Tue, Jan 3, 2017 at 9:58 AM, Pavel Stehule <pavel.stehule@gmail.com>
>> wrote:
>> > 2017-01-03 16:23 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>:
>> >> *) Would also like to have a FINALLY block
>> >
>> > What you can do there?
>>
>> This is syntax sugar so you don't need second begin/end/exception
>> block or duplicated code. It separates error handling from cleanup.
>>
>> BEGIN
>> PERFORM dblink_connect(...
>> <risky_stuff>
>> EXCEPTION WHEN OTHERS THEN
>> <log/handle error>
>> FINALLY
>> PERFORM dblink_disconnect(...
>> END;
>
>
> Does know somebody this pattern from Ada or PL/SQL?
I guess probably not. It's a standard pattern in modern EH languages
(for example, https://msdn.microsoft.com/en-us/library/dszsf989.aspx).
>>
>> >> *) Some user visible mechanic other than forcing SQL through EXECUTE
>> >> to be able to control plan caching would be useful.
>> >
>> > fully agree.
>> >
>> > Have you some ideas?
>> >
>> > What about plpgsql option (function scope) -- WITHOUT-PLAN-CACHE - any
>> > non
>> > trivial plans will not be cached - and evaluated as parametrized query
>> > only.
>>
>> I have slight preference for syntax marker for each query, similar to
>> INTO. Maybe 'UNCACHED'?
>
>
> I am not clean opinion - the statement level is nice, but what readability?
>
> SELECT UNCACHED t.a, t.b FROM INTO a,b;
Yeah -- this is pretty ugly admittedly. Maybe control directive is
ok, as long as you can set it mid function?
ADA uses for this purpose PRAGMA keyword - it is used for everything in ADA - cycle iteration optimization, ...the scope can be statement, block, procedure.
so something like
BEGIN
PRAGMA uncached_plans;
SELECT ...
..
END;
But it should be verified by some PL/SQL or Ada experts
Regards
merlin
> SELECT UNCACHED t.a, t.b FROM INTO a,b;
Yeah -- this is pretty ugly admittedly. Maybe control directive is
ok, as long as you can set it mid function?ADA uses for this purpose PRAGMA keyword - it is used for everything in ADA - cycle iteration optimization, ...the scope can be statement, block, procedure.
Pragma is used for changing (enforcing) behave. There are pragmas ada_05, ada_2012, ..
so something likeBEGINPRAGMA uncached_plans;SELECT .....END;But it should be verified by some PL/SQL or Ada expertsRegards
merlin
>>
>> >> *) Some user visible mechanic other than forcing SQL through EXECUTE
>> >> to be able to control plan caching would be useful.
>> >
>> > fully agree.
>> >
>> > Have you some ideas?
>> >
>> > What about plpgsql option (function scope) -- WITHOUT-PLAN-CACHE - any
>> > non
>> > trivial plans will not be cached - and evaluated as parametrized query
>> > only.
>>
>> I have slight preference for syntax marker for each query, similar to
>> INTO. Maybe 'UNCACHED'?
>
>
> I am not clean opinion - the statement level is nice, but what readability?
>
> SELECT UNCACHED t.a, t.b FROM INTO a,b;
Yeah -- this is pretty ugly admittedly. Maybe control directive is
ok, as long as you can set it mid function?ADA uses for this purpose PRAGMA keyword - it is used for everything in ADA - cycle iteration optimization, ...the scope can be statement, block, procedure.so something likeBEGINPRAGMA uncached_plans;SELECT .....END;But it should be verified by some PL/SQL or Ada experts
Little bit better - if PRAGMA is used in DECLARE part, then it has block scope
so some possible design can be:
DECLARE
PRAGMA UNCACHED_PLANS;
BEGIN
SELECT ... INTO ;
SELECT ... INTO ;
END;
This respects Ada and PL/SQL style - probably easy implementation
Regards
Pavel
Regards
merlin
so some possible design can be:DECLAREPRAGMA UNCACHED_PLANS;BEGINSELECT ... INTO ;SELECT ... INTO ;END;This respects Ada and PL/SQL style - probably easy implementationRegardsPavel
some examples based on Ada doc
FUNCTION xxx RETURN int AS
PRAGMA yyy -- pragma has function scope
BEGIN
FUNCTION xxx RETURN int AS
BEGIN
DECLARE
PRAGMA yyy -- pragma has block scope
Regards
Pavel
Regards
merlin
On Wed, Jan 4, 2017 at 1:49 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > >> >> so some possible design can be: >> >> DECLARE >> PRAGMA UNCACHED_PLANS; >> BEGIN >> SELECT ... INTO ; >> SELECT ... INTO ; >> END; >> >> This respects Ada and PL/SQL style - probably easy implementation >> >> Regards >> >> Pavel > > > some examples based on Ada doc > > FUNCTION xxx RETURN int AS > PRAGMA yyy -- pragma has function scope > BEGIN > > FUNCTION xxx RETURN int AS > BEGIN > DECLARE > PRAGMA yyy -- pragma has block scope ok, sub-block makes sense over statement level IMO. merlin
On Wed, Dec 28, 2016 at 2:25 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > That's my whole point of why this needs to be settable at a global level: so > that people with a lot of legacy code can set the OLD behavior at a global > level, and deal with the old code over time. This has the same problem being discussed nearby on the case-folding thread, though: any extension or third-party tool has to either work with every possible value, or else it has to require one particular value and therefore not be usable if you need another value for some other reason. Now, that's not to say we should never break backward compatibility. Sometimes we should. I think the problem with PL/pgsql is that many of the compatibility breaks that people want are likely to lead to subtle misbehavior rather than outright failure, or are not easy to spot via a cursory look ("hmm, could that SELECT query ever return more than one row?"). Also, while everybody agrees that a bunch of things should be changed and improved, not everybody agrees about which ones, and sometimes person A desperately wants X changed while person B desperately wants it changed in the other direction or left alone. If there were a set of changes that we could make all at once, call the result plpgsql2 or nplpgsql or whatever, and make everybody happy, that'd be fabulous, but we don't. So we're left with doing nothing, or having 2^n language variants controlled by GUCs or pragmas, neither of which is appealing. I think it would be a good idea to lock all the people who really care about PL/pgsql in a room until they agree on what changes should be made for the next version of the language. If they don't agree quickly enough, we can resort to the techniques described in https://en.wikipedia.org/wiki/Papal_election,_1268%E2%80%9371 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 5, 2017 at 11:03 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Now, that's not to say we should never break backward compatibility. > Sometimes we should. I think the problem with PL/pgsql is that many > of the compatibility breaks that people want are likely to lead to > subtle misbehavior rather than outright failure, or are not easy to > spot via a cursory look ("hmm, could that SELECT query ever return > more than one row?"). The core issue is that developers tend to be very poor at estimating the impacts of changes; they look at things the the lens of the "new". Professional software development is quite expensive and framework- (I'll lump the database and it's various built-in features under that term) level changes are essentially throwing out some portion of our user's investments. Even fairly innocent compatibility breaks can have major downstream impacts on our users and it's always much worse than expected. For example, nobody thought that changing the bytea text encoding format to hex would have corrupted our user's data, but it did. TBH, the discussion should shift away from specific issues on compatibility and towards a specific set of standards and policies around how to do it and what kinds of technical justifications need to be made in advance. Security problems for example could be argued as a valid reason to break user code, or poor adherence to the the SQL standard which are in turn blocking other content. Minus those kinds of considerations it's really just not worth doing, and there's no tricky strategy like playing with version numbers that can game that rule. A formal deprecation policy might be a good start. The C language really should be considered the gold standard here. Changes did have to be made, like getting rid of the notoriously broken and insecure gets(), but they were made very, very slowly and unobtrusively. (I do think lpad should except "any" FWIW) :-D merlin
2017-01-05 18:36 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>:
On Thu, Jan 5, 2017 at 11:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Now, that's not to say we should never break backward compatibility.
> Sometimes we should. I think the problem with PL/pgsql is that many
> of the compatibility breaks that people want are likely to lead to
> subtle misbehavior rather than outright failure, or are not easy to
> spot via a cursory look ("hmm, could that SELECT query ever return
> more than one row?").
The core issue is that developers tend to be very poor at estimating
the impacts of changes; they look at things the the lens of the "new".
Professional software development is quite expensive and framework-
(I'll lump the database and it's various built-in features under that
term) level changes are essentially throwing out some portion of our
user's investments. Even fairly innocent compatibility breaks can
have major downstream impacts on our users and it's always much worse
than expected. For example, nobody thought that changing the bytea
text encoding format to hex would have corrupted our user's data, but
it did.
TBH, the discussion should shift away from specific issues on
compatibility and towards a specific set of standards and policies
around how to do it and what kinds of technical justifications need to
be made in advance. Security problems for example could be argued as
a valid reason to break user code, or poor adherence to the the SQL
standard which are in turn blocking other content. Minus those kinds
of considerations it's really just not worth doing, and there's no
tricky strategy like playing with version numbers that can game that
rule. A formal deprecation policy might be a good start.
The C language really should be considered the gold standard here.
Changes did have to be made, like getting rid of the notoriously
broken and insecure gets(), but they were made very, very slowly and
unobtrusively.
(I do think lpad should except "any" FWIW) :-D
I fully agree - sometimes there is fuzzy border in understanding what is bug and what unhappy designed feature - probably lot of useful features can be taken and used wrong.
Regards
Pavel
merlin
Hi
>
> some examples based on Ada doc
>
> FUNCTION xxx RETURN int AS
> PRAGMA yyy -- pragma has function scope
> BEGIN
>
> FUNCTION xxx RETURN int AS
> BEGIN
> DECLARE
> PRAGMA yyy -- pragma has block scope
ok, sub-block makes sense over statement level IMO.
I am sending proof concept (parser only implementation) - it allows to control query plan usage on function and on block level
Examples
CREATE OR REPLACE FUNCTION fx()
RETURNS int AS $$
PRAGMA use_query_plan_cache(off); -- disable query plan cache on function level
DECLARE r record;
BEGIN
FOR r IN SELECT ... -- some complex query, where we prefer on one shot plan
LOOP
DECLARE
PRAGMA use_query_plan_cache(on); -- enable query plan cache for block
BEGIN
... statements inside cycle reuses query plan
END;
END LOOP;
END;
$$ LANGUAGE plpgsql;
or
BEGIN
...
DECLARE
PRAGMA use_query_plan_cache(off);
BEGIN
-- these queries has fresh plan only
SELECT ...
SELECT ...
END; -- end of PRAGMA scope
...
-- usual behave
END;
The behave is static - controlled on compile time only - the controlled feature can be enabled/disabled. The impact on runtime is zero
* the syntax is verbose - readable - I prefer strong clean signal for readers so something internals is different
* consistent with Ada, PL/SQL
* remove one reason for dynamic SQL
* allows to mix queries with without query plan cache - interesting for patter FOR IN slow query LOOP fast query; END LOOP;
* there is small risk of compatibility break - if somebody use variables named PRAGMA, because new reserved keyword is necessary - fails on syntax error - so it is easy identified.
* this syntax can be reused - autonomous_transaction like PL/SQL. I read a manual of Gnu Ada - and this is used often for implementation legacy (obsolete) behave, functionality.
Notes, comments?
Regards
Pavel
Attachment
On Thu, Jan 5, 2017 at 7:03 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > I think it would be a good idea to lock all the people who really care > about PL/pgsql in a room until they agree on what changes should be > made for the next version of the language. If they don't agree > quickly enough, we can resort to the techniques described in > https://en.wikipedia.org/wiki/Papal_election,_1268%E2%80%9371 I think that's a very good idea, and I'm happy to be locked into such a room. I think such a discussion will be very fruitful, given the others in the room have also already decided they want a new language and are there to discuss "the next version of the language", instead of debating why they don't think we need a new language. It would also be good if those people could bring laptops with all their plpgsql code bases, to check if any of the proposed possibly non-backwards compatible syntax proposals would break nothing, just a few functions, or a lot of functions in their code bases.
2017-01-03 17:57 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/2/17 1:51 PM, Pavel Stehule wrote:1) Neither is enabled by default, so 90% of users have no idea they
exist. Obviously that's an easy enough fix, but...
We can strongly talk about it - there can be a chapter in plpgsql doc.
Now, the patterns and antipatterns are not officially documented.
Or just fix the issue, provide the backwards compatability GUCs and move on.2) There's no way to incrementally change those values for a single
function. If you've set extra_errors = 'all' globally, a single
function can't say "turn off the too many rows setting for this
function".
We can enhance the GUC syntax like "all -too_many_rows,-xxx"
Why create all that framework when we could just have multiple plpgsql.blah GUCs? plpgsql.multirow_assign_level=FATAL solves that problem. We just need a plpgsql GUC for each backwards compatibility break. BTW, while I can see value in being able to change these settings
for an entire function, I think the recommended use should be to
only change them for a specific statement.
What you can do in plain assign statement
target := expression ?
The point I was trying to make there is if you do have some cases where you need to silently ignore extra rows (for example) it's probably only one statement and not an entire function. That said, if we just make these options GUCs then you can just do SET and RESET.My border is any compatibility break - and I would not to across it.
First issue is probably harder
If we never broke compatibility we'd still be allowing SELECT without FROM, NULL = NULL being TRUE, and a whole bunch of other problems. We'd also be stuck on protocol v1 (and of course not talking about what we want in v4).
We've successfully made incompatible changes that were *far worse* than this (ie: renaming pg_stat_activity.procpid). Obviously we shouldn't be breaking things willy-nilly, but these are long-standing warts (dare I say BUGS?) that should be fixed. They're ugly enough that someone took the time to break plpgsql out of the core code and fork it.
The discussion about changing behave of current features has not a solution. I don't believe so there is possible to find a win/win solution - it is not possible. I respect the opinion all people here, but somebody more afraid of language fragmentation, somebody else more from some possible production issues. All arguments are valid, all arguments has same value, all arguments are sent by people with lot of experience (but with different experience - anybody works with different domains, uses different patters, hits different kind of errors).
This discussion was +/- about behave of INTO clause - and if STRICT clause should be default and if STRICT clause should be more strict.
if we introduce new pattern (new syntax), that can be strict enough then we can go forward. New syntax will not have a impact on current customers code base. If somebody prefer very strict behave, then he can use new syntax quickly. Issues in old code can be detected by other tools - plpgsql_check, and extra_warnings, extra_errors.
Current state
==========
INTO
-------
* is not strict in any directions - columns or rows
* not equal target and source column number cannot be detected in plpgsql (plpgsql_check does it)
* missing rows - uses FOUND (PL/SQL), too much rows - uses GET DIAGNOSTICS x = ROW_COUNT (ANSI/SQL pattern)
* doesn't need outer subtransaction (handled exception) for handling specific situations
select a into t from test where a = i;
get diagnostics r = row_count;
if r > 1 then raise exception 'too much rows'; end if;
INTO STRICT
------------------
* is strict in rows - require exactly one row result
* not equal target and source column number cannot be detected in plpgsql (plpgsql_check does it)
* raises a exceptions no_data_found, too_much_rows
* require outer subtransaction (handled exception) for handling no_data_found (10% slowdown in worst case)
begin
select a into t from test where a = i;
exception
when no_data_found then
t = null;
end;
There is safe pattern (workaround) for missing columns in source - using RECORD type. Access to missing field raises runtime error. Another solution - plpgsql_check.
subselect assignment
-----------------------------
* is column strict - doesn't allow more columns now
* don't allow too_much_rows
* the FOUND is always true - has not change to check it
* although it is ANSI/SQL pattern it is undocumented in PostgreSQL (only INTO clause is documented)
x := (select a from test where a = i)
officially unsupported implementation side effect assignment FROM
------------------------------------------------------------------------------------------
* is undocumented
* allows only one column
* behave is similar like INTO
x := a from test where a = i
Possible risk
============
* typo SELECT a b INTO a,b or SELECT a,b INTO b,a - currently I use a RECORD type to be safe
* undetermined result in more rows
* unknown result in no_data_found
The questions - what we expect from new behave?
=======================================
* safety against typo - 100% yes
* safety against undetermined result - 100% yes
* what behave should be default in no_data_found case ? Exception or NULL? - both has advantages for who require designed behave and disadvantage for require second behave (a) handling the exception, b) IF NOT FOUND test)
Possible solution (without compatibility break)
===================================
introduction ANSI/SQL assingnment, multiassignment SET
------------------------------------------------------------------------------
SET varname = (SELECT ...)
SET (t1, t2, t3, ...) = (SELECT c1, c2, c3 ..)
- possible identifier collision with GUC (not in multiassignment variant)
- is not too robust against typo SET (t1, t2,..) = (SELECT t2,t1, ...)
- it is ANSI/SQL design/behave
introduction new syntax proposed by plpgsql2 project
-----------------------------------------------------------------------
SELECT t1 := c1, t2 := c2, ...
- it can be PostgreSQL specific syntax - full control over design
- maximally robust against typo
- long syntax, but for short syntax can be used SELECT c1,c2,c3, .. INTO STRICT recvar
- what should be no_data_found behave?
I have nothing about a cost of "new syntax" implementation - but for me - it looks like good solution for us - it can be win/win solution. It breaks nothing - it introduce nice to have typo robust syntax.
Comments, notes?
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
* EXECUTE and FOUND - this is incompatible change, extra check can be used (test on unset variable). I see solution in leaving FOUND variable and introduction of some new without this issue - ROW_COUNT maybe (this is another possible incompatible change, but with higher benefit - maybe we can introduce some aliasing, PRAGMA clause, default PRAGMAs, ..).
I checked current implementation of FOUND variable. If we introduce new auto variable ROW_COUNT - exactly like FOUND, then it doesn't introduce any compatibility break.
ROW_COUNT .. shortcut for GET DIAGNOSTICS row_count = ROW_COUNT.
Comments, notes?
Regards
Pavel
On 1/5/17 11:36 AM, Merlin Moncure wrote: > The C language really should be considered the gold standard here. > Changes did have to be made, like getting rid of the notoriously > broken and insecure gets(), but they were made very, very slowly and > unobtrusively. For those not familiar... how did they accomplish that? I'm certainly fine with changes being made very slowly. We carried the missing_From GUC around for like a decade before ditching it. We have other GUCs that have defaulted to not allowing silly behavior for a long time as well. We might well need to leave the default for compatibility GUCs set to current behavior for several more releases, to allow people to start specifying which behavior they want. I agree with Robert that there needs to be consensus that a change needs to be made, but frankly I think 50% of this thread was people disagreeing with *ANY* change that would be incompatible. IMHO that's a ridiculous position that does not match expectations outside of plpgsql. That kind of expectation means we have absolutely no way of fixing past mistakes. Certainly, there also needs to be agreement on what the new behavior should be, but again, what I observed was an adamant insistence that absolutely no break would be permitted. As for using GUCs for these changes and that impact on extensions, I don't see why that won't work for what we're discussing here. In a worst-case scenario, extension authors would need to specify what behavior they wanted in their extensions instead of blindly accepting the default, by making sure those options were set for each function they defined. While it would certainly be nice to avoid that extra work, all the necessary infrastructure to handle that is already in place. And if we wanted to avoid that hassle, we could allow custom GUC settings on extensions, like we currently do for roles and databases. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 1/7/17 5:39 AM, Pavel Stehule wrote: > > I checked current implementation of FOUND variable. If we introduce new > auto variable ROW_COUNT - exactly like FOUND, then it doesn't introduce > any compatibility break. Except it would break every piece of code that had a row_count variable, though I guess you could see which scoping level the variable had been defined in. I think the right solution in this case is to replace GET DIAGNOSTICs with something easier to use, but I'm not sure what that would be. I think this is another example of where not using some kind of character to distinguish variables screws us. :/ -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 1/7/17 2:06 AM, Pavel Stehule wrote: > > SELECT t1 := c1, t2 := c2, ... > > - it can be PostgreSQL specific syntax - full control over design > - maximally robust against typo > - long syntax, but for short syntax can be used SELECT c1,c2,c3, .. INTO > STRICT recvar I don't think overloading a SQL command is a good idea. We'd be in trouble if ANSI ever introduced :=. I think that could also conflict with existing operators. > - what should be no_data_found behave? Considering where we're at today, I don't think there should be a default behavior; make the user specify somehow whether missing data is allowed or not. > I have nothing about a cost of "new syntax" implementation - but for me > - it looks like good solution for us - it can be win/win solution. It > breaks nothing - it introduce nice to have typo robust syntax. Related to that, I suspect we could add better support to existing commands for at least some of these things. For example, SELECT ... INTO NOMULTI (instead of STRICT) to indicate that multiple rows are an error but missing data is OK. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > On 1/7/17 5:39 AM, Pavel Stehule wrote: >> I checked current implementation of FOUND variable. If we introduce new >> auto variable ROW_COUNT - exactly like FOUND, then it doesn't introduce >> any compatibility break. > Except it would break every piece of code that had a row_count variable, > though I guess you could see which scoping level the variable had been > defined in. If FOUND were declared at an outer scoping level such that any user-created declaration overrode the name, then we could do likewise for other auto variables and not fear compatibility breaks. Currently, though, we don't seem to be quite there: it looks like FOUND is an outer variable with respect to DECLARE blocks, but it's more closely nested than parameter names. Compare: regression=# create function foo1(bool) returns bool as 'declare found bool := $1; begin return found; end' language plpgsql; CREATE FUNCTION regression=# select foo1(true);foo1 ------t (1 row) regression=# create function foo2(found bool) returns bool as regression-# 'begin return found; end' language plpgsql; CREATE FUNCTION regression=# select foo2(true);foo2 ------f (1 row) Not sure if changing this would be a good thing or not --- was there reasoning behind this behavior, or was it just accidental? regards, tom lane
On 1/7/17 8:53 PM, Tom Lane wrote: > If FOUND were declared at an outer scoping level such that any > user-created declaration overrode the name, then we could do likewise > for other auto variables and not fear compatibility breaks. > > Currently, though, we don't seem to be quite there: it looks like > FOUND is an outer variable with respect to DECLARE blocks, but it's > more closely nested than parameter names. Sorry, I'm not following... you can override a parameter name the same way and get the same behavior, no? BTW, I do wish you could change the label of the scope that arguments went into, so that you could use that label to refer to function parameters. If we allowed that it'd perhaps be the best of both worlds: you'd be guaranteed access to all auto variables and parameters, and that access wouldn't need to be tied to the function name (which can be both painful and error prone). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
2017-01-08 3:53 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> On 1/7/17 5:39 AM, Pavel Stehule wrote:
>> I checked current implementation of FOUND variable. If we introduce new
>> auto variable ROW_COUNT - exactly like FOUND, then it doesn't introduce
>> any compatibility break.
> Except it would break every piece of code that had a row_count variable,
> though I guess you could see which scoping level the variable had been
> defined in.
If FOUND were declared at an outer scoping level such that any
user-created declaration overrode the name, then we could do likewise
for other auto variables and not fear compatibility breaks.
Currently, though, we don't seem to be quite there: it looks like
FOUND is an outer variable with respect to DECLARE blocks, but it's
more closely nested than parameter names. Compare:
regression=# create function foo1(bool) returns bool as
'declare found bool := $1; begin return found; end' language plpgsql;
CREATE FUNCTION
regression=# select foo1(true);
foo1
------
t
(1 row)
regression=# create function foo2(found bool) returns bool as
regression-# 'begin return found; end' language plpgsql;
CREATE FUNCTION
regression=# select foo2(true);
foo2
------
f
(1 row)
Not sure if changing this would be a good thing or not --- was
there reasoning behind this behavior, or was it just accidental?
There are two related features in plpgsql2 project:
1. dynamic SQL sets FOUND variable
2. direct access to processed rows info via variable ROW_COUNT
@1 is incompatible change, @2 is good enough - so we should not to change FOUND, but we can propagate ROW_COUNT instead.
Regards
Pavel
regards, tom lane
2017-01-08 4:11 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/7/17 8:53 PM, Tom Lane wrote:If FOUND were declared at an outer scoping level such that any
user-created declaration overrode the name, then we could do likewise
for other auto variables and not fear compatibility breaks.
Currently, though, we don't seem to be quite there: it looks like
FOUND is an outer variable with respect to DECLARE blocks, but it's
more closely nested than parameter names.
Sorry, I'm not following... you can override a parameter name the same way and get the same behavior, no?
It is declared before any custom identifier. If you override it, then you are working with own variable - not with auto variable.
BTW, I do wish you could change the label of the scope that arguments went into, so that you could use that label to refer to function parameters. If we allowed that it'd perhaps be the best of both worlds: you'd be guaranteed access to all auto variables and parameters, and that access wouldn't need to be tied to the function name (which can be both painful and error prone).
We can talk about compiler directive.
PRAGMA auto_variables_label(xxxx) -- require function scope only
BEGIN
IF xxxx.FOUND THEN
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
2017-01-08 3:39 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/7/17 2:06 AM, Pavel Stehule wrote:
SELECT t1 := c1, t2 := c2, ...
- it can be PostgreSQL specific syntax - full control over design
- maximally robust against typo
- long syntax, but for short syntax can be used SELECT c1,c2,c3, .. INTO
STRICT recvar
I don't think overloading a SQL command is a good idea. We'd be in trouble if ANSI ever introduced :=. I think that could also conflict with existing operators.
The ":=" operator is used ANSI/SQL already for named arguments. Isn't probable so ANSI uses it in different context.
This is not overloading of SQL command - it is like annotations. It is smart idea, so I was not surprised if ANSI/SQL reuses it. There is not any possible construct, that is robust against typo - because assignment is very verbose and natural.
ANSI - SQL/PSM uses two methods
1. multiassignment
SET (a,b,c) = (SELECT a, b, c ...)
2. auto variables in dedicated new scope
FOR scope_label IN SELECT a, b, c
DO
-- you can use variables a, b, c
-- you can use qualified variables scope_label.a, scope_label.b, ..
END FOR
This method is not possible in PL/pgSQL - but a work with record type is similar
- what should be no_data_found behave?
Considering where we're at today, I don't think there should be a default behavior; make the user specify somehow whether missing data is allowed or not.I have nothing about a cost of "new syntax" implementation - but for me
- it looks like good solution for us - it can be win/win solution. It
breaks nothing - it introduce nice to have typo robust syntax.
Related to that, I suspect we could add better support to existing commands for at least some of these things. For example, SELECT ... INTO NOMULTI (instead of STRICT) to indicate that multiple rows are an error but missing data is
Another flag into NOMULTI can be solution too.
The new syntax ":=" has some advantages:
1. it robust against type - it is harder to do unwanted swap of variables, and this mistake is very clear
2. the syntax ensure equality of target variables and source expressions.
I see valuable benefit of this syntax
Regards
Pavel
OK.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
2017-01-08 3:31 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/7/17 5:39 AM, Pavel Stehule wrote:
I checked current implementation of FOUND variable. If we introduce new
auto variable ROW_COUNT - exactly like FOUND, then it doesn't introduce
any compatibility break.
Except it would break every piece of code that had a row_count variable, though I guess you could see which scoping level the variable had been defined in.
I think the right solution in this case is to replace GET DIAGNOSTICs with something easier to use, but I'm not sure what that would be.
I invite any ideas?
Regards
Pavel
I think this is another example of where not using some kind of character to distinguish variables screws us. :/
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
BTW, I do wish you could change the label of the scope that arguments went into, so that you could use that label to refer to function parameters. If we allowed that it'd perhaps be the best of both worlds: you'd be guaranteed access to all auto variables and parameters, and that access wouldn't need to be tied to the function name (which can be both painful and error prone).We can talk about compiler directive.PRAGMA auto_variables_label(xxxx) -- require function scope only
If we know a list of all auto variables, then it can be on function or block level - it can create aliases.
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
2017-01-08 3:27 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/5/17 11:36 AM, Merlin Moncure wrote:The C language really should be considered the gold standard here.
Changes did have to be made, like getting rid of the notoriously
broken and insecure gets(), but they were made very, very slowly and
unobtrusively.
For those not familiar... how did they accomplish that?
I'm certainly fine with changes being made very slowly. We carried the missing_From GUC around for like a decade before ditching it. We have other GUCs that have defaulted to not allowing silly behavior for a long time as well. We might well need to leave the default for compatibility GUCs set to current behavior for several more releases, to allow people to start specifying which behavior they want.
I agree with Robert that there needs to be consensus that a change needs to be made, but frankly I think 50% of this thread was people disagreeing with *ANY* change that would be incompatible. IMHO that's a ridiculous position that does not match expectations outside of plpgsql. That kind of expectation means we have absolutely no way of fixing past mistakes.
Certainly, there also needs to be agreement on what the new behavior should be, but again, what I observed was an adamant insistence that absolutely no break would be permitted.
As for using GUCs for these changes and that impact on extensions, I don't see why that won't work for what we're discussing here. In a worst-case scenario, extension authors would need to specify what behavior they wanted in their extensions instead of blindly accepting the default, by making sure those options were set for each function they defined. While it would certainly be nice to avoid that extra work, all the necessary infrastructure to handle that is already in place. And if we wanted to avoid that hassle, we could allow custom GUC settings on extensions, like we currently do for roles and databases.
The discussion related to plpgsql2, future development of plpgsql has more levels, topics
1. incompatible changes - INTO, INTO STRICT, FOUND - there is a agreement so current behave is not ideal for all cases, but there is not a agreement so it broken and should be "fixed" - GUC doesn't helps here.
2. new features - the question was "how much we would to move PL/pgSQL from verbose Ada language to different place - convention against configuration principle", and what (if) conventions should be default. GUC can partially helps.
I still hope so there is some space for finding a partial agreement - and we can do some evolution steps forward.
I would not to use GUC like "We cannot to find a agreement, so we use GUC and somebody will use this feature, some one not" - it is not way how to do things better long time.
Jim, Marko, Joel - is there a place, features where we can find a partial agreement? If it is, then we can move our view there.
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
Related to that, I suspect we could add better support to existing commands for at least some of these things. For example, SELECT ... INTO NOMULTI (instead of STRICT) to indicate that multiple rows are an error but missing data isAnother flag into NOMULTI can be solution too.The new syntax ":=" has some advantages:1. it robust against type - it is harder to do unwanted swap of variables, and this mistake is very clear
should be "against typo", sorry
Pavel
2. the syntax ensure equality of target variables and source expressions.
On Sat, Jan 7, 2017 at 8:56 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > Jim, Marko, Joel - is there a place, features where we can find a partial agreement? If it is, then we can move our viewthere. I have decided I definitively want a new language, and I'm willing to pay for it. Hopefully the community will join forces and contribute with ideas and code, but with or without you or the rest of the community, plpgsql2 is going to happen. Call it pltrustly or plpgsql2, I don't care. I just care about ending my suffering from being forced writing plpgsql every day. It sucks, and I'm going to end it. I'm just too fed up with the annoyances of plpgsql. I cannot care less about _hypothetical_ incompatibility problems, I think your arguments "this is like Perl6 or Python3" are delusional. You can easily intermix plpgsql and plpgsql2 in the same "application", something you cannot do with Perl6 or Python3. So please stop using that as an argument. If anyone has an application where the hypothetical incompatibility problems would be a problem, then just continue to use plpgsql. And please kill all these GUCs ideas. The best thing with PostgreSQL is the natural expected behaviour of the default configuration. Contrary to MySQL where you have to enable lots and lots of configuration options just to get a behaviour you expect as a novice user. It's much better to just come together and agree on whatever we have learned during the last 15 years of PL/pgSQL1, and sample all ideas during a year maybe, and decide what to put into PL/pgSQL2. To make it useful, we should aim to not break compatibility for _most_ code, but accept some necessary rewrites of functions with deprecated anti-patterns. I'm even willing to suggest it might be a good idea to first try out PL/pgSQL2 at Trustly, and after a year of usage, report back to the community of our findings on how well it worked out for us, to allow all others to learn from our mistakes during our first year of using the new language. That way less people and companies will have to suffer when we discover what we got wrong in what we thought would work out well for us. During the same trial period maybe your company Pavel and others can try out their ideas of a PL/pgSQL2 and implement it, see how it works out for you, and then report back to the community on your findings from production environments. That way we can avoid all these hypothetical discussions on what will be good or bad without having any empirical evidence at hand.
On 1/8/17 2:52 AM, Joel Jacobson wrote: > And please kill all these GUCs ideas. The best thing with PostgreSQL > is the natural expected behaviour of the default configuration. > Contrary to MySQL where you have to enable lots and lots of > configuration options just to get a behaviour you expect as a novice > user. The only reason to use GUCs or some other kind of backwards compatibility setting would be to allow the current plpgsql itself to move forwards. If you think that's a dead end (which I can certainly understand) then they make no sense at all. > It's much better to just come together and agree on whatever we have > learned during the last 15 years of PL/pgSQL1, and sample all ideas > during a year maybe, and decide what to put into PL/pgSQL2. To make it > useful, we should aim to not break compatibility for _most_ code, but > accept some necessary rewrites of functions with deprecated > anti-patterns. If we're going to create a brand new language then I think it would be extremely foolish to keep *any* of the current pain points around. Off the top of my head: - variables must have an identifier (what $ in most languages does). The steps you have to go through to avoid simple naming collisions are insane. - Support for composite types needs to be stronger. Off the top of my head, you need to be able to reference an element name via a variable. OR, maybe it'd be better to just provide a plpgsql equivalent to a dict. - GET DIAGNOSTICS and their ilk need to die. There needs to be an easier way to get that kind of info back (perhaps via an automatic composite/record/dict). - There needs to be real support for dealing with exceptions. IE: get a composite of all exception deatils, modify parts of it, then re-raise with the new info. - Real support for using variables as identifiers / nothing restricted to only accepting a Const. - Support for the notion of a variable being unset (which is NOT the same thing as NULL). That said, I'll bet we still get some of that wrong, so there better be some way of fixing those issues down the road... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 1/7/17 11:44 PM, Pavel Stehule wrote: > This is not overloading of SQL command - it is like annotations. It is > smart idea, so I was not surprised if ANSI/SQL reuses it. SHas ANSI declared that they will NEVER support := in a SELECT that's not running in a stored function? Because if they haven't done that, there's nothing preventing them from doing just that. If that happens we're going to have some very difficult choices to make. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 1/8/17 12:03 AM, Pavel Stehule wrote: > BTW, I do wish you could change the label of the scope that > arguments went into, so that you could use that label to refer > to function parameters. If we allowed that it'd perhaps be the > best of both worlds: you'd be guaranteed access to all auto > variables and parameters, and that access wouldn't need to be > tied to the function name (which can be both painful and error > prone). > > > We can talk about compiler directive. > > PRAGMA auto_variables_label(xxxx) -- require function scope only > > > If we know a list of all auto variables, then it can be on function or > block level - it can create aliases. Oh, the problem is that if you have an argument with the same name as an auto variable you're in trouble. Probably the easiest thing is to have a scope that sits above the scope containing the arguments, and then allow the user to rename both scopes if so desired. So in effect you'd end up with <<plpgsql>> -- new scope DECLARE FOUND; etc BEGIN <<function_name>> DECLARE argument_1; argument_2; BEGIN -- User supplied block goes here, with optional label END; END; Alternatively, we could do... <<function_name>> DECLARE FOUND; etc BEGIN DECLARE-- User's DECLARE argument_1; argomuent_2; -- User supplied declare code BEGIN -- User's BEGIN .... END That removes one level of nesting. It's probably better to go with the first option though, since it's simpler. In both cases, I'd really like the ability to rename those blocks. #pragma would be fine for that. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
2017-01-09 0:37 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/8/17 2:52 AM, Joel Jacobson wrote:And please kill all these GUCs ideas. The best thing with PostgreSQL
is the natural expected behaviour of the default configuration.
Contrary to MySQL where you have to enable lots and lots of
configuration options just to get a behaviour you expect as a novice
user.
The only reason to use GUCs or some other kind of backwards compatibility setting would be to allow the current plpgsql itself to move forwards. If you think that's a dead end (which I can certainly understand) then they make no sense at all.It's much better to just come together and agree on whatever we have
learned during the last 15 years of PL/pgSQL1, and sample all ideas
during a year maybe, and decide what to put into PL/pgSQL2. To make it
useful, we should aim to not break compatibility for _most_ code, but
accept some necessary rewrites of functions with deprecated
anti-patterns.
If we're going to create a brand new language then I think it would be extremely foolish to keep *any* of the current pain points around. Off the top of my head:
- variables must have an identifier (what $ in most languages does). The steps you have to go through to avoid simple naming collisions are insane.
just note - from 9.0 the collisions are not a issue
- Support for composite types needs to be stronger. Off the top of my head, you need to be able to reference an element name via a variable. OR, maybe it'd be better to just provide a plpgsql equivalent to a dict.
This point self needs significant code refactoring - maybe total rewriting PL executor - it allows to change expression result data type in cycle. It doesn't mean so I fully disagree with this point, but it is not easy to implement it in type strict environment - C, C++, Pascal, Ada - hasn't any similar - maybe it is possible with some libraries.
- GET DIAGNOSTICS and their ilk need to die. There needs to be an easier way to get that kind of info back (perhaps via an automatic composite/record/dict).
It is about performance - probably you wouldn't to fill all dict fields after any statement.
- There needs to be real support for dealing with exceptions. IE: get a composite of all exception deatils, modify parts of it, then re-raise with the new info.
- Real support for using variables as identifiers / nothing restricted to only accepting a Const.
second point that enforces new PL environment - writing from scratch - and it hides the cost of dynamic SQL
- Support for the notion of a variable being unset (which is NOT the same thing as NULL).
That said, I'll bet we still get some of that wrong, so there better be some way of fixing those issues down the road...
With these requests you have to specify first, how much close will be your ideal language with PostgreSQL. Currently PL/pgSQL is pretty close - with some impacts. Your mentioned features can requires more independent environment from Postgres.
What is really weak in plpgsql is a left side of assign statement and missing some global/module/extensions variables.
Maybe if we integrate more PLLua or PLPython, PLPerl it can be better for these requests.
I am not sure about benefit new only PostgreSQL specific language. What do you think about Lua - it is light, pretty fast, dynamic, fast dictionary API that can mask lot of internals.
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
2017-01-09 1:10 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/8/17 12:03 AM, Pavel Stehule wrote:BTW, I do wish you could change the label of the scope that
arguments went into, so that you could use that label to refer
to function parameters. If we allowed that it'd perhaps be the
best of both worlds: you'd be guaranteed access to all auto
variables and parameters, and that access wouldn't need to be
tied to the function name (which can be both painful and error
prone).
We can talk about compiler directive.
PRAGMA auto_variables_label(xxxx) -- require function scope only
If we know a list of all auto variables, then it can be on function or
block level - it can create aliases.
Oh, the problem is that if you have an argument with the same name as an auto variable you're in trouble.
I didn't well explained my idea
It is similar to your plpgsql scope. You are introducing the convention. I proposed a explicit specification. The result is similar.
Probably the easiest thing is to have a scope that sits above the scope containing the arguments, and then allow the user to rename both scopes if so desired. So in effect you'd end up with
<<plpgsql>> -- new scope
DECLARE
FOUND;
etc
BEGIN
<<function_name>>
DECLARE
argument_1;
argument_2;
BEGIN
-- User supplied block goes here, with optional label
END;
END;
It is similar to
PRAGMA auto_variables_namespace(plpgsql);
BEGIN
...
END;
Using PRAGMA is more verbose - it is useful for code audit, review - it is speaking "I will overwrite some auto variables here, and I need special namespace"
plpgsql_check, maybe plpgsql self can raise warning if these variables are shadowed and some option/pragma is not used. Maybe current extra check does it already.
Alternatively, we could do...
<<function_name>>
DECLARE
FOUND;
etc
BEGIN
DECLARE-- User's DECLARE
argument_1;
argomuent_2;
-- User supplied declare code
BEGIN -- User's BEGIN
....
END
That removes one level of nesting. It's probably better to go with the first option though, since it's simpler.
You are forgot on function paramaters - somebody can use a function argument like FOUND, .. So auto variables should to be declared in most top namespace.
Usually it is invisible for users - one, two more namespaces has zero cost for compilation and absolute zero impact for evaluation.
In both cases, I'd really like the ability to rename those blocks. #pragma would be fine for that.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
2017-01-09 0:39 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/7/17 11:44 PM, Pavel Stehule wrote:This is not overloading of SQL command - it is like annotations. It is
smart idea, so I was not surprised if ANSI/SQL reuses it.
SHas ANSI declared that they will NEVER support := in a SELECT that's not running in a stored function? Because if they haven't done that, there's nothing preventing them from doing just that. If that happens we're going to have some very difficult choices to make.
No, there is nothing declared in ANSI. But currently in ANSI is not using one operator for two different thing.
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
Hi
Real support for using variables as identifiers / nothing restricted to only accepting a Const.
This point is problematic not only from performance perspective.
if you don't use some special syntax and you allow variables as identifier, then you will got a ambiguous situation quickly - although variables can have special symbol prefix
SELECT * FROM tab WHERE $var1 = $var3
What is $var1, what is $var2? identifier or value?
Regards
Pavel
On Sun, Jan 8, 2017 at 2:52 AM, Joel Jacobson <joel@trustly.com> wrote: > On Sat, Jan 7, 2017 at 8:56 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> >> Jim, Marko, Joel - is there a place, features where we can find a partial agreement? If it is, then we can move our viewthere. > > I have decided I definitively want a new language, and I'm willing to > pay for it. well, as they say, "money talks" :-D. > Hopefully the community will join forces and contribute with ideas and > code, but with or without you or the rest of the community, plpgsql2 > is going to happen. > Call it pltrustly or plpgsql2, I don't care. I just care about ending > my suffering from being forced writing plpgsql every day. It sucks, > and I'm going to end it. Curious, are you mainly troubled by the 'INTO STRICT' family of problems? Or something else? Pavel has scored some points with PRAGMA syntax and ISTM that does not require compatibility break. > And please kill all these GUCs ideas. The best thing with PostgreSQL > is the natural expected behaviour of the default configuration. > Contrary to MySQL where you have to enable lots and lots of > configuration options just to get a behaviour you expect as a novice > user. I think there is a lot of support for this point of view. Jim is notable outlier here, but for the most part we don't do language behavior changes with GUC. > It's much better to just come together and agree on whatever we have > learned during the last 15 years of PL/pgSQL1, and sample all ideas > during a year maybe, and decide what to put into PL/pgSQL2. To make it > useful, we should aim to not break compatibility for _most_ code, but > accept some necessary rewrites of functions with deprecated > anti-patterns. Agreed: If you want to break compatibility, pushing a new language is the better way than GUC. If you got consensus on this, having both languages side by side supported for a while (maybe 4-5 releases) is they way to go, and finally the only language is frozen and moved to extension. But this is a lot of work and aggravation, are you *sure* you can only get what you want with a full compatibility break? With respect to your company developers specifically? I'm genuinely curious if you've taken a good look at pl/v8 and why you've determined it's not suitable to move forward with. It's got a different set of headaches, but is really fast, and sometimes wonder if with some alternative preprocessing (like coffeescript but geared towards SQL) could have some long term promise. merlin
On Mon, Jan 9, 2017 at 12:37 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
If we're going to create a brand new language then I think it would be extremely foolish to keep *any* of the current pain points around. Off the top of my head:
- variables must have an identifier (what $ in most languages does). The steps you have to go through to avoid simple naming collisions are insane.
This is exactly what we did not want to do with this project. The idea is to create a language which is really close to PL/PgSQL, but removes some of the brain diarrhoea currently present.
Now, this *is* a problem, and the solution we had (well I, mostly, at this point) in mind is to use the underscore prefix for all input variables and make OUT parameters invisible to queries inside function bodies unless explicitly prefixed with OUT. As far as I can tell this eliminates most if not all collisions while staying almost completely compatible with arguably well-written PL/PgSQL 1.
- Support for the notion of a variable being unset (which is NOT the same thing as NULL).
My idea was that the currently unsupported combination of NOT NULL and no DEFAULT would mean "has to be assigned to a non-NULL value before it can be read from, or an exception is thrown". Solves the most common use case and is backwards compatible.
.m
On 01/09/2017 06:12 PM, Merlin Moncure wrote: > With respect to your company developers specifically? I'm genuinely > curious if you've taken a good look at pl/v8 and why you've determined > it's not suitable to move forward with. It's got a different set of > headaches, but is really fast, and sometimes wonder if with some > alternative preprocessing (like coffeescript but geared towards SQL) > could have some long term promise. > Yeah, especially if built against a modern V8, with all or most of the ES6 stuff. Without template strings and lexically scoped variables it's very unpleasant for large functions, but with them it's usable. It's also something a very large number of people are familiar with. As you say it's damn fast. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/9/17 5:30 PM, Marko Tiikkaja wrote: > > This is exactly what we did not want to do with this project. The idea > is to create a language which is really close to PL/PgSQL, but removes > some of the brain diarrhoea currently present. As a general comment, ISTM it'd be much better to do as much as we can in the current language then. It's going to take a LOT to get people to switch to a different language, so there needs to be a LOT of added value. > Now, this *is* a problem, and the solution we had (well I, mostly, at > this point) in mind is to use the underscore prefix for all input > variables and make OUT parameters invisible to queries inside function > bodies unless explicitly prefixed with OUT. As far as I can tell this > eliminates most if not all collisions while staying almost completely > compatible with arguably well-written PL/PgSQL 1. That might be workable... it's still rather ugly though. I don't see prefixing everything with _ as being useful though; people can already do that if they want to uglify the function's argument names. I do think there's stuff that could be done along these lines with namespaces though. Allowing users to rename the namespace that arguments went into would be a huge step forward. I think having a separate namespace for all the automatic variables would be a big help too. Amusingly, that would allow users to set the namespace to '$', which would (almost) give you $variable. > - Support for the notion of a variable being unset (which is NOT the > same thing as NULL). > > > My idea was that the currently unsupported combination of NOT NULL and > no DEFAULT would mean "has to be assigned to a non-NULL value before it > can be read from, or an exception is thrown". Solves the most common > use case and is backwards compatible. That won't allow you to use a variable in multiple places though... is there a reason we couldn't support something like IS DEFINED and UNSET? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On Tue, Jan 10, 2017 at 12:47 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/9/17 5:30 PM, Marko Tiikkaja wrote:
My idea was that the currently unsupported combination of NOT NULL and
no DEFAULT would mean "has to be assigned to a non-NULL value before it
can be read from, or an exception is thrown". Solves the most common
use case and is backwards compatible.
That won't allow you to use a variable in multiple places though... is there a reason we couldn't support something like IS DEFINED and UNSET?
I don't understand what your use case is. Could you demonstrate that with some code you'd write if these features were in?
.m
On 1/9/17 5:12 PM, Merlin Moncure wrote: > Agreed: If you want to break compatibility, pushing a new language is > the better way than GUC. If you got consensus on this, having both > languages side by side supported for a while (maybe 4-5 releases) is > they way to go, and finally the only language is frozen and moved to > extension. But this is a lot of work and aggravation, are you *sure* > you can only get what you want with a full compatibility break? FWIW, that work and aggravation part is what I hoped to avoid with GUCs. I do think that whichever route we go, we're going to be stuck supporting the old version for a LONG time. A big part of why standard_conforming_strings was so ugly is users didn't have enough time to adjust. If we'd had that enabled by default for 4-5 releases it wouldn't have been nearly as much of an issue. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 1/9/17 3:01 AM, Pavel Stehule wrote: > You are forgot on function paramaters - somebody can use a function > argument like FOUND, .. So auto variables should to be declared in most > top namespace. Right, that's why I said it was an alternative. I agree it would be better to just have 2 explicit namespaces: the top one being auto variables and the one below that being function arguments. The namespace below that would be the top-most *user* block. Both of the pre-defined namespaces need the ability to change their name; I don't see any issue with using PRAGMA for that. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 1/9/17 5:53 PM, Marko Tiikkaja wrote: > My idea was that the currently unsupported combination of NOT > NULL and > no DEFAULT would mean "has to be assigned to a non-NULL value > before it > can be read from, or an exception is thrown". Solves the most > common > use case and is backwards compatible. > > > That won't allow you to use a variable in multiple places though... > is there a reason we couldn't support something like IS DEFINED and > UNSET? > > > I don't understand what your use case is. Could you demonstrate that > with some code you'd write if these features were in? One use case is NEW and OLD in triggers. Checking to see if one or the other is set is easier than checking TG_OP. It's also going to be faster (probably MUCH faster; IIRC the comparison currently happens via SPI). Another case is selecting into a record: EXECUTE ... INTO rec; IF rec IS DEFINED THEN ELSE EXECUTE <something else> INTO rec; IF rec IS DEFINED THEN ... Perhaps DEFINED is not the best keyword. Ultimately I want to know if a variable has been assigned a value, as well as being able to mark a variable as unassigned (though arguably you might not need to be able to un-assign...). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On Tue, Jan 10, 2017 at 1:03 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/9/17 5:53 PM, Marko Tiikkaja wrote:My idea was that the currently unsupported combination of NOT
NULL and
no DEFAULT would mean "has to be assigned to a non-NULL value
before it
can be read from, or an exception is thrown". Solves the most
common
use case and is backwards compatible.
That won't allow you to use a variable in multiple places though...
is there a reason we couldn't support something like IS DEFINED and
UNSET?
I don't understand what your use case is. Could you demonstrate that
with some code you'd write if these features were in?
One use case is NEW and OLD in triggers. Checking to see if one or the other is set is easier than checking TG_OP. It's also going to be faster (probably MUCH faster; IIRC the comparison currently happens via SPI).
This sounds useless.
Another case is selecting into a record:
EXECUTE ... INTO rec;
IF rec IS DEFINED THEN
ELSE
EXECUTE <something else> INTO rec;
IF rec IS DEFINED THEN
And this a workaround for non-functional FOUND.
I can't get excited about this idea based on these examples.
.m
On 1/9/17 6:07 PM, Marko Tiikkaja wrote: > One use case is NEW and OLD in triggers. Checking to see if one or > the other is set is easier than checking TG_OP. It's also going to > be faster (probably MUCH faster; IIRC the comparison currently > happens via SPI). > > > This sounds useless. I guess you've not written much non-trivial trigger code then... the amount of code duplication you end up with is quite ridiculous. It's also a good example of why treating this as an exception and trapping isn't a good solution either: you can already do that with triggers today. Being able to check the existence of a variable is a very common idiom in other languages, so I'm don't see why plpgsql shouldn't have it. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
2017-01-10 2:02 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/9/17 6:07 PM, Marko Tiikkaja wrote:One use case is NEW and OLD in triggers. Checking to see if one or
the other is set is easier than checking TG_OP. It's also going to
be faster (probably MUCH faster; IIRC the comparison currently
happens via SPI).
This sounds useless.
I guess you've not written much non-trivial trigger code then... the amount of code duplication you end up with is quite ridiculous. It's also a good example of why treating this as an exception and trapping isn't a good solution either: you can already do that with triggers today.
Being able to check the existence of a variable is a very common idiom in other languages, so I'm don't see why plpgsql shouldn't have it.
In strongly typed language like PLpgSQL is DEFINE little bit strange. On second hand there are some elements of dynamic languages - record types and polymorphics parameters.
Some languages has reflection API - some other like Oberon some special statements - variable guards that allows safe casting and safe usage - it is not far what we do with Node API.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
On 1/7/17 6:39 AM, Pavel Stehule wrote: > ROW_COUNT .. shortcut for GET DIAGNOSTICS row_count = ROW_COUNT. I don't see the point. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-01-10 5:59 GMT+01:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>:
On 1/7/17 6:39 AM, Pavel Stehule wrote:
> ROW_COUNT .. shortcut for GET DIAGNOSTICS row_count = ROW_COUNT.
I don't see the point.
A check how much rows was impacted by query is relative often task. So we can do this task more user friendly.
Second motivation - ROW_COUNT is working for static and for dynamic SQL - it can be partial replace of FOUND variable.
But now, when I am thinking about it - it can be strange for some users too. Pretty often we use implicit LIMIT for query execution. So ROW_COUNT can be probably different than users expecting.
Regards
Pavel
--
Peter Eisentraut http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/10/17 12:06 AM, Pavel Stehule wrote: > A check how much rows was impacted by query is relative often task. So > we can do this task more user friendly. > > Second motivation - ROW_COUNT is working for static and for dynamic SQL > - it can be partial replace of FOUND variable. What is stopping anyone from claiming that their favorite diagnostic item is also a relatively often task and request it to become an automatic variable? Where does it stop? It's not like PL/pgSQL is the king of brevity. Creating inconsistent and arbitrary warts to save a few characters does not appear appealing. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-01-10 14:26 GMT+01:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>:
On 1/10/17 12:06 AM, Pavel Stehule wrote:
> A check how much rows was impacted by query is relative often task. So
> we can do this task more user friendly.
>
> Second motivation - ROW_COUNT is working for static and for dynamic SQL
> - it can be partial replace of FOUND variable.
What is stopping anyone from claiming that their favorite diagnostic
item is also a relatively often task and request it to become an
automatic variable? Where does it stop?
There is only two possible fields - ROW_COUNT and RESULT_OID. Result Oid is not almost unused today. So stop is ROW_COUNT
It's not like PL/pgSQL is the king of brevity. Creating inconsistent
and arbitrary warts to save a few characters does not appear appealing.
yes
Regards
Pavel
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jan 10, 2017 at 2:26 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
It's not like PL/pgSQL is the king of brevity.
This is essentially saying "PL/PgSQL isn't perfect, so we shouldn't try and make it better". I hear this argument a lot, and as long as people keep rejecting improvements for this reason they can keep saying it. It's a self-fulfilling prophecy.
.m
On Tue, Jan 10, 2017 at 7:44 AM, Marko Tiikkaja <marko@joh.to> wrote: > On Tue, Jan 10, 2017 at 2:26 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> >> It's not like PL/pgSQL is the king of brevity. > > > This is essentially saying "PL/PgSQL isn't perfect, so we shouldn't try and > make it better". I hear this argument a lot, and as long as people keep > rejecting improvements for this reason they can keep saying it. It's a > self-fulfilling prophecy. Agreed. But adding language features, especially syntactical ones, demands prudence; there is good reason to limit keywords like that. What about: pgsql.rows pgsql.found pgsql.sqlerrm etc as automatic variables (I think this was suggested upthread). Conflicts with existing structures is of course an issue but I bet it could be worked out. I also kinda disagree on the brevity point, or at least would like to add some color. SQL is verbose in the sense of "let's make everything an english language sentence" but incredibly terse relative to other language implementations of the same task. Embedded SQL tends to be uniformly clumsy due to all of the extra handling of errrors, parameterization, etc. This is why we write plpgsql naturally. merlin
2017-01-11 15:37 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>:
On Tue, Jan 10, 2017 at 7:44 AM, Marko Tiikkaja <marko@joh.to> wrote:
> On Tue, Jan 10, 2017 at 2:26 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>>
>> It's not like PL/pgSQL is the king of brevity.
>
>
> This is essentially saying "PL/PgSQL isn't perfect, so we shouldn't try and
> make it better". I hear this argument a lot, and as long as people keep
> rejecting improvements for this reason they can keep saying it. It's a
> self-fulfilling prophecy.
Agreed. But adding language features, especially syntactical ones,
demands prudence; there is good reason to limit keywords like that.
What about:
pgsql.rows
pgsql.found
pgsql.sqlerrm
etc
as automatic variables (I think this was suggested upthread).
Conflicts with existing structures is of course an issue but I bet it
could be worked out.
Any implicit namespace can be problem. But we can continue in default unlabeled namespace for auto variables with possibility to specify this namespace explicitly.
Regards
Pavel
On 1/10/17 8:44 AM, Marko Tiikkaja wrote: > On Tue, Jan 10, 2017 at 2:26 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com > <mailto:peter.eisentraut@2ndquadrant.com>> wrote: > > It's not like PL/pgSQL is the king of brevity. > > > This is essentially saying "PL/PgSQL isn't perfect, so we shouldn't try > and make it better". I hear this argument a lot, and as long as people > keep rejecting improvements for this reason they can keep saying it. > It's a self-fulfilling prophecy. I'm not making that argument. But if the plan here is that PL/pgSQL is too verbose, let's make it less verbose, then maybe, but let's see a more complete plan for that. The current syntax was chosen because it is SQL-compatible. Adding redundant syntax to save a few characters without any new functionality (performance, resource usage, safety, etc.) is a weak argument in the overall scheme of things. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jan 11, 2017 at 11:11 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > The current syntax was chosen because it is SQL-compatible. Adding > redundant syntax to save a few characters without any new functionality > (performance, resource usage, safety, etc.) is a weak argument in the > overall scheme of things. Yeah -- exactly. The few minor things that are not 100% SQL compatible I find to be major headaches. Incompatible usage of INTO for example. This thread has been going on for quite some time now and is starting to become somewhat circular. Perhaps we ought to organize the various ideas and pain points presented in a wiki along with conclusions, and in some cases if there is no solution that is compatible with the current syntax. merlin
2017-01-11 20:53 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>:
On Wed, Jan 11, 2017 at 11:11 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> The current syntax was chosen because it is SQL-compatible. Adding
> redundant syntax to save a few characters without any new functionality
> (performance, resource usage, safety, etc.) is a weak argument in the
> overall scheme of things.
Yeah -- exactly. The few minor things that are not 100% SQL
compatible I find to be major headaches. Incompatible usage of INTO
for example.
We not designed INTO usage in plpgsql - it is PL/SQL heritage.
PL/SQL = ADA + Oracle SQL; -- but sometimes the result is not perfect - Ada was not designed be integrated with SQL
This thread has been going on for quite some time now and is starting
to become somewhat circular. Perhaps we ought to organize the
various ideas and pain points presented in a wiki along with
conclusions, and in some cases if there is no solution that is
compatible with the current syntax.
There is a language that is much better integrated with SQL - SQL/PSM
http://postgres.cz/wiki/SQL/PSM_Manual
http://postgres.cz/wiki/SQL/PSM_Manual
It is less verbose, but still verbose language. It is static typed language - so it can be bad for some people.
But due design based on SQL integration from base, there is less conflicts between SQL and PL.
Regards
Pavel
merlin
On Mon, Jan 9, 2017 at 6:53 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > I do think that whichever route we go, we're going to be stuck supporting > the old version for a LONG time. A big part of why > standard_conforming_strings was so ugly is users didn't have enough time to > adjust. If we'd had that enabled by default for 4-5 releases it wouldn't > have been nearly as much of an issue. /me boggles. I think you are confused about the history here. standard_conforming_strings had a generously long phase-in period. - The E'' syntax and the standard_conforming_strings GUC were added in PostgreSQL 8.0. The only legal value of standard_conforming_strings was "false". - In PostgreSQL 8.1, it became possible to set standard_conforming_strings to "true", but the default was still "false". - In PostgreSQL 9.1, the default was changed to "true". So there 6 major release from the time the GUC was added and 5 from the time it became mutable before the default was flipped. We've now had 5 more since the default was changed to "true". (No, it's not time to remove the GUC yet. At least not in my opinion.) One thing that made changing standard_conforming_strings particularly painful was that it had knock-on effects on many language-specific drivers not maintained by the core project (or just plain not maintained). I don't think the language changes being proposed here for PL/pgsql would have the same kind of impact, but some of them would make it significantly harder to migrate to PostgreSQL from Oracle, which some people might see as an anti-goal (as per other nearby threads on making that easier). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jan 11, 2017 at 2:57 PM, Robert Haas <robertmhaas@gmail.com> wrote: > - The E'' syntax and the standard_conforming_strings GUC were added in > PostgreSQL 8.0. The only legal value of standard_conforming_strings > was "false". > > - In PostgreSQL 8.1, it became possible to set > standard_conforming_strings to "true", but the default was still > "false". > > - In PostgreSQL 9.1, the default was changed to "true". > > So there 6 major release from the time the GUC was added and 5 from > the time it became mutable before the default was flipped. We've now > had 5 more since the default was changed to "true". (No, it's not > time to remove the GUC yet. At least not in my opinion.) > > One thing that made changing standard_conforming_strings particularly > painful was that it had knock-on effects on many language-specific > drivers not maintained by the core project (or just plain not > maintained). I don't think the language changes being proposed here > for PL/pgsql would have the same kind of impact, but some of them > would make it significantly harder to migrate to PostgreSQL from > Oracle, which some people might see as an anti-goal (as per other > nearby threads on making that easier). I don't think it's a simple matter of waiting N or N+M releases (although I certainly did appreciate that we did it regardless). It comes down to this: there's just no way to release changes that break a lot of code without breaking a lot of code. It's all about acknowledging that and judging it acceptable against the benefits you get. For posterity, with respect to conforming strings, SQL injection is an absolute scourge of the computing world so on balance we did the right thing. Having said that, It's always good to do the math and the calculation is primarily an economic one, I think, merlin
On 1/11/17 12:07 PM, Pavel Stehule wrote: > PL/SQL = ADA + Oracle SQL; -- but sometimes the result is not perfect - > Ada was not designed be integrated with SQL ... > There is a language that is much better integrated with SQL - SQL/PSM I think it is worth considering ways to increase compatibility with plsql, as well as pulling PSM into core. The former would be to help migrating from Oracle; the latter would be to provide everyone a cleaner built-in PL. (IMHO a PLSQL equivalent could certainly be an external extension). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)