Thread: plpgsql.consistent_into
Greetings fellow elephants, I would humbly like to submit for your consideration my proposal for alleviating pain caused by one of the most annoying footguns in PL/PgSQL: the behaviour of SELECT .. INTO when the query returns more than one row. Some of you might know that no exception is raised in this case (as opposed to INSERT/UPDATE/DELETE .. INTO, all of them yielding TOO_MANY_ROWS), which can hide subtle bugs in queries if during testing the query always returns only one row or the "correct" one happens to be picked up every time. Additionally, the row_count() after execution is always going to be either 0 or 1, so even if you want to explicitly guard against potentially broken queries, you can't do so! So I added the following compile-time option: set plpgsql.consistent_into to true; create or replace function footest() returns void as $$ declare x int; begin -- too many rows select 1 from foo into x; end$$ language plpgsql; select footest(); ERROR: query returned more than one row It defaults to false to preserve full backwards compatibility. Also turning it on makes the executor try and find two rows, so it might have an effect on performance as well. The patch, as currently written, also changes the behaviour of EXECUTE .. INTO, but I don't feel strongly about whether that should be affected as well or not. Regards, Marko Tiikkaja
Attachment
Hello
2014/1/12 Marko Tiikkaja <marko@joh.to>
Greetings fellow elephants,
I would humbly like to submit for your consideration my proposal for alleviating pain caused by one of the most annoying footguns in PL/PgSQL: the behaviour of SELECT .. INTO when the query returns more than one row. Some of you might know that no exception is raised in this case (as opposed to INSERT/UPDATE/DELETE .. INTO, all of them yielding TOO_MANY_ROWS), which can hide subtle bugs in queries if during testing the query always returns only one row or the "correct" one happens to be picked up every time. Additionally, the row_count() after execution is always going to be either 0 or 1, so even if you want to explicitly guard against potentially broken queries, you can't do so!
It is not bad and, sure, - it is very useful and important
but - it is a redundant to INTO STRICT clause. When you use it, then you change a INTO behaviour. Is not better to ensure STRICT option than hidden redefining INTO?
Option INTO (without STRICT clause) is not safe and we should to disallow. I see a three states (not only two)
a) disallow INTO without STRICT (as preferred for new code)
b) implicit check after every INTO without STRICT
c) without check
these modes should be: "strict_required", "strict_default", "strict_legacy"
So I added the following compile-time option:
set plpgsql.consistent_into to true;
This name is not best (there is not clean with it a into should be consistent)
Is question, if this functionality should be enabled by GUC to be used for legacy code (as protection against some kind of hidden bugs)
This topic is interesting idea for me - some checks can be pushed to plpgsql_check (as errors or warnings) too.
Generally I like proposed functionality, just I am not sure, so hidden redefining INTO clause (to INTO STRICT) is what we want. We can do it (but explicitly). I don't know any situation where INTO without STRICT is valid. Introduction of STRICT option was wrong idea - and now is not way to back.
Regards
Pavel
create or replace function footest() returns void as $$
declare
x int;
begin
-- too many rows
select 1 from foo into x;
end$$ language plpgsql;
select footest();
ERROR: query returned more than one row
It defaults to false to preserve full backwards compatibility. Also turning it on makes the executor try and find two rows, so it might have an effect on performance as well. The patch, as currently written, also changes the behaviour of EXECUTE .. INTO, but I don't feel strongly about whether that should be affected as well or not.
Regards,
Marko Tiikkaja
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 1/12/14, 7:47 AM, Pavel Stehule wrote: > 2014/1/12 Marko Tiikkaja <marko@joh.to> > >> Greetings fellow elephants, >> >> I would humbly like to submit for your consideration my proposal for >> alleviating pain caused by one of the most annoying footguns in PL/PgSQL: >> the behaviour of SELECT .. INTO when the query returns more than one row. >> Some of you might know that no exception is raised in this case (as >> opposed to INSERT/UPDATE/DELETE .. INTO, all of them yielding >> TOO_MANY_ROWS), which can hide subtle bugs in queries if during testing the >> query always returns only one row or the "correct" one happens to be picked >> up every time. Additionally, the row_count() after execution is always >> going to be either 0 or 1, so even if you want to explicitly guard against >> potentially broken queries, you can't do so! >> > > It is not bad and, sure, - it is very useful and important > > but - it is a redundant to INTO STRICT clause. When you use it, then you > change a INTO behaviour. Is not better to ensure STRICT option than hidden > redefining INTO? That only works if the query should never return 0 rows either. If you want to allow for missing rows, STRICT is out of the question. > Option INTO (without STRICT clause) is not safe and we should to disallow. > I see a three states (not only two) > > a) disallow INTO without STRICT (as preferred for new code) > b) implicit check after every INTO without STRICT > c) without check > > these modes should be: "strict_required", "strict_default", "strict_legacy" I can't get excited about this. Mostly because it doesn't solve the problem I'm having. It is important to be able to execute queries with INTO which might not return a row. That's what FOUND is for. >> So I added the following compile-time option: >> >> >> set plpgsql.consistent_into to true; >> > > This name is not best (there is not clean with it a into should be > consistent) I agree, but I had to pick something. One of the three hard problems in CS.. > Is question, if this functionality should be enabled by GUC to be used for > legacy code (as protection against some kind of hidden bugs) > > This topic is interesting idea for me - some checks can be pushed to > plpgsql_check (as errors or warnings) too. > > Generally I like proposed functionality, just I am not sure, so hidden > redefining INTO clause (to INTO STRICT) is what we want. We can do it (but > explicitly). I don't know any situation where INTO without STRICT is valid. > Introduction of STRICT option was wrong idea - and now is not way to back. Note that this is different from implicitly STRICTifying every INTO, like I said above. Regards, Marko Tiikkaja
2014/1/12 Marko Tiikkaja <marko@joh.to>
On 1/12/14, 7:47 AM, Pavel Stehule wrote:That only works if the query should never return 0 rows either. If you want to allow for missing rows, STRICT is out of the question.2014/1/12 Marko Tiikkaja <marko@joh.to>Greetings fellow elephants,
I would humbly like to submit for your consideration my proposal for
alleviating pain caused by one of the most annoying footguns in PL/PgSQL:
the behaviour of SELECT .. INTO when the query returns more than one row.
Some of you might know that no exception is raised in this case (as
opposed to INSERT/UPDATE/DELETE .. INTO, all of them yielding
TOO_MANY_ROWS), which can hide subtle bugs in queries if during testing the
query always returns only one row or the "correct" one happens to be picked
up every time. Additionally, the row_count() after execution is always
going to be either 0 or 1, so even if you want to explicitly guard against
potentially broken queries, you can't do so!
It is not bad and, sure, - it is very useful and important
but - it is a redundant to INTO STRICT clause. When you use it, then you
change a INTO behaviour. Is not better to ensure STRICT option than hidden
redefining INTO?
hmm - you have true.
try to find better name.
Other questions is using a GUC for legacy code. I am for this checked mode be default (or can be simply activated for new code)
Regards
Pavel
I can't get excited about this. Mostly because it doesn't solve the problem I'm having.Option INTO (without STRICT clause) is not safe and we should to disallow.
I see a three states (not only two)
a) disallow INTO without STRICT (as preferred for new code)
b) implicit check after every INTO without STRICT
c) without check
these modes should be: "strict_required", "strict_default", "strict_legacy"
It is important to be able to execute queries with INTO which might not return a row. That's what FOUND is for.I agree, but I had to pick something. One of the three hard problems in CS..So I added the following compile-time option:
set plpgsql.consistent_into to true;
This name is not best (there is not clean with it a into should be
consistent)Note that this is different from implicitly STRICTifying every INTO, like I said above.Is question, if this functionality should be enabled by GUC to be used for
legacy code (as protection against some kind of hidden bugs)
This topic is interesting idea for me - some checks can be pushed to
plpgsql_check (as errors or warnings) too.
Generally I like proposed functionality, just I am not sure, so hidden
redefining INTO clause (to INTO STRICT) is what we want. We can do it (but
explicitly). I don't know any situation where INTO without STRICT is valid.
Introduction of STRICT option was wrong idea - and now is not way to back.
Regards,
Marko Tiikkaja
On Jan12, 2014, at 06:51 , Marko Tiikkaja <marko@joh.to> wrote: > I would humbly like to submit for your consideration my proposal for alleviating pain caused by one of the most annoyingfootguns in PL/PgSQL: the behaviour of SELECT .. INTO when the query returns more than one row. Some of you mightknow that no exception is raised in this case (as opposed to INSERT/UPDATE/DELETE .. INTO, all of them yielding TOO_MANY_ROWS),which can hide subtle bugs in queries if during testing the query always returns only one row or the "correct"one happens to be picked up every time. Additionally, the row_count() after execution is always going to be either0 or 1, so even if you want to explicitly guard against potentially broken queries, you can't do so! > > So I added the following compile-time option: > > set plpgsql.consistent_into to true; I don't think a GUC is the best way to handle this. Handling this via a per-function setting similar to #variable_conflict would IMHO be better.So a function containing #into_surplus_rows error would complain whereas #into_surplus_rows ignore_for_select would leave the behaviour unchanged. best regards, Florian Pflug
On 1/12/14, 10:19 PM, Florian Pflug wrote: > On Jan12, 2014, at 06:51 , Marko Tiikkaja <marko@joh.to> wrote: >> >> set plpgsql.consistent_into to true; > > I don't think a GUC is the best way to handle this. Handling > this via a per-function setting similar to #variable_conflict would > IMHO be better. This is exactly what the patch does. A global default in the form of a GUC, and then a per-function option for overriding that default. Regards, Marko Tiikkaja
2014/1/12 Florian Pflug <fgp@phlo.org>
There is GUC for variable_conflict already too. In this case I would to enable this functionality everywhere (it is tool how to simply eliminate some kind of strange bugs) so it needs a GUC.
On Jan12, 2014, at 06:51 , Marko Tiikkaja <marko@joh.to> wrote:I don't think a GUC is the best way to handle this. Handling
> I would humbly like to submit for your consideration my proposal for alleviating pain caused by one of the most annoying footguns in PL/PgSQL: the behaviour of SELECT .. INTO when the query returns more than one row. Some of you might know that no exception is raised in this case (as opposed to INSERT/UPDATE/DELETE .. INTO, all of them yielding TOO_MANY_ROWS), which can hide subtle bugs in queries if during testing the query always returns only one row or the "correct" one happens to be picked up every time. Additionally, the row_count() after execution is always going to be either 0 or 1, so even if you want to explicitly guard against potentially broken queries, you can't do so!
>
> So I added the following compile-time option:
>
> set plpgsql.consistent_into to true;
this via a per-function setting similar to #variable_conflict would
IMHO be better.So a function containing
#into_surplus_rows error
would complain whereas
#into_surplus_rows ignore_for_select
would leave the behaviour unchanged.
There is GUC for variable_conflict already too. In this case I would to enable this functionality everywhere (it is tool how to simply eliminate some kind of strange bugs) so it needs a GUC.
We have GUC for plpgsql.variable_conflict three years and I don't know about any problem.
Regards
Pavel
best regards,
Florian Pflug
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Jan12, 2014, at 22:37 , Pavel Stehule <pavel.stehule@gmail.com> wrote: > There is GUC for variable_conflict already too. In this case I would to > enable this functionality everywhere (it is tool how to simply eliminate > some kind of strange bugs) so it needs a GUC. > > We have GUC for plpgsql.variable_conflict three years and I don't know > about any problem. I must say I hate behaviour-changing GUCs with quite some passion. IMHO they tend to cause bugs, not avoid them, in the long run. The pattern usually is 1) Code gets written, depends on some particular set of settings to work correctly 2) Code gets reused, with little further testing since it's supposed to be battle-proven anyway. Settings get dropped. 3) Code blows up for those corner-cases where the setting actually matter. Debugging is hell, because you effectivelyhave to go over the code line-by-line and check if it might be affected by some GUC or another. Only a few days ago I spent more than an hour tracking down a bug which, as it turned out, was caused by a regex which subtly changed its meaning depending on whether standard_conforming_strings is on or off. Some GUCs are unavoidable - standard_conforming_strings, for example probably still was a good idea, since the alternative would have been to stick with the historical, non-standard behaviour forever. But in this case, my feeling is that the trouble such a GUC may cause out-weights the potential benefits. I'm all for having a directive like #consistent_into (though I feel that the name could convey the meaning better). If we *really* think that this ought to be the default from 9.4 onward, then we should *) Change it to always complain, except if the function explictly specifies "#consistent_into on" or whatever. *) Have pg_dump add that to all plpgsql functions if the server version is < 9.4 or whatever major release this endsup in That's all just my opinion of course. best regards, Florian Pflug
On 13/01/14 11:44, Florian Pflug wrote: > On Jan12, 2014, at 22:37 , Pavel Stehule <pavel.stehule@gmail.com> wrote: >> There is GUC for variable_conflict already too. In this case I would to >> enable this functionality everywhere (it is tool how to simply eliminate >> some kind of strange bugs) so it needs a GUC. >> >> We have GUC for plpgsql.variable_conflict three years and I don't know >> about any problem. > I must say I hate behaviour-changing GUCs with quite some passion. IMHO > they tend to cause bugs, not avoid them, in the long run. The pattern > usually is > > 1) Code gets written, depends on some particular set of settings > to work correctly > > 2) Code gets reused, with little further testing since it's supposed > to be battle-proven anyway. Settings get dropped. > > 3) Code blows up for those corner-cases where the setting actually > matter. Debugging is hell, because you effectively have to go > over the code line-by-line and check if it might be affected by > some GUC or another. > > Only a few days ago I spent more than an hour tracking down a bug > which, as it turned out, was caused by a regex which subtly changed its > meaning depending on whether standard_conforming_strings is on or off. > > Some GUCs are unavoidable - standard_conforming_strings, for example > probably still was a good idea, since the alternative would have been > to stick with the historical, non-standard behaviour forever. > > But in this case, my feeling is that the trouble such a GUC may cause > out-weights the potential benefits. I'm all for having a directive like > #consistent_into (though I feel that the name could convey the > meaning better). If we *really* think that this ought to be the default > from 9.4 onward, then we should > > *) Change it to always complain, except if the function explictly > specifies "#consistent_into on" or whatever. > > *) Have pg_dump add that to all plpgsql functions if the server > version is < 9.4 or whatever major release this ends up in > > That's all just my opinion of course. > > best regards, > Florian Pflug > > > Possibly there should be a warning put out, whenever someone uses some behaviour that requires a GUC set to a non-default value? Cheers, Gavin
2014/1/12 Florian Pflug <fgp@phlo.org>
On Jan12, 2014, at 22:37 , Pavel Stehule <pavel.stehule@gmail.com> wrote:I must say I hate behaviour-changing GUCs with quite some passion. IMHO
> There is GUC for variable_conflict already too. In this case I would to
> enable this functionality everywhere (it is tool how to simply eliminate
> some kind of strange bugs) so it needs a GUC.
>
> We have GUC for plpgsql.variable_conflict three years and I don't know
> about any problem.
they tend to cause bugs, not avoid them, in the long run. The pattern
usually is
1) Code gets written, depends on some particular set of settings
to work correctly
2) Code gets reused, with little further testing since it's supposed
to be battle-proven anyway. Settings get dropped.
3) Code blows up for those corner-cases where the setting actually
matter. Debugging is hell, because you effectively have to go
over the code line-by-line and check if it might be affected by
some GUC or another.
Only a few days ago I spent more than an hour tracking down a bug
which, as it turned out, was caused by a regex which subtly changed its
meaning depending on whether standard_conforming_strings is on or off.
Some GUCs are unavoidable - standard_conforming_strings, for example
probably still was a good idea, since the alternative would have been
to stick with the historical, non-standard behaviour forever.
But in this case, my feeling is that the trouble such a GUC may cause
out-weights the potential benefits. I'm all for having a directive like
#consistent_into (though I feel that the name could convey the
meaning better). If we *really* think that this ought to be the default
from 9.4 onward, then we should
*) Change it to always complain, except if the function explictly
specifies "#consistent_into on" or whatever.
*) Have pg_dump add that to all plpgsql functions if the server
version is < 9.4 or whatever major release this ends up in
I disagree - automatic code injection can is strange. Is not problem inject code from simple DO statement, but I dislike append lines to source code without any specific user request.
Maybe this problem with GUC can be solved in 9.4. Some specific GUC can be ported with database.
Pavel
That's all just my opinion of course.
best regards,
Florian Pflug
2014/1/13 Gavin Flower <GavinFlower@archidevsys.co.nz>
Possibly there should be a warning put out, whenever someone uses some behaviour that requires a GUC set to a non-default value?On 13/01/14 11:44, Florian Pflug wrote:On Jan12, 2014, at 22:37 , Pavel Stehule <pavel.stehule@gmail.com> wrote:There is GUC for variable_conflict already too. In this case I would toI must say I hate behaviour-changing GUCs with quite some passion. IMHO
enable this functionality everywhere (it is tool how to simply eliminate
some kind of strange bugs) so it needs a GUC.
We have GUC for plpgsql.variable_conflict three years and I don't know
about any problem.
they tend to cause bugs, not avoid them, in the long run. The pattern
usually is
1) Code gets written, depends on some particular set of settings
to work correctly
2) Code gets reused, with little further testing since it's supposed
to be battle-proven anyway. Settings get dropped.
3) Code blows up for those corner-cases where the setting actually
matter. Debugging is hell, because you effectively have to go
over the code line-by-line and check if it might be affected by
some GUC or another.
Only a few days ago I spent more than an hour tracking down a bug
which, as it turned out, was caused by a regex which subtly changed its
meaning depending on whether standard_conforming_strings is on or off.
Some GUCs are unavoidable - standard_conforming_strings, for example
probably still was a good idea, since the alternative would have been
to stick with the historical, non-standard behaviour forever.
But in this case, my feeling is that the trouble such a GUC may cause
out-weights the potential benefits. I'm all for having a directive like
#consistent_into (though I feel that the name could convey the
meaning better). If we *really* think that this ought to be the default
from 9.4 onward, then we should
*) Change it to always complain, except if the function explictly
specifies "#consistent_into on" or whatever.
*) Have pg_dump add that to all plpgsql functions if the server
version is < 9.4 or whatever major release this ends up in
That's all just my opinion of course.
best regards,
Florian Pflug
It is a good idea. I though about it. A worry about GUC are legitimate, but we are most static and sometimes we try to design bigger creatures, than we try to solve.
I am thinking so dump can contain a serialized GUC values, and can raises warnings when GUC are different (not only different from default).
Similar problems are with different FROM_COLAPS_LIMIT, JOIN_COLAPS_LIMIT, WORK_MEM, ...
Regards
Pavel
Pavel
Cheers,
Gavin
2014/1/12 Florian Pflug <fgp@phlo.org>
On Jan12, 2014, at 22:37 , Pavel Stehule <pavel.stehule@gmail.com> wrote:I must say I hate behaviour-changing GUCs with quite some passion. IMHO
> There is GUC for variable_conflict already too. In this case I would to
> enable this functionality everywhere (it is tool how to simply eliminate
> some kind of strange bugs) so it needs a GUC.
>
> We have GUC for plpgsql.variable_conflict three years and I don't know
> about any problem.
they tend to cause bugs, not avoid them, in the long run. The pattern
usually is
1) Code gets written, depends on some particular set of settings
to work correctly
2) Code gets reused, with little further testing since it's supposed
to be battle-proven anyway. Settings get dropped.
3) Code blows up for those corner-cases where the setting actually
matter. Debugging is hell, because you effectively have to go
over the code line-by-line and check if it might be affected by
some GUC or another.
Only a few days ago I spent more than an hour tracking down a bug
which, as it turned out, was caused by a regex which subtly changed its
meaning depending on whether standard_conforming_strings is on or off.
Some GUCs are unavoidable - standard_conforming_strings, for example
probably still was a good idea, since the alternative would have been
to stick with the historical, non-standard behaviour forever.
But in this case, my feeling is that the trouble such a GUC may cause
out-weights the potential benefits. I'm all for having a directive like
#consistent_into (though I feel that the name could convey the
meaning better). If we *really* think that this ought to be the default
from 9.4 onward, then we should
*) Change it to always complain, except if the function explictly
specifies "#consistent_into on" or whatever.
*) Have pg_dump add that to all plpgsql functions if the server
version is < 9.4 or whatever major release this ends up in
That's all just my opinion of course.
I am thinking so GUC and plpgsql option can live together. If you like to accent a some behave, then you can use a plpgsql option. On second hand, I would to use a some functionality, that is safe, but I don't would to dirty source code by using repeated options. But I have to check (and calculate with risk) a GUC settings.
One idea: required GUC? Can be nice a possibility to ensure some GUC setting, and restore ensure these values or raises warning.
Back to main topic. Required and described feature doesn't change a behave of INTO clause. I can enable or disable this functionality and well written code should to work without change (and problems). When check is disabled, then execution is just less safe. So in this case, a impact of GUC is significantly less than by you described issues. Does know anybody a use case where this check should be disabled?
Probably we have a different experience about GUC. I had a problem with standard_conforming_strings and bytea format some years ago. Now I prepare document about required setting. But I can see (from my experience from Czech area) more often problems related to effective_cache_size or from_collapse_limit and similar GUC. These parameters are behind knowledge (and visibility) typical user.
Best regards
Pavel
best regards,
Florian Pflug
On 1/13/14, 1:44 AM, Pavel Stehule wrote: > > > > 2014/1/12 Florian Pflug <fgp@phlo.org <mailto:fgp@phlo.org>> > > On Jan12, 2014, at 22:37 , Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>> wrote: > > There is GUC for variable_conflict already too. In this case I would to > > enable this functionality everywhere (it is tool how to simply eliminate > > some kind of strange bugs) so it needs a GUC. > > > > We have GUC for plpgsql.variable_conflict three years and I don't know > > about any problem. > > I must say I hate behaviour-changing GUCs with quite some passion. IMHO > they tend to cause bugs, not avoid them, in the long run. The pattern > usually is > > 1) Code gets written, depends on some particular set of settings > to work correctly > > 2) Code gets reused, with little further testing since it's supposed > to be battle-proven anyway. Settings get dropped. > > 3) Code blows up for those corner-cases where the setting actually > matter. Debugging is hell, because you effectively have to go > over the code line-by-line and check if it might be affected by > some GUC or another. > > Only a few days ago I spent more than an hour tracking down a bug > which, as it turned out, was caused by a regex which subtly changed its > meaning depending on whether standard_conforming_strings is on or off. > > Some GUCs are unavoidable - standard_conforming_strings, for example > probably still was a good idea, since the alternative would have been > to stick with the historical, non-standard behaviour forever. > > But in this case, my feeling is that the trouble such a GUC may cause > out-weights the potential benefits. I'm all for having a directive like > #consistent_into (though I feel that the name could convey the > meaning better). If we *really* think that this ought to be the default > from 9.4 onward, then we should > > *) Change it to always complain, except if the function explictly > specifies "#consistent_into on" or whatever. > > *) Have pg_dump add that to all plpgsql functions if the server > version is < 9.4 or whatever major release this ends up in > > That's all just my opinion of course. > > > I am thinking so GUC and plpgsql option can live together. If you like to accent a some behave, then you can use a plpgsqloption. On second hand, I would to use a some functionality, that is safe, but I don't would to dirty source codeby using repeated options. But I have to check (and calculate with risk) a GUC settings. > > One idea: required GUC? Can be nice a possibility to ensure some GUC setting, and restore ensure these values or raiseswarning. > > Back to main topic. Required and described feature doesn't change a behave of INTO clause. I can enable or disable thisfunctionality and well written code should to work without change (and problems). When check is disabled, then executionis just less safe. So in this case, a impact of GUC is significantly less than by you described issues. Does knowanybody a use case where this check should be disabled? > > Probably we have a different experience about GUC. I had a problem with standard_conforming_strings and bytea format someyears ago. Now I prepare document about required setting. But I can see (from my experience from Czech area) more often problems related to effective_cache_size or from_collapse_limit and similar GUC. These parameters are behind knowledge(and visibility) typical user. ISTM that in this case, it should be safe to make the new default behavior STRICT; if you forget to set the GUC to disablethan you'll get an error that points directly at the problem, at which point you'll go "Oh, yeah... I forgot to setX..." Outside of the GUC, I believe the default should definitely be STRICT. If your app is relying on non-strict then you needto be made aware of that. We should be able to provide a DO block that will change this setting for every function you'vegot if someone isn't happy with STRICT mode. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Jan13, 2014, at 22:49 , Jim Nasby <jim@nasby.net> wrote: > ISTM that in this case, it should be safe to make the new default behavior STRICT; > if you forget to set the GUC to disable than you'll get an error that points directly > at the problem, at which point you'll go "Oh, yeah... I forgot to set X..." What do you mean by STRICT? STRICT (which we already support) complains if the query doesn't return *exactly* one row. What Marko wants is to raise an error for a plain SELECT ... INTO if more than one row is returned, but to still convert zero rows to NULL. > Outside of the GUC, I believe the default should definitely be STRICT. If your app is > relying on non-strict then you need to be made aware of that. We should be able to > provide a DO block that will change this setting for every function you've got if > someone isn't happy with STRICT mode. If you mean that we should change SELECT ... INTO to always behave as if STRICT had been specified - why on earth would we want to do that? That would break perfectly fine code for no good reason whatsoever. In fact, after reading the documentation on SELECT ... INTO, I'm convinced the the whole consistent_into thing is a bad idea. The documentation states clearly that For INSERT/UPDATE/DELETE with RETURNING, PL/pgSQL reports an error for more than one returned row, even when STRICT is notspecified. This is because there is no option such as ORDER BY with which to determine which affected row should be returned. It therefor isn't an oversight that SELECT ... INTO allows multiple result rows but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and for a reason. We shouldn't be second-guessing ourselves by changing that later - not, at least, unless we have a *very* good reason for it. Which, AFAICS, we don't. (And yeah, personally I'd prefer if we'd complain about multiple rows. But it's IMHO just too late for that) best regards, Florian Pflug
On 1/14/14, 12:41 AM, Florian Pflug wrote: > In fact, after reading the documentation on SELECT ... INTO, I'm convinced the > the whole consistent_into thing is a bad idea. The documentation states clearly > that > > For INSERT/UPDATE/DELETE with RETURNING, PL/pgSQL reports an error for more than > one returned row, even when STRICT is not specified. This is because there is no > option such as ORDER BY with which to determine which affected row should be > returned. > > It therefor isn't an oversight that SELECT ... INTO allows multiple result rows > but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and > for a reason. Yeah, it does state that. But it's a BS reason. In addition to ORDER BY, SELECT also has a LIMIT which you can use to get the "first row" behaviour. There's no way to go to the more sane behaviour from what we have right now. When I've worked with PL/PgSQL, this has been a source of a few bugs that would have been noticed during testing if the behaviour of INTO wasn't as dangerous as it is right now. Yes, it breaks backwards compatibility, but that's why there's a nice GUC. If we're not going to scrap PL/PgSQL and start over again, we are going to have to do changes like this to make the language better. Also I think that out of all the things we could do to break backwards compatibility, this is closer to "harmless" than "a pain in the butt". Regards, Marko Tiikkaja
On 01/13/2014 03:41 PM, Florian Pflug wrote: > It therefor isn't an oversight that SELECT ... INTO allows multiple result rows > but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and > for a reason. We shouldn't be second-guessing ourselves by changing that later - > not, at least, unless we have a *very* good reason for it. Which, AFAICS, we don't. > > (And yeah, personally I'd prefer if we'd complain about multiple rows. But it's > IMHO just too late for that) I *really* don't want to go through all my old code to find places where I used SELECT ... INTO just to pop off the first row, and ignored the rest. I doubt anyone else does, either. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Jan14, 2014, at 00:52 , Marko Tiikkaja <marko@joh.to> wrote: > When I've worked with PL/PgSQL, this has been a source of a few bugs that > would have been noticed during testing if the behaviour of INTO wasn't as > dangerous as it is right now. The question is, how many bugs stemmed from wrong SQL queries, and what percentage of those would have been caught by this? The way I see it, there are thousands of ways to screw up a query, and having it return multiple rows instead of one is just one of them. > Yes, it breaks backwards compatibility, but that's why there's a nice GUC. Which doesn't help, because the GUC isn't tied to the code. This *adds* an error case, not remove one - now, instead of getting your code correct, you *also* have to get the GUC correct. If you even *know* that such a GUC exists. > If we're not going to scrap PL/PgSQL and > start over again, we are going to have to do changes like this to make the > language better. Also I think that out of all the things we could do to > break backwards compatibility, this is closer to "harmless" than "a pain > in the butt". I very strongly believe that languages don't get better by adding a thousand little knobs which subtly change semantics. Look at the mess that is PHP - we absolutely, certainly don't want to go there. The most important rule in language design is in my opinion "stick with your choices". C, C++ and JAVA all seem to follow this, and it's one of the reasons these languages are popular for big projects, I think. The way I see it, the only OK way to change existing behaviour is to have the concept of a "language version", and force code to indicate the language version it expects. The important thing is that the language version is an attribute of code, not some global setting that you can change without ever looking at the code it'd affect. So if we really want to change this, I think we need to have a LANGUAGE_VERSION attribute on functions. Each time a major postgres release changes the behaviour of one of the procedural languages, we'd increment that language's version, and enable the old behaviour for all functions tagged with an older one. best regards, Florian Pflug
On 1/13/14, 5:57 PM, Josh Berkus wrote: > On 01/13/2014 03:41 PM, Florian Pflug wrote: >> It therefor isn't an oversight that SELECT ... INTO allows multiple result rows >> but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and >> for a reason. We shouldn't be second-guessing ourselves by changing that later - >> not, at least, unless we have a *very* good reason for it. Which, AFAICS, we don't. >> >> (And yeah, personally I'd prefer if we'd complain about multiple rows. But it's >> IMHO just too late for that) > > I *really* don't want to go through all my old code to find places where > I used SELECT ... INTO just to pop off the first row, and ignored the > rest. I doubt anyone else does, either. Do you regularly have use cases where you actually want just one RANDOM row? I suspect the far more likely scenario is thatpeople write code assuming they'll get only one row and they'll end up with extremely hard to trace bugs if that assumptionis ever wrong. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 1/13/14, 6:16 PM, Florian Pflug wrote: > On Jan14, 2014, at 00:52 , Marko Tiikkaja <marko@joh.to> wrote: >> When I've worked with PL/PgSQL, this has been a source of a few bugs that >> would have been noticed during testing if the behaviour of INTO wasn't as >> dangerous as it is right now. > > The question is, how many bugs stemmed from wrong SQL queries, and what > percentage of those would have been caught by this? The way I see it, there > are thousands of ways to screw up a query, and having it return multiple > rows instead of one is just one of them. A query that's simply wrong is more likely to fail consistently. Non-strict use of INTO is going to fail in very subtle ways(unless you actually DO want just the first row, in which case you should explicitly use LIMIT 1). >> If we're not going to scrap PL/PgSQL and >> start over again, we are going to have to do changes like this to make the >> language better. Also I think that out of all the things we could do to >> break backwards compatibility, this is closer to "harmless" than "a pain >> in the butt". > > I very strongly believe that languages don't get better by adding a thousand > little knobs which subtly change semantics. Look at the mess that is PHP - > we absolutely, certainly don't want to go there. The most important rule in > language design is in my opinion "stick with your choices". C, C++ and JAVA > all seem to follow this, and it's one of the reasons these languages are > popular for big projects, I think. > > The way I see it, the only OK way to change existing behaviour is to have > the concept of a "language version", and force code to indicate the language > version it expects. The important thing is that the language version is an > attribute of code, not some global setting that you can change without ever > looking at the code it'd affect. > > So if we really want to change this, I think we need to have a > LANGUAGE_VERSION attribute on functions. Each time a major postgres release > changes the behaviour of one of the procedural languages, we'd increment > that language's version, and enable the old behaviour for all functions > tagged with an older one. I like that idea. It allows us to fix past decisions that were ill considered without hosing all existing code. BTW, have we always had support for STRICT, or was it added at some point? It's in 8.4, but I don't know how far back itgoes. And if we've always had it, why on earth didn't we make STRICT the default behavior? -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
(Responding to both of your mails here) On Jan14, 2014, at 01:20 , Jim Nasby <jim@nasby.net> wrote: > On 1/13/14, 5:57 PM, Josh Berkus wrote: >> On 01/13/2014 03:41 PM, Florian Pflug wrote: >>> It therefor isn't an oversight that SELECT ... INTO allows multiple result rows >>> but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and >>> for a reason. We shouldn't be second-guessing ourselves by changing that later - >>> not, at least, unless we have a *very* good reason for it. Which, AFAICS, we don't. >>> >>> (And yeah, personally I'd prefer if we'd complain about multiple rows. But it's >>> IMHO just too late for that) >> >> I *really* don't want to go through all my old code to find places where >> I used SELECT ... INTO just to pop off the first row, and ignored the >> rest. I doubt anyone else does, either. > > Do you regularly have use cases where you actually want just one RANDOM row? > I suspect the far more likely scenario is that people write code assuming they'll > get only one row and they'll end up with extremely hard to trace bugs if that > assumption is ever wrong. One case that immediatly comes to mind is a JOIN which sometimes returns multiple rows, and a projection clause that only uses one of the tables involved in the join. Another are queries including an ORDER BY - I don't think the patch makes an exception for those, and even if it did, it probably wouldn't catch all cases, like e.g. an SRF which returns the rows in a deterministic order. Or maybe you're picking a row to process next, and don't really care about the order in which you work through them. >> The question is, how many bugs stemmed from wrong SQL queries, and what >> percentage of those would have been caught by this? The way I see it, there >> are thousands of ways to screw up a query, and having it return multiple >> rows instead of one is just one of them. > > A query that's simply wrong is more likely to fail consistently. Non-strict > use of INTO is going to fail in very subtle ways (unless you actually DO want > just the first row, in which case you should explicitly use LIMIT 1). How so? Say you expect "SELECT * FROM view WHERE c=<n>" to only ever return one row. Then "SELECT sum(f) FROM table JOIN view ON table.c = view.c" is just as subtly wrong as the first query is. > And if we've always had it, why on earth didn't we make STRICT the default > behavior? Dunno, but AFAIK pl/pgsql mimics Oracle's PL/SQL, at least in some aspects, so maybe this is one of the areas where we just do what oracle does. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > On Jan14, 2014, at 01:20 , Jim Nasby <jim@nasby.net> wrote: >> And if we've always had it, why on earth didn't we make STRICT the default >> behavior? > Dunno, but AFAIK pl/pgsql mimics Oracle's PL/SQL, at least in some aspects, > so maybe this is one of the areas where we just do what oracle does. STRICT is a relatively recent addition (8.2, looks like). One of the reasons it got in was that the author didn't have any grandiose ideas about making it the new default. I think this patch would have a lot better chance if it was just adding another keyword as an alternative to STRICT, and not hoping to impose the author's opinions on the rest of the world. Whatever your opinion of the default behavior, the fact that it's been that way for upwards of fifteen years without any mass protests should give you pause about changing it. regards, tom lane
On 01/13/2014 04:20 PM, Jim Nasby wrote: > On 1/13/14, 5:57 PM, Josh Berkus wrote: >> I *really* don't want to go through all my old code to find places where >> I used SELECT ... INTO just to pop off the first row, and ignored the >> rest. I doubt anyone else does, either. > > Do you regularly have use cases where you actually want just one RANDOM > row? I suspect the far more likely scenario is that people write code > assuming they'll get only one row and they'll end up with extremely hard > to trace bugs if that assumption is ever wrong. Regularly? No. But I've seen it, especially as part of a "does this query return any rows?" test. That's not the best way to test that, but that doesn't stop a lot of people doing it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 1/13/14, 6:36 PM, Florian Pflug wrote: > On Jan14, 2014, at 01:20 , Jim Nasby<jim@nasby.net> wrote: >> >On 1/13/14, 5:57 PM, Josh Berkus wrote: >>> >>On 01/13/2014 03:41 PM, Florian Pflug wrote: >>>> >>>It therefor isn't an oversight that SELECT ... INTO allows multiple result rows >>>> >>>but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and >>>> >>>for a reason. We shouldn't be second-guessing ourselves by changing that later - >>>> >>>not, at least, unless we have a*very* good reason for it. Which, AFAICS, we don't. >>>> >>> >>>> >>>(And yeah, personally I'd prefer if we'd complain about multiple rows. But it's >>>> >>>IMHO just too late for that) >>> >> >>> >>I*really* don't want to go through all my old code to find places where >>> >>I used SELECT ... INTO just to pop off the first row, and ignored the >>> >>rest. I doubt anyone else does, either. >> > >> >Do you regularly have use cases where you actually want just one RANDOM row? >> >I suspect the far more likely scenario is that people write code assuming they'll >> >get only one row and they'll end up with extremely hard to trace bugs if that >> >assumption is ever wrong. > One case that immediatly comes to mind is a JOIN which sometimes returns > multiple rows, and a projection clause that only uses one of the tables > involved in the join. Which is just bad coding IMHO. > Another are queries including an ORDER BY - I don't think the patch makes an > exception for those, and even if it did, it probably wouldn't catch all > cases, like e.g. an SRF which returns the rows in a deterministic order. Sure, but if you've gone to the trouble of putting the ORDER BY in, how hard is it to add LIMIT 1? > Or maybe you're picking a row to process next, and don't really care about > the order in which you work through them. Again, how hard is LIMIT 1? BTW, my issue here isn't with typing out " STRICT". I'd be fine if the way things worked was you had to specify INTO STRICTor INTO LOOSE (or whatever you want to call the opposite of strict). My problem is that the default here is plain and simple a bad choice for default behavior. In light of Tom's research showingthis was added in 8.2 I understand why we went that route, but I'd really like a way to change the default. Or evendisallow it (IE: force the user to state which route they're going here). -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 1/14/14, 1:57 AM, Tom Lane wrote: > Whatever your opinion of the default behavior, the > fact that it's been that way for upwards of fifteen years without any > mass protests should give you pause about changing it. For what it's worth, my patch does not change the default behaviour. I don't think I've ever publicly said that it should. Regards, Marko Tiikkaja
On 1/13/14, 7:06 PM, Josh Berkus wrote: > On 01/13/2014 04:20 PM, Jim Nasby wrote: >> On 1/13/14, 5:57 PM, Josh Berkus wrote: >>> I *really* don't want to go through all my old code to find places where >>> I used SELECT ... INTO just to pop off the first row, and ignored the >>> rest. I doubt anyone else does, either. >> >> Do you regularly have use cases where you actually want just one RANDOM >> row? I suspect the far more likely scenario is that people write code >> assuming they'll get only one row and they'll end up with extremely hard >> to trace bugs if that assumption is ever wrong. > > Regularly? No. But I've seen it, especially as part of a "does this > query return any rows?" test. That's not the best way to test that, but > that doesn't stop a lot of people doing it. Right, and I certainly don't want to force anyone to rewrite all their code. But I'd certainly like a safer default so peopledon't mistakenly go the "multiple rows is OK" route without doing so very intentionally. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 01/13/2014 05:10 PM, Jim Nasby wrote: > On 1/13/14, 7:06 PM, Josh Berkus wrote: >> Regularly? No. But I've seen it, especially as part of a "does this >> query return any rows?" test. That's not the best way to test that, but >> that doesn't stop a lot of people doing it. > > Right, and I certainly don't want to force anyone to rewrite all their > code. But I'd certainly like a safer default so people don't mistakenly > go the "multiple rows is OK" route without doing so very intentionally. The problem is that if you change the default, you're creating an unexpected barrier to upgrading. I just don't think that it's worth doing so in order to meet some standard of code neatness, especially in plpgsql, the unwanted bastard child of SQL and ADA. For people who want to enable this in order to prevent stupid query bugs from creeping into their plpgsql, that's great, let's have an easy option to turn on. But it's hard enough to get people to upgrade as it is. If we're going to add an upgrade landmine, it better be for something really important. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Sun, Jan 12, 2014 at 7:51 AM, Marko Tiikkaja <marko@joh.to> wrote: > the behaviour of SELECT .. INTO when the query returns more than one row. > Some of you might know that no exception is raised in this case Agreed. But I also agree with the rest of the thread about changing current INTO behavior and introducing new GUC variables. But PL/pgSQL already has an assignment syntax with the behavior you want: DECLARE foo int; BEGIN foo = generate_series(1,1); -- this is OK foo = generate_series(1,2); -- fails foo = 10 WHERE FALSE; -- sets foo toNULL -- And you can actually do: foo = some_col FROM some_table WHERE bar=10; END; So if we extend this syntax to support multiple columns, it should satisfy the use cases you care about. foo, bar = col1, col2 FROM some_table WHERE bar=10; It's ugly without the explicit SELECT though. Perhaps make the SELECT optional: foo, bar = SELECT col1, col2 FROM some_table WHERE bar=10; I think that's more aesthetically pleasing than INTO and also looks more familiar to other languages. Plus, now you can copy-paste the query straight to an SQL shell without another footgun involving creating new tables in your database. Regards, Marti
On 2014-01-14 02:54, Marti Raudsepp wrote: > On Sun, Jan 12, 2014 at 7:51 AM, Marko Tiikkaja <marko@joh.to> wrote: >> the behaviour of SELECT .. INTO when the query returns more than one row. >> Some of you might know that no exception is raised in this case > > Agreed. But I also agree with the rest of the thread about changing > current INTO behavior and introducing new GUC variables. > > But PL/pgSQL already has an assignment syntax with the behavior you want: According to the docs, that doesn't set FOUND which would make this a pain to deal with.. Regards, Marko Tiikkaja
I am thinking so GUC and plpgsql option can live together. If you like to accent a some behave, then you can use a plpgsql option. On second hand, I would to use a some functionality, that is safe, but I don't would to dirty source code by using repeated options. But I have to check (and calculate with risk) a GUC settings.
One idea: required GUC? Can be nice a possibility to ensure some GUC setting, and restore ensure these values or raises warning.
Back to main topic. Required and described feature doesn't change a behave of INTO clause. I can enable or disable this functionality and well written code should to work without change (and problems). When check is disabled, then execution is just less safe. So in this case, a impact of GUC is significantly less than by you described issues. Does know anybody a use case where this check should be disabled?
Probably we have a different experience about GUC. I had a problem with standard_conforming_strings and bytea format some years ago. Now I prepare document about required setting. But I can see (from my experience from Czech area) more often problems related to effective_cache_size or from_collapse_limit and similar GUC. These parameters are behind knowledge (and visibility) typical user.
ISTM that in this case, it should be safe to make the new default behavior STRICT; if you forget to set the GUC to disable than you'll get an error that points directly at the problem, at which point you'll go "Oh, yeah... I forgot to set X..."
I disagree - STRICT can be too restrictive - and a combination SELECT INTO and IF FOUND can be significantly faster - our exception handling is not cheap.
Regards
Pavel
Pavel
Outside of the GUC, I believe the default should definitely be STRICT. If your app is relying on non-strict then you need to be made aware of that. We should be able to provide a DO block that will change this setting for every function you've got if someone isn't happy with STRICT mode.
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
2014/1/14 Florian Pflug <fgp@phlo.org>
On Jan14, 2014, at 00:52 , Marko Tiikkaja <marko@joh.to> wrote:The question is, how many bugs stemmed from wrong SQL queries, and what
> When I've worked with PL/PgSQL, this has been a source of a few bugs that
> would have been noticed during testing if the behaviour of INTO wasn't as
> dangerous as it is right now.
percentage of those would have been caught by this? The way I see it, there
are thousands of ways to screw up a query, and having it return multiple
rows instead of one is just one of them.Which doesn't help, because the GUC isn't tied to the code. This *adds*
> Yes, it breaks backwards compatibility, but that's why there's a nice GUC.
an error case, not remove one - now, instead of getting your code correct,
you *also* have to get the GUC correct. If you even *know* that such a GUC
exists.I very strongly believe that languages don't get better by adding a thousand
> If we're not going to scrap PL/PgSQL and
> start over again, we are going to have to do changes like this to make the
> language better. Also I think that out of all the things we could do to
> break backwards compatibility, this is closer to "harmless" than "a pain
> in the butt".
little knobs which subtly change semantics. Look at the mess that is PHP -
we absolutely, certainly don't want to go there. The most important rule in
language design is in my opinion "stick with your choices". C, C++ and JAVA
all seem to follow this, and it's one of the reasons these languages are
popular for big projects, I think.
The way I see it, the only OK way to change existing behaviour is to have
the concept of a "language version", and force code to indicate the language
version it expects. The important thing is that the language version is an
attribute of code, not some global setting that you can change without ever
looking at the code it'd affect.
So if we really want to change this, I think we need to have a
LANGUAGE_VERSION attribute on functions. Each time a major postgres release
changes the behaviour of one of the procedural languages, we'd increment
that language's version, and enable the old behaviour for all functions
tagged with an older one.
I dislike this proposal
too enterprise, too complex, too bad - after ten years we can have a ten language versions and it helps nothing.
return back to topica) there is agreement so we like this functionality as plpgsql option
b) there is no agreement so we would to see this functionality as default (or in near future)
@b is a topic, that we should not to solve and it is back again and again. Sometimes we have success, sometimes not. Temporal GUC is not enough solution for some people.
So my result - @a can be implement, @b not
and we can continue in other thread about SELECT INTO redesign and about possibilities that we have or have not. It is about personal perspective - someone can thinking about missing check after INTO as about feature, other one can see it as bug. My opinion - it is a bug with possible ambiguous result and possible negative performance impacts. Probably we can precise a doc about wrong and good usage SELECT INTO clause.
and we can continue in other thread about SELECT INTO redesign and about possibilities that we have or have not. It is about personal perspective - someone can thinking about missing check after INTO as about feature, other one can see it as bug. My opinion - it is a bug with possible ambiguous result and possible negative performance impacts. Probably we can precise a doc about wrong and good usage SELECT INTO clause.
I am working on plpgsql_debug extension and I am thinking so I am able (with small change in plpgsql executor) implement this check as extension. So it can be available for advanced users (that will has a knowledge about additional plpgsql extensions).
Regards
Pavel
regards
Pavel
Pavel
best regards,
Florian Pflug
On 1/14/14 10:16 AM, Pavel Stehule wrote: > 2014/1/14 Florian Pflug <fgp@phlo.org> > >> So if we really want to change this, I think we need to have a >> LANGUAGE_VERSION attribute on functions. Each time a major postgres release >> changes the behaviour of one of the procedural languages, we'd increment >> that language's version, and enable the old behaviour for all functions >> tagged with an older one. >> > > I dislike this proposal > > too enterprise, too complex, too bad - after ten years we can have a ten > language versions and it helps nothing. At this point I'm almost tempted to agree with Florian -- I'm really hoping we could change PL/PgSQL for the better over the next 10 years, but given the backwards compatibility requirements we have, this seems like an absolute nightmare. Not saying we need a version number we can keep bumping every release, but something similar. > return back to topic > > a) there is agreement so we like this functionality as plpgsql option > > b) there is no agreement so we would to see this functionality as default > (or in near future) > > @b is a topic, that we should not to solve and it is back again and again. > Sometimes we have success, sometimes not. Temporal GUC is not enough > solution for some people. > > So my result - @a can be implement, @b not FWIW, I would like to have this behaviour even if it's not the default (that was my original proposal anyway). It could help catch a lot of bugs in testing, and for important code it could be even turned on in production (perhaps on a per-function basis). Maybe even globally, idk. > I am working on plpgsql_debug extension and I am thinking so I am able > (with small change in plpgsql executor) implement this check as extension. > So it can be available for advanced users (that will has a knowledge about > additional plpgsql extensions). While this sounds cool, running a modified postgres locally seems a lot easier. I already have to do that for PL/PgSQL development because of the reasons I outlined in the thread "PL/PgSQL, RAISE and error context". Regards, Marko Tiikkaja
I've always hated INTO in procedures since it makes the code harder to follow and has very different behavior on the SQL level, in addition to the multi-row problem you bring up. If we can make assignment syntax more versatile and eventually replace INTO, then that solves multiple problems in the language without breaking backwards compatibility. On Tue, Jan 14, 2014 at 4:30 AM, Marko Tiikkaja <marko@joh.to> wrote: > On 2014-01-14 02:54, Marti Raudsepp wrote: >> But PL/pgSQL already has an assignment syntax with the behavior you want: > > According to the docs, that doesn't set FOUND which would make this a pain > to deal with.. Right you are. If we can extend the syntax then we could make it such that "= SELECT" sets FOUND and other diagnostics, and a simple assignment doesn't. Which makes sense IMO: a = 10; -- simple assignments really shouldn't affect FOUND With explicit SELECT, clearly the intent is to perform a query: a = SELECT foo FROM table; And this could also work: a = INSERT INTO table (foo) VALUES (10) RETURNING foo_id; AFAICT the fact that this works is more of an accident and should be discouraged. We can leave it as is for compatibility's sake: a = foo FROM table; Now, another question is whether it's possible to make the syntax work. Is this an assignment from the result of a subquery, or is it a query by itself? a = (SELECT foo FROM table); Does anyone consider this proposal workable? Regards, Marti
On 1/14/14 12:28 PM, Marti Raudsepp wrote: > I've always hated INTO in procedures since it makes the code harder to > follow and has very different behavior on the SQL level, in addition > to the multi-row problem you bring up. If we can make assignment > syntax more versatile and eventually replace INTO, then that solves > multiple problems in the language without breaking backwards > compatibility. I don't personally have a problem with INTO other than the behaviour that started this thread. But I'm willing to consider other options. > On Tue, Jan 14, 2014 at 4:30 AM, Marko Tiikkaja <marko@joh.to> wrote: >> On 2014-01-14 02:54, Marti Raudsepp wrote: >>> But PL/pgSQL already has an assignment syntax with the behavior you want: >> >> According to the docs, that doesn't set FOUND which would make this a pain >> to deal with.. > > Right you are. If we can extend the syntax then we could make it such > that "= SELECT" sets FOUND and other diagnostics, and a simple > assignment doesn't. Which makes sense IMO: > > a = 10; -- simple assignments really shouldn't affect FOUND With you so far. > With explicit SELECT, clearly the intent is to perform a query: > a = SELECT foo FROM table; > And this could also work: > a = INSERT INTO table (foo) VALUES (10) RETURNING foo_id; I'm not sure that would work with the grammar. Basically what PL/PgSQL does right now is for a statement like: a = 1; It parses the "a =" part itself, and then just reads until the next unquoted semicolon without actually looking at it, and slams a "SELECT " in front of it. With this approach we'd have to look into the query and try and guess what it does. That might be possible, but I don't like the idea. > AFAICT the fact that this works is more of an accident and should be > discouraged. We can leave it as is for compatibility's sake: > a = foo FROM table; I've always considered that ugly (IIRC it's still undocumented as well), and would encourage people not to do that. > Now, another question is whether it's possible to make the syntax > work. Is this an assignment from the result of a subquery, or is it a > query by itself? > a = (SELECT foo FROM table); That looks like a scalar subquery, which is wrong because they can't return more than one column (nor can they be INSERT etc., obviously). How about: (a) = SELECT 1; (a, b) = SELECT 1, 2; (a, b) = INSERT INTO foo RETURNING col1, col2; Same semantics: TOO_MANY_ROWS on rows > 1, sets FOUND and row_count. AFAICT this can be parsed unambiguously, too, and we don't need to look at the query string because this is new syntax. Regards, Marko Tiikkaja
2014/1/14 Marko Tiikkaja <marko@joh.to>
On 1/14/14 12:28 PM, Marti Raudsepp wrote:I don't personally have a problem with INTO other than the behaviour that started this thread. But I'm willing to consider other options.I've always hated INTO in procedures since it makes the code harder to
follow and has very different behavior on the SQL level, in addition
to the multi-row problem you bring up. If we can make assignment
syntax more versatile and eventually replace INTO, then that solves
multiple problems in the language without breaking backwards
compatibility.With you so far.On Tue, Jan 14, 2014 at 4:30 AM, Marko Tiikkaja <marko@joh.to> wrote:On 2014-01-14 02:54, Marti Raudsepp wrote:But PL/pgSQL already has an assignment syntax with the behavior you want:
According to the docs, that doesn't set FOUND which would make this a pain
to deal with..
Right you are. If we can extend the syntax then we could make it such
that "= SELECT" sets FOUND and other diagnostics, and a simple
assignment doesn't. Which makes sense IMO:
a = 10; -- simple assignments really shouldn't affect FOUNDI'm not sure that would work with the grammar. Basically what PL/PgSQL does right now is for a statement like:With explicit SELECT, clearly the intent is to perform a query:
a = SELECT foo FROM table;
And this could also work:
a = INSERT INTO table (foo) VALUES (10) RETURNING foo_id;
a = 1;
It parses the "a =" part itself, and then just reads until the next unquoted semicolon without actually looking at it, and slams a "SELECT " in front of it. With this approach we'd have to look into the query and try and guess what it does. That might be possible, but I don't like the idea.I've always considered that ugly (IIRC it's still undocumented as well), and would encourage people not to do that.AFAICT the fact that this works is more of an accident and should be
discouraged. We can leave it as is for compatibility's sake:
a = foo FROM table;Now, another question is whether it's possible to make the syntax
work. Is this an assignment from the result of a subquery, or is it a
query by itself?
a = (SELECT foo FROM table);
only this form is allowed in SQL/PSM - and it has some logic - you can assign result of subquery (should be one row only) to variable.
That looks like a scalar subquery, which is wrong because they can't return more than one column (nor can they be INSERT etc., obviously).
How about:
(a) = SELECT 1;
(a, b) = SELECT 1, 2;
(a, b) = INSERT INTO foo RETURNING col1, col2;
I prefer subquery only syntax - a := (some) or (a,b,c) = (some a,b,c) with possible enhancing for statements with RETURNING
a advance is compatibility with DB2 (SQL/PSM) syntax - and this code is written now - it is done in my sql/psm implementation
Regards
Pavel
Same semantics: TOO_MANY_ROWS on rows > 1, sets FOUND and row_count. AFAICT this can be parsed unambiguously, too, and we don't need to look at the query string because this is new syntax.
Regards,
Marko Tiikkaja
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 1/14/14 1:28 PM, Pavel Stehule wrote: > I prefer subquery only syntax - a := (some) or (a,b,c) = (some a,b,c) with > possible enhancing for statements with RETURNING > > a advance is compatibility with DB2 (SQL/PSM) syntax - and this code is > written now - it is done in my sql/psm implementation Are you saying that DB2 supports the multi-variable assignment? I couldn't find any reference suggesting that, can you point me to one? Regards, Marko Tiikkaja
Hello
2014/1/14 Marko Tiikkaja <marko@joh.to>
On 1/14/14 1:28 PM, Pavel Stehule wrote:Are you saying that DB2 supports the multi-variable assignment? I couldn't find any reference suggesting that, can you point me to one?I prefer subquery only syntax - a := (some) or (a,b,c) = (some a,b,c) with
possible enhancing for statements with RETURNING
a advance is compatibility with DB2 (SQL/PSM) syntax - and this code is
written now - it is done in my sql/psm implementation
I was wrong - it is different syntax - it is SQL/PSM form - db2 supports SET var=val, var=val, ...
http://postgres.cz/wiki/Basic_statements_SQL/PSM_samples
http://postgres.cz/wiki/Basic_statements_SQL/PSM_samples
Regards
Pavel
Pavel
Regards,
Marko Tiikkaja
Marko Tiikkaja <marko@joh.to> writes: > On 1/14/14 12:28 PM, Marti Raudsepp wrote: >> Now, another question is whether it's possible to make the syntax >> work. Is this an assignment from the result of a subquery, or is it a >> query by itself? >> a = (SELECT foo FROM table); > That looks like a scalar subquery, which is wrong because they can't > return more than one column (nor can they be INSERT etc., obviously). Yeah, it's a scalar subquery, which means that plpgsql already assigns a non-error meaning to this syntax. > How about: > (a) = SELECT 1; > (a, b) = SELECT 1, 2; > (a, b) = INSERT INTO foo RETURNING col1, col2; > Same semantics: TOO_MANY_ROWS on rows > 1, sets FOUND and row_count. > AFAICT this can be parsed unambiguously, too, and we don't need to look > at the query string because this is new syntax. The idea of inventing new syntax along this line seems like a positive direction to pursue. Since assignment already rejects multiple rows from the source expression, this wouldn't be weirdly inconsistent. It might be worth thinking about the <multiple column assignment> UPDATE syntax that's in recent versions of the SQL standard: UPDATE targettab SET (a, b, c) = row-valued-expression [ , ... ] [ WHERE ... ] We don't actually implement this in PG yet, except for trivial cases, but it will certainly happen eventually. I think your sketch above deviates unnecessarily from what the standard says for UPDATE. In particular I think it'd be better to write things like (a, b) = ROW(1, 2);(a, b, c) = (SELECT x, y, z FROM foo WHERE id = 42); which would exactly match what you'd write in a multiple-assignment UPDATE, and it has the same rejects-multiple-rows semantics too. Also note that the trivial cases we do already implement in UPDATE look like UPDATE targettab SET (a, b, c) = (1, 2, 3) [ WHERE ... ] that is, we allow a row constructor where the optional keyword ROW has been omitted. I think people would expect to be able to write this in plpgsql: (a, b) = (1, 2); Now, this doesn't provide any guidance for INSERT/UPDATE/DELETE RETURNING, but frankly I don't feel any need to invent new syntax for those, since RETURNING INTO already works the way you want. I'm not too sure what it'd take to make this work. Right now, SELECT (SELECT x, y FROM foo WHERE id = 42); would generate "ERROR: subquery must return only one column", but I think it's mostly a historical artifact that it does that rather than returning a composite value (of an anonymous record type). If we were willing to make that change then it seems like it'd be pretty straightforward to teach plpgsql to handle (a, b, ...) = row-valued-expression where there wouldn't actually be any need to parse the RHS any differently from the way plpgsql parses an assignment RHS right now. Which would be a good thing IMO. If we don't generalize the behavior of scalar subqueries then plpgsql would have to jump through a lot of hoops to support the subselect case. regards, tom lane
On 1/14/14, 6:15 PM, Tom Lane wrote: > Marko Tiikkaja <marko@joh.to> writes: >> How about: > >> (a) = SELECT 1; >> (a, b) = SELECT 1, 2; >> (a, b) = INSERT INTO foo RETURNING col1, col2; > >> Same semantics: TOO_MANY_ROWS on rows > 1, sets FOUND and row_count. >> AFAICT this can be parsed unambiguously, too, and we don't need to look >> at the query string because this is new syntax. > > The idea of inventing new syntax along this line seems like a positive > direction to pursue. Since assignment already rejects multiple rows > from the source expression, this wouldn't be weirdly inconsistent. > > It might be worth thinking about the <multiple column assignment> UPDATE > syntax that's in recent versions of the SQL standard: > > UPDATE targettab SET (a, b, c) = row-valued-expression [ , ... ] [ WHERE ... ] > > We don't actually implement this in PG yet, except for trivial cases, but > it will certainly happen eventually. I think your sketch above deviates > unnecessarily from what the standard says for UPDATE. In particular > I think it'd be better to write things like > > (a, b) = ROW(1, 2); > (a, b, c) = (SELECT x, y, z FROM foo WHERE id = 42); > > which would exactly match what you'd write in a multiple-assignment > UPDATE, and it has the same rejects-multiple-rows semantics too. Hmm. That's a fair point I did not consider. > Also note that the trivial cases we do already implement in UPDATE > look like > > UPDATE targettab SET (a, b, c) = (1, 2, 3) [ WHERE ... ] > > that is, we allow a row constructor where the optional keyword ROW has > been omitted. I think people would expect to be able to write this in > plpgsql: > > (a, b) = (1, 2); > > Now, this doesn't provide any guidance for INSERT/UPDATE/DELETE RETURNING, > but frankly I don't feel any need to invent new syntax for those, since > RETURNING INTO already works the way you want. Yeah, I don't feel strongly about having to support them with this syntax. The inconsistency is a bit ugly, but it's not the end of the world. > I'm not too sure what it'd take to make this work. Right now, > > SELECT (SELECT x, y FROM foo WHERE id = 42); > > would generate "ERROR: subquery must return only one column", but > I think it's mostly a historical artifact that it does that rather than > returning a composite value (of an anonymous record type). If we were > willing to make that change then it seems like it'd be pretty > straightforward to teach plpgsql to handle > > (a, b, ...) = row-valued-expression > > where there wouldn't actually be any need to parse the RHS any differently > from the way plpgsql parses an assignment RHS right now. Which would be > a good thing IMO. If we don't generalize the behavior of scalar > subqueries then plpgsql would have to jump through a lot of hoops to > support the subselect case. You can already do the equivalent of (a,b,c) = (1,2,3) with SELECT .. INTO. Would you oppose to starting the work on this by only supporting the subquery syntax, with the implementation being similar to how we currently handle SELECT .. INTO? If we could also not flat out reject including that into 9.4, I would be quite happy. Regards, Marko Tiikkaja
Marko Tiikkaja <marko@joh.to> writes: > On 1/14/14, 6:15 PM, Tom Lane wrote: >> I'm not too sure what it'd take to make this work. Right now, >> >> SELECT (SELECT x, y FROM foo WHERE id = 42); >> >> would generate "ERROR: subquery must return only one column", but >> I think it's mostly a historical artifact that it does that rather than >> returning a composite value (of an anonymous record type). If we were >> willing to make that change then it seems like it'd be pretty >> straightforward to teach plpgsql to handle >> >> (a, b, ...) = row-valued-expression >> >> where there wouldn't actually be any need to parse the RHS any differently >> from the way plpgsql parses an assignment RHS right now. Which would be >> a good thing IMO. If we don't generalize the behavior of scalar >> subqueries then plpgsql would have to jump through a lot of hoops to >> support the subselect case. > You can already do the equivalent of (a,b,c) = (1,2,3) with SELECT .. > INTO. Would you oppose to starting the work on this by only supporting > the subquery syntax, with the implementation being similar to how we > currently handle SELECT .. INTO? You can try if you want, but I suspect it will result in writing a lot of basically throwaway code, ie, not something that would be a precursor of code that could support the generic-row-valued-expression case. regards, tom lane
On 1/14/14, 11:15 AM, Tom Lane wrote: >> How about: >> > (a) = SELECT 1; >> > (a, b) = SELECT 1, 2; >> > (a, b) = INSERT INTO foo RETURNING col1, col2; >> >Same semantics: TOO_MANY_ROWS on rows > 1, sets FOUND and row_count. >> >AFAICT this can be parsed unambiguously, too, and we don't need to look >> >at the query string because this is new syntax. > The idea of inventing new syntax along this line seems like a positive > direction to pursue. Since assignment already rejects multiple rows > from the source expression, this wouldn't be weirdly inconsistent. Do we actually support = right now? We already support v_field := field FROM table ... ; and I think it's a bad idea to have different meaning for = and :=. > I'm not too sure what it'd take to make this work. Right now, > > SELECT (SELECT x, y FROM foo WHERE id = 42); > > would generate "ERROR: subquery must return only one column", but > I think it's mostly a historical artifact that it does that rather than > returning a composite value (of an anonymous record type). If we were > willing to make that change then it seems like it'd be pretty > straightforward to teach plpgsql to handle > > (a, b, ...) = row-valued-expression > > where there wouldn't actually be any need to parse the RHS any differently > from the way plpgsql parses an assignment RHS right now. Which would be > a good thing IMO. If we don't generalize the behavior of scalar > subqueries then plpgsql would have to jump through a lot of hoops to > support the subselect case. I have no idea if this is related or not, but I would REALLY like for this to work (doesn't in 8.4, AFAIK not in 9.1 either...) CREATE FUNCTION f(int) RETURNS text STABLE LANGUAGE sql AS ( SELECT field FROM table WHERE table_id = $1 ); SELECT f(blah_id) FROM ... to be equivalent to SELECT ( SELECT field FROM table WHERE table_id = blah_id ) FROM ... That would make it very easy to do a lot of code simplification with no performance loss. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Jim Nasby <jim@nasby.net> writes: > Do we actually support = right now? We already support > v_field := field FROM table ... ; > and I think it's a bad idea to have different meaning for = and :=. That ship sailed a *very* long time ago. See other thread about documenting rather than ignoring this more-or-less-aboriginal behavior of plpgsql. > I have no idea if this is related or not, but I would REALLY like for this to work (doesn't in 8.4, AFAIK not in 9.1 either...) Hm ... too tired to be sure, but I think the issue about inlining a function of this kind has to do with whether you get the same answers in corner cases such as subselect fetching no rows. regards, tom lane
2014/1/15 Jim Nasby <jim@nasby.net>
On 1/14/14, 11:15 AM, Tom Lane wrote:Do we actually support = right now? We already supportHow about:The idea of inventing new syntax along this line seems like a positive
> (a) = SELECT 1;
> (a, b) = SELECT 1, 2;
> (a, b) = INSERT INTO foo RETURNING col1, col2;
>Same semantics: TOO_MANY_ROWS on rows > 1, sets FOUND and row_count.
>AFAICT this can be parsed unambiguously, too, and we don't need to look
>at the query string because this is new syntax.
direction to pursue. Since assignment already rejects multiple rows
from the source expression, this wouldn't be weirdly inconsistent.
v_field := field FROM table ... ;
and I think it's a bad idea to have different meaning for = and :=.
It is probably second the most ugly thing on plpgsql. I don't know about other crippled embedded SQL statement in any stored procedure language.
Regards
Pavel
Pavel
I have no idea if this is related or not, but I would REALLY like for this to work (doesn't in 8.4, AFAIK not in 9.1 either...)I'm not too sure what it'd take to make this work. Right now,
SELECT (SELECT x, y FROM foo WHERE id = 42);
would generate "ERROR: subquery must return only one column", but
I think it's mostly a historical artifact that it does that rather than
returning a composite value (of an anonymous record type). If we were
willing to make that change then it seems like it'd be pretty
straightforward to teach plpgsql to handle
(a, b, ...) = row-valued-expression
where there wouldn't actually be any need to parse the RHS any differently
from the way plpgsql parses an assignment RHS right now. Which would be
a good thing IMO. If we don't generalize the behavior of scalar
subqueries then plpgsql would have to jump through a lot of hoops to
support the subselect case.
CREATE FUNCTION f(int) RETURNS text STABLE LANGUAGE sql AS ( SELECT field FROM table WHERE table_id = $1 );
SELECT f(blah_id) FROM ...
to be equivalent to
SELECT ( SELECT field FROM table WHERE table_id = blah_id ) FROM ...
That would make it very easy to do a lot of code simplification with no performance loss.--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jan 15, 2014 at 8:23 AM, Jim Nasby <jim@nasby.net> wrote: > Do we actually support = right now? We already support > > v_field := field FROM table ... ; > > and I think it's a bad idea to have different meaning for = and :=. That was already discussed before. Yes, we support both = and := and they have exactly the same behavior; I agree, we should keep them equivalent. Regards, Marti
On 1/15/14, 12:35 AM, Tom Lane wrote: > Jim Nasby <jim@nasby.net> writes: >> Do we actually support = right now? We already support >> v_field := field FROM table ... ; >> and I think it's a bad idea to have different meaning for = and :=. > > That ship sailed a *very* long time ago. See other thread about > documenting rather than ignoring this more-or-less-aboriginal > behavior of plpgsql. Yeah, I had no idea that was even supported... >> I have no idea if this is related or not, but I would REALLY like for this to work (doesn't in 8.4, AFAIK not in 9.1 either...) > > Hm ... too tired to be sure, but I think the issue about inlining a > function of this kind has to do with whether you get the same answers > in corner cases such as subselect fetching no rows. There was some discussion about this a few years ago and I think that was essentially the issue. What I think would work is essentially a macro that would dump the function definition right into the query and then letthe planner deal with it. So SELECT blah, ( SELECT status_code FROM status_code WHERE status_code_id = blah_status_code_id ) FROM blah; can become simply SELECT blah, status_code__get_text( blah_status_code_id ) FROM blah; but have it translate to the same raw SQL, same as views. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 1/14/14, 6:15 PM, Tom Lane wrote: > We don't actually implement this in PG yet, except for trivial cases, but > it will certainly happen eventually. I think your sketch above deviates > unnecessarily from what the standard says for UPDATE. In particular > I think it'd be better to write things like > > (a, b) = ROW(1, 2); > (a, b, c) = (SELECT x, y, z FROM foo WHERE id = 42); > > which would exactly match what you'd write in a multiple-assignment > UPDATE, and it has the same rejects-multiple-rows semantics too. Just in case someone's interested: I won't be working on this for 9.5. If someone feels like picking this patch up, be my guest. .marko