Thread: MySQL search query is not executing in Postgres DB
In MySQL the below query is executing properly. SELECT * FROM <Table-name> WHERE (Table.ID LIKE '1%') But when i try to execute the above query in Postgres, i get the following Exception "org.postgresql.util.PSQLException: ERROR: operator does not exist: integer ~~ unknown Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts". If i convert the same query " SELECT * FROM <Table-name> WHERE CAST(Table.ID as TEXT) LIKE '1%' ". This gets executed directly in Postgres DB. But i need some query which implicitly type cast in DB, which allows me to execute the MySQL query without any Exception. Because i remember there is a way for integer to boolean implicit type cast. Please refer the following link. http://archives.postgresql.org/pgsql-general/2011-01/msg00866.php Thanks in advance. -- View this message in context: http://postgresql.1045698.n5.nabble.com/MySQL-search-query-is-not-executing-in-Postgres-DB-tp5491531p5491531.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 17.02.2012 07:33, premanand wrote: > In MySQL the below query is executing properly. > > SELECT * FROM<Table-name> WHERE (Table.ID LIKE '1%') > > But when i try to execute the above query in Postgres, i get the following > Exception "org.postgresql.util.PSQLException: ERROR: operator does not > exist: integer ~~ unknown Hint: No operator matches the given name and > argument type(s). You might need to add explicit type casts". > > If i convert the same query " SELECT * FROM<Table-name> WHERE CAST(Table.ID > as TEXT) LIKE '1%' ". This gets executed directly in Postgres DB. But i need > some query which implicitly type cast in DB, which allows me to execute the > MySQL query without any Exception. Because i remember there is a way for > integer to boolean implicit type cast. Please refer the following link. > http://archives.postgresql.org/pgsql-general/2011-01/msg00866.php You can use CREATE CAST (http://www.postgresql.org/docs/current/static/sql-createcast.html). Or you can create the operator "integer ~~ text" with CREATE FUNCTION + CREATE OPERATOR. The latter would match fewer cases, which would reduce the chances of introducing subtle bugs elsewhere in your application. Of course, the best fix would be to change your queries. It's quite sloppy to rely on "integer LIKE text" without an explicit cast in the query. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, Thanks for the reply. It is not possible to change the query in our environment. So we need to do casting. I'm a new bee, so it will be more helpful if you give me some simple examples. -- View this message in context: http://postgresql.1045698.n5.nabble.com/MySQL-search-query-is-not-executing-in-Postgres-DB-tp5491531p5491947.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 02/17/2012 04:22 AM, premanand wrote: > Hi, > > Thanks for the reply. It is not possible to change the query in our > environment. So we need to do casting. I'm a new bee, so it will be more > helpful if you give me some simple examples. > That's not what this list is for, You should be asking in pgsql-general, not here, (so don't just reply to this email). But in any case, you should take Heikki's advice seriously. What you are doing is broken. You should fix it and not just say "Oh, we can't". cheers andrew
On Fri, Feb 17, 2012 at 2:45 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Of course, the best fix would be to change your queries. It's quite sloppy > to rely on "integer LIKE text" without an explicit cast in the query. Why? I understand why it's a bad idea to rely on integer = text doing anything sane - is that text equality or numeric equality? And in theory the same issue could exist here if there were another meaning for LIKE. But there isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I understand why it's a bad idea to rely on integer = text doing > anything sane - is that text equality or numeric equality? > And in theory the same issue could exist here if there were another > meaning for LIKE. But there isn't. Really? LIKE is just a different spelling for operator ~~, and there is nothing stopping people from defining more operators named that, not to mention that there are already four of them in core PG. In particular the bytea ~~ bytea version is explicitly intended to provide a LIKE implementation for non-text types. I see some operators named ~~ in contrib as well. regards, tom lane
On Fri, Feb 17, 2012 at 10:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I understand why it's a bad idea to rely on integer = text doing >> anything sane - is that text equality or numeric equality? > >> And in theory the same issue could exist here if there were another >> meaning for LIKE. But there isn't. > > Really? LIKE is just a different spelling for operator ~~, and there is > nothing stopping people from defining more operators named that, not to > mention that there are already four of them in core PG. In particular > the bytea ~~ bytea version is explicitly intended to provide a LIKE > implementation for non-text types. I see some operators named ~~ in > contrib as well. As far as I know, the fact that LIKE is another spelling for ~~ is a PostgreSQL implementation detail with which users ought not to be burdened. But even given that, there are many situations where we currently complain about ambiguity even though the ambiguity is entirely hypothetical: there COULD be a ~~(int,text) operator, but there actually isn't. Now, I'll admit that this is not an easy problem to solve without giving up something somewhere, since it's clearly undesirable for the meaning of something that worked before to silently change when and if someone defines a new operator. But on the other hand, I think that labeling the user's coding practices as sloppy is a cop-out. There are many, many people running on not only MySQL but also on Oracle who have written large amounts of code that requires fewer type casts on those systems than it does on PostgreSQL. That fact presents serious migration challenges for such users when they move over to PostgreSQL. Labeling the code as the problem excuses us from the need to think about how to make our type system work any better than it does today. Boo, hiss. If we're not doing anything about this because we have carefully examined the subject and decided that this is a trade-off we must make because MySQL or Oracle doesn't support XYZ and we do, then let's give that explanation to the user instead of telling them the problem is that their code stinks. Otherwise, we have some soul-searching to do, as time permits. I remember there was a time when you couldn't say "SELECT a x FROM foo" in PostgreSQL. We told people that it was because our syntax was more flexible - we have postfix operators, or something. I no longer remember the details of where the grammar conflict came from. But somebody (probably you or Hiroshi Saito, judging by the commit log) figured out a way to get around the problem, and now that syntax works fine in 99% of the cases people care about. That is a huge usability improvement for people coming from other database systems where AS was never required. I don't know whether a similar improvement is possible in this area, but we're certainly not going to get there by labeling the user's expectations as unreasonable. I don't think they are, and the people who wrote MySQL and Oracle evidently agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I remember there was a time when you couldn't say "SELECT a x FROM > foo" in PostgreSQL. We told people that it was because our syntax was > more flexible - we have postfix operators, or something. Which it was, and yes that was the reason. We eventually thought of a kluge solution that lets you omit "AS" 90% of the time, which is better than nothing; but I doubt it would ever have been accepted if it weren't a matter of improving standards compliance. I am pretty sure that the SQL spec doesn't say that you should be able to apply LIKE directly to an integer, so that issue isn't comparable to this one. > I don't know whether a similar improvement is > possible in this area, but we're certainly not going to get there by > labeling the user's expectations as unreasonable. I don't think they > are, and the people who wrote MySQL and Oracle evidently agree. The people who wrote MySQL had very poor taste in a lot of areas, and we are not going to blindly follow their lead. Oracle is not a terribly presentable system either. Having said that, I don't object to any clean improvements we can think of in this area --- but "make it work more like MySQL" had better not be the only argument for it. regards, tom lane
Robert Haas <robertmhaas@gmail.com> wrote: > As far as I know, the fact that LIKE is another spelling for ~~ is a > PostgreSQL implementation detail with which users ought not to be > burdened. +1 LIKE is well defined by the standard, and the ~~ operator is not mentioned there anywhere. On the other hand, LIKE is defined to either work on character strings or binary strings -- there is nothing in the standard about using it with other data types or automatic casting to support that. Any such support would be a non-standard PostgreSQL extension. As such, anyone wanting to write portable code should avoid that by explicit casting (which should be portable). > I remember there was a time when you couldn't say "SELECT a x FROM > foo" in PostgreSQL. That was in violation of the SQL standard, which makes AS an allowed but optional noise word. In spite of all that, perhaps we should have a compatibility extension which provides more casts, in an attempt to ease the transition from other databases? Personally, I like having the default behavior this strict -- I think it reduces the chance of errors, reduces the chances of accidentally having type mismatches which defeat optimizations, and improves portability. But I have nothing against allowing someone to give all that up to ease transition from another product. My biggest concern is whether we might "paint ourselves into a corner" by including such an extension. It might shut off avenues for other cool features because anyone using the extension would have conflicts. Perhaps such a thing would be more appropriate on PGXN with admonitions that it was only intended to ease conversion and that users were encouraged to migrate to standard syntax as soon as possible. -Kevin
2012/2/17 Kevin Grittner <Kevin.Grittner@wicourts.gov>: > Robert Haas <robertmhaas@gmail.com> wrote: > >> As far as I know, the fact that LIKE is another spelling for ~~ is > a >> PostgreSQL implementation detail with which users ought not to be >> burdened. > > +1 > > LIKE is well defined by the standard, and the ~~ operator is not > mentioned there anywhere. > > On the other hand, LIKE is defined to either work on character > strings or binary strings -- there is nothing in the standard about > using it with other data types or automatic casting to support that. > Any such support would be a non-standard PostgreSQL extension. As > such, anyone wanting to write portable code should avoid that by > explicit casting (which should be portable). > >> I remember there was a time when you couldn't say "SELECT a x FROM >> foo" in PostgreSQL. > > That was in violation of the SQL standard, which makes AS an allowed > but optional noise word. > > In spite of all that, perhaps we should have a compatibility > extension which provides more casts, in an attempt to ease the > transition from other databases? Personally, I like having the > default behavior this strict -- I think it reduces the chance of > errors, reduces the chances of accidentally having type mismatches > which defeat optimizations, and improves portability. But I have > nothing against allowing someone to give all that up to ease > transition from another product. My biggest concern is whether we > might "paint ourselves into a corner" by including such an > extension. It might shut off avenues for other cool features > because anyone using the extension would have conflicts. Perhaps > such a thing would be more appropriate on PGXN with admonitions that > it was only intended to ease conversion and that users were > encouraged to migrate to standard syntax as soon as possible. > +1 Pavel > -Kevin > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Feb 17, 2012 at 12:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I don't know whether a similar improvement is >> possible in this area, but we're certainly not going to get there by >> labeling the user's expectations as unreasonable. I don't think they >> are, and the people who wrote MySQL and Oracle evidently agree. > > The people who wrote MySQL had very poor taste in a lot of areas, and > we are not going to blindly follow their lead. Oracle is not a terribly > presentable system either. Having said that, I don't object to any > clean improvements we can think of in this area --- but "make it work > more like MySQL" had better not be the only argument for it. Hey, if I preferred MySQL to PostgreSQL, I wouldn't be here. That doesn't mean that there are exactly 0 things that they do better than we do. What I'm unhappy about isn't that we're not bug-compatible with MySQL, but rather that, in this case, I like MySQL's behavior better, and the fact that they've made it work means it's not theoretically impossible. It just involves some trade-off that I don't believe we've thought about hard enough. Standards compliance is a means to an end. The purpose of having standards is to allow for interoperable implementations of the same underlying functionality. That doesn't mean we should copy non-standard warts, of course, but it isn't obvious to me that this is a wart. No one has suggested that the user's actual query has more than one reasonable interpretation, so complaining that it's ambiguous doesn't impress me very much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 02/17/2012 12:59 PM, Robert Haas wrote: > On Fri, Feb 17, 2012 at 12:14 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> I don't know whether a similar improvement is >>> possible in this area, but we're certainly not going to get there by >>> labeling the user's expectations as unreasonable. I don't think they >>> are, and the people who wrote MySQL and Oracle evidently agree. >> The people who wrote MySQL had very poor taste in a lot of areas, and >> we are not going to blindly follow their lead. Oracle is not a terribly >> presentable system either. Having said that, I don't object to any >> clean improvements we can think of in this area --- but "make it work >> more like MySQL" had better not be the only argument for it. > Hey, if I preferred MySQL to PostgreSQL, I wouldn't be here. That > doesn't mean that there are exactly 0 things that they do better than > we do. What I'm unhappy about isn't that we're not bug-compatible > with MySQL, but rather that, in this case, I like MySQL's behavior > better, and the fact that they've made it work means it's not > theoretically impossible. It just involves some trade-off that I > don't believe we've thought about hard enough. > > Standards compliance is a means to an end. The purpose of having > standards is to allow for interoperable implementations of the same > underlying functionality. That doesn't mean we should copy > non-standard warts, of course, but it isn't obvious to me that this is > a wart. No one has suggested that the user's actual query has more > than one reasonable interpretation, so complaining that it's ambiguous > doesn't impress me very much. Assuming we had the cast, What would "intval like '1%'" mean? You're going to match 1, 10..19, 100..199, 1000..1999 ... Now maybe there's a good use for such a test, but I'm have a VERY hard time imagining what it might be. cheers andrew
Greetings, > > My biggest concern is whether we > > might "paint ourselves into a corner" by including such an > > extension. It might shut off avenues for other cool features > > because anyone using the extension would have conflicts. Perhaps > > such a thing would be more appropriate on PGXN with admonitions that > > it was only intended to ease conversion and that users were > > encouraged to migrate to standard syntax as soon as possible. > IMHO if you give someone syntax surgar like this and tell them to "fix it ASAP" it will never get fixed properly. I'm all for getting new users to pgsql, but this is not the way to do it. Regards, J
On Fri, Feb 17, 2012 at 1:21 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > Assuming we had the cast, What would "intval like '1%'" mean? You're going > to match 1, 10..19, 100..199, 1000..1999 ... > > Now maybe there's a good use for such a test, but I'm have a VERY hard time > imagining what it might be. Well, I can readily see someone encoding parts of their application into this sort of encoding, so that, for instance, all customer numbers beginning with "1" are deemed to be "internal accounts." It's a pretty terrible approach to encoding data for an application; it leads to stuff like the "oops, once they have generated 30,000 invoices, the system reaches doomsday and can't work anymore." http://thedailywtf.com/Articles/Ive-Got-Your-Number.aspx But nothing prevents users from designing their applications to encode information in their ID prefixes. And I have *zero* confidence that for PostgreSQL to rule out "LIKE '1%'" is preventing those designs from getting built... When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On Feb 17, 2012, at 11:28 AM, Christopher Browne wrote: > And I have *zero* confidence that for PostgreSQL to rule out "LIKE > '1%'" is preventing those designs from getting built... > > When confronted by a difficult problem, solve it by reducing it to the > question, "How would the Lone Ranger handle this?" He would turn to Tonto, who undoubtably would advise: If you want to treat an integer like a string, figure out how to convert your integer into a string … The original query strikes me as being similar to expecting the Lone Ranger's six-shooter to be capable of shooting Tonto'sarrows. ---- Don Baccus http://donb.photo.net http://birdnotes.net http://openacs.org
On 02/17/2012 02:28 PM, Christopher Browne wrote: > On Fri, Feb 17, 2012 at 1:21 PM, Andrew Dunstan<andrew@dunslane.net> wrote: >> Assuming we had the cast, What would "intval like '1%'" mean? You're going >> to match 1, 10..19, 100..199, 1000..1999 ... >> >> Now maybe there's a good use for such a test, but I'm have a VERY hard time >> imagining what it might be. > Well, I can readily see someone encoding parts of their application > into this sort of encoding, so that, for instance, all customer > numbers beginning with "1" are deemed to be "internal accounts." > > It's a pretty terrible approach to encoding data for an application; > it leads to stuff like the "oops, once they have generated 30,000 > invoices, the system reaches doomsday and can't work anymore." > http://thedailywtf.com/Articles/Ive-Got-Your-Number.aspx > > But nothing prevents users from designing their applications to encode > information in their ID prefixes. > > And I have *zero* confidence that for PostgreSQL to rule out "LIKE > '1%'" is preventing those designs from getting built... > > When confronted by a difficult problem, solve it by reducing it to the > question, "How would the Lone Ranger handle this?" > Strings of digits used that way should not be stored in numeric fields at all, IMNSHO, just as zip codes and phone numbers should not be. They should be text in the first place, and if the OP had done that he wouldn't have had any difficulty about this. I hope that's what the Lone Ranger would do ... cheers andrew
On Fri, Feb 17, 2012 at 1:21 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > Assuming we had the cast, What would "intval like '1%'" mean? You're going > to match 1, 10..19, 100..199, 1000..1999 ... Yep. > Now maybe there's a good use for such a test, but I'm have a VERY hard time > imagining what it might be. Dunno. Presumably the test is meaningful for the OP's IDs, or he wouldn't have written the query that way. The time I got bitten by this was actually with LPAD(), rather than LIKE. I had a serial column which I wanted to use to generate record identifiers off of a sequence: B00001, B00002, B00003, B00004, etc. So I wrote 'B' || lpad(id, 5, '0'). When the implicit casting changes came along, I had to go back and change that to id::text. Fortunately that wasn't a lot of work, especially since by that time I was following pgsql-hackers enough to understand immediately why it broke, but it did and does seem unnecessary, because there is no real ambiguity there. Yeah, there could be ambiguity, if someone created another LPAD() function... but no one did. Here's yet another case where the current rules are thoroughly disagreeable. rhaas=# create or replace function z(smallint) returns smallint as $$select $1+1$$ language sql; ERROR: return type mismatch in function declared to return smallint DETAIL: Actual return type is integer. CONTEXT: SQL function "z" So cast the result from an integer to a smallint. What's the big deal? But, OK, I'll do it your way: rhaas=# create or replace function z(smallint) returns smallint as $$select $1+1::smallint$$ language sql; CREATE FUNCTION rhaas=# select z(1); ERROR: function z(integer) does not exist LINE 1: select z(1); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. Come on, really? Note that the above example works without casts if you use int *or* bigint *or* numeric, but not smallint. That could be fixed by causing sufficiently-small integers to lex as smallints, but if you think implicit datatype coercions are evil, you ought to be outraged by the fact that we are already going out of our way to blur the line between int, bigint, and numeric. We let people write 2.0 + 3 and get 5.0 - surely it's only a short step from there to human sacrifice, cats and dogs living together... mass hysteria! I mean, the whole reason for rejecting integer = text is that we aren't sure whether to coerce the text to an integer or the integer to a string, and it's better to throw an error than to guess. But in the case of 2.0 + 3, we feel 100% confident in predicting that the user will be happy to convert the integer to a numeric rather than the numeric to an integer, so no error. We do that because we know that the domain of numeric is a superset of the domain of integer, or in other words, we are using context clues to deduce what the user probably meant rather than forcing them to be explicit about it. And yet in other cases, such as LIKE or LPAD with an integer rather than a string, or just about anything involving smallint, the user is required to be explicit, even though in most cases there is only one reasonable implementation of the query. What is the value in erroring out on a query that's not actually ambiguous? Numerous people here are defending that behavior as if it were somehow morally superior, but I think it's merely an accident of how the post-8.3 type system happens to work. On pre-8.3 systems this all works differently, and some of those old behaviors are worse than what we have now, while others are better. If we really believed that implicit casts any form were evil, we would have removed them entirely instead of trimming them back. I don't see why it's heretical to suggest that the 8.3 casting changes brought us to exactly that point in the universe where everything is perfect and nothing can be further improved; does anyone seriously believe that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Andrew Dunstan <andrew@dunslane.net> writes: > Assuming we had the cast, What would "intval like '1%'" mean? You're > going to match 1, 10..19, 100..199, 1000..1999 ... > Now maybe there's a good use for such a test, but I'm have a VERY hard > time imagining what it might be. Yeah, that's another point worth asking. Coercing an integer to text and then doing LIKE on it is an extremely inefficient way to do what's probably the wrong thing anyhow. I would be interested to know exactly why the OP wants to do this, and whether it couldn't be done better with some arithmetical test. In this connection it's worth remembering that when we removed some implicit casts in 8.3, we heard lots of yelling, and we also heard from lots of people who found bugs in their SQL code that the implicit casts had masked. Allowing LIKE-on-anything could be a rerun of that mess. regards, tom lane
On Fri, Feb 17, 2012 at 2:44 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > Strings of digits used that way should not be stored in numeric fields at > all, IMNSHO, just as zip codes and phone numbers should not be. They should > be text in the first place, and if the OP had done that he wouldn't have had > any difficulty about this. I hope that's what the Lone Ranger would do ... The argument isn't about whether the user made the right design choices; it's about whether he should be forced to insert an explicit type cast to get the query to do what it is unambiguously intended to do. I don't believe it's entirely self-evident that it's always better to store strings of integers in a text column rather than as an integer or bigint - integers are pretty fast and compact. Even granting that a text field would have been better, nobody's arguing that you can't do LIKE against an integer column; we're just discussing what syntax is required to make the user's intent acceptably clear. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 02/17/2012 02:52 PM, Robert Haas wrote: > If we really believed that implicit casts any form were evil, we would > have removed them entirely instead of trimming them back. I don't see > why it's heretical to suggest that the 8.3 casting changes brought us > to exactly that point in the universe where everything is perfect and > nothing can be further improved; does anyone seriously believe that? I don't believe we are necessarily at a perfect place, nor have I said it, nor has anyone else that I'm aware of. Neither am I opposed to implementing MySQL features (or doing them better) when appropriate. But I do believe that a test for "intval like '1%'" is very likely to come from a broken design. cheers andrew
On 2/17/12 12:04 PM, Robert Haas wrote: > The argument isn't about whether the user made the right design > choices; it's about whether he should be forced to insert an explicit > type cast to get the query to do what it is unambiguously intended to > do. I don't find INTEGER LIKE '1%' to be unambiguous. Prior to this discussion, if I had run across such a piece of code, I couldn't have told you what it would do in MySQL without testing. What *does* it do in MySQL? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Feb 17, 2012, at 4:12 PM, Josh Berkus wrote: > On 2/17/12 12:04 PM, Robert Haas wrote: >> The argument isn't about whether the user made the right design >> choices; it's about whether he should be forced to insert an explicit >> type cast to get the query to do what it is unambiguously intended to >> do. > > I don't find INTEGER LIKE '1%' to be unambiguous. > > Prior to this discussion, if I had run across such a piece of code, I > couldn't have told you what it would do in MySQL without testing. If someone showed it to me without mention MySQL I'd say: "oh, it's an error". > > What *does* it do in MySQL? And knowing it's MySQL … "oh, probably *not* an error", but like you … I'd be mystified. Should 01 like '0%' match? ---- Don Baccus http://donb.photo.net http://birdnotes.net http://openacs.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > The time I got bitten by this was actually with LPAD(), rather than LIKE. +1. This is one of the functions that gave some of our clients real trouble when 8.3 came out. > If we really believed that implicit casts any form were evil, we > would have removed them entirely instead of trimming them back. > I don't see why it's heretical to suggest that the 8.3 casting > changes brought us to exactly that point in the universe where > everything is perfect and nothing can be further improved; does > anyone seriously believe that? Agreed (although the last bit is a bit of a straw man). The idea in this thread of putting some implicit casts into an extension or other external package is not a very good one, either. Let's apply some common sense instead, and stick to our guns on the ones where we feel there could honestly be serious app consequences and thus we encourage^H^Hforce people to change their code (or write all sorts of custom casts and functions). I think the actual number of such app circumstances is rather small, but my clients are not your* clients, so who knows? In other words, I'll concede int==text, but really need a strong argument for conceding things like LPAD. * Your = everyone else, not just M. Haas. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201202181145 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAk8/1usACgkQvJuQZxSWSsjE6ACdHy31jpHUsXo5juvXcCkzKpGH RQAAoM/uTbM/JBkDiDjrsI1Blyg3DsWf =7CA4 -----END PGP SIGNATURE-----
On Fri, Feb 17, 2012 at 4:12 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 2/17/12 12:04 PM, Robert Haas wrote: >> The argument isn't about whether the user made the right design >> choices; it's about whether he should be forced to insert an explicit >> type cast to get the query to do what it is unambiguously intended to >> do. > > I don't find INTEGER LIKE '1%' to be unambiguous. > > Prior to this discussion, if I had run across such a piece of code, I > couldn't have told you what it would do in MySQL without testing. > > What *does* it do in MySQL? > IIRC it casts each INTEGER (without any left padding) to text and then does the comparison as per normal. Comparison of dissimilar types are a recipe for full table scans and unexpected results. A really good example is select * from employees where first_name=5; vs select * from employees where first_name='5'; Where first_name is string the queries above have very different behaviour in MySQL. The first does a full table scan and coerces first_name to an integer (so '5adfs' -> 5) while the second can use an index as it is normal string comparison. I have seen this sort of things cause significant production issues several times.* I have seen several companies use comparisons of dissimilar data types as part of their stump the prospective DBA test and they stump lots of folks. -- Rob Wultsch wultsch@gmail.com
On Feb 18, 2012, at 12:57 PM, Rob Wultsch wrote: > > Where first_name is string the queries above have very different > behaviour in MySQL. The first does a full table scan and coerces > first_name to an integer (so '5adfs' -> 5) Oh my, I can't wait to see someone rise to the defense of *this* behavior! ---- Don Baccus http://donb.photo.net http://birdnotes.net http://openacs.org
On Sat, Feb 18, 2012 at 4:12 PM, Don Baccus <dhogaza@pacifier.com> wrote: > > On Feb 18, 2012, at 12:57 PM, Rob Wultsch wrote: >> >> Where first_name is string the queries above have very different >> behaviour in MySQL. The first does a full table scan and coerces >> first_name to an integer (so '5adfs' -> 5) > > Oh my, I can't wait to see someone rise to the defense of *this* behavior! I can see a use, albeit a clumsy one, to the notion of looking for values WHERE integer_id_column like '1%' It's entirely common for companies to organize general ledger account numbers by having numeric prefixes that are somewhat meaningful. A hierarchy like the following is perfectly logical:- 0000 to 0999 :: Cash accounts [1]- 1000 to 1999 :: Short Term Assets-2000 to 2999 :: Long Term Assets- 3000 to 3999 :: Incomes- 4000 to 4999 :: Costs of Goods Sold- 5000 to 5999 :: OtherExpenses- 6000 to 6999 :: Share Capital- 7000 to 7999 :: Retained Earnings and such And back in the pre-computer days, accountants got very comfortable with the shorthands that, for instance, "Income is in the 3000 series." We are much smarter today (well, not necessarily!) and can use other ways to indicate hierarchy, so that there's no reason to *care* what that account number is. But if old-school accountants that think "3000 series" *demand* that, and as they're likely senior enough to assert their way, they're likely to succeed in that demand, then it's pretty easy to this to lead to somewhat clumsy "account_id like '3%'" as a search for income. If I put my purist hat on, then the *right* answer is a range query, thus WHERE account_id between 3000 and 3999 The new RANGE stuff that Jeff Davis has been adding into 9.2 should, in principle, be the even better way to represent this kind of thing. I'd think it nearly insane if someone was expecting '3%' to match not only the '3000 thru 3999' series, but also '300 to 399' and "30 to 39" and "3". A situation where that is the right set of results requires a mighty strangely designed numbering system. I imagine a designer would want to rule out the range 0-999, in such a design. Nonetheless, the need for "where account_id like '1%'" comes from a system designed with the above kind of thinking about account numbers, and that approach fits mighty well with the way people thought back when a "computer" was a person whose job it was to work out sums. Notes: [1] A careful observer will notice that the prefix notion doesn't work for the first range without forcing leading zeroes onto numbers... -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On Feb 18, 2012, at 1:43 PM, Christopher Browne wrote: > On Sat, Feb 18, 2012 at 4:12 PM, Don Baccus <dhogaza@pacifier.com> wrote: >> >> On Feb 18, 2012, at 12:57 PM, Rob Wultsch wrote: >>> >>> Where first_name is string the queries above have very different >>> behaviour in MySQL. The first does a full table scan and coerces >>> first_name to an integer (so '5adfs' -> 5) >> >> Oh my, I can't wait to see someone rise to the defense of *this* behavior! > > I can see a use, albeit a clumsy one, to the notion of looking for values > WHERE integer_id_column like '1%' > > It's entirely common for companies to organize general ledger account > numbers by having numeric prefixes that are somewhat meaningful. > > A hierarchy like the following is perfectly logical: > - 0000 to 0999 :: Cash accounts [1] I asked earlier if anyone would expect 01 like '0%' to match … Apparently so! Your example is actually a good argument for storing account ids as text, because '0000' like '0%' *will* match. "I'd think it nearly insane if someone was expecting '3%' to match not only the '3000 thru 3999' series, but also '300 to 399' and "30 to 39" and "3"." How is PG supposed to know that integers compared to strings are always to be padded out to precisely 4 digits? ---- Don Baccus http://donb.photo.net http://birdnotes.net http://openacs.org
Don Baccus <dhogaza@pacifier.com> writes: >> A hierarchy like the following is perfectly logical: >> - 0000 to 0999 :: Cash accounts [1] > > Your example is actually a good argument for storing account ids as > text, because '0000' like '0%' *will* match. FWIW, I too think that if you want to process your integers as text for some operations (LIKE) and as integer for some others, you'd better do the casting explicitly. In the worked-out example Christopher has been proposing, just alter the column type to text and be done, I can't see summing up or whatever int arithmetic usage being done on those general ledger account numbers. Use a domain (well a CHECK constraint really) to tight things down. As for lpad(), that's a function working on text that returns text, so having a variant that accepts integers would not be confusing. Then again, why aren't you using to_char() if processing integers? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support PS: having worked on telephone number prefix indexing and processing them as text, I might have a biased opinion. Youdon't add up phone numbers, though, do you?
On 02/18/2012 05:34 PM, Don Baccus wrote: > On Feb 18, 2012, at 1:43 PM, Christopher Browne wrote: > >> On Sat, Feb 18, 2012 at 4:12 PM, Don Baccus<dhogaza@pacifier.com> wrote: >>> On Feb 18, 2012, at 12:57 PM, Rob Wultsch wrote: >>>> Where first_name is string the queries above have very different >>>> behaviour in MySQL. The first does a full table scan and coerces >>>> first_name to an integer (so '5adfs' -> 5) >>> Oh my, I can't wait to see someone rise to the defense of *this* behavior! >> I can see a use, albeit a clumsy one, to the notion of looking for values >> WHERE integer_id_column like '1%' >> >> It's entirely common for companies to organize general ledger account >> numbers by having numeric prefixes that are somewhat meaningful. >> >> A hierarchy like the following is perfectly logical: >> - 0000 to 0999 :: Cash accounts [1] > I asked earlier if anyone would expect 01 like '0%' to match … > > Apparently so! > > Your example is actually a good argument for storing account ids as text, because '0000' like '0%' *will* match. > > "I'd think it nearly insane if someone was expecting '3%' to match not > only the '3000 thru 3999' series, but also '300 to 399' and "30 to 39" > and "3"." > > How is PG supposed to know that integers compared to strings are always to be padded out to precisely 4 digits? > By this point the Lone Ranger has committed suicide. cheers andrew
On Sat, Feb 18, 2012 at 5:34 PM, Don Baccus <dhogaza@pacifier.com> wrote: > > On Feb 18, 2012, at 1:43 PM, Christopher Browne wrote: >> A hierarchy like the following is perfectly logical: >> - 0000 to 0999 :: Cash accounts [1] > > I asked earlier if anyone would expect 01 like '0%' to match … > > Apparently so! Yes, and I was intentionally treating this as an oddity. > Your example is actually a good argument for storing account ids as text, because '0000' like '0%' *will* match. Absolutely. The trouble is that if you use the term "account NUMBER" enough times, some portion of people will think that it's a number in the sense that it should be meaningful to add and subtract against them. > "I'd think it nearly insane if someone was expecting '3%' to match not > only the '3000 thru 3999' series, but also '300 to 399' and "30 to 39" > and "3"." > > How is PG supposed to know that integers compared to strings are always to be padded out to precisely 4 digits? I think it's not quite right to treat it as "how is PG supposed to know." The problem is a bit more abstract; it occurs without having a database involved. The notion that the ranges (3), (30-39), (300-399), and (3000-3999) ought to be considered connected together in the account number classification is what seems crazy to me. But that's what "account number starts with a 3" could be expected to imply. At any rate, yes, this is liable to point the Lone Ranger towards solutions that involve him not riding off into the sunset! -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On Sat, Feb 18, 2012 at 4:12 PM, Don Baccus <dhogaza@pacifier.com> wrote: > On Feb 18, 2012, at 12:57 PM, Rob Wultsch wrote: >> >> Where first_name is string the queries above have very different >> behaviour in MySQL. The first does a full table scan and coerces >> first_name to an integer (so '5adfs' -> 5) > > Oh my, I can't wait to see someone rise to the defense of *this* behavior! Well, this gets to my point. The behavior Rob is mentioning here is the one that caused us to make the implicit casting changes in the first place. And, in this situation, I agree that throwing an error is much better than silently doing something that may be quite different from what the user expects. However, the fact that the implicit casting changes are an improvement in this case does not mean that they are an improvement in every case.All I am asking for here is that we examine the variouscases on their merits rather than assuming that our way must be better than MySQL's way, or visca versa. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Feb 17, 2012 at 7:12 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 2/17/12 12:04 PM, Robert Haas wrote: >> The argument isn't about whether the user made the right design >> choices; it's about whether he should be forced to insert an explicit >> type cast to get the query to do what it is unambiguously intended to >> do. > > I don't find INTEGER LIKE '1%' to be unambiguous. Please propose two reasonable interpretations. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Feb 17, 2012 at 02:52:20PM -0500, Robert Haas wrote: > Here's yet another case where the current rules are thoroughly disagreeable. > > rhaas=# create or replace function z(smallint) returns smallint as > $$select $1+1$$ language sql; > ERROR: return type mismatch in function declared to return smallint > DETAIL: Actual return type is integer. > CONTEXT: SQL function "z" > > So cast the result from an integer to a smallint. What's the big deal? > > But, OK, I'll do it your way: > > rhaas=# create or replace function z(smallint) returns smallint as > $$select $1+1::smallint$$ language sql; > CREATE FUNCTION > rhaas=# select z(1); > ERROR: function z(integer) does not exist > LINE 1: select z(1); > ^ > HINT: No function matches the given name and argument types. You > might need to add explicit type casts. > > Come on, really? Note that the above example works without casts if > you use int *or* bigint *or* numeric, but not smallint. That could be > fixed by causing sufficiently-small integers to lex as smallints, but > if you think implicit datatype coercions are evil, you ought to be > outraged by the fact that we are already going out of our way to blur > the line between int, bigint, and numeric. We let people write 2.0 + > 3 and get 5.0 - surely it's only a short step from there to human > sacrifice, cats and dogs living together... mass hysteria! I mean, > the whole reason for rejecting integer = text is that we aren't sure > whether to coerce the text to an integer or the integer to a string, > and it's better to throw an error than to guess. But in the case of > 2.0 + 3, we feel 100% confident in predicting that the user will be > happy to convert the integer to a numeric rather than the numeric to > an integer, so no error. We do that because we know that the domain > of numeric is a superset of the domain of integer, or in other words, > we are using context clues to deduce what the user probably meant > rather than forcing them to be explicit about it. Is there any general interest in adjusting smallint casting? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes: > On Fri, Feb 17, 2012 at 02:52:20PM -0500, Robert Haas wrote: >> Come on, really? Note that the above example works without casts if >> you use int *or* bigint *or* numeric, but not smallint. That could be >> fixed by causing sufficiently-small integers to lex as smallints, > Is there any general interest in adjusting smallint casting? We tried that once, years ago, and it was a miserable failure: it opened up far too many ambiguities, eg should "int4col + 1" invoke int4pl or int42pl? (That particular case works, because there's an exact match to int42pl, but we found an awful lot of cases where the parser couldn't resolve a best choice. IIRC there were dozens of failures in the regression tests then, and there would be more now.) There's also the problem that if "2 + 2" starts getting parsed as smallint int2pl smallint, cases like "20000 + 20000" will overflow when they didn't before. IMO smallint is a bit too narrow to be a useful general-purpose integer type, so we'd end up wanting int2pl to yield int4 to avoid unexpected overflows --- and that opens up more cans of worms, like which version of f() gets called for f(2+2). It's conceivable that a change in the lexer behavior combined with a massive reorganization of the integer-related operators would bring us to a nicer place than where we are now. But it'd be a lot of work for dubious reward, and it would almost certainly generate a pile of application compatibility problems. Some history: http://archives.postgresql.org/pgsql-hackers/2002-11/msg00468.php http://archives.postgresql.org/pgsql-hackers/2010-09/msg00223.php (A lot of the specific details in the 2002 thread are obsolete now, but the general point remains, I fear.) regards, tom lane
On Mon, Aug 27, 2012 at 04:03:05PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Fri, Feb 17, 2012 at 02:52:20PM -0500, Robert Haas wrote: > >> Come on, really? Note that the above example works without casts if > >> you use int *or* bigint *or* numeric, but not smallint. That could be > >> fixed by causing sufficiently-small integers to lex as smallints, > > > Is there any general interest in adjusting smallint casting? > > We tried that once, years ago, and it was a miserable failure: it opened > up far too many ambiguities, eg should "int4col + 1" invoke int4pl or > int42pl? (That particular case works, because there's an exact match > to int42pl, but we found an awful lot of cases where the parser couldn't > resolve a best choice. IIRC there were dozens of failures in the > regression tests then, and there would be more now.) > > There's also the problem that if "2 + 2" starts getting parsed as > smallint int2pl smallint, cases like "20000 + 20000" will overflow when > they didn't before. IMO smallint is a bit too narrow to be a useful > general-purpose integer type, so we'd end up wanting int2pl to yield > int4 to avoid unexpected overflows --- and that opens up more cans of > worms, like which version of f() gets called for f(2+2). > > It's conceivable that a change in the lexer behavior combined with a > massive reorganization of the integer-related operators would bring us > to a nicer place than where we are now. But it'd be a lot of work for > dubious reward, and it would almost certainly generate a pile of > application compatibility problems. > > Some history: > http://archives.postgresql.org/pgsql-hackers/2002-11/msg00468.php > http://archives.postgresql.org/pgsql-hackers/2010-09/msg00223.php > (A lot of the specific details in the 2002 thread are obsolete now, > but the general point remains, I fear.) Thanks, just asking. Odd int2 is so much harder than int8/numberic casts. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Mon, Aug 27, 2012 at 4:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bruce Momjian <bruce@momjian.us> writes: >> On Fri, Feb 17, 2012 at 02:52:20PM -0500, Robert Haas wrote: >>> Come on, really? Note that the above example works without casts if >>> you use int *or* bigint *or* numeric, but not smallint. That could be >>> fixed by causing sufficiently-small integers to lex as smallints, > >> Is there any general interest in adjusting smallint casting? > > We tried that once, years ago, and it was a miserable failure: it opened > up far too many ambiguities, eg should "int4col + 1" invoke int4pl or > int42pl? (That particular case works, because there's an exact match > to int42pl, but we found an awful lot of cases where the parser couldn't > resolve a best choice. IIRC there were dozens of failures in the > regression tests then, and there would be more now.) > > There's also the problem that if "2 + 2" starts getting parsed as > smallint int2pl smallint, cases like "20000 + 20000" will overflow when > they didn't before. IMO smallint is a bit too narrow to be a useful > general-purpose integer type, so we'd end up wanting int2pl to yield > int4 to avoid unexpected overflows --- and that opens up more cans of > worms, like which version of f() gets called for f(2+2). I agree that redefining the lexer behavior is a can of worms. What I don't understand is why f(2+2) can't call f(smallint) when that's the only extant f. It seems to me that we could do that without breaking anything that works today: if you look for candidates and don't find any, try again, allowing assignment casts the second time. We really ought to put some effort into solving this problem. I've seen a few Oracle-migration talks at conferences, and *every one* of them has mentioned the smallint problem. It hits our customers, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Tom Lane replied: >>> Come on, really? Note that the above example works without casts if >>> you use int *or* bigint *or* numeric, but not smallint. That could be >>> fixed by causing sufficiently-small integers to lex as smallints, >> Is there any general interest in adjusting smallint casting? ... > It's conceivable that a change in the lexer behavior combined with a > massive reorganization of the integer-related operators would bring us > to a nicer place than where we are now. But it'd be a lot of work for > dubious reward, and it would almost certainly generate a pile of > application compatibility problems. Okay, but what about a more targeted solution to the original poster's problem? That seems doable without causing major breakage elsewhere.... - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201208271818 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAlA78m0ACgkQvJuQZxSWSshW2gCg1Xcx5zLORMIDQo2yE6QTLVuD P88AniE9rh4Dojg0o416cWK7cYHWaq0b =NOAR -----END PGP SIGNATURE-----
Robert Haas <robertmhaas@gmail.com> writes: > I agree that redefining the lexer behavior is a can of worms. What I > don't understand is why f(2+2) can't call f(smallint) when that's the > only extant f. It seems to me that we could do that without breaking > anything that works today: if you look for candidates and don't find > any, try again, allowing assignment casts the second time. Yeah, possibly. Where would you fit that in the existing sequence of tests? http://www.postgresql.org/docs/devel/static/typeconv-func.html regards, tom lane
On Mon, Aug 27, 2012 at 7:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I agree that redefining the lexer behavior is a can of worms. What I >> don't understand is why f(2+2) can't call f(smallint) when that's the >> only extant f. It seems to me that we could do that without breaking >> anything that works today: if you look for candidates and don't find >> any, try again, allowing assignment casts the second time. > > Yeah, possibly. Where would you fit that in the existing sequence of > tests? > http://www.postgresql.org/docs/devel/static/typeconv-func.html I think: If step 4a would result in discarding all candidates, then instead discard candidate functions for which the input types do not match and cannot be converted -- using an ASSIGNMENT conversion -- to match. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Aug 27, 2012 at 10:03 PM, Robert Haas <robertmhaas@gmail.com> wrote: > We really ought to put some effort into solving this problem. I've > seen a few Oracle-migration talks at conferences, and *every one* of > them has mentioned the smallint problem. It hits our customers, too. I'm kind of puzzled how Oracle-migration talks talk about a smallint problem given that Oracle only has NUMBER which is a variable-sized data type. Why would Oracle people end up with an f(smallint) at all? Perhaps just a warning on CREATE FUNCTION when one of the arguments doesn't have an implicit cast from the canonical data type of that hierarchy saying perhaps you should consider using that data type and let Postgres convert instead of the more specific data type? -- greg
Greg Stark <stark@mit.edu> writes: > Perhaps just a warning on CREATE FUNCTION when one of the arguments > doesn't have an implicit cast from the canonical data type of that > hierarchy saying perhaps you should consider using that data type and > let Postgres convert instead of the more specific data type? This might be a good idea if we could write such a test in a principled way, but I'm not seeing how. We don't really have a concept of "canonical data types". Also, right at the moment it's not clear to me whether there are any other cases besides integer literal vs smallint argument. I think that's the only particularly surprising case within the numeric hierarchy --- and for non-numeric types, the literal is generally going to start out "unknown" so the whole problem doesn't arise. I feel uncomfortable trying to invent general-purpose solutions to problems we have only one instance of ... regards, tom lane
On Tue, Aug 28, 2012 at 1:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Also, right at the moment it's not clear to me whether there are any > other cases besides integer literal vs smallint argument. I think > that's the only particularly surprising case within the numeric > hierarchy --- and for non-numeric types, the literal is generally going > to start out "unknown" so the whole problem doesn't arise. I feel > uncomfortable trying to invent general-purpose solutions to problems > we have only one instance of ... The other case that comes up regularly is someone trying to pass some kind of number to a function such as LPAD(). There is only one LPAD() so no ambiguity exists, but PostgreSQL doesn't even see that there's a candidate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Aug 28, 2012 at 1:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Also, right at the moment it's not clear to me whether there are any >> other cases besides integer literal vs smallint argument. I think >> that's the only particularly surprising case within the numeric >> hierarchy --- and for non-numeric types, the literal is generally going >> to start out "unknown" so the whole problem doesn't arise. I feel >> uncomfortable trying to invent general-purpose solutions to problems >> we have only one instance of ... > The other case that comes up regularly is someone trying to pass some > kind of number to a function such as LPAD(). There is only one LPAD() > so no ambiguity exists, but PostgreSQL doesn't even see that there's a > candidate. There still won't be a candidate for that one, unless you're proposing to allow explicit-only coercions to be applied implicitly. regards, tom lane
On 8/27/12 5:19 PM, Greg Sabino Mullane wrote: > Tom Lane replied: >>>> >>>Come on, really? Note that the above example works without casts if >>>> >>>you use int*or* bigint*or* numeric, but not smallint. That could be >>>> >>>fixed by causing sufficiently-small integers to lex as smallints, >>> >>Is there any general interest in adjusting smallint casting? > ... >> >It's conceivable that a change in the lexer behavior combined with a >> >massive reorganization of the integer-related operators would bring us >> >to a nicer place than where we are now. But it'd be a lot of work for >> >dubious reward, and it would almost certainly generate a pile of >> >application compatibility problems. > Okay, but what about a more targeted solution to the original > poster's problem? That seems doable without causing major > breakage elsewhere.... FWIW, this causes problems for me at work as well. For the case of const op const instead of trying to small-cast the 2 constants, would it be possible to large-cast them, perform the operation, and thenre-cast the results of the operation? ISTM that would solve the operator issue (ie, the case of 20000+20000). -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Tue, Aug 28, 2012 at 1:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The other case that comes up regularly is someone trying to pass some >> kind of number to a function such as LPAD(). There is only one LPAD() >> so no ambiguity exists, but PostgreSQL doesn't even see that there's a >> candidate. > > There still won't be a candidate for that one, unless you're proposing > to allow explicit-only coercions to be applied implicitly. OK, I'm confused. rhaas=# create table foo (a text); CREATE TABLE rhaas=# insert into foo values (12345); INSERT 0 1 There are no pg_cast entries for integer -> text, but this still gets treated as an assignment cast because of some special-case logic in find_coercion_pathway(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Aug 28, 2012 at 1:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> There still won't be a candidate for that one, unless you're proposing >> to allow explicit-only coercions to be applied implicitly. > [ not so, see kluge in find_coercion_pathway() ] Oh, I'd forgotten that worked that way. Frankly, that makes me quite a bit more concerned about this proposal than I was before. I do *not* want to re-introduce silent cross-category casts to text, not even if there's no other way to match the function/operator. I think that hack was/is tolerable for actual assignment to a table column, because there is very little chance that the semantics of such an assignment will come out differently than the user expected. This is not the case when you're matching to potentially overloaded functions or operators, though. If we go down this route we're going to find ourselves back in the badlands of timestamps sometimes being compared as though they were strings, and all the other sillinesses that we got rid of in 8.3. I got beat up enough already for taking those toys away from people; I'm not looking forward to having to have another round of it in the future. I could see doing what you suggest as long as we exclude the automatic-coerce-via-IO case. regards, tom lane
On Tue, Aug 28, 2012 at 2:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Oh, I'd forgotten that worked that way. Frankly, that makes me quite a > bit more concerned about this proposal than I was before. I do *not* > want to re-introduce silent cross-category casts to text, not even if > there's no other way to match the function/operator. I think that hack > was/is tolerable for actual assignment to a table column, because there > is very little chance that the semantics of such an assignment will come > out differently than the user expected. Well, I think that when there is only one LPAD function, there is also very little chance that the results will come out differently than the user expected. I'm having a hard time seeing a bright line between those two cases. Remember, I'm not proposing that we try to guess between more alternatives than we're already trying to guess between - only that we do something other than fail outright in situations where we currently do. The changes we made in 8.3 broke a bunch of cases that were actually ambiguous. That was painful, but probably for the best. What wasn't, in my opinion, for the best was that we also broke a lot of cases - including this one - that were by no means ambiguous. In fact, I believe that every place that I had to fix my application code actually fell into the latter category: there was no actual ambiguity, but I had to go back and insert a cast anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Aug 28, 2012 at 2:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Oh, I'd forgotten that worked that way. Frankly, that makes me quite a >> bit more concerned about this proposal than I was before. I do *not* >> want to re-introduce silent cross-category casts to text, not even if >> there's no other way to match the function/operator. I think that hack >> was/is tolerable for actual assignment to a table column, because there >> is very little chance that the semantics of such an assignment will come >> out differently than the user expected. > Well, I think that when there is only one LPAD function, there is also > very little chance that the results will come out differently than the > user expected. [ shrug... ] I'm having a hard time resisting the temptation to point out that there are two. The real point here though is that the proposed behavior change will affect all functions, not only the cases where you think there is only one sane behavior. And features such as search paths and default parameters frequently mean that there are more potential matches than the user thought of while writing the query. In the end, SQL is a fairly strongly typed language, especially in our manifestation of it. I don't think we should give that up, especially not for benefits as dubious as not having to write a cast to make it clear that yes you really do want a timestamp to be treated as text. IMO, saving people from the errors that inevitably arise from that sort of sloppy thinking is a benefit, not a cost, of having a typed language. regards, tom lane
On 08/29/2012 01:32 AM, Robert Haas wrote: > On Tue, Aug 28, 2012 at 1:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Also, right at the moment it's not clear to me whether there are any >> other cases besides integer literal vs smallint argument. I think >> that's the only particularly surprising case within the numeric >> hierarchy --- and for non-numeric types, the literal is generally going >> to start out "unknown" so the whole problem doesn't arise. I feel >> uncomfortable trying to invent general-purpose solutions to problems >> we have only one instance of ... > > The other case that comes up regularly is someone trying to pass some > kind of number to a function such as LPAD(). There is only one LPAD() > so no ambiguity exists, but PostgreSQL doesn't even see that there's a > candidate. Allowing Pg to assign parameters or fields by using the normally-only-explicit casts where no ambiguity exists would be *really* helpful in other areas, too. In particular, this applies with assignment of fields from `text' input, too. PostgreSQL can be incredibly frustrating to work with from Java/JDBC where everything goes through protocol-level parameterised statements, because you can't use Java `String' types via PreparedStatement.setString() to assign to, say, an `xml' or `json' field, you have to use `setObject()'. That's OK (ish) when working with PgJDBC directly, but it breaks code that expects this to work like it does in other databases where setString(...) can be used to assign to anything that's castable from varchar. Pg doesn't allow `unknown' to be passed as the type of a parameterised statement, so the JDBC driver can't work around this by passing such entries as fields of "unknown" type and letting the server work it out. It'd instead have to ask the server "what are the permissible types for the placeholder $1 in this query" ... which AFAIK isn't possible, and would require extra round trips too. I currently work around this by creating additional implicit casts where I need them, eg text->xml, text->json. It'd be lovely not to have to do that, though. -- Craig Ringer
On Tue, Aug 28, 2012 at 4:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Well, I think that when there is only one LPAD function, there is also >> very little chance that the results will come out differently than the >> user expected. > > [ shrug... ] I'm having a hard time resisting the temptation to point > out that there are two. Fine, but as they have different numbers of arguments it has no bearing on the point at hand, which is that right now it is very easy to write a call that matches unexpectedly fails to match either one. > The real point here though is that the proposed > behavior change will affect all functions, not only the cases where you > think there is only one sane behavior. And features such as search paths > and default parameters frequently mean that there are more potential > matches than the user thought of while writing the query. I'm totally unpersuaded by this argument. I have yet to run into a customer who defined multiple functions with the same name and then complained because we called the wrong one, or even because we threw an error instead of just picking one. I have run into MANY customers who have been forced to insert typecasts into applications to work around our willingness to consider calling the only plausible candidate function or operator. Now some of this is no doubt because we have very few customers running on pre-8.3 releases (woohoo!), but that's exactly the point: the bad old days when you could break your application by accidentally invoking the wrong function are gone. That problem is dead. What we ought to be focusing on now is fixing the collateral damage. > In the end, SQL is a fairly strongly typed language, especially in our > manifestation of it. I don't think we should give that up, especially > not for benefits as dubious as not having to write a cast to make it > clear that yes you really do want a timestamp to be treated as text. > IMO, saving people from the errors that inevitably arise from that sort > of sloppy thinking is a benefit, not a cost, of having a typed language. The benefit is that it allows us to be compatible with other SQL systems. If PostgreSQL were the only database in the world, I might agree with you, but it isn't: it's just the only one that requires you to insert this many casts. It's hard to accept the argument that there's no sensible alternative when other people have clearly found something that works for them and their users. We can dig in our heels and insist we know better, but what does that do other than drive away users? For most people, the database is just a tool, and they want it to work with a minimum of fuss, not force them to jump through unexpected and unwelcome hoops. Again, if there's real ambiguity then that is one thing, but what I'm proposing does not change the behavior in any case we currently consider ambiguous. I don't know of any other programming language or system where it is considered a virtue to force the user to inject unnecessary decorations into their code. Indeed, many systems go to quite some lengths to minimize the amount of such decoration that is required. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Aug 28, 2012 at 4:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The real point here though is that the proposed >> behavior change will affect all functions, not only the cases where you >> think there is only one sane behavior. And features such as search paths >> and default parameters frequently mean that there are more potential >> matches than the user thought of while writing the query. > I'm totally unpersuaded by this argument. I have yet to run into a > customer who defined multiple functions with the same name and then > complained because we called the wrong one, or even because we threw > an error instead of just picking one. That argument would hold water if we got rid of every single usage of overloading in the system-defined operators/functions, which as you well know is not an attractive idea. Since that's not going to happen, arguing for this on the basis that your customers don't overload function names is missing the point. Any loosening of the rules is going to create issues for system-function resolution ... unless you're going to propose that we somehow do this differently for user and system defined functions. > I have run into MANY customers > who have been forced to insert typecasts into applications to work > around our willingness to consider calling the only plausible > candidate function or operator. Now some of this is no doubt because > we have very few customers running on pre-8.3 releases (woohoo!), but > that's exactly the point: the bad old days when you could break your > application by accidentally invoking the wrong function are gone. > That problem is dead. The reason it's dead is that we killed it in 8.3. I don't want it coming back to life, but I think that that will be exactly the outcome if we let any implicit casts to text get back into the rules for operator/function overloading resolution. An example of the sort of problem that I don't want to hear about ever again is somebody trying to use max() on a "point" column. We don't have linear sort ordering for points, so this is nonsensical and should draw an error. Which it does, today. With your proposal, the system would silently use max(pointcol::text), producing results that might even look plausible if the user wasn't paying too much attention. If that's the behavior the user actually wants, fine: let him say so with an explicit cast to text. But I don't want the system trapping users into such hard-to-find errors because we are so focused on mysql compatibility that we let people omit conceptually-critical casts in the name of ease of use. > For most people, the database is just a tool, and > they want it to work with a minimum of fuss, not force them to jump > through unexpected and unwelcome hoops. Again, if there's real > ambiguity then that is one thing, but what I'm proposing does not > change the behavior in any case we currently consider ambiguous. I > don't know of any other programming language or system where it is > considered a virtue to force the user to inject unnecessary > decorations into their code. Really? You've not had experience with very many programming languages, then. Just about every one I've ever dealt with that's at a higher conceptual level than C or BASIC *is* sticky about this sort of thing. regards, tom lane
I wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> That problem is dead. > The reason it's dead is that we killed it in 8.3. I don't want it > coming back to life, but I think that that will be exactly the outcome > if we let any implicit casts to text get back into the rules for > operator/function overloading resolution. To put some concreteness into what so far has been a pretty hand-wavy discussion, I experimented with the attached patch. I'm not sure that it exactly corresponds to what you proposed, but I think this is the only place the consideration could be injected without a substantial amount of code rearrangement. This results in half a dozen regression test failures (see second attachment), which mostly consist of "function/operator does not exist" errors changing to "function/operator is not unique". I've not looked into exactly why each one happens --- possibly the code is now finding multiple string-category matches where before it found none. But it definitely illustrates my point that this would not be without surprises. Oh, one more thing: regression=# select lpad(42,8); ERROR: failed to find conversion function from integer to text so this doesn't actually solve the problem you want to solve. I'm not sure why that's happening, either, but evidently some additional coercion laxity would required. regards, tom lane diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c index 2b1a13a..feac9f9 100644 *** a/src/backend/parser/parse_func.c --- b/src/backend/parser/parse_func.c *************** func_match_argtypes(int nargs, *** 555,560 **** --- 555,578 ---- } } + if (ncandidates == 0) + { + /* try again with assignment rules */ + for (current_candidate = raw_candidates; + current_candidate != NULL; + current_candidate = next_candidate) + { + next_candidate = current_candidate->next; + if (can_coerce_type(nargs, input_typeids, current_candidate->args, + COERCION_ASSIGNMENT)) + { + current_candidate->next = *candidates; + *candidates = current_candidate; + ncandidates++; + } + } + } + return ncandidates; } /* func_match_argtypes() */ *** /home/tgl/pgsql/src/test/regress/expected/text.out Tue Jul 12 18:56:58 2011 --- /home/tgl/pgsql/src/test/regress/results/text.out Wed Aug 29 00:08:45 2012 *************** *** 26,35 **** -- As of 8.3 we have removed most implicit casts to text, so that for example -- this no longer works: select length(42); ! ERROR: function length(integer) does not exist LINE 1: select length(42); ^ ! HINT: No function matches the given name and argument types. You might need to add explicit type casts. -- But as a special exception for usability's sake, we still allow implicit -- casting to text in concatenations, so long as the other input is text or -- an unknown literal. So these work: --- 26,35 ---- -- As of 8.3 we have removed most implicit casts to text, so that for example -- this no longer works: select length(42); ! ERROR: function length(integer) is not unique LINE 1: select length(42); ^ ! HINT: Could not choose a best candidate function. You might need to add explicit type casts. -- But as a special exception for usability's sake, we still allow implicit -- casting to text in concatenations, so long as the other input is text or -- an unknown literal. So these work: *************** *** 47,56 **** -- but not this: select 3 || 4.0; ! ERROR: operator does not exist: integer || numeric LINE 1: select 3 || 4.0; ^ ! HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. /* * various string functions */ --- 47,56 ---- -- but not this: select 3 || 4.0; ! ERROR: operator is not unique: integer || numeric LINE 1: select 3 || 4.0; ^ ! HINT: Could not choose a best candidate operator. You might need to add explicit type casts. /* * various string functions */ ====================================================================== *** /home/tgl/pgsql/src/test/regress/expected/errors.out Thu Jan 26 17:29:22 2012 --- /home/tgl/pgsql/src/test/regress/results/errors.out Wed Aug 29 00:08:52 2012 *************** *** 126,132 **** stype = int4, finalfunc = int2um, initcond = '0'); ! ERROR: function int2um(integer) does not exist -- left out basetype create aggregate newcnt1 (sfunc = int4inc, stype = int4, --- 126,132 ---- stype = int4, finalfunc = int2um, initcond = '0'); ! ERROR: function int2um(smallint) requires run-time type coercion -- left out basetype create aggregate newcnt1 (sfunc = int4inc, stype = int4, ====================================================================== *** /home/tgl/pgsql/src/test/regress/expected/alter_table.out Fri Jul 27 17:27:42 2012 --- /home/tgl/pgsql/src/test/regress/results/alter_table.out Wed Aug 29 00:09:02 2012 *************** *** 1705,1712 **** alter table anothertab alter column atcol1 drop default; alter table anothertab alter column atcol1 type boolean using case when atcol1 % 2 = 0 then true else false end; -- fails ! ERROR: operator does not exist: boolean <= integer ! HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. alter table anothertab drop constraint anothertab_chk; alter table anothertab drop constraint anothertab_chk; -- fails ERROR: constraint "anothertab_chk" of relation "anothertab" does not exist --- 1705,1712 ---- alter table anothertab alter column atcol1 drop default; alter table anothertab alter column atcol1 type boolean using case when atcol1 % 2 = 0 then true else false end; -- fails ! ERROR: operator is not unique: boolean <= integer ! HINT: Could not choose a best candidate operator. You might need to add explicit type casts. alter table anothertab drop constraint anothertab_chk; alter table anothertab drop constraint anothertab_chk; -- fails ERROR: constraint "anothertab_chk" of relation "anothertab" does not exist ====================================================================== *** /home/tgl/pgsql/src/test/regress/expected/rowtypes.out Mon Jul 9 10:27:28 2012 --- /home/tgl/pgsql/src/test/regress/results/rowtypes.out Wed Aug 29 00:09:03 2012 *************** *** 348,357 **** (0 rows) select text(fullname) from fullname; -- error ! ERROR: function text(fullname) does not exist LINE 1: select text(fullname) from fullname; ^ ! HINT: No function matches the given name and argument types. You might need to add explicit type casts. select fullname.text from fullname; -- error ERROR: column fullname.text does not exist LINE 1: select fullname.text from fullname; --- 348,357 ---- (0 rows) select text(fullname) from fullname; -- error ! ERROR: function text(fullname) is not unique LINE 1: select text(fullname) from fullname; ^ ! HINT: Could not choose a best candidate function. You might need to add explicit type casts. select fullname.text from fullname; -- error ERROR: column fullname.text does not exist LINE 1: select fullname.text from fullname; *************** *** 370,379 **** (1 row) select text(row('Jim', 'Beam')); -- error ! ERROR: function text(record) does not exist LINE 1: select text(row('Jim', 'Beam')); ^ ! HINT: No function matches the given name and argument types. You might need to add explicit type casts. select (row('Jim', 'Beam')).text; -- error ERROR: could not identify column "text" in record data type LINE 1: select (row('Jim', 'Beam')).text; --- 370,379 ---- (1 row) select text(row('Jim', 'Beam')); -- error ! ERROR: function text(record) is not unique LINE 1: select text(row('Jim', 'Beam')); ^ ! HINT: Could not choose a best candidate function. You might need to add explicit type casts. select (row('Jim', 'Beam')).text; -- error ERROR: could not identify column "text" in record data type LINE 1: select (row('Jim', 'Beam')).text; ======================================================================
On Tue, Aug 28, 2012 at 9:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:[ shrug... ] I'm having a hard time resisting the temptation to point
> On Tue, Aug 28, 2012 at 2:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Oh, I'd forgotten that worked that way. Frankly, that makes me quite a
>> bit more concerned about this proposal than I was before. I do *not*
>> want to re-introduce silent cross-category casts to text, not even if
>> there's no other way to match the function/operator. I think that hack
>> was/is tolerable for actual assignment to a table column, because there
>> is very little chance that the semantics of such an assignment will come
>> out differently than the user expected.
> Well, I think that when there is only one LPAD function, there is also
> very little chance that the results will come out differently than the
> user expected.
out that there are two. The real point here though is that the proposed
behavior change will affect all functions, not only the cases where you
think there is only one sane behavior. And features such as search paths
and default parameters frequently mean that there are more potential
matches than the user thought of while writing the query.
In the end, SQL is a fairly strongly typed language, especially in our
manifestation of it. I don't think we should give that up, especially
not for benefits as dubious as not having to write a cast to make it
clear that yes you really do want a timestamp to be treated as text.
IMO, saving people from the errors that inevitably arise from that sort
of sloppy thinking is a benefit, not a cost, of having a typed language.
regards, tom lane
+a very big number
I remember the pain we had when we upgraded from 8.1 to 8.4, but I also distinctly remember that after the upgrade I was a little bit more confident that our SQL code does the right thing. But we are a OCaml shop if there is one thing we believe in with ferocity it is that a STRICT type checker is a good thing (TM). You pay a little verbosity tax but in return all the "stupid" little obvious bugs get caught and maybe even more importantly when you later change your types the system are forced to reconsider all cases where you used the value of (now different) type (and that is A VERY GOOD THING in a big code base). Admittedly we are not there yet in Postgres as functions are only (re)checked upon execution.
My 2cents,
Bene
On Tue, Aug 28, 2012 at 11:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > That argument would hold water if we got rid of every single usage of > overloading in the system-defined operators/functions, which as you well > know is not an attractive idea. Since that's not going to happen, > arguing for this on the basis that your customers don't overload > function names is missing the point. Any loosening of the rules is > going to create issues for system-function resolution ... unless you're > going to propose that we somehow do this differently for user and system > defined functions. Obviously not. > An example of the sort of problem that I don't want to hear about > ever again is somebody trying to use max() on a "point" column. > We don't have linear sort ordering for points, so this is nonsensical > and should draw an error. Which it does, today. Much as I hate to say it, I have to admit I find this to be a compelling argument. > Really? You've not had experience with very many programming languages, > then. Just about every one I've ever dealt with that's at a higher > conceptual level than C or BASIC *is* sticky about this sort of thing. In terms of type-strictness, it runs the gamut. You have things like Perl where datatypes barely exist at all and silent (sometimes confusing) conversions are performed nary a second thought, and at the other end of the spectrum you have things like ML which are incredibly fanatic about type-checking. But both Perl and ML and, as far as I know, most of what's in between make a virtue of terseness. The exceptions are things like Ada and Cobol, which are not IMHO the sorts of thing we ought to be trying to emulate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Aug 29, 2012 at 12:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > To put some concreteness into what so far has been a pretty hand-wavy > discussion, I experimented with the attached patch. I'm not sure that > it exactly corresponds to what you proposed, but I think this is the > only place the consideration could be injected without a substantial > amount of code rearrangement. Yeah, this is what I was thinking of. > This results in half a dozen regression > test failures (see second attachment), which mostly consist of > "function/operator does not exist" errors changing to "function/operator > is not unique". I've not looked into exactly why each one happens --- > possibly the code is now finding multiple string-category matches where > before it found none. But it definitely illustrates my point that this > would not be without surprises. Well, the good news is that nothing fails that would have succeeded before, or for that matter visca versa. But after playing around with it a little, I agree that there's danger lurking. The fact that length(42) fails due to the ambiguity between length(text) and length(bpchar) is mostly happy coincidence. It's hard to get excited about the possibility of that managing to return "2". The situation with || is even worse. If I remove textanycat and anytextcat on the theory that textcat itself ought to be enough under the new rules, then a whole bunch of regression test failures occur because we end up bogusly matching the array concatenation operator somehow, and fail to interpret an unknown literal as an array (ouch!). The upshot here appears to be that we're kind of schizophrenic about what we want. With things like text || anyelement, anyelement || text, and concat(variadic "any") we are basically asserting that we want to treat anything that we don't recognize as a string. But then we have other functions (like max and length) where we don't want that behavior. I suppose that more than anything this is based on a perception that || won't be ambiguous (though whether that perception is entirely correct is debatable, given the array-related meanings of that operator) but there might be more than one possible sense for length() or max(). Is there any principled way of distinguishing these cases, or even a rule for what we ought to do by hand in future cases of this type, or is it totally arbitrary? > regression=# select lpad(42,8); > ERROR: failed to find conversion function from integer to text > > so this doesn't actually solve the problem you want to solve. > I'm not sure why that's happening, either, but evidently some > additional coercion laxity would required. This, however, is a trivial problem; make_fn_arguments just didn't get the memo that it might now need to look for assignment casts. See attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
<div class="moz-cite-prefix">On 29/08/12 23:34, Robert Haas wrote:<br /></div><blockquote cite="mid:CA+TgmoaujOxBeKOz16SEKXT5vjc7A=N-F6BfT-pa+P55aoVrFQ@mail.gmail.com"type="cite"><pre wrap="">On Wed, Aug 29, 2012at 12:27 AM, Tom Lane <a class="moz-txt-link-rfc2396E" href="mailto:tgl@sss.pgh.pa.us"><tgl@sss.pgh.pa.us></a>wrote: </pre><blockquote type="cite"><pre wrap="">To put some concreteness into what so far has been a pretty hand-wavy discussion, I experimented with the attached patch. I'm not sure that it exactly corresponds to what you proposed, but I think this is the only place the consideration could be injected without a substantial amount of code rearrangement. </pre></blockquote><pre wrap=""> Yeah, this is what I was thinking of. </pre><blockquote type="cite"><pre wrap="">This results in half a dozen regression test failures (see second attachment), which mostly consist of "function/operator does not exist" errors changing to "function/operator is not unique". I've not looked into exactly why each one happens --- possibly the code is now finding multiple string-category matches where before it found none. But it definitely illustrates my point that this would not be without surprises. </pre></blockquote><pre wrap=""> Well, the good news is that nothing fails that would have succeeded before, or for that matter visca versa. But after playing around with it a little, I agree that there's danger lurking. The fact that length(42) fails due to the ambiguity between length(text) and length(bpchar) is mostly happy coincidence. It's hard to get excited about the possibility of that managing to return "2". The situation with || is even worse. If I remove textanycat and anytextcat on the theory that textcat itself ought to be enough under the new rules, then a whole bunch of regression test failures occur because we end up bogusly matching the array concatenation operator somehow, and fail to interpret an unknown literal as an array (ouch!). The upshot here appears to be that we're kind of schizophrenic about what we want. With things like text || anyelement, anyelement || text, and concat(variadic "any") we are basically asserting that we want to treat anything that we don't recognize as a string. But then we have other functions (like max and length) where we don't want that behavior. I suppose that more than anything this is based on a perception that || won't be ambiguous (though whether that perception is entirely correct is debatable, given the array-related meanings of that operator) but there might be more than one possible sense for length() or max(). Is there any principled way of distinguishing these cases, or even a rule for what we ought to do by hand in future cases of this type, or is it totally arbitrary? </pre><blockquote type="cite"><pre wrap="">regression=# select lpad(42,8); ERROR: failed to find conversion function from integer to text so this doesn't actually solve the problem you want to solve. I'm not sure why that's happening, either, but evidently some additional coercion laxity would required. </pre></blockquote><pre wrap=""> This, however, is a trivial problem; make_fn_arguments just didn't get the memo that it might now need to look for assignment casts. See attached. </pre><br /><fieldset class="mimeAttachmentHeader"></fieldset><br /><pre wrap=""> </pre></blockquote> You realize of course, that '42' is the answer to Life, the Universe, and Everything? :-)<br /><br /><br/> Cheers,<br /> Gavin<br />
Robert Haas <robertmhaas@gmail.com> writes: > The upshot here appears to be that we're kind of schizophrenic about > what we want. With things like text || anyelement, anyelement || > text, and concat(variadic "any") we are basically asserting that we > want to treat anything that we don't recognize as a string. But then > we have other functions (like max and length) where we don't want that > behavior. I suppose that more than anything this is based on a > perception that || won't be ambiguous (though whether that perception > is entirely correct is debatable, given the array-related meanings of > that operator) but there might be more than one possible sense for > length() or max(). Is there any principled way of distinguishing > these cases, or even a rule for what we ought to do by hand in future > cases of this type, or is it totally arbitrary? I would not claim that the situation around || is principled in any fashion. Rather, || was identified as being a key pain point for the removal of implicit-casts-to-text during 8.3 development, and we agreed we would reduce the pain by adding operators that effectively reintroduced the implicit casts *for that one operator only*. I felt that was still a big step forward compared to implicit casts everywhere. But if we'd been doing this in a green field, I doubt that you'd see text || anyelement or anyelement || text in there. I would vote against introducing more such things in future, on the grounds that there would be no backwards-compatibility argument for it. As for the concat() function, IMO it's an ugly kluge. But again, it's not introducing any global behavior that might have side-effects on the resolution of other function or operator names. >> regression=# select lpad(42,8); >> ERROR: failed to find conversion function from integer to text >> I'm not sure why that's happening, either, but evidently some >> additional coercion laxity would required. > This, however, is a trivial problem; make_fn_arguments just didn't get > the memo that it might now need to look for assignment casts. See > attached. Meh. I'm a bit worried about whether that might have unexpected consequences, too. regards, tom lane
On 8/29/12 11:40 AM, Tom Lane wrote: >>> regression=# select lpad(42,8); >>> >>ERROR: failed to find conversion function from integer to text >>> >>I'm not sure why that's happening, either, but evidently some >>> >>additional coercion laxity would required. >> >This, however, is a trivial problem; make_fn_arguments just didn't get >> >the memo that it might now need to look for assignment casts. See >> >attached. > Meh. I'm a bit worried about whether that might have unexpected > consequences, too. We are not seriously intending to make the above query work, are we?
Peter Eisentraut <peter_e@gmx.net> writes: > On 8/29/12 11:40 AM, Tom Lane wrote: >> regression=# select lpad(42,8); > We are not seriously intending to make the above query work, are we? Well, *I* don't want to, but apparently Robert does. I don't really want to go much further than finding a way to handle the "integer constant passed to smallint function argument" case. Maybe we should consider a narrow fix for that rather than opening up the general assignment-cast scenario. regards, tom lane
On Wed, Aug 29, 2012 at 3:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> On 8/29/12 11:40 AM, Tom Lane wrote: >>> regression=# select lpad(42,8); > >> We are not seriously intending to make the above query work, are we? > > Well, *I* don't want to, but apparently Robert does. > > I don't really want to go much further than finding a way to handle the > "integer constant passed to smallint function argument" case. Maybe we > should consider a narrow fix for that rather than opening up the general > assignment-cast scenario. If we could just do that, it would be a huge improvement. I'm not very sanguine about the possibility of a clean fix in the lexer, but maybe there is some other tweak to the system that would make it work. On the more general issue, I continue to see minimal risk of harm in allowing things like LPAD() to implicitly cast the first argument to text. I wrote code that did that for years (to pad numeric fields out to a specific width by adding leading zeros) and until I upgraded to 8.3 it caused me no problems. I knew what I meant, and so did the database, and we were both happy. The argument seems to be that we shouldn't have been happy, but we were. Some of the other examples Tom mentions are, indeed, icky, and I don't know what to do about that, but LPAD() does indeed seem pretty harmless to me. And, on a more pragmatic level, as long as I work for a company that helps people migrate from other database systems, I'm not going to be able to stop caring about this issue even in cases where I don't personally think implicit casting is a good idea, because other people who are not me have tens of thousands of lines of procedural code written for those other systems and if you tell them they've got to go through and add hundreds or thousands of casts before they can migrate, it tends to turn them off. Maybe there's no perfect solution to that problem, but the status quo is definitely not perfect either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On the more general issue, I continue to see minimal risk of harm in > allowing things like LPAD() to implicitly cast the first argument to > text. Well, I see your point about LPAD(), but the problem is how to tell the difference between a harmless cast omission and an actual mistake that the user will be very grateful if we point out. If we allow implicit casts to text in the general case in function/operator calls, we are definitely going to re-introduce a lot of room for mistakes. Upthread you were complaining about how we'd reject calls even when there was only one possible interpretation. I wonder whether there'd be any value in taking that literally: that is, allow use of assignment rules when there is, in fact, exactly one function with the right number of parameters visible in the search path. This would solve the LPAD() problem (at least as stated), and probably many other practical cases too, since I admit your point that an awful lot of users do not use function overloading. The max() example I mentioned earlier would not get broken since there's more than one max(), and in general it seems likely that cases where there's a real risk would involve overloaded names. The main downside I can see is that code that used to work is likely to stop working as soon as someone creates a potential overloading situation. Worse, the error message could be pretty confusing, since if you had been successfully calling f(smallint) with f(42), you'd get "f(integer) does not exist", not something like "f() is ambiguous", after adding f(float8) to the mix. This seems related to the confusing changes in regression test cases that I got in my experiments yesterday. This may be sufficient reason to reject the idea, since the very last thing we need in this area is any degradation in the relevance of the error messages. > ... as long as I work for a company that helps > people migrate from other database systems, I'm not going to be able > to stop caring about this issue even in cases where I don't personally > think implicit casting is a good idea, because other people who are > not me have tens of thousands of lines of procedural code written for > those other systems and if you tell them they've got to go through and > add hundreds or thousands of casts before they can migrate, it tends > to turn them off. Maybe there's no perfect solution to that problem, > but the status quo is definitely not perfect either. Meh. I tend to think that a better solution to those folks' problem is a package of add-on casts that they could install for use with their legacy code; not dumbing down the system's error detection capability for everyone. Peter's original try at re-adding implicit text casts in that way didn't work very well IIRC, but maybe we could try harder. regards, tom lane
On Wed, Aug 29, 2012 at 06:39:37PM -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On the more general issue, I continue to see minimal risk of harm > > in allowing things like LPAD() to implicitly cast the first > > argument to text. > > Well, I see your point about LPAD(), but the problem is how to tell > the difference between a harmless cast omission and an actual > mistake that the user will be very grateful if we point out. If we > allow implicit casts to text in the general case in > function/operator calls, we are definitely going to re-introduce a > lot of room for mistakes. > > Upthread you were complaining about how we'd reject calls even when > there was only one possible interpretation. I wonder whether > there'd be any value in taking that literally: that is, allow use of > assignment rules when there is, in fact, exactly one function with > the right number of parameters visible in the search path. +1 for this. > The main downside I can see is that code that used to work is likely > to stop working as soon as someone creates a potential overloading > situation. Worse, the error message could be pretty confusing, > since if you had been successfully calling f(smallint) with f(42), > you'd get "f(integer) does not exist", not something like "f() is > ambiguous", after adding f(float8) to the mix. This seems related > to the confusing changes in regression test cases that I got in my > experiments yesterday. This may be sufficient reason to reject the > idea, since the very last thing we need in this area is any > degradation in the relevance of the error messages. With the ANY* parameters introduced in the past few versions, there's a lot less incentive to create this problem. The trick here is documenting the ANY* parameters in enough places to make sure that incentive is reduced. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, Aug 29, 2012 at 6:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Well, I see your point about LPAD(), but the problem is how to tell > the difference between a harmless cast omission and an actual mistake > that the user will be very grateful if we point out. If we allow > implicit casts to text in the general case in function/operator calls, > we are definitely going to re-introduce a lot of room for mistakes. I concede that point. :-) > Upthread you were complaining about how we'd reject calls even when > there was only one possible interpretation. I wonder whether there'd be > any value in taking that literally: that is, allow use of assignment > rules when there is, in fact, exactly one function with the right number > of parameters visible in the search path. This would solve the LPAD() > problem (at least as stated), and probably many other practical cases > too, since I admit your point that an awful lot of users do not use > function overloading. The max() example I mentioned earlier would not > get broken since there's more than one max(), and in general it seems > likely that cases where there's a real risk would involve overloaded > names. That's an interesting idea. I like it. > The main downside I can see is that code that used to work is likely > to stop working as soon as someone creates a potential overloading > situation. Worse, the error message could be pretty confusing, since > if you had been successfully calling f(smallint) with f(42), you'd get > "f(integer) does not exist", not something like "f() is ambiguous", > after adding f(float8) to the mix. This seems related to the confusing > changes in regression test cases that I got in my experiments yesterday. One thought I had when looking at those messages was that, in some ways, the new messages were actually less confusing than the old messages. I mean, if you try to call f(42) and you get f(integer) does not exist, ok, you'll probably figure out that the issue is with the argument type, since you most likely know that an f of some type does in fact exist. But it would be even more clear if the error message said, ok, so there is an f, but I'm not going to call it because the argument types don't match closely enough. The distinction would be even more useful if the function happens to be called snuffleupagus rather than f, because then when you call snufleupagus(42.0), it'll tell you "i know nothing about a function by that name" whereas when you call snuffleupagus(42) it'll tell you "i know about a function by that name, but not with those argument types". I've certainly encountered this confusion before whilst debugging my own and other people's databases: is it giving me that error because the function doesn't exist, or because of an argument type mismatch? > This may be sufficient reason to reject the idea, since the very last > thing we need in this area is any degradation in the relevance of the > error messages. > >> ... as long as I work for a company that helps >> people migrate from other database systems, I'm not going to be able >> to stop caring about this issue even in cases where I don't personally >> think implicit casting is a good idea, because other people who are >> not me have tens of thousands of lines of procedural code written for >> those other systems and if you tell them they've got to go through and >> add hundreds or thousands of casts before they can migrate, it tends >> to turn them off. Maybe there's no perfect solution to that problem, >> but the status quo is definitely not perfect either. > > Meh. I tend to think that a better solution to those folks' problem is > a package of add-on casts that they could install for use with their > legacy code; not dumbing down the system's error detection capability > for everyone. Peter's original try at re-adding implicit text casts > in that way didn't work very well IIRC, but maybe we could try harder. Well, the big problem that you run into is that when you add casts, you tend to create situations that the type system thinks are ambiguous. A particular example of this is textanycat, anytextcat, and plain old textcat. If you start adding casts, the system can get confused about which one it's supposed to call in which situation. The frustrating thing is that we don't really care. The only reason why there are three different operators in the first place is because we want to make sure that everything someone does will match one of them. But then if something matches two of them, we error out unnecessarily. It would be nice to have a way to say "among this group of functions, we don't care" or perhaps "among this group of functions, here is a preference ordering; in case of doubt, pick the one with the highest preference". But in some sense I feel that that isn't really solving the problem, because the only reason those extra functions exist in the first place is to work around the fact that sometimes the system doesn't perform typecasts in situations where we wish it did. It's almost like we should have a way to flag argument positions and say "for this function, in this argument position, feel free to implicitly cast to text". Then, for example, you'd only need one quote_literal() rather than two. This is possibly unsatisfying as well because there likely will be disagreement about which functions ought to have that implicit-casting-ok behavior and which ones should not. Maybe that'd be OK anyway; people could always hack their local catalogs if need be. But, all things being equal, it would be nice to design this in a way where you could package up a bunch of behavior in the form of an extension that users could either install or not install according to the semantics that they wish to have. Your proposal of adding casts is appealing from that perspective, but it'd work a lot better in practice if we could eliminate the practice of having multiple versions of functions with the same semantics just to elicit the desired casting behavior. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 30, 2012 at 9:13 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Upthread you were complaining about how we'd reject calls even when >> there was only one possible interpretation. I wonder whether there'd be >> any value in taking that literally: that is, allow use of assignment >> rules when there is, in fact, exactly one function with the right number >> of parameters visible in the search path. This would solve the LPAD() >> problem (at least as stated), and probably many other practical cases >> too, since I admit your point that an awful lot of users do not use >> function overloading. The max() example I mentioned earlier would not >> get broken since there's more than one max(), and in general it seems >> likely that cases where there's a real risk would involve overloaded >> names. > > That's an interesting idea. I like it. I did some experimentation with this. It seems that what Tom proposed here is a lot cleaner than what I proposed previously, while still increasing usability in many real-world cases. For example, in unpatched master: rhaas=# create function xyz(smallint) returns smallint as $$select $1$$ language sql; CREATE FUNCTION rhaas=# select xyz(5); ERROR: function xyz(integer) does not exist LINE 1: select xyz(5); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. rhaas=# create table abc (a int); CREATE TABLE rhaas=# select lpad(a, 5, '0') from abc; ERROR: function lpad(integer, integer, unknown) does not exist LINE 1: select lpad(a, 5, '0') from abc; ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. But, with the attached patch: rhaas=# create function xyz(smallint) returns smallint as $$select $1$$ language sql; CREATE FUNCTION rhaas=# select xyz(5); xyz ----- 5 (1 row) rhaas=# create table abc (a int); CREATE TABLE rhaas=# select lpad(a, 5, '0') from abc; lpad ------ (0 rows) There is only one regression test output change: -ERROR: function int2um(integer) does not exist +ERROR: function int2um(smallint) requires run-time type coercion The replacement error message is coming from lookup_agg_function(), which calls func_get_detail() and then imposes stricter checks on the result. In the old coding func_get_detail() didn't even identify a candidate, whereas now it does but lookup_agg_function() decides that it isn't usable. This seems OK to me, and the error message doesn't seem any worse either. So that's the good news. The not-so-good news is that to make it work, I had to modify make_fn_arguments() to pass COERCION_ASSIGNMENT rather than COERCION_IMPLICIT to coerce_type(). Otherwise, parsing succeeds, but then things fall over later when we try to identify the coercion function to be used. The reason I'm nervous about is because the code now looks like this: node = coerce_type(pstate, node, actual_arg_types[i], declared_arg_types[i], -1, COERCION_ASSIGNMENT, COERCE_IMPLICIT_CAST, -1); It seems wrong to pass COERCE_IMPLICIT_CAST along with COERCION_ASSIGNMENT, because COERCE_IMPLICIT_CAST controls the way that the cast is *displayed*, and COERCE_IMPLICIT_CAST means don't display it at all. That seems like it could create a problem if we used this new type of argument matching (because there was only one function with a given name) and then later someone added a second one. I thought, for example, that there might be a problem with the way views are reverse-parsed, but it actually seems to work OK, at least in the case I can think of to test: rhaas=# create table look_ma (a int, b text); CREATE TABLE rhaas=# create view look_ma_view (a, b) as select lpad(a, 5), lpad(b, 5) from look CREATE VIEW rhaas=# \d+ look_ma_view View "public.look_ma_view" Column | Type | Modifiers | Storage | Description --------+------+-----------+----------+------------- a | text | | extended | b | text | | extended | View definition: SELECT lpad(look_ma.a::text, 5) AS a, lpad(look_ma.b, 5) AS b FROM look_ma; Note that where the assignment cast was used to find the function to call, we get a cast in the deparsed query, but in the case where we used an implicit cast, we don't. This is exactly as I would have hoped. I fear there might be a subtler case where there is an issue, but so far I haven't been able to find it. If there in fact is an issue, I think we can fix it by pushing the logic up from func_match_argtypes where it is now into func_get_detail; func_get_detail can then return some indication to the caller indicating which make_fn_arguments behavior is required. However, I don't want to add that complexity unless we actually need it for something. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Tue, 2012-11-06 at 10:57 -0500, Robert Haas wrote: > But, with the attached patch: > > rhaas=# create function xyz(smallint) returns smallint as $$select > $1$$ language sql; > CREATE FUNCTION > rhaas=# select xyz(5); > xyz > ----- > 5 > (1 row) > > rhaas=# create table abc (a int); > CREATE TABLE > rhaas=# select lpad(a, 5, '0') from abc; > lpad > ------ > (0 rows) I continue to be of the opinion that allowing this second case to work is not desirable.
On Mon, Nov 19, 2012 at 6:19 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On Tue, 2012-11-06 at 10:57 -0500, Robert Haas wrote: >> But, with the attached patch: >> >> rhaas=# create function xyz(smallint) returns smallint as $$select >> $1$$ language sql; >> CREATE FUNCTION >> rhaas=# select xyz(5); >> xyz >> ----- >> 5 >> (1 row) >> >> rhaas=# create table abc (a int); >> CREATE TABLE >> rhaas=# select lpad(a, 5, '0') from abc; >> lpad >> ------ >> (0 rows) > > I continue to be of the opinion that allowing this second case to work > is not desirable. 1. Why? 2. What's your counter-proposal? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 29 August 2012 23:39, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The main downside I can see is that code that used to work is likely > to stop working as soon as someone creates a potential overloading > situation. Worse, the error message could be pretty confusing, since > if you had been successfully calling f(smallint) with f(42), you'd get > "f(integer) does not exist", not something like "f() is ambiguous", > after adding f(float8) to the mix. This seems related to the confusing > changes in regression test cases that I got in my experiments yesterday. > This may be sufficient reason to reject the idea, since the very last > thing we need in this area is any degradation in the relevance of the > error messages. It would be useful if we issued a NOTICE when an ambiguity is introduced, rather than when using it. Like Bison's reporting of reduce conflicts. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > It would be useful if we issued a NOTICE when an ambiguity is > introduced, rather than when using it. I think that's pie in the sky, since whether there is an ambiguity will depend not only on what set of functions exists, but what the caller's search_path is. regards, tom lane
On 11/21/12 9:42 AM, Robert Haas wrote: > On Mon, Nov 19, 2012 at 6:19 PM, Peter Eisentraut <peter_e@gmx.net> wrote: >> On Tue, 2012-11-06 at 10:57 -0500, Robert Haas wrote: >>> But, with the attached patch: >>> >>> rhaas=# create function xyz(smallint) returns smallint as $$select >>> $1$$ language sql; >>> CREATE FUNCTION >>> rhaas=# select xyz(5); >>> xyz >>> ----- >>> 5 >>> (1 row) >>> >>> rhaas=# create table abc (a int); >>> CREATE TABLE >>> rhaas=# select lpad(a, 5, '0') from abc; >>> lpad >>> ------ >>> (0 rows) >> >> I continue to be of the opinion that allowing this second case to work >> is not desirable. > > 1. Why? Because a strongly-typed system should not cast numbers to strings implicitly. Does the equivalent of the lpad case work in any other strongly-typed programming language? > 2. What's your counter-proposal? Leave things as they are.
Peter Eisentraut <peter_e@gmx.net> writes: > On 11/21/12 9:42 AM, Robert Haas wrote: >> On Mon, Nov 19, 2012 at 6:19 PM, Peter Eisentraut <peter_e@gmx.net> wrote: >>> I continue to be of the opinion that allowing this second case to work >>> is not desirable. >> 1. Why? > Because a strongly-typed system should not cast numbers to strings > implicitly. Does the equivalent of the lpad case work in any other > strongly-typed programming language? The argument here is basically between ease of use and ability to detect common programming mistakes. It's not clear to me that there is any principled way to make such a tradeoff, because different people can reasonably put different weights on those two goals. >> 2. What's your counter-proposal? > Leave things as they are. FWIW, I agree with Peter. It's been like this for a long time and whether the system would be easier to use or not, it would definitely be uglier and harder to explain. ("Assignment casts are used only for assignments ... except when they aren't.") I notice that the proposed patch is devoid of documentation. Perhaps after Robert is done writing the necessary changes to the SGML docs about type conversions and casts, he'll agree this is pretty ugly. regards, tom lane
On Wed, Nov 21, 2012 at 5:10 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > Because a strongly-typed system should not cast numbers to strings > implicitly. Does the equivalent of the lpad case work in any other > strongly-typed programming language? Does any other strongly-typed programming language distinguish between explicit, assignment, and implicit casting the way that PostgreSQL does? In order for the equivalent of the lpad case to exist in some other programming language, I think they'd need to make that distinction, and AFAICT no one does that. The only other programming language I know of in which you can define what it means to cast between two data types is C++, and it's not generally considered one of that languages better features. AFAICT, they have implicit casts and explicit casts, but nothing intermediate. There are dynamic_cast, static_cast, and reinterpret_cast as well, but those trade-off efficiency for the possibility of a segmentation fault, and have nothing to do with the context in which the cast can be applied automatically. So I think the answer to your question is probably no, purely on the grounds that we have set a new world record for byzantine casting systems. A more fair comparison might be to look at what other SQL systems allow. Oracle, MySQL, and SQL Server all permit implicit casting between integer and text, and a call to LPAD with an integer first argument works just fine in both Oracle and MySQL. It doesn't work in SQL server, but that's only because SQL server doesn't have it as a built-in function. FWICT, there's no general problem with passing an integer to a function that expects varchar in any mainstream RDBMS other than PostgreSQL. >> 2. What's your counter-proposal? > > Leave things as they are. I'd be fine with that if we had a system that allows users to pick the behavior that they want in their particular environment, but in fact our existing system is extremely inflexible. If you start adding additional implicit casts to the system, you get failures trying to invoke commonly-used system functions, because we've got overloaded versions of them precisely to work around the fact that our casting rules are more restrictive than real-world users want them to be. If that's not prima facie evidence that the system doesn't work well in the real world, I'm not sure what would qualify. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Nov 22, 2012 at 10:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The argument here is basically between ease of use and ability to detect > common programming mistakes. It's not clear to me that there is any > principled way to make such a tradeoff, because different people can > reasonably put different weights on those two goals. I think that is true. But for whatever it's worth, and at the risk of beating a horse that seems not to be dead yet in spite of the fact that I feel I've already administered one hell of a beating, the LPAD case is unambiguous, and therefore it is hard to see what sort of programming mistake we are protecting users against. If there's only one function called bob, and the user says bob(x), it is hard to see what behavior, other than calling bob with x as an argument, would be even mildly sensible. (Yes, OK, there are two lpad functions, but as you pointed out previously, they take different numbers of arguments, so the point still stands.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> I'd be fine with that if we had a system that allows users to pick the > behavior that they want in their particular environment, but in fact > our existing system is extremely inflexible. If you start adding > additional implicit casts to the system, you get failures trying to > invoke commonly-used system functions, because we've got overloaded > versions of them precisely to work around the fact that our casting > rules are more restrictive than real-world users want them to be. If > that's not prima facie evidence that the system doesn't work well in > the real world, I'm not sure what would qualify. They don't even work particularly well for db hackers. I went down the rabbit hole of trying to make TEXT = CITEXT be a default cast to CITEXT, and after several days of struggling with breaking system functions, I gave up. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Nov 22, 2012 at 10:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The argument here is basically between ease of use and ability to detect >> common programming mistakes. It's not clear to me that there is any >> principled way to make such a tradeoff, because different people can >> reasonably put different weights on those two goals. > I think that is true. But for whatever it's worth, and at the risk of > beating a horse that seems not to be dead yet in spite of the fact > that I feel I've already administered one hell of a beating, the LPAD > case is unambiguous, and therefore it is hard to see what sort of > programming mistake we are protecting users against. I think we're talking past each other here. It is unarguable that (as long as there's only one LPAD function) there is only one possible non-error interpretation. However, you are ignoring the real possibility that perhaps the situation *is* an error: maybe the user typed the wrong function name, or the wrong field name, or simply misunderstands what the function is meant to do. If it is a typo then complaining about the datatype mismatch is a good thing to do. If it is intentional, then requiring an explicit cast makes it clear to all concerned that what's wanted is to convert the non-string value to a string and then perform a string-ish operation on it. regards, tom lane
On 25 November 2012 23:31, Robert Haas <robertmhaas@gmail.com> wrote: > The only other programming > language I know of in which you can define what it means to cast > between two data types is C++, and it's not generally considered one > of that languages better features. AFAICT, they have implicit casts > and explicit casts, but nothing intermediate. Well, you can make your class copy-constructable by providing a constructor (and a copy-assignment operator) whose only argument is, say, an int. In additional to that, you could potentially define a conversion operator, which will make the class implicitly cast back into an int. That is kind of a big distinction, because it doesn't have to go both ways, and in fact it usually doesn't - plenty of working C++ programmers don't know what a conversion operator is, but they could all tell you how to get this behaviour: MyClass foo = 5; // actually calls copy constructor - equivalent to MyClass foo(5); foo = 4; // This calls copy assignment operator -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Sun, Nov 25, 2012 at 6:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think we're talking past each other here. It is unarguable that > (as long as there's only one LPAD function) there is only one possible > non-error interpretation. However, you are ignoring the real > possibility that perhaps the situation *is* an error: maybe the user > typed the wrong function name, or the wrong field name, or simply > misunderstands what the function is meant to do. If it is a typo then > complaining about the datatype mismatch is a good thing to do. If it > is intentional, then requiring an explicit cast makes it clear to all > concerned that what's wanted is to convert the non-string value to a > string and then perform a string-ish operation on it. Sure, in theory that is true, but no other RDBMS that I know about feels a need to error out in that situation. I'm skeptical of the contention that we're smarter than everyone else. Moreover, if implicit casts to string are a categorically bad idea, why do we allow them (via various evil hacks) for quote_literal(), concat(), and ||? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Nov 25, 2012 at 7:05 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > Well, you can make your class copy-constructable by providing a > constructor (and a copy-assignment operator) whose only argument is, > say, an int. In additional to that, you could potentially define a > conversion operator, which will make the class implicitly cast back > into an int. That is kind of a big distinction, because it doesn't > have to go both ways, and in fact it usually doesn't - plenty of > working C++ programmers don't know what a conversion operator is, but > they could all tell you how to get this behaviour: > > MyClass foo = 5; // actually calls copy constructor - equivalent to > MyClass foo(5); > foo = 4; // This calls copy assignment operator I remember this sort of thing un-fondly from my C++ days, but it doesn't make me like our current behavior any better. As in C++, we seem to have created a system where the only way to get even locally sensible behavior is to throw large piles of hackery at the problem. Getting the behavior you want globally cannot be obtained at any price. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 26 November 2012 00:24, Robert Haas <robertmhaas@gmail.com> wrote: > I remember this sort of thing un-fondly from my C++ days, but it > doesn't make me like our current behavior any better. You can also make a constructor with a single argument "explicit", and thereby prevent implicit conversions. So yes, C++ distinguishes between explicit, assignment, and implicit casting in a way that is, in broad strokes, at least as sophisticated as PostgreSQL. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Robert Haas <robertmhaas@gmail.com> writes: > So I think the answer to your question is probably no, purely on the > grounds that we have set a new world record for byzantine casting > systems. A more fair comparison might be to look at what other SQL > systems allow. Oracle, MySQL, and SQL Server all permit implicit > casting between integer and text, and a call to LPAD with an integer > first argument works just fine in both Oracle and MySQL. It doesn't > work in SQL server, but that's only because SQL server doesn't have it > as a built-in function. FWICT, there's no general problem with > passing an integer to a function that expects varchar in any > mainstream RDBMS other than PostgreSQL. I think this is ignoring the fact that we have an extensible type system, and thus a lot more room for problems if we allow too many implicit casts. It might also be worth noting that some of this complexity comes from the SQL standard. It is at least arguable that the three-way coercion distinction exists in the standard: they have got different rules for what happens in an explicit CAST, in an assignment context, and in plain expressions. So it's not that relevant whether other mainstream programming languages have comparable constructs. Having said that, though, I think you could make an argument that there's some support for this idea in the SQL standard. In SQL99 10.4 <routine invocation>, it appears that once you've identified a target routine to be called, you're supposed to use the "store assignment" rules to decide how to convert the supplied expression(s) to the parameter data type(s). However, it's not clear to me that that should be taken as conclusive, because the $64 question here is exactly how sure you are in your identification of the target routine. SQL99 doesn't seem to allow anywhere near as much function overloading as we do --- and of course they have no notion of overloaded or user-defined operators at all. As far as I can tell from 10.4, you are supposed to be able to identify the target routine without any consideration of the actual parameters' types. regards, tom lane
On Sun, Nov 25, 2012 at 7:36 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 26 November 2012 00:24, Robert Haas <robertmhaas@gmail.com> wrote: >> I remember this sort of thing un-fondly from my C++ days, but it >> doesn't make me like our current behavior any better. > > You can also make a constructor with a single argument "explicit", and > thereby prevent implicit conversions. So yes, C++ distinguishes > between > explicit, assignment, and implicit casting in a way that is, in broad > strokes, at least as sophisticated as PostgreSQL. OK, I stand corrected. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Nov 25, 2012 at 6:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > I think we're talking past each other here. It is unarguable that > (as long as there's only one LPAD function) there is only one possible > non-error interpretation. However, you are ignoring the real > possibility that perhaps the situation *is* an error: maybe the user > typed the wrong function name, or the wrong field name, or simply > misunderstands what the function is meant to do. Yeah, but of course all of that could also be true even if the argument types match exactly, too. I mean, I won't deny that the presence of an exact argument-type match lends a little bit of additional confidence that the call is the one the user intended, but a user is more likely to confuse LPAD with RPAD than they are to confuse either of them with a function that does something completely different but is spelled almost the same (LDAP?). I think it's also worth reiterating that, whatever you may think of the LPAD case, there is a significant win here in allowing things like foo(3) to match foo(smallint) in the absence of any other foo-of-one-argument. Nobody's even attempted to argue that the current behavior in that situation is desirable, and the fact that this would fix it in nearly all of the cases that anyone cares about strikes me as a distinct point in its favor. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Nov 25, 2012 at 7:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think this is ignoring the fact that we have an extensible type > system, and thus a lot more room for problems if we allow too many > implicit casts. I don't deny that. > It might also be worth noting that some of this complexity comes from > the SQL standard. It is at least arguable that the three-way coercion > distinction exists in the standard: they have got different rules for > what happens in an explicit CAST, in an assignment context, and in plain > expressions. So it's not that relevant whether other mainstream > programming languages have comparable constructs. > > Having said that, though, I think you could make an argument that > there's some support for this idea in the SQL standard. In SQL99 > 10.4 <routine invocation>, it appears that once you've identified > a target routine to be called, you're supposed to use the "store > assignment" rules to decide how to convert the supplied expression(s) > to the parameter data type(s). However, it's not clear to me that > that should be taken as conclusive, because the $64 question here > is exactly how sure you are in your identification of the target > routine. SQL99 doesn't seem to allow anywhere near as much function > overloading as we do --- and of course they have no notion of > overloaded or user-defined operators at all. As far as I can tell > from 10.4, you are supposed to be able to identify the target routine > without any consideration of the actual parameters' types. FWIW, neither MySQL nor Oracle supports function overloading for plain functions, so the question doesn't arise for them in the context of something like LPAD(). Oracle does support overloading for package functions, and I'm not sure exactly how they identify candidate functions in that context, but they do complain about ambiguous calls in some circumstances. Personally, I'm not sure that anyone has come up with an altogether satisfactory solution to the function overloading problem. If you have an exact type match in every argument position for one of the possible candidate functions, then surely any system that permits overloading at all is going to pick that candidate. Conversely, if you have one or many candidates all of which are completely incompatible with the actual argument types, then any system is going to fail. The tension is all around what to do when you have several candidates which are about equally good. You can either reject the call as ambiguous (which will sometimes annoy users who don't feel that a cast should be needed) or you can use some sort of tiebreak system to pick a candidate (which risks picking a different function than the user expected). I tend to think it's better to err on the side of the former, and I think we do, but there might nonetheless be some for improvement in that area, with due regard for the possibility of breaking currently-working applications. That, however, is a separate question from what's under discussion here, because the case at issue for the proposed patch is the one in which only one possible candidate exists, and the question is whether we ought to allow the use of assignment casts to allow the call to work rather than fail, NOT which of several overloaded functions we ought to pick. In any situation in which overloading is in use, the patch as proposed changes nothing. I'm not generally very good at interpreting the SQL standard text, but if it says that you ought to use assignment casts to match actual argument types to the chosen candidate function, then that seems like it's advocating for essentially the same position that you arrived at independently and that the patch also takes, which furthermore happens to be compatible with what other RDBMS systems do, at least in the no-overloading case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, 2012-11-25 at 21:08 -0500, Robert Haas wrote: > That, however, is a separate question from what's under discussion > here, because the case at issue for the proposed patch is the one in > which only one possible candidate exists, and the question is whether > we ought to allow the use of assignment casts to allow the call to > work rather than fail, NOT which of several overloaded functions we > ought to pick. In any situation in which overloading is in use, the > patch as proposed changes nothing. I'm not generally very good at > interpreting the SQL standard text, but if it says that you ought to > use assignment casts to match actual argument types to the chosen > candidate function, then that seems like it's advocating for > essentially the same position that you arrived at independently and > that the patch also takes, which furthermore happens to be compatible > with what other RDBMS systems do, at least in the no-overloading case. Let's say you have only one function "foo". All your queries are coded into your application, and everything works fine, using assignment casts where necessary. Then the user is foolish enough to CREATE FUNCTION foo... and now their queries start failing left and right. In other words, "only one possible candidate exists" should be followed by "right now" to be more precise. That's a major violation of the principle of least astonishment, that CREATE FUNCTION could cause such a disaster. I know that it can now, but what you're proposing will come into play much more frequently because most people start off with just one function by a particular name and define more as needed. If we do something like this, I think we should explicitly opt out of the overloading feature at DDL time (somewhat like what Simon suggested in another reply). E.g. CREATE {UNIQUE|OVERLOADED} FUNCTION ... I'm not proposing that; in general I am very wary of changes to the type system. I'm just saying that, if we do have special rules, we should have a way to make sure that users know when the rules are changing. Regards,Jeff Davis
On Wed, 2012-11-21 at 15:27 +0000, Simon Riggs wrote: > It would be useful if we issued a NOTICE when an ambiguity is > introduced, rather than when using it. > > Like Bison's reporting of reduce conflicts. This brings up a very important point, which is that a lot of the code is frozen in applications yet invisible at DDL time. So we have to be careful that DDL changes have a reasonable impact on the ability to continue to compile and execute the previously-working SQL received from the applications. In other words, as I said in another reply, we want to avoid cases where something seemingly innocuous (like creating a function) causes previously-working SQL to fail due to ambiguity. As Tom said, detecting the ambiguity at DDL time is not easy, so I'm not suggesting that. And I know that creating a function can already cause previously-working SQL to fail. I'm just saying we should be careful of these situations and not make them more likely than necessary. Regards,Jeff Davis
Hello all 2012/11/27 Jeff Davis <pgsql@j-davis.com>: > On Wed, 2012-11-21 at 15:27 +0000, Simon Riggs wrote: >> It would be useful if we issued a NOTICE when an ambiguity is >> introduced, rather than when using it. >> >> Like Bison's reporting of reduce conflicts. > > This brings up a very important point, which is that a lot of the code > is frozen in applications yet invisible at DDL time. So we have to be > careful that DDL changes have a reasonable impact on the ability to > continue to compile and execute the previously-working SQL received from > the applications. > > In other words, as I said in another reply, we want to avoid cases where > something seemingly innocuous (like creating a function) causes > previously-working SQL to fail due to ambiguity. > > As Tom said, detecting the ambiguity at DDL time is not easy, so I'm not > suggesting that. And I know that creating a function can already cause > previously-working SQL to fail. I'm just saying we should be careful of > these situations and not make them more likely than necessary. > from my view - a current design works well, but for someone who see pg first time, there can be lot of surprises. a) PostgreSQL reports missing functions -- but there are issue in parameters b) PostgreSQL requests explicit typing string literals to text -- and again it reports not informative message so minimally we can enhance a error messages Regards Pavel > Regards, > Jeff Davis > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Nov 27, 2012 at 01:59:04AM -0800, Jeff Davis wrote: > On Wed, 2012-11-21 at 15:27 +0000, Simon Riggs wrote: > > It would be useful if we issued a NOTICE when an ambiguity is > > introduced, rather than when using it. > > > > Like Bison's reporting of reduce conflicts. > > This brings up a very important point, which is that a lot of the code > is frozen in applications yet invisible at DDL time. So we have to be > careful that DDL changes have a reasonable impact on the ability to > continue to compile and execute the previously-working SQL received from > the applications. > > In other words, as I said in another reply, we want to avoid cases where > something seemingly innocuous (like creating a function) causes > previously-working SQL to fail due to ambiguity. > > As Tom said, detecting the ambiguity at DDL time is not easy, so I'm not > suggesting that. And I know that creating a function can already cause > previously-working SQL to fail. I'm just saying we should be careful of > these situations and not make them more likely than necessary. For me this highlights why looking at how application languages handle overloading might not be as relevant --- most language don't have possible-conflicting functions being created at run-time like a database does. The parallels in how other databases treat overloading is relevant. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
2012/11/27 Bruce Momjian <bruce@momjian.us>: > On Tue, Nov 27, 2012 at 01:59:04AM -0800, Jeff Davis wrote: >> On Wed, 2012-11-21 at 15:27 +0000, Simon Riggs wrote: >> > It would be useful if we issued a NOTICE when an ambiguity is >> > introduced, rather than when using it. >> > >> > Like Bison's reporting of reduce conflicts. >> >> This brings up a very important point, which is that a lot of the code >> is frozen in applications yet invisible at DDL time. So we have to be >> careful that DDL changes have a reasonable impact on the ability to >> continue to compile and execute the previously-working SQL received from >> the applications. >> >> In other words, as I said in another reply, we want to avoid cases where >> something seemingly innocuous (like creating a function) causes >> previously-working SQL to fail due to ambiguity. >> >> As Tom said, detecting the ambiguity at DDL time is not easy, so I'm not >> suggesting that. And I know that creating a function can already cause >> previously-working SQL to fail. I'm just saying we should be careful of >> these situations and not make them more likely than necessary. > > For me this highlights why looking at how application languages handle > overloading might not be as relevant --- most language don't have > possible-conflicting functions being created at run-time like a database > does. The parallels in how other databases treat overloading is > relevant. it is a basic problem - PostgreSQL has unique possibilities - polymorphic parameters and almost all databases doesn't support overloading probably our system is very similar to Haskell > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + It's impossible for everything to be true. + > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Nov 27, 2012 at 10:52 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > it is a basic problem - PostgreSQL has unique possibilities - > polymorphic parameters and almost all databases doesn't support > overloading Speaking of polymorphism, why not just implement lpad()'s first argument as 'anyelement'? ISTM this comes up in mostly in porting code from other database that is utilizing standard sql functions. This should be appropriate when the function's basic functionality and argument signature is not dependent on input type (constrast: to_timestamp) and there is a good portability case to be made. Essentially, this applies to a handful of string processing routines AFAICT. merlin
On Tue, Nov 27, 2012 at 4:46 AM, Jeff Davis <pgsql@j-davis.com> wrote: > Let's say you have only one function "foo". All your queries are coded > into your application, and everything works fine, using assignment casts > where necessary. > > Then the user is foolish enough to CREATE FUNCTION foo... and now their > queries start failing left and right. > > In other words, "only one possible candidate exists" should be followed > by "right now" to be more precise. > > That's a major violation of the principle of least astonishment, that > CREATE FUNCTION could cause such a disaster. I know that it can now, but > what you're proposing will come into play much more frequently because > most people start off with just one function by a particular name and > define more as needed. I admit that there are cases where this could happen, and that it will happen a little more than it does now. But, as you say, this can happen now, and yet we get very few if any complaints about it, whereas we get regular complaints about the need to insert casts that other database systems do not require. The fact is that most functions are not overloaded, so the esoterica of overloading affect only a tiny number of relatively sophisticated users. The need for extra casts cuts a much broader swath through our user base. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I admit that there are cases where this could happen, and that it will > happen a little more than it does now. But, as you say, this can > happen now, and yet we get very few if any complaints about it, > whereas we get regular complaints about the need to insert casts that > other database systems do not require. The fact is that most > functions are not overloaded, so the esoterica of overloading affect > only a tiny number of relatively sophisticated users. The need for > extra casts cuts a much broader swath through our user base. I find this argument a bit specious. It probably is true that most *user defined* functions aren't overloaded --- but that's not so true for system-defined functions, and even less true for operators. So the parser's behavior with overloaded calls affects all users, whether they know it or not. It also affects developers, in that adding a new overloaded version of a system function (that previously wasn't overloaded) could actually reduce the number of cases for which the function works without an explicit cast. We have got to be really careful with changing the parser's behavior here, or we're going to break cases that work today. regards, tom lane
On Tue, 2012-11-27 at 12:19 -0500, Robert Haas wrote: > I admit that there are cases where this could happen, and that it will > happen a little more than it does now. But, as you say, this can > happen now, and yet we get very few if any complaints about it, > whereas we get regular complaints about the need to insert casts that > other database systems do not require. The fact is that most > functions are not overloaded, so the esoterica of overloading affect > only a tiny number of relatively sophisticated users. The need for > extra casts cuts a much broader swath through our user base. Well, I did offer a suggestion that would make your idea safer, which is to explicitly opt out of the overloading feature at the time the function is created, rather than making it implicit based on how many functions happen to have the same name. The fact that it can only hurt sophisticated users is not convincing to me. For one thing, our users are programmers, so they should all feel comfortable defining their own functions, and I don't want to make them any less so. Next, sophisticated users also make mistakes. I could also make a security argument. Even today, any user who can create a function in your search path can make your queries start failing. If we locked down most of the system-defined functions as non-overloadable, and allowed users to do the same for their functions (maybe even the default one day?), then that would greatly reduce the exposure. The current strictness of the overloaded functions tends to make users more explicit about argument types, which reduces the chance of problems at the expense of usability and compatibility. Not ideal, but if we make it more permissive then we are permanently stuck with less information about what types the user intended and which function they intended to call. In such an extensible system, that worries me on several fronts. That being said, I'm not outright in opposition to the idea of making improvements like this, I just think we should do so cautiously. Regards,Jeff Davis
On Tue, Nov 27, 2012 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I admit that there are cases where this could happen, and that it will >> happen a little more than it does now. But, as you say, this can >> happen now, and yet we get very few if any complaints about it, >> whereas we get regular complaints about the need to insert casts that >> other database systems do not require. The fact is that most >> functions are not overloaded, so the esoterica of overloading affect >> only a tiny number of relatively sophisticated users. The need for >> extra casts cuts a much broader swath through our user base. > > I find this argument a bit specious. It probably is true that most > *user defined* functions aren't overloaded --- but that's not so true > for system-defined functions, and even less true for operators. So > the parser's behavior with overloaded calls affects all users, whether > they know it or not. It also affects developers, in that adding a > new overloaded version of a system function (that previously wasn't > overloaded) could actually reduce the number of cases for which the > function works without an explicit cast. > > We have got to be really careful with changing the parser's behavior > here, or we're going to break cases that work today. Well, the whole point of writing the patch the way I did was that it *doesn't* break any cases that work today. But as to your point about the system catalogs, it is true that adding an additional function could reduce the number of cases where things work today. But I think in many cases it would eliminate the need for overloading that we already have, and simplify things for future developers. Right now, quote_literal() allows implicit casts to text by having a second version that takes any anyelement argument; on the other hand, concat() allows implicit casts to text by accepting "any" rather than text as an argument; and || allows implicit casts to text by defining operators for anynonarray || text, text || anynonarray, and text || text. So we've got three quite different methods to create implicit-cast-to-text behavior in particular cases. That's got developer complexity too, and while this proposal wouldn't do anything about the third case since || actually sometimes has a different meaning, namely array concatenation, the first two wouldn't need overloading any more. They'd just work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Nov 27, 2012 at 1:45 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Tue, 2012-11-27 at 12:19 -0500, Robert Haas wrote: >> I admit that there are cases where this could happen, and that it will >> happen a little more than it does now. But, as you say, this can >> happen now, and yet we get very few if any complaints about it, >> whereas we get regular complaints about the need to insert casts that >> other database systems do not require. The fact is that most >> functions are not overloaded, so the esoterica of overloading affect >> only a tiny number of relatively sophisticated users. The need for >> extra casts cuts a much broader swath through our user base. > > Well, I did offer a suggestion that would make your idea safer, which is > to explicitly opt out of the overloading feature at the time the > function is created, rather than making it implicit based on how many > functions happen to have the same name. > > The fact that it can only hurt sophisticated users is not convincing to > me. For one thing, our users are programmers, so they should all feel > comfortable defining their own functions, and I don't want to make them > any less so. Next, sophisticated users also make mistakes. > > I could also make a security argument. Even today, any user who can > create a function in your search path can make your queries start > failing. If we locked down most of the system-defined functions as > non-overloadable, and allowed users to do the same for their functions > (maybe even the default one day?), then that would greatly reduce the > exposure. > > The current strictness of the overloaded functions tends to make users > more explicit about argument types, which reduces the chance of problems > at the expense of usability and compatibility. Not ideal, but if we make > it more permissive then we are permanently stuck with less information > about what types the user intended and which function they intended to > call. In such an extensible system, that worries me on several fronts. > > That being said, I'm not outright in opposition to the idea of making > improvements like this, I just think we should do so cautiously. Fair enough. I certainly admit that I wouldn't like to release with this code in place and then find out that it's got some critical flaw, security or otherwise. A couple of embarrassing bugs have been found recently in patches I wrote and committed, and I'm not looking to up that number. That having been said, I remain unconvinced that any of the things proposed so far are compelling reasons not to do this. That doesn't mean there aren't any such reasons, but I am personally unconvinced that we've found them yet. Most of the arguments so far advanced seem to involve overloading (where this proposal doesn't change anything vs. today); I think you're the only one who has proposed a situation where it causes a problem (namely, a function that is overloaded later) but in my personal opinion that's not going to happen often enough to justify the amount of user pain the current system imposes. Of course that's a judgement call. I do think that applying some kind of explicit flag to the function indicating whether it should allow implicit assignment casting/implicit casting to text/overloading/whatever is a possibly interesting alternative. It seems clear from our system catalogs that implicit casting to text is sometimes a desirable behavior and sometimes not, so it's reasonable to think that perhaps we should put that under user control. What I like about "my" proposal (really Tom's idea) is that it seems like it solves a pretty high percentage of the problem cases without requiring any explicit user action. I actually suspect we could get the "right" behavior even more often by attaching flags to the function or argument position, but that would also put more of the onus on the user to get the flags right -- and we might not even agree amongst ourselves on how the flags should be set.The fact that quote_literal() allows (by the expedientof overloading) implicit casts to text and that lpad() does not seems fairly random to me in hindsight; is there a general principle there that we'd all sign on to? The nice thing about this proposal is that it doesn't require any explicit user action. Of course that's no help if it does the wrong thing, but since it only fixes cases that are unambiguous and which currently fail, it's hard for me to see how that's a real danger. That doesn't mean there ISN'T a real danger, but I want to make sure that if we don't do this we have a clear and understandable reason, and not just bad memories of the last time we made a change in this area. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > But as to your point about the system catalogs, it is true that adding > an additional function could reduce the number of cases where things > work today. But I think in many cases it would eliminate the need for > overloading that we already have, and simplify things for future > developers. Right now, quote_literal() allows implicit casts to text > by having a second version that takes any anyelement argument; on the > other hand, concat() allows implicit casts to text by accepting "any" > rather than text as an argument; and || allows implicit casts to text > by defining operators for anynonarray || text, text || anynonarray, > and text || text. So we've got three quite different methods to > create implicit-cast-to-text behavior in particular cases. That's got > developer complexity too, and while this proposal wouldn't do anything > about the third case since || actually sometimes has a different > meaning, namely array concatenation, the first two wouldn't need > overloading any more. They'd just work. Uh, no, not really, and I think that assertion just goes to show that this area is more subtle than you think. quote_literal() for instance presently works for any datatype that has an explicit cast to text. After making the change you propose above, it would only work for types for which the cast was assignment-grade or less. concat() is even looser: as now implemented, it works for *anything at all*, because it relies on datatype output functions not casts to text. I'm dubious that that inconsistency is a good thing, actually, but that's how the committed code is written. Now, some of us might think that backing these conversions down to only allowing assignment-grade casts would be an improvement, in the sense that it would actually make the type system tighter not looser than it is today for these particular functions. But I suspect you wouldn't see it as an improvement, given the position you're arguing from. In fact, I'm afraid that making this change would result in requests to downgrade existing explicit casts to be assignment-only, so that people could be even lazier about not casting function arguments; and that is something up with which I will not put. regards, tom lane
On Tue, 2012-11-27 at 14:13 -0500, Robert Haas wrote: > I do think that applying some kind of explicit flag to the function > indicating whether it should allow implicit assignment > casting/implicit casting to text/overloading/whatever is a possibly > interesting alternative. It seems clear from our system catalogs that > implicit casting to text is sometimes a desirable behavior and > sometimes not, so it's reasonable to think that perhaps we should put > that under user control. What I like about "my" proposal (really > Tom's idea) is that it seems like it solves a pretty high percentage > of the problem cases without requiring any explicit user action. What user action are you concerned about? If we (eventually) made the non-overloaded case the default, would that resolve your concerns? Regards,Jeff Davis
Robert Haas <robertmhaas@gmail.com> writes: > I do think that applying some kind of explicit flag to the function > indicating whether it should allow implicit assignment > casting/implicit casting to text/overloading/whatever is a possibly > interesting alternative. That idea seems possibly worth pursuing. The thing that I find scary about the current proposal is that it applies to all functions (and operators) willy-nilly, which seems to raise the risk of unexpected side effects pretty high. If we could confine the behavioral change to a relatively small number of functions for which there was consensus that they should accept most anything, I'd feel better about it. (Of course, we might then conclude that something close to the quote_literal solution would work as well as a new function property. But it's worth thinking about.) > The fact that quote_literal() allows (by the expedient of > overloading) implicit casts to text and that lpad() does not seems > fairly random to me in hindsight; is there a general principle there > that we'd all sign on to? I don't find that random in the slightest. The entire purpose of quote_literal is "manufacture a SQL-literal string representation of this value", and that clearly might apply to data of any type. lpad() is, first last and only, a textual operation. Somebody who thinks it should apply directly to an integer is guilty of sloppy thinking at best, or not even understanding what a data type is at worst. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > It also affects developers, in that adding a > new overloaded version of a system function (that previously wasn't > overloaded) could actually reduce the number of cases for which the > function works without an explicit cast. > > We have got to be really careful with changing the parser's behavior > here, or we're going to break cases that work today. For my 2c- we have to be really careful making changes to the system functions as well as the parser's behavior. If we're worried about users creating overloaded versions of system functions, well, I'd probably suggest a "don't do that then" kind of approach.. Thanks, Stephen
On Tue, Nov 27, 2012 at 2:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Uh, no, not really, and I think that assertion just goes to show that > this area is more subtle than you think. quote_literal() for instance > presently works for any datatype that has an explicit cast to text. That doesn't appear to be the behavior I'm seeing: rhaas=# select quote_literal(17);quote_literal ---------------'17' (1 row) rhaas=# select * from pg_cast where castsource = 'int4'::regtype and casttarget = 'text'::regtype;castsource | casttarget | castfunc | castcontext | castmethod ------------+------------+----------+-------------+------------ (0 rows) > After making the change you propose above, it would only work for types > for which the cast was assignment-grade or less. ...but that's everything, because there's a hardcoded exception in the code that dictates that even if there is no entry in pg_cast, an assignment cast to text exists for every data type. > concat() is even > looser: as now implemented, it works for *anything at all*, because it > relies on datatype output functions not casts to text. I'm dubious that > that inconsistency is a good thing, actually, but that's how the > committed code is written. I argued at the time that CONCAT should take variadic text rather than variadic any and was roundly ignored on the grounds that the implicit casting to text behavior was what everyone wanted in that particular case. My feeling is that we need to adopt a solution to this problem partly so that people don't keep inventing (even in core code!) one-off, hackish solutions that make certain cases behave completely differently from the general rules. > Now, some of us might think that backing these conversions down to only > allowing assignment-grade casts would be an improvement, in the sense > that it would actually make the type system tighter not looser than it > is today for these particular functions. But I suspect you wouldn't see > it as an improvement, given the position you're arguing from. Actually, I think it wouldn't matter a bit, because of the exception that says there's an assignment cast to text for everything. > In fact, I'm afraid that making this change would result in requests to > downgrade existing explicit casts to be assignment-only, so that people > could be even lazier about not casting function arguments; and that is > something up with which I will not put. While I'm personally not excited about it, it is certainly imaginable that someone might prefer something like text -> xml to be an assignment casts rather than an explicit cast. But we've got an easy response to that, which is fine, change it for your database, but we're not changing it in the upstream copy. As a compatibility issue with other databases, it's not really an issue; I can't remember a single complaint about needing an explicit cast from text to xml or integer to boolean or any of the other things that appear in pg_cast with castcontext = 'e'. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Nov 27, 2012 at 2:59 PM, Jeff Davis <pgsql@j-davis.com> wrote: > What user action are you concerned about? If we (eventually) made the > non-overloaded case the default, would that resolve your concerns? I can't quite see how a non-overloaded flag would work, unless we get rid of schemas. But I think there are a variety of other kinds of labeling that I think would work. I'm still not sure that's as good as a general solution, because if nothing else it relies on us to make the right decision as to which type to use in each case, and considering that neither Tom nor I are particularly sold on what we did with CONCAT(), nor am I sure that we even agree with each other on what the right thing to do would have been there, I'm a bit skeptical about our ability to get these decisions right. But it might still be an improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Nov 27, 2012 at 2:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> After making the change you propose above, it would only work for types >> for which the cast was assignment-grade or less. > ...but that's everything, because there's a hardcoded exception in the > code that dictates that even if there is no entry in pg_cast, an > assignment cast to text exists for every data type. Ugh. I had been thinking that automatic CoerceViaIO casting only happened for explicit casts. If that can be invoked via assignment casts, then what you're proposing really destroys the type system entirely, at least for functions taking text: there is absolutely no argument such a function won't accept. I cannot support this. regards, tom lane
On Tue, Nov 27, 2012 at 3:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I do think that applying some kind of explicit flag to the function >> indicating whether it should allow implicit assignment >> casting/implicit casting to text/overloading/whatever is a possibly >> interesting alternative. > > That idea seems possibly worth pursuing. The thing that I find scary > about the current proposal is that it applies to all functions (and > operators) willy-nilly, which seems to raise the risk of unexpected > side effects pretty high. If we could confine the behavioral change > to a relatively small number of functions for which there was consensus > that they should accept most anything, I'd feel better about it. > > (Of course, we might then conclude that something close to the > quote_literal solution would work as well as a new function property. > But it's worth thinking about.) >> The fact that quote_literal() allows (by the expedient of >> overloading) implicit casts to text and that lpad() does not seems >> fairly random to me in hindsight; is there a general principle there >> that we'd all sign on to? > > I don't find that random in the slightest. The entire purpose of > quote_literal is "manufacture a SQL-literal string representation of > this value", and that clearly might apply to data of any type. lpad() > is, first last and only, a textual operation. Somebody who thinks it > should apply directly to an integer is guilty of sloppy thinking at > best, or not even understanding what a data type is at worst. Well, considering I made that "mistake" while working with PostgreSQL 8.2, and considering further that other databases allow it, I'm a little reluctant to accept this theory. I'm willing to bet that the fine folks in Redwood understand what a data type is just fine, and I'm pretty sure that I do, too. Sloppy thinking? Perhaps. But I think you could make a perfectly fine argument that the function of lpad() is to concatenate something onto the string representation of a value, or conversely that the function of quote_literal() is to escape a string. You might not agree with either of those arguments but I don't care to label someone who does as an idiot. The problem I have with the explicit labeling approach is that it seems to depend heavily on how you conceptualize what the function is trying to do, and not everyone is going to conceptualize that the same way. Clearly there are a lot of people who expect at least some string operators to work on numbers, including the OP, and are confused when they don't. We can call those people nasty names but that's not going to win us many friends. Anyway, I'm not averse to thinking about some kind of labeling solution but I'm not exactly sure what would work well - and I'd still like to see some hard evidence that the collateral damage from my er your proposal is unacceptably high. The most plausible scenario for how this could break things that has been presented thus far is that someone might create a function, use it with a data type that requires assignment-casting, and then create another function, and have things break. But as Jeff pointed out, that can happen already: in fact, it did, in core, with pg_size_pretty(), and while you had doubts about that change at the time, none of us realized exactly what the failure scenario was until it was too late to change it. Would that kind of thing happen more often under this proposal? Kind of hard to say, but if it made us think twice before overloading system catalog functions, it might even work out to a net positive. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/25/12 6:36 PM, Robert Haas wrote: > I think that is true. But for whatever it's worth, and at the risk of > beating a horse that seems not to be dead yet in spite of the fact > that I feel I've already administered one hell of a beating, the LPAD > case is unambiguous, and therefore it is hard to see what sort of > programming mistake we are protecting users against. Upstream applications passing wrong data down to the database.
On 11/25/12 7:21 PM, Robert Haas wrote: > Sure, in theory that is true, but no other RDBMS that I know about > feels a need to error out in that situation. I'm skeptical of the > contention that we're smarter than everyone else. Well, I think in most programming languages that have typed function prototypes, define foo(string) call foo(55) is an error. I could be convinced otherwise, but that's my current impression. So the principle of rejecting this is not new. If, on the other hand, we want to align more with other RDBMS that apparently allow this, we should look closer into by what rules they do this. If they use assignment casts (that is, the same rules that apply when running INSERT, for example), we could look into using that as well, but then we should do that all the time, and not only as a fallback of some sort. Because that's (a) arbitrary, and (b) causes failures when overloaded functions are added, which shouldn't happen (the existing cases where adding overloaded functions cause an existing function to fail are surely warts that should be removed, not new ones designed in). I wonder what implicit casts would be good for if assignment casts applied for function and operator calls.
On 11/27/12 12:07 PM, Merlin Moncure wrote: > Speaking of polymorphism, why not just implement lpad()'s first > argument as 'anyelement'? One of the arguments made here was that lpad(not-text) *should* fail.
On Tue, Nov 27, 2012 at 4:09 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On 11/27/12 12:07 PM, Merlin Moncure wrote: >> Speaking of polymorphism, why not just implement lpad()'s first >> argument as 'anyelement'? > > One of the arguments made here was that lpad(not-text) *should* fail. Well, sure. My point is: why do we need to break the casting machinery when we can simply tweak a few of the standard functions on portability grounds? Robert's case on lpad() has merit in the sense it has unambiguous meaning regardless of input type; polymorphic input types were designed to solve *exactly* that problem. SQL portability is a secondary but also important argument. That said, md5() surely needs some type of cast or interpretation of non-text types. ditto to_timestamp(), etc. So messing around with the casting rules is surely the wrong answer. I think if you relaxed the function sigs of a few functions on this page (http://www.postgresql.org/docs/9.2/interactive/functions-string.html), most reported problems would go away. One thing that worries me is introducing ambiguous cases where previously there weren't any though. merlin
On Tue, 2012-11-27 at 15:41 -0500, Robert Haas wrote: > I can't quite see how a non-overloaded flag would work, unless we get > rid of schemas. It may work to pick the first schema in the search path that has any functions by that name, and then choose the overloaded (or not) candidate from among those functions in that one schema. Then, non-overloaded function names would be unique within a schema. If there are multiple functions of the same name in multiple schemas in the search path, it does not make sense to me to lump them all together and choose an overloaded candidate from all of them (although I think that's what we do now). That sounds like a mistake, to me. Do you know of any useful examples of doing that? > But I think there are a variety of other kinds of > labeling that I think would work. Worth exploring. > I'm still not sure that's as good > as a general solution, because if nothing else it relies on us to make > the right decision as to which type to use in each case, and > considering that neither Tom nor I are particularly sold on what we > did with CONCAT(), nor am I sure that we even agree with each other on > what the right thing to do would have been there, I'm a bit skeptical > about our ability to get these decisions right. But it might still be > an improvement. I'm not entirely clear on the benefits of a general solution, nor why your solution is more general. You are still categorizing functions into "overloaded" and "non-overloaded", but you are doing so at runtime based on the current contents of the catalog. Regards,Jeff Davis
Merlin Moncure <mmoncure@gmail.com> writes: > ... I think if you relaxed > the function sigs of a few functions on this page > (http://www.postgresql.org/docs/9.2/interactive/functions-string.html), > most reported problems would go away. That's an interesting way of approaching it. Do we have any data on exactly which functions people do complain about? > One thing that worries me is introducing ambiguous cases where > previously there weren't any though. Right, but at least we'd be confining the ambiguity to a small number of function names. Tweaking the casting rules could have a lot of unforeseen consequences. regards, tom lane
On Tue, Nov 27, 2012 at 4:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> ... I think if you relaxed >> the function sigs of a few functions on this page >> (http://www.postgresql.org/docs/9.2/interactive/functions-string.html), >> most reported problems would go away. > > That's an interesting way of approaching it. Do we have any data on > exactly which functions people do complain about? After a few minutes of google-fu, lpad seems to top the list (if you don't count operator related issues which I think are mostly coding bugs). see: http://drupal.org/node/1338188. also upper: http://sourceforge.net/tracker/?func=detail&aid=3447417&group_id=171772&atid=859223.. also substring: http://archives.postgresql.org/pgsql-bugs/2008-01/msg00001.php note in two of the above cases the bugs were raised through 3rd party issue trackers :/. Interestingly, if you look at popular postgresql-only functions, such as regexp_replace, google seems to indicate there's not much problem there. This, IMNSHO, reinforces Robert's point -- but it seems to mostly bite people porting code, running cross database code bases, or having a strong background in other systems. I found a few non-string cases, especially round(). merlin
On 11/27/12 2:41 PM, Tom Lane wrote: > Tweaking the casting rules could have a lot of > unforeseen consequences. understatement of the year. IMHO. $0.02 worth etc. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
2012/11/28 Merlin Moncure <mmoncure@gmail.com>: > On Tue, Nov 27, 2012 at 4:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Merlin Moncure <mmoncure@gmail.com> writes: >>> ... I think if you relaxed >>> the function sigs of a few functions on this page >>> (http://www.postgresql.org/docs/9.2/interactive/functions-string.html), >>> most reported problems would go away. >> >> That's an interesting way of approaching it. Do we have any data on >> exactly which functions people do complain about? > > After a few minutes of google-fu, lpad seems to top the list (if you > don't count operator related issues which I think are mostly coding > bugs). > > see: http://drupal.org/node/1338188. > also upper: http://sourceforge.net/tracker/?func=detail&aid=3447417&group_id=171772&atid=859223.. > also substring: http://archives.postgresql.org/pgsql-bugs/2008-01/msg00001.php some of these issues are buggy and I am happy, so it doesn't working now. http://archives.postgresql.org/pgsql-bugs/2008-01/msg00001.php and these issue can be simply solved by overloading. Pavel > > note in two of the above cases the bugs were raised through 3rd party > issue trackers :/. Interestingly, if you look at popular > postgresql-only functions, such as regexp_replace, google seems to > indicate there's not much problem there. This, IMNSHO, reinforces > Robert's point -- but it seems to mostly bite people porting code, > running cross database code bases, or having a strong background in > other systems. > > I found a few non-string cases, especially round(). > > merlin
I am aware that in the case at hand, the call to make_fn_arguments() is with the only possible candidate function, so changing COERCE_IMPLICIT to COERCE_ASSIGNMENT inside of make_fn_arguments() is correct. But I wonder if this may have any unwanted side effects for other code paths to make_fn_arguments(), like from optimizer/util/clauses.c or from parser/parse_oper.c. I'm not saying it does, just asking if that could be the case. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin
On Tue, Nov 27, 2012 at 4:36 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On 11/25/12 6:36 PM, Robert Haas wrote: >> I think that is true. But for whatever it's worth, and at the risk of >> beating a horse that seems not to be dead yet in spite of the fact >> that I feel I've already administered one hell of a beating, the LPAD >> case is unambiguous, and therefore it is hard to see what sort of >> programming mistake we are protecting users against. > > Upstream applications passing wrong data down to the database. The circumstantial evidence suggests that many users don't want to be protected against that in the way that we are currently protecting them - or at least not all of them do (see Merlin's email elsewhere on this thread). What's frustrating about the status quo is that not only do you need lots of extra casts, but there's no real way to improve the situation. If you add an implicit cast from integer to text, for example, then 4 || 'foo' breaks. Now, you may think that adding an implicit cast from integer to text is a dumb idea, and maybe it is, but don't get too hung up on that example. The point is that if you're unhappy with the quote_literal() case (because it accepts too much), or the lpad() case (because it doesn't accept enough), or the foo(smallint) case (because it can't be happy with foo(42)), you don't really have a lot of options for adjusting the behavior as things stand today. I accept that some people think that decorating their code with lots of extra casts helps to avoid errors, and maybe it does, but there is plenty of evidence that a lot of users don't want to. And we not only don't give them the behavior they want; we don't even have a meaningful way to give the option of opting into that behavior at initdb or create-database time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello 2012/12/10 Robert Haas <robertmhaas@gmail.com>: > On Tue, Nov 27, 2012 at 4:36 PM, Peter Eisentraut <peter_e@gmx.net> wrote: >> On 11/25/12 6:36 PM, Robert Haas wrote: >>> I think that is true. But for whatever it's worth, and at the risk of >>> beating a horse that seems not to be dead yet in spite of the fact >>> that I feel I've already administered one hell of a beating, the LPAD >>> case is unambiguous, and therefore it is hard to see what sort of >>> programming mistake we are protecting users against. >> >> Upstream applications passing wrong data down to the database. > > The circumstantial evidence suggests that many users don't want to be > protected against that in the way that we are currently protecting > them - or at least not all of them do (see Merlin's email elsewhere on > this thread). What's frustrating about the status quo is that not > only do you need lots of extra casts, but there's no real way to > improve the situation. If you add an implicit cast from integer to > text, for example, then 4 || 'foo' breaks. Now, you may think that > adding an implicit cast from integer to text is a dumb idea, and maybe > it is, but don't get too hung up on that example. The point is that > if you're unhappy with the quote_literal() case (because it accepts > too much), or the lpad() case (because it doesn't accept enough), or > the foo(smallint) case (because it can't be happy with foo(42)), you > don't really have a lot of options for adjusting the behavior as > things stand today. I accept that some people think that decorating > their code with lots of extra casts helps to avoid errors, and maybe > it does, but there is plenty of evidence that a lot of users don't > want to. And we not only don't give them the behavior they want; we > don't even have a meaningful way to give the option of opting into > that behavior at initdb or create-database time. > it is looking so our design missing some feature, flag, that can signalize safety of implicit cast - or allow more exactly to specify casting rules for related functionality. For some functions we do this magic inside parser and rewriter, but we don't allow this for custom functions. Few years ago I proposed a parser hooks, where this task can be solved. The parser hook is probably too generic, but probably we don't design good solution just by simple change of some parameter of current design, and we should to enhance current design - maybe some new parameter modifiers? Regards Pavel > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, 2012-12-10 at 14:07 -0500, Robert Haas wrote: > And we not only don't give them the behavior they want; we > don't even have a meaningful way to give the option of opting into > that behavior at initdb or create-database time. I strongly object to offering options that change the language in such a substantial way. initdb-time options still mean that we are essentially dividing our language, and therefore the applications that support postgres, in half (or worse). One of the things I really like about postgres is that we haven't forked the language with a million options like mysql has. I don't even like the fact that we have a GUC to control the output format of a BYTEA. For every developer who says "wow, that mysql query just worked without modification" there is another one who says "oh, I forgot to test with option XYZ... postgres is too complex to support, I'm going to drop it from the list of supported databases". I still don't see a compelling reason why opting out of overloading on a per-function basis won't work. Your objections seemed fairly minor in comparison to how strongly you are advocating this use case. In particular, I didn't get a response to: http://archives.postgresql.org/message-id/1354055056.1766.50.camel@sussancws0025 For what it's worth, I'm glad that people like you are pushing on these usability issues, because it can be hard for insiders to see them sometimes. Regards,Jeff Davis
I agree with Jeff. Options that change the language at initdb or create-database time just fragment the language. It is best to just have 1 language where options are providable either dynamically per connection or otherwise lexically, so that then they are really just shorthands for the current local usage, and the language as a whole is the same. That also means you can have example code out there and know it will work on any Postgres install, invariant of static global options. If language modifiers are local or lexical, then any example code presumably would include the switches to turn them on for that example. That all being said, I also think it is best to explicitly overload operators with extra parameter types, such as defining another operator with the signature of (Nunber,String) with the same base name as string catenation, rather than making numbers implicitly stringify. But I can also accept implicit stringification / language behavior changes if it is a lexical/temporary effect that the same user is still explicitly turning on. -- Darren Duncan Jeff Davis wrote: > On Mon, 2012-12-10 at 14:07 -0500, Robert Haas wrote: >> And we not only don't give them the behavior they want; we >> don't even have a meaningful way to give the option of opting into >> that behavior at initdb or create-database time. > > I strongly object to offering options that change the language in such a > substantial way. initdb-time options still mean that we are essentially > dividing our language, and therefore the applications that support > postgres, in half (or worse). One of the things I really like about > postgres is that we haven't forked the language with a million options > like mysql has. I don't even like the fact that we have a GUC to control > the output format of a BYTEA. > > For every developer who says "wow, that mysql query just worked without > modification" there is another one who says "oh, I forgot to test with > option XYZ... postgres is too complex to support, I'm going to drop it > from the list of supported databases". > > I still don't see a compelling reason why opting out of overloading on a > per-function basis won't work. Your objections seemed fairly minor in > comparison to how strongly you are advocating this use case. > > In particular, I didn't get a response to: > > http://archives.postgresql.org/message-id/1354055056.1766.50.camel@sussancws0025 > > For what it's worth, I'm glad that people like you are pushing on these > usability issues, because it can be hard for insiders to see them > sometimes. > > Regards, > Jeff Davis
On Tue, 2012-11-06 at 10:57 -0500, Robert Haas wrote: > I did some experimentation with this. It seems that what Tom proposed > here is a lot cleaner than what I proposed previously, while still > increasing usability in many real-world cases. For example, in > unpatched master: It looks like we still haven't reached consensus on the design here. Are we still discussing, or should this be moved to the next CF? Regards,Jeff Davis
On Tue, 2012-11-27 at 14:24 -0800, Jeff Davis wrote: > On Tue, 2012-11-27 at 15:41 -0500, Robert Haas wrote: > > I can't quite see how a non-overloaded flag would work, unless we get > > rid of schemas. > > It may work to pick the first schema in the search path that has any > functions by that name, and then choose the overloaded (or not) > candidate from among those functions in that one schema. Then, > non-overloaded function names would be unique within a schema. > > If there are multiple functions of the same name in multiple schemas in > the search path, it does not make sense to me to lump them all together > and choose an overloaded candidate from all of them (although I think > that's what we do now). That sounds like a mistake, to me. Do you know > of any useful examples of doing that? On second thought, this won't work very well, particularly for operators. Users may want to overload a built-in operator, like "+", yet still have it work on all the built-in types. So, if we did try to declare a function non-overloaded at DDL time, the name would have to be globally unique (not just to a schema). Regards,Jeff Davis
On Tue, Dec 11, 2012 at 12:59 AM, Jeff Davis <pgsql@j-davis.com> wrote: > For every developer who says "wow, that mysql query just worked without > modification" there is another one who says "oh, I forgot to test with > option XYZ... postgres is too complex to support, I'm going to drop it > from the list of supported databases". Perhaps so. That's why my first choice is still to just fix this problem across the board. I think there is probably more than one way of doing that that is technically safe, and I currently believe that my patch is one of those. However, it seems that more people than not find the extra casts that PostgreSQL forces programmers to use to be a feature, not a bug. According to Tom, to allow people to call a non-overloaded function without casts will "completely destroy the type system"; Peter Eisentraut was aghast at the idea of allowing someone to pass a non-text first argument to lpad without an explicit cast. I recognize that not everyone's going to agree on these things but I find those attitudes shockingly arrogant. We have regular evidence that users are coming to PostgreSQL and then abandoning it because these kinds of things don't work, and we know that numerous other popular and well-respected systems allow these sorts of things to Just Work. It is one thing to insist on casts when there is an ambiguity about which of several overloaded functions a user intended to call - but when there's only one, it's just masterminding. In more than ten years of working with PostgreSQL, I've never encountered where the restriction at issue here prevented a bug. It's only annoyed me and broken my application code (when moving from PostgreSQL 8.2 to PostgreSQL 8.3, never mind any other database!). There is ample evidence that I'm not the only one, but I think we have a clear consensus to continue ignoring the problem, or at least the solutions. I don't think there's much point in carrying this patch over to the next CommitFest; I'll mark it as Rejected. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > ... In more > than ten years of working with PostgreSQL, I've never encountered > where the restriction at issue here prevented a bug. It's only > annoyed me and broken my application code (when moving from PostgreSQL > 8.2 to PostgreSQL 8.3, never mind any other database!). There are quite a few examples in our archives of application bugs that would have been prevented, or later were prevented, by the 8.3 changes that reduced the system's willingness to apply implicit casts to text. I recall for instance cases where people got wrong/unexpected answers because the system was sorting what-they-thought-were-timestamp values textually. So I find such sweeping claims to be demonstrably false, and I'm suspicious of behavioral changes that are proposed with such arguments backing them. > There is ample evidence that I'm not the only one, but I think we have > a clear consensus to continue ignoring the problem, or at least the > solutions. Oh, I don't think we're ignoring the problem; people beat us up about it too often for that. But we need to pay attention to error detection not only ease-of-use, so it's hard to be sure what's a net improvement. regards, tom lane
On Fri, Dec 14, 2012 at 2:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> ... In more >> than ten years of working with PostgreSQL, I've never encountered >> where the restriction at issue here prevented a bug. It's only >> annoyed me and broken my application code (when moving from PostgreSQL >> 8.2 to PostgreSQL 8.3, never mind any other database!). > > There are quite a few examples in our archives of application bugs that > would have been prevented, or later were prevented, by the 8.3 changes > that reduced the system's willingness to apply implicit casts to text. > I recall for instance cases where people got wrong/unexpected answers > because the system was sorting what-they-thought-were-timestamp values > textually. > > So I find such sweeping claims to be demonstrably false, and I'm > suspicious of behavioral changes that are proposed with such arguments > backing them. I think you're mixing apples and oranges. The whole point of the patch on the table - which, if you recall, was designed originally by you and merely implemented by me - was to change the behavior only in the cases where there's only one function with the appropriate name and argument count. The ambiguous cases that 8.3+ helpfully prevent are those where overloading is in use and the choice of which function to call is somewhat arbitrary and perhaps incorrectly-foreseen by the user. Those changes also have the side-effect of preventing a straightforward function call from working without casts even in cases where no overloading is in use - and making that case work is completely different from making the ambiguous case arbitrarily pick one of the available answers. > Oh, I don't think we're ignoring the problem; people beat us up about it > too often for that. But we need to pay attention to error detection not > only ease-of-use, so it's hard to be sure what's a net improvement. Well, that's not how the dynamic of this thread reads to me. There seems to be massive opposition - including from you - to allowing unambiguous function calls to resolve without casts, at least as a categorical matter, and maybe even in the specific cases that users most frequently care about. I simply disagree with the contention that there's a value in making people cast to text when the target function is not overloaded. Maybe there's some world where it's uncommon to want to pass the text representation of a non-text value to a non-overloaded function that accepts text, and therefore forcing a cast upon the user to warn them "here be dragons" is warranted, but I don't live in it. When the function IS overloaded, well, that's a completely different situation. I've written enough C++ over the years to understand what happens when you try too hard to be clever with tiebreak rules. But that's not got much to do with the question of whether the only candidate to put in an appearance can be declared the winner. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/14/2012 3:20 PM, Robert Haas wrote: > On Fri, Dec 14, 2012 at 2:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> ... In more >>> than ten years of working with PostgreSQL, I've never encountered >>> where the restriction at issue here prevented a bug. It's only >>> annoyed me and broken my application code (when moving from PostgreSQL >>> 8.2 to PostgreSQL 8.3, never mind any other database!). >> >> There are quite a few examples in our archives of application bugs that >> would have been prevented, or later were prevented, by the 8.3 changes >> that reduced the system's willingness to apply implicit casts to text. >> I recall for instance cases where people got wrong/unexpected answers >> because the system was sorting what-they-thought-were-timestamp values >> textually. >> >> So I find such sweeping claims to be demonstrably false, and I'm >> suspicious of behavioral changes that are proposed with such arguments >> backing them. > > I think you're mixing apples and oranges. The whole point of the > patch on the table - which, if you recall, was designed originally by > you and merely implemented by me - was to change the behavior only in > the cases where there's only one function with the appropriate name > and argument count. The ambiguous cases that 8.3+ helpfully prevent > are those where overloading is in use and the choice of which function > to call is somewhat arbitrary and perhaps incorrectly-foreseen by the > user. Those changes also have the side-effect of preventing a > straightforward function call from working without casts even in cases > where no overloading is in use - and making that case work is > completely different from making the ambiguous case arbitrarily pick > one of the available answers. FWIW I for one thought that casting more liberal in the case at hand, where there is only one function with that name and number of arguments, would be a good thing. My only concern with the patch presented was that changing make_fn_assignment() in that way may have some unwanted side effects because it is called from different locations and the usage of COERCION_IMPLICIT may actually guard against something, that we don't want to allow. I don't have any evidence that it does, just the concern that it may. Perhaps make_fn_arguments() needs to receive that coercion context as an argument and the caller hands in COERCION_ASSIGNMENT only in the case at hand? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin
On 27 November 2012 22:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> ... I think if you relaxed >> the function sigs of a few functions on this page >> (http://www.postgresql.org/docs/9.2/interactive/functions-string.html), >> most reported problems would go away. > > That's an interesting way of approaching it. Do we have any data on > exactly which functions people do complain about? > >> One thing that worries me is introducing ambiguous cases where >> previously there weren't any though. > > Right, but at least we'd be confining the ambiguity to a small number > of function names. Tweaking the casting rules could have a lot of > unforeseen consequences. There have been many good points made on this thread. Being sloppy in all cases is a bad thing we all agree or reluctantly admit; what is needed is the ability for a user to be able to more closely define what they mean by such conversions, so that application SQL can be made to work. It certainly isn't easy to say that COLUMN LIKE '1%' would work in all cases, since the preferred format of that data might be (xxx) xxx-xxxx, or $xxxxx or <EURO symbol>xxxx,xx (i.e. with a comma as the decimal separator). The format comes from the meaning of the data, which we cannot know. What would be useful is to be able to define default format models for each column. If not defined, there is no implicit cast. If FORMAT is defined then we know to apply it in the absence of a global cast. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services