Thread: citext operator precedence fix
All, I just tripped over this: select 'josh'::varchar(32) = 'Josh'::citext;?column? ----------f While it's clear why it's that way, it's not how people would expect citext to behave. Users expect case-insensitive text to be case-insensitive for *all* comparisons, not just for comparisons with the same data type. I would also think that folks migrating from SQL Server and MySQL would get bitten by this. I'd like to patch the citext contrib module for 9.2 to fix this by creating four new = operators to establish the comparison function for text and varchar. Before I clean up my ad-hoc fix code for submission, are there strong objections to this idea? Or are there other data types I'd need to cover? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > I'd like to patch the citext contrib module for 9.2 to fix this by > creating four new = operators to establish the comparison function for > text and varchar. I think you'll find that's easier said than done (problem 1 is going to be ambiguity, and problem 2 is going to be that comparisons involving these operators won't get indexed). regards, tom lane
> I think you'll find that's easier said than done (problem 1 is going to > be ambiguity, Ambiguity? > and problem 2 is going to be that comparisons involving > these operators won't get indexed). Yeah, that's acceptable, since it's not any worse than the behavior of the comparisons now. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >> I think you'll find that's easier said than done (problem 1 is going to >> be ambiguity, > Ambiguity? Yeah, I'm worried about the possibility of parser failing to resolve which operator is meant. >> and problem 2 is going to be that comparisons involving >> these operators won't get indexed). > Yeah, that's acceptable, since it's not any worse than the behavior of > the comparisons now. No, I don't think so. For people for whom the right thing is happening, you'll risk making it (a) wrong and (b) lots slower. For people for whom the wrong thing is happening, maybe you'll fix it so it's semantically right, but if indexes don't work they still won't be happy. regards, tom lane
>> Ambiguity? > > Yeah, I'm worried about the possibility of parser failing to resolve > which operator is meant. Any idea of how to test for that? What kind of situations would make things difficult for the parser? Also, how is this any different for any optional data type which overrides = ? > No, I don't think so. For people for whom the right thing is happening, > you'll risk making it (a) wrong and (b) lots slower. Well, I'm dubious that current behavior is the right thing for anybody.The best I could do to answer that would be an informalcommunity survey. > For people for > whom the wrong thing is happening, maybe you'll fix it so it's > semantically right, but if indexes don't work they still won't be happy. So I'd need to add operator classes and indexing support functions as well, then, presumably. Annoying, but not impossible. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >> Yeah, I'm worried about the possibility of parser failing to resolve >> which operator is meant. > Any idea of how to test for that? What kind of situations would make > things difficult for the parser? Not sure, but it would be underspecified cases such as 'x' = 'x' that might have trouble. I may be worrying over nothing --- I'm just trying to point out some trouble spots you'd better watch for while messing with this. regards, tom lane
On Sep 21, 2011, at 6:12 PM, Tom Lane wrote: >> Any idea of how to test for that? What kind of situations would make >> things difficult for the parser? > > Not sure, but it would be underspecified cases such as 'x' = 'x' that > might have trouble. I may be worrying over nothing --- I'm just trying > to point out some trouble spots you'd better watch for while messing > with this. Right, we'd need to make sure that 'x' = 'x' resolves to 'x'::text = 'x'::text. Josh, I think the issue Tom is raising issomething I've seen when there are multiple functions that could be correct for a given call. For example: select citext_eq('x', 'x'::citext); If you have these functions: 1. citext_eq(citext,citext) 2. citext_eq(text,citext) 3. citext_eq(citext,text) Then the question is: does it find only #2 via polymorphic lookup, or does it think that either #1 or #2 could work (becausetext supports an implicit cast to citext, IIRC). If it's more than one it's an error. Not sure if the same issueexists for operators. This is easy to test for, however, so we'll soon know if there's an issue. Best, David
> 1. citext_eq(citext,citext) > 2. citext_eq(text,citext) > 3. citext_eq(citext,text) > > Then the question is: does it find only #2 via polymorphic lookup, or does it think that either #1 or #2 could work (becausetext supports an implicit cast to citext, IIRC). If it's more than one it's an error. Not sure if the same issueexists for operators. Well, I just ran through the 7 potential combinations, and didn't get any errors. Hard to tell which function is being used, of course. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Sep 22, 2011, at 9:53 AM, Josh Berkus wrote: >> >> Then the question is: does it find only #2 via polymorphic lookup, or does it think that either #1 or #2 could work (becausetext supports an implicit cast to citext, IIRC). If it's more than one it's an error. Not sure if the same issueexists for operators. > > Well, I just ran through the 7 potential combinations, and didn't get > any errors. Hard to tell which function is being used, of course. That's what tests are for. David
>> Well, I just ran through the 7 potential combinations, and didn't get >> any errors. Hard to tell which function is being used, of course. > > That's what tests are for. So, tell me how to write a test to check which function is being used. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Sep 22, 2011, at 10:11 AM, Josh Berkus wrote: >> That's what tests are for. > > So, tell me how to write a test to check which function is being used. Just write some comparisons like upthread, and see if the output is f or t. Put them into sql/citext.sql. Best, David
On 9/22/11 10:26 AM, David E. Wheeler wrote: > On Sep 22, 2011, at 10:11 AM, Josh Berkus wrote: > >>> That's what tests are for. >> >> So, tell me how to write a test to check which function is being used. > > Just write some comparisons like upthread, and see if the output is f or t. Put them into sql/citext.sql. Oh, ok. I thought you meant checking the actual function call. Tests go in the main SQL file? Shouldn't they have their own file? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Sep 22, 2011, at 10:50 AM, Josh Berkus wrote: >> Just write some comparisons like upthread, and see if the output is f or t. Put them into sql/citext.sql. > > Oh, ok. I thought you meant checking the actual function call. > > Tests go in the main SQL file? Shouldn't they have their own file? That is the test file. The main SQL file is citext--1.0.sql. You'll actually need to bump the version number to 1.1, renamethat file to citext--1.1.sql, and also add them to a citext--1.0--1.1.sql. There probably also needs to be a citext--unpackaged--1.1.sqlfile. Best, David
Excerpts from David E. Wheeler's message of jue sep 22 14:51:59 -0300 2011: > On Sep 22, 2011, at 10:50 AM, Josh Berkus wrote: > > >> Just write some comparisons like upthread, and see if the output is f or t. Put them into sql/citext.sql. > > > > Oh, ok. I thought you meant checking the actual function call. > > > > Tests go in the main SQL file? Shouldn't they have their own file? > > That is the test file. The main SQL file is citext--1.0.sql. You'll actually need to bump the version number to 1.1, renamethat file to citext--1.1.sql, and also add them to a citext--1.0--1.1.sql. There probably also needs to be a citext--unpackaged--1.1.sqlfile. Hmm, if there's a citext--unpackaged--1.0.sql and also citext--1.0--1.1.sql, is it really necessary to have citext--unpackaged--1.1.sql? Shouldn't the upgrade facility be able to just run both scripts? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Sep 22, 2011, at 11:07 AM, Alvaro Herrera wrote: >> That is the test file. The main SQL file is citext--1.0.sql. You'll actually need to bump the version number to 1.1, renamethat file to citext--1.1.sql, and also add them to a citext--1.0--1.1.sql. There probably also needs to be a citext--unpackaged--1.1.sqlfile. > > Hmm, if there's a citext--unpackaged--1.0.sql and also > citext--1.0--1.1.sql, is it really necessary to have > citext--unpackaged--1.1.sql? Shouldn't the upgrade facility be able to > just run both scripts? No, because if 1.1 was installed on 8.4, you'd need the commands to move all its functions into the extension, not re-createthem. However, since this ships with core, it's probably not necessary, because theoretically no one will use it in 8.4, so thefunctions will never be unpackaged. Best, David
"David E. Wheeler" <david@kineticode.com> wrote: > On Sep 22, 2011, at 11:07 AM, Alvaro Herrera wrote: >> Hmm, if there's a citext--unpackaged--1.0.sql and also >> citext--1.0--1.1.sql, is it really necessary to have >> citext--unpackaged--1.1.sql? Shouldn't the upgrade facility be >> able to just run both scripts? > > No, because if 1.1 was installed on 8.4, you'd need the commands > to move all its functions into the extension, not re-create them. Shouldn't a version installed on 8.4 be installed as "unpackaged"? Doesn't citext--unpackaged--1.0.sql contain the commands to move all its functions into the extension? -Kevin
On Sep 22, 2011, at 11:14 AM, Kevin Grittner wrote: >> No, because if 1.1 was installed on 8.4, you'd need the commands >> to move all its functions into the extension, not re-create them. > > Shouldn't a version installed on 8.4 be installed as "unpackaged"? > Doesn't citext--unpackaged--1.0.sql contain the commands to move all > its functions into the extension? It contains everything need to move 1.0 functions into the extension. If Josh adds new functions they obviously would notbe moved. So a new script would need to move them. And unpackaged--1.1 does not first run unpackaged--1.0. Best, David
On Thu, Sep 22, 2011 at 2:16 PM, David E. Wheeler <david@kineticode.com> wrote: > On Sep 22, 2011, at 11:14 AM, Kevin Grittner wrote: > >>> No, because if 1.1 was installed on 8.4, you'd need the commands >>> to move all its functions into the extension, not re-create them. >> >> Shouldn't a version installed on 8.4 be installed as "unpackaged"? >> Doesn't citext--unpackaged--1.0.sql contain the commands to move all >> its functions into the extension? > > It contains everything need to move 1.0 functions into the extension. If Josh adds new functions they obviously would notbe moved. So a new script would need to move them. And unpackaged--1.1 does not first run unpackaged--1.0. I believe the point David is trying to make is that someone might take an 9.2 version of a contrib module and manually install it on an 8.4 server by executing the install script, perhaps with some amount of hackery. But I don't think we're required to support that case. If the user does a non-standard install, it's their job to deal with the fallout. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sep 22, 2011, at 11:23 AM, Robert Haas wrote: > I believe the point David is trying to make is that someone might take > an 9.2 version of a contrib module and manually install it on an 8.4 > server by executing the install script, perhaps with some amount of > hackery. Right. > But I don't think we're required to support that case. If the user > does a non-standard install, it's their job to deal with the fallout. Agreed; I was thinking of how one would handle this for non-core distributed extensions. Probably not necessary for contrib. David
> But I don't think we're required to support that case. If the user > does a non-standard install, it's their job to deal with the fallout. Well, I'll write the script anyway, since *I* need it. I'm installing this on a 9.0 database which will be later upgraded to 9.1. However, before I write all this, I'd like to settle the question of acceptability. What do I need to do to make it OK to break backwards compatibility for this? I feel strongly that I'm correcting it to the behavior users expect, but that's not statistically backed. I don't want to spend several hours writing scripts so that it can be rejected *for that reason*. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, Sep 22, 2011 at 2:36 PM, Josh Berkus <josh@agliodbs.com> wrote: >> But I don't think we're required to support that case. If the user >> does a non-standard install, it's their job to deal with the fallout. > > Well, I'll write the script anyway, since *I* need it. I'm installing > this on a 9.0 database which will be later upgraded to 9.1. > > However, before I write all this, I'd like to settle the question of > acceptability. What do I need to do to make it OK to break backwards > compatibility for this? I feel strongly that I'm correcting it to the > behavior users expect, but that's not statistically backed. > > I don't want to spend several hours writing scripts so that it can be > rejected *for that reason*. I'm OK with the proposed behavior change and I agree that it's probably what people want, but I am awfully suspicious that those extra casts are going to break something you haven't thought about. It might be worth posting a rough version first just to see if I (or someone else) can break it before you spend a lot of time on it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> I'm OK with the proposed behavior change and I agree that it's > probably what people want, but I am awfully suspicious that those > extra casts are going to break something you haven't thought about. > It might be worth posting a rough version first just to see if I (or > someone else) can break it before you spend a lot of time on it. Additional breakage confirmed (hash functions, etc.) Looks like I need to add a lot more support functions and test. This is still worth doing, but don't expect it for the next commitfest. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Fri, Sep 23, 2011 at 12:37 PM, Josh Berkus <josh@agliodbs.com> wrote: >> I'm OK with the proposed behavior change and I agree that it's >> probably what people want, but I am awfully suspicious that those >> extra casts are going to break something you haven't thought about. >> It might be worth posting a rough version first just to see if I (or >> someone else) can break it before you spend a lot of time on it. > > Additional breakage confirmed (hash functions, etc.) Looks like I need > to add a lot more support functions and test. This is still worth > doing, but don't expect it for the next commitfest. I would also be looking carefully at whether you can construct a scenario where the operator resolution code can't decide between =(text,citext) or =(text,text) - you probably need a third type like varchar or bpchar in the mix to trigger a failure, if there's one to be found. Or you might have a problem with citext = bpchar being ambiguous between =(text,citext) and =(varchar,text), or some such thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company