Thread: citext operator precedence fix

citext operator precedence fix

From
Josh Berkus
Date:
All,

I just tripped over this:

select 'josh'::varchar(32) = 'Josh'::citext;?column?
----------f

While it's clear why it's that way, it's not how people would expect
citext to behave.  Users expect case-insensitive text to be
case-insensitive for *all* comparisons, not just for comparisons with
the same data type.  I would also think that folks migrating from SQL
Server and MySQL would get bitten by this.

I'd like to patch the citext contrib module for 9.2 to fix this by
creating four new = operators to establish the comparison function for
text and varchar.

Before I clean up my ad-hoc fix code for submission, are there strong
objections to this idea?  Or are there other data types I'd need to cover?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: citext operator precedence fix

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> I'd like to patch the citext contrib module for 9.2 to fix this by
> creating four new = operators to establish the comparison function for
> text and varchar.

I think you'll find that's easier said than done (problem 1 is going to
be ambiguity, and problem 2 is going to be that comparisons involving
these operators won't get indexed).
        regards, tom lane


Re: citext operator precedence fix

From
Josh Berkus
Date:
> I think you'll find that's easier said than done (problem 1 is going to
> be ambiguity, 

Ambiguity?

> and problem 2 is going to be that comparisons involving
> these operators won't get indexed).

Yeah, that's acceptable, since it's not any worse than the behavior of
the comparisons now.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: citext operator precedence fix

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> I think you'll find that's easier said than done (problem 1 is going to
>> be ambiguity, 

> Ambiguity?

Yeah, I'm worried about the possibility of parser failing to resolve
which operator is meant.

>> and problem 2 is going to be that comparisons involving
>> these operators won't get indexed).

> Yeah, that's acceptable, since it's not any worse than the behavior of
> the comparisons now.

No, I don't think so.  For people for whom the right thing is happening,
you'll risk making it (a) wrong and (b) lots slower.  For people for
whom the wrong thing is happening, maybe you'll fix it so it's
semantically right, but if indexes don't work they still won't be happy.
        regards, tom lane


Re: citext operator precedence fix

From
Josh Berkus
Date:
>> Ambiguity?
> 
> Yeah, I'm worried about the possibility of parser failing to resolve
> which operator is meant.

Any idea of how to test for that?  What kind of situations would make
things difficult for the parser?

Also, how is this any different for any optional data type which
overrides = ?

> No, I don't think so.  For people for whom the right thing is happening,
> you'll risk making it (a) wrong and (b) lots slower.  

Well, I'm dubious that current behavior is the right thing for anybody.The best I could do to answer that would be an
informalcommunity survey.
 

> For people for
> whom the wrong thing is happening, maybe you'll fix it so it's
> semantically right, but if indexes don't work they still won't be happy.

So I'd need to add operator classes and indexing support functions as
well, then, presumably.  Annoying, but not impossible.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: citext operator precedence fix

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> Yeah, I'm worried about the possibility of parser failing to resolve
>> which operator is meant.

> Any idea of how to test for that?  What kind of situations would make
> things difficult for the parser?

Not sure, but it would be underspecified cases such as 'x' = 'x' that
might have trouble.  I may be worrying over nothing --- I'm just trying
to point out some trouble spots you'd better watch for while messing
with this.
        regards, tom lane


Re: citext operator precedence fix

From
"David E. Wheeler"
Date:
On Sep 21, 2011, at 6:12 PM, Tom Lane wrote:

>> Any idea of how to test for that?  What kind of situations would make
>> things difficult for the parser?
>
> Not sure, but it would be underspecified cases such as 'x' = 'x' that
> might have trouble.  I may be worrying over nothing --- I'm just trying
> to point out some trouble spots you'd better watch for while messing
> with this.

Right, we'd need to make sure that 'x' = 'x' resolves to 'x'::text = 'x'::text. Josh, I think the issue Tom is raising
issomething I've seen when there are multiple functions that could be correct for a given call. For example: 
   select citext_eq('x', 'x'::citext);

If you have these functions:

1. citext_eq(citext,citext)
2. citext_eq(text,citext)
3. citext_eq(citext,text)

Then the question is: does it find only #2 via polymorphic lookup, or does it think that either #1 or #2 could work
(becausetext supports an implicit cast to citext, IIRC). If it's more than one it's an error. Not sure if the same
issueexists for operators. 

This is easy to test for, however, so we'll soon know if there's an issue.

Best,

David

Re: citext operator precedence fix

From
Josh Berkus
Date:
> 1. citext_eq(citext,citext)
> 2. citext_eq(text,citext)
> 3. citext_eq(citext,text)
> 
> Then the question is: does it find only #2 via polymorphic lookup, or does it think that either #1 or #2 could work
(becausetext supports an implicit cast to citext, IIRC). If it's more than one it's an error. Not sure if the same
issueexists for operators.
 

Well, I just ran through the 7 potential combinations, and didn't get
any errors.  Hard to tell which function is being used, of course.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: citext operator precedence fix

From
"David E. Wheeler"
Date:
On Sep 22, 2011, at 9:53 AM, Josh Berkus wrote:

>>
>> Then the question is: does it find only #2 via polymorphic lookup, or does it think that either #1 or #2 could work
(becausetext supports an implicit cast to citext, IIRC). If it's more than one it's an error. Not sure if the same
issueexists for operators. 
>
> Well, I just ran through the 7 potential combinations, and didn't get
> any errors.  Hard to tell which function is being used, of course.

That's what tests are for.

David



Re: citext operator precedence fix

From
Josh Berkus
Date:
>> Well, I just ran through the 7 potential combinations, and didn't get
>> any errors.  Hard to tell which function is being used, of course.
> 
> That's what tests are for.

So, tell me how to write a test to check which function is being used.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: citext operator precedence fix

From
"David E. Wheeler"
Date:
On Sep 22, 2011, at 10:11 AM, Josh Berkus wrote:

>> That's what tests are for.
>
> So, tell me how to write a test to check which function is being used.

Just write some comparisons like upthread, and see if the output is f or t. Put them into sql/citext.sql.

Best,

David



Re: citext operator precedence fix

From
Josh Berkus
Date:
On 9/22/11 10:26 AM, David E. Wheeler wrote:
> On Sep 22, 2011, at 10:11 AM, Josh Berkus wrote:
> 
>>> That's what tests are for.
>>
>> So, tell me how to write a test to check which function is being used.
> 
> Just write some comparisons like upthread, and see if the output is f or t. Put them into sql/citext.sql.

Oh, ok.  I thought you meant checking the actual function call.

Tests go in the main SQL file?  Shouldn't they have their own file?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: citext operator precedence fix

From
"David E. Wheeler"
Date:
On Sep 22, 2011, at 10:50 AM, Josh Berkus wrote:

>> Just write some comparisons like upthread, and see if the output is f or t. Put them into sql/citext.sql.
>
> Oh, ok.  I thought you meant checking the actual function call.
>
> Tests go in the main SQL file?  Shouldn't they have their own file?

That is the test file. The main SQL file is citext--1.0.sql. You'll actually need to bump the version number to 1.1,
renamethat file to citext--1.1.sql, and also add them to a citext--1.0--1.1.sql. There probably also needs to be a
citext--unpackaged--1.1.sqlfile. 

Best,

David



Re: citext operator precedence fix

From
Alvaro Herrera
Date:
Excerpts from David E. Wheeler's message of jue sep 22 14:51:59 -0300 2011:
> On Sep 22, 2011, at 10:50 AM, Josh Berkus wrote:
> 
> >> Just write some comparisons like upthread, and see if the output is f or t. Put them into sql/citext.sql.
> > 
> > Oh, ok.  I thought you meant checking the actual function call.
> > 
> > Tests go in the main SQL file?  Shouldn't they have their own file?
> 
> That is the test file. The main SQL file is citext--1.0.sql. You'll actually need to bump the version number to 1.1,
renamethat file to citext--1.1.sql, and also add them to a citext--1.0--1.1.sql. There probably also needs to be a
citext--unpackaged--1.1.sqlfile.
 

Hmm, if there's a citext--unpackaged--1.0.sql and also
citext--1.0--1.1.sql, is it really necessary to have
citext--unpackaged--1.1.sql?  Shouldn't the upgrade facility be able to
just run both scripts?

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: citext operator precedence fix

From
"David E. Wheeler"
Date:
On Sep 22, 2011, at 11:07 AM, Alvaro Herrera wrote:

>> That is the test file. The main SQL file is citext--1.0.sql. You'll actually need to bump the version number to 1.1,
renamethat file to citext--1.1.sql, and also add them to a citext--1.0--1.1.sql. There probably also needs to be a
citext--unpackaged--1.1.sqlfile. 
>
> Hmm, if there's a citext--unpackaged--1.0.sql and also
> citext--1.0--1.1.sql, is it really necessary to have
> citext--unpackaged--1.1.sql?  Shouldn't the upgrade facility be able to
> just run both scripts?

No, because if 1.1 was installed on 8.4, you'd need the commands to move all its functions into the extension, not
re-createthem. 

However, since this ships with core, it's probably not necessary, because theoretically no one will use it in 8.4, so
thefunctions will never be unpackaged. 

Best,

David



Re: citext operator precedence fix

From
"Kevin Grittner"
Date:
"David E. Wheeler" <david@kineticode.com> wrote:
> On Sep 22, 2011, at 11:07 AM, Alvaro Herrera wrote:
>> Hmm, if there's a citext--unpackaged--1.0.sql and also
>> citext--1.0--1.1.sql, is it really necessary to have
>> citext--unpackaged--1.1.sql?  Shouldn't the upgrade facility be
>> able to just run both scripts?
> 
> No, because if 1.1 was installed on 8.4, you'd need the commands
> to move all its functions into the extension, not re-create them.
Shouldn't a version installed on 8.4 be installed as "unpackaged"?
Doesn't citext--unpackaged--1.0.sql contain the commands to move all
its functions into the extension?
-Kevin


Re: citext operator precedence fix

From
"David E. Wheeler"
Date:
On Sep 22, 2011, at 11:14 AM, Kevin Grittner wrote:

>> No, because if 1.1 was installed on 8.4, you'd need the commands
>> to move all its functions into the extension, not re-create them.
>
> Shouldn't a version installed on 8.4 be installed as "unpackaged"?
> Doesn't citext--unpackaged--1.0.sql contain the commands to move all
> its functions into the extension?

It contains everything need to move 1.0 functions into the extension. If Josh adds new functions they obviously would
notbe moved. So a new script would need to move them. And unpackaged--1.1 does not first run unpackaged--1.0. 

Best,

David



Re: citext operator precedence fix

From
Robert Haas
Date:
On Thu, Sep 22, 2011 at 2:16 PM, David E. Wheeler <david@kineticode.com> wrote:
> On Sep 22, 2011, at 11:14 AM, Kevin Grittner wrote:
>
>>> No, because if 1.1 was installed on 8.4, you'd need the commands
>>> to move all its functions into the extension, not re-create them.
>>
>> Shouldn't a version installed on 8.4 be installed as "unpackaged"?
>> Doesn't citext--unpackaged--1.0.sql contain the commands to move all
>> its functions into the extension?
>
> It contains everything need to move 1.0 functions into the extension. If Josh adds new functions they obviously would
notbe moved. So a new script would need to move them. And unpackaged--1.1 does not first run unpackaged--1.0.
 

I believe the point David is trying to make is that someone might take
an 9.2 version of a contrib module and manually install it on an 8.4
server by executing the install script, perhaps with some amount of
hackery.

But I don't think we're required to support that case.  If the user
does a non-standard install, it's their job to deal with the fallout.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: citext operator precedence fix

From
"David E. Wheeler"
Date:
On Sep 22, 2011, at 11:23 AM, Robert Haas wrote:

> I believe the point David is trying to make is that someone might take
> an 9.2 version of a contrib module and manually install it on an 8.4
> server by executing the install script, perhaps with some amount of
> hackery.

Right.

> But I don't think we're required to support that case.  If the user
> does a non-standard install, it's their job to deal with the fallout.

Agreed; I was thinking of how one would handle this for non-core distributed extensions. Probably not necessary for
contrib.

David



Re: citext operator precedence fix

From
Josh Berkus
Date:
> But I don't think we're required to support that case.  If the user
> does a non-standard install, it's their job to deal with the fallout.

Well, I'll write the script anyway, since *I* need it.  I'm installing
this on a 9.0 database which will be later upgraded to 9.1.

However, before I write all this, I'd like to settle the question of
acceptability.  What do I need to do to make it OK to break backwards
compatibility for this?  I feel strongly that I'm correcting it to the
behavior users expect, but that's not statistically backed.

I don't want to spend several hours writing scripts so that it can be
rejected *for that reason*.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: citext operator precedence fix

From
Robert Haas
Date:
On Thu, Sep 22, 2011 at 2:36 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> But I don't think we're required to support that case.  If the user
>> does a non-standard install, it's their job to deal with the fallout.
>
> Well, I'll write the script anyway, since *I* need it.  I'm installing
> this on a 9.0 database which will be later upgraded to 9.1.
>
> However, before I write all this, I'd like to settle the question of
> acceptability.  What do I need to do to make it OK to break backwards
> compatibility for this?  I feel strongly that I'm correcting it to the
> behavior users expect, but that's not statistically backed.
>
> I don't want to spend several hours writing scripts so that it can be
> rejected *for that reason*.

I'm OK with the proposed behavior change and I agree that it's
probably what people want, but I am awfully suspicious that those
extra casts are going to break something you haven't thought about.
It might be worth posting a rough version first just to see if I (or
someone else) can break it before you spend a lot of time on it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: citext operator precedence fix

From
Josh Berkus
Date:
> I'm OK with the proposed behavior change and I agree that it's
> probably what people want, but I am awfully suspicious that those
> extra casts are going to break something you haven't thought about.
> It might be worth posting a rough version first just to see if I (or
> someone else) can break it before you spend a lot of time on it.

Additional breakage confirmed (hash functions, etc.)  Looks like I need
to add a lot more support functions and test.   This is still worth
doing, but don't expect it for the next commitfest.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: citext operator precedence fix

From
Robert Haas
Date:
On Fri, Sep 23, 2011 at 12:37 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> I'm OK with the proposed behavior change and I agree that it's
>> probably what people want, but I am awfully suspicious that those
>> extra casts are going to break something you haven't thought about.
>> It might be worth posting a rough version first just to see if I (or
>> someone else) can break it before you spend a lot of time on it.
>
> Additional breakage confirmed (hash functions, etc.)  Looks like I need
> to add a lot more support functions and test.   This is still worth
> doing, but don't expect it for the next commitfest.

I would also be looking carefully at whether you can construct a
scenario where the operator resolution code can't decide between
=(text,citext) or =(text,text) - you probably need a third type like
varchar or bpchar in the mix to trigger a failure, if there's one to
be found.  Or you might have a problem with citext = bpchar being
ambiguous between =(text,citext) and =(varchar,text), or some such
thing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company