Thread: Bug in ILIKE?

Bug in ILIKE?

From
Tom Lane
Date:
Good:

regression=# select 'T' ilike 't';?column? 
----------t
(1 row)

Not so good:

regression=# select 'T' ilike E'\\t';?column? 
----------f
(1 row)

ISTM backslash is only supposed to turn off the pattern-language
specialness of characters, not render them case sensitive.  The reason
this happens is that the backslash case in MatchText() checks for exact
equality.  I think it should be checking for TCHAR() equality, same as
when it is just checking two ordinary characters.  Comments?  Is this a
backpatchable bug fix, or should we only change the behavior for 8.4 and
beyond?
        regards, tom lane


Re: Bug in ILIKE?

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Good:
>
> regression=# select 'T' ilike 't';
>  ?column? 
> ----------
>  t
> (1 row)
>
> Not so good:
>
> regression=# select 'T' ilike E'\\t';
>  ?column? 
> ----------
>  f
> (1 row)
>
> ISTM backslash is only supposed to turn off the pattern-language
> specialness of characters, not render them case sensitive.  The reason
> this happens is that the backslash case in MatchText() checks for exact
> equality.  I think it should be checking for TCHAR() equality, same as
> when it is just checking two ordinary characters.  Comments?  Is this a
> backpatchable bug fix, or should we only change the behavior for 8.4 and
> beyond?
>
>   


Hmm. A quick test shows this working the other way in 8.2, so I assume 
this is a byproduct of the work we did to improve the efficiency of 
LIKE/ILIKE in 8.3 ;-(

The docs actually don't state what are the semantics of escape followed 
by something that is not escape or a metachar. Does the spec say 
anything about that?

cheers

andrew



Re: Bug in ILIKE?

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> The docs actually don't state what are the semantics of escape followed 
> by something that is not escape or a metachar. Does the spec say 
> anything about that?

The spec says it's an error, per the SQL92 excerpt I quoted in the
previous thread.  (SQL99 says about the same with more notation;
I didn't bother looking in the later specs.)

I find that position too restrictive, mainly because of this
consideration: suppose some future version of the spec invents
additional metacharacters.  To be concrete, suppose ? means
something special in SQL2010.  Now how do you make a pattern that
works in both older and newer servers?  \? means literal ? to the
newer server, but if it throws an error on the older, you're stuck.

So I'm for the definition that escape-anything means exactly anything,
without any special treatment that it would otherwise have.  And in
the case of ILIKE it seems like "no special treatment" should mean
"case insensitive match".
        regards, tom lane


Re: Bug in ILIKE?

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> The docs actually don't state what are the semantics of escape followed 
>> by something that is not escape or a metachar. Does the spec say 
>> anything about that?
>>     
>
> The spec says it's an error, per the SQL92 excerpt I quoted in the
> previous thread.  (SQL99 says about the same with more notation;
> I didn't bother looking in the later specs.)
>
> I find that position too restrictive, mainly because of this
> consideration: suppose some future version of the spec invents
> additional metacharacters.  To be concrete, suppose ? means
> something special in SQL2010.  Now how do you make a pattern that
> works in both older and newer servers?  \? means literal ? to the
> newer server, but if it throws an error on the older, you're stuck.
>
> So I'm for the definition that escape-anything means exactly anything,
> without any special treatment that it would otherwise have.  And in
> the case of ILIKE it seems like "no special treatment" should mean
> "case insensitive match".
>   

Well, it looks to me to be pretty easily fixable. Given the above it 
seems to me marginal to call it a bug, though. I don't have very strong 
feelings, but I am generally opposed to changing the visible behaviour 
in stable releases unless something is clearly a bug.

cheers

andrew
>             regards, tom lane
>
>   


Re: Bug in ILIKE?

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> So I'm for the definition that escape-anything means exactly anything,
>> without any special treatment that it would otherwise have.  And in
>> the case of ILIKE it seems like "no special treatment" should mean
>> "case insensitive match".

> Well, it looks to me to be pretty easily fixable. Given the above it 
> seems to me marginal to call it a bug, though. I don't have very strong 
> feelings, but I am generally opposed to changing the visible behaviour 
> in stable releases unless something is clearly a bug.

Well, there are two possible interpretations: (1) it's a bug, or (2)
it's an intentional, about-to-be-documented feature that backslashing a
letter makes it case-sensitive in ILIKE.

I do not care for interpretation #2 ... especially in view of your
observation (confirmed here) that pre-8.3 releases didn't do this.
I think it's just a bug in 8.3.
        regards, tom lane


Re: Bug in ILIKE?

From
"David E. Wheeler"
Date:
On Sep 25, 2008, at 20:45, Tom Lane wrote:

> Well, there are two possible interpretations: (1) it's a bug, or (2)
> it's an intentional, about-to-be-documented feature that
> backslashing a
> letter makes it case-sensitive in ILIKE.
>
> I do not care for interpretation #2 ... especially in view of your
> observation (confirmed here) that pre-8.3 releases didn't do this.
> I think it's just a bug in 8.3.

+1 I think it would be very difficult to come up with a justification
for backslashes making a comparison case-sensitive. It's just…weird.

Best,

David



Re: Bug in ILIKE?

From
Andrew Dunstan
Date:

Tom Lane wrote:
> I think it's just a bug in 8.3.
>
>
>

Well, here's a patch that I think fixes it. If you're happy I'll apply
it to HEAD and 8.3.

cheers

andrew
? .deps
Index: like_match.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/like_match.c,v
retrieving revision 1.21
diff -c -r1.21 like_match.c
*** like_match.c    1 Mar 2008 03:26:34 -0000    1.21
--- like_match.c    26 Sep 2008 22:38:26 -0000
***************
*** 96,105 ****
      {
          if (*p == '\\')
          {
!             /* Next byte must match literally, whatever it is */
              NextByte(p, plen);
!             if ((plen <= 0) || *p != *t)
                  return LIKE_FALSE;
          }
          else if (*p == '%')
          {
--- 96,108 ----
      {
          if (*p == '\\')
          {
!             /* Next char must match literally, whatever it is */
              NextByte(p, plen);
!             if ((plen <= 0) || TCHAR(*p) != TCHAR(*t))
                  return LIKE_FALSE;
+             NextChar(t,tlen);
+             NextChar(p,plen);
+             continue;
          }
          else if (*p == '%')
          {

Re: Bug in ILIKE?

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> I think it's just a bug in 8.3.

> Well, here's a patch that I think fixes it. If you're happy I'll apply 
> it to HEAD and 8.3.

That patch isn't gonna apply to HEAD ;-).  Also the introduction of
NextChar is simply broken, as it will skip additional bytes of a
multibyte char without having compared 'em.  All you need AFAICS is
to put TCHAR()s into the *p != *t comparison, so that it matches the
case for ordinary characters.
        regards, tom lane


Re: Bug in ILIKE?

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Tom Lane wrote:
>>     
>>> I think it's just a bug in 8.3.
>>>       
>
>   
>> Well, here's a patch that I think fixes it. If you're happy I'll apply 
>> it to HEAD and 8.3.
>>     
>
> That patch isn't gonna apply to HEAD ;-).  Also the introduction of
> NextChar is simply broken, as it will skip additional bytes of a
> multibyte char without having compared 'em.  All you need AFAICS is
> to put TCHAR()s into the *p != *t comparison, so that it matches the
> case for ordinary characters.
>
>             
>   

I'll have another look. What happens in that case though if you have 
escape+x where x is a multibyte char?

cheers

andrew


Re: Bug in ILIKE?

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I'll have another look. What happens in that case though if you have 
> escape+x where x is a multibyte char?

TCHAR()'s just a no-op in multibyte encodings --- we handle ILIKE
using a preliminary downcasing pass in that case.
        regards, tom lane


Re: Bug in ILIKE?

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> I'll have another look. What happens in that case though if you have 
>> escape+x where x is a multibyte char?
>>     
>
> TCHAR()'s just a no-op in multibyte encodings --- we handle ILIKE
> using a preliminary downcasing pass in that case.
>
>             
>   

Ah. Of course.

Will fix.

cheers

andrew