Re: String comparison and the SQL standard - Mailing list pgsql-general

From Albe Laurenz
Subject Re: String comparison and the SQL standard
Date
Msg-id A737B7A37273E048B164557ADEF4A58B0579B0AD@ntex2010a.host.magwien.gv.at
Whole thread Raw
In response to Re: String comparison and the SQL standard  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: String comparison and the SQL standard  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Tom Lane wrote:
>>    b) If the length in characters of X is not equal to the length
>>       in characters of Y, then the shorter string is effectively
>>       replaced, for the purposes of comparison, with a copy of itself
>>       that has been extended to the length of the longer string by
>>       concatenation on the right of one or more pad characters,
>>       where the pad character is chosen based on CS.
>>       If CS has the NO PAD characteristic, then the pad character is
>>       an implementation-dependent character different from
>>       any character in the character set of X and Y that collates
>>       less than any string under CS.
>>       Otherwise, the pad character is a <space>.
>
> The PAD case is specifying the way that CHAR(n) comparison should work.
> (We don't expose the PAD/NO PAD distinction in any other way than
> CHAR vs VARCHAR/TEXT types.)
>
> AFAICS, the NO PAD case is ignorable BS: [...]

> In any case, the most significant word in that whole paragraph is
> "effectively", which means you can do it however you want as long
> as you get an equivalent comparison result.
>
>> That would effectively mean that 'a'='a   ' is TRUE for
>> all character string types.
>
> In the PAD case, yes.  Else no.

Thanks for the clarification.

>> Of the DBMS I tested, Microsoft SQL Server and MySQL gave me
>> that very result, while PostgreSQL and Oracle gave me FALSE.
>
> This probably has more to do with what these systems think the
> data type of an undecorated literal is, than with whether they do
> trailing-space-insensitive comparison all the time.

I tested not only with string literals, but also comparing
table columns of the respective types.

I came up with the following table of semantics used for
comparisons:

           | CHAR(n)=CHAR(n) | VARCHAR(n)=VARCHAR(n) | CHAR(n)=VARCHAR(n) |
-----------+-----------------+-----------------------+--------------------+
Oracle     |    PAD SPACE    |        NO PAD         |      NO PAD        |
-----------+-----------------+-----------------------+--------------------+
PostgreSQL |    PAD SPACE    |        NO PAD         |     PAD SPACE      |
-----------+-----------------+-----------------------+--------------------+
MySQL      |    PAD SPACE    |       PAD SPACE       |     PAD SPACE      |
-----------+-----------------+-----------------------+--------------------+
SQL Server |    PAD SPACE    |       PAD SPACE       |     PAD SPACE      |
-----------+-----------------+-----------------------+--------------------+

Yours,
Laurenz Albe


pgsql-general by date:

Previous
From: Jasen Betts
Date:
Subject: Re: Case insensitive collation
Next
From: salah jubeh
Date:
Subject: Re: SELECT DISTINCT