Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase) - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)
Date
Msg-id D425483C2C5C9F49B5B7A41F8944154757D206@postal.corporate.connx.com
Whole thread Raw
Responses Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)  (Terry Fielder <terry@ashtonwoodshomes.com>)
List pgsql-hackers
Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:

connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' as
varchar(30));
 ?column?
----------
(0 rows)

I see how you can interpret the SQL Standard to make the above response
a correct one.  But is it the response that you would like?

Suppose, for instance, that you have an employee table in your database.
You have another table called benefits.  Perhaps it is even in another
database.  Conceivably even in a database other than PostgreSQL.

Anyway, you want to match information between the two systems so you
join on some redundant columns like a.First_Name = b.fname AND
a.Last_name = b.lname AND a.Middle_Initial = b.mi AND a.City = b.city
AND a.Street_name = b.street

If the columns are not the same length (or one is fixed length and the
other variable), then 'Danniel' won't match 'Danniel' unless you trim
it.  If you trim it, then the indexes go out the window.  If the indexes
go out the window, then we table scan.

I don't like that behavior.  Perhaps others who know more than me can
say why not blank padding comparisons is a good idea.

Clearly, one could argue that having redundant data is bad and that
every attribute in a database intended to match should be exactly the
same type.  But there are lots of database systems badly designed.  And
of well designed systems, it is not uncommon to have more than one
database in your organization, and a need to perform federated joins as
well because of it.

> -----Original Message-----
> From: Tino Wildenhain [mailto:tino@wildenhain.de]
> Sent: Wednesday, October 19, 2005 1:05 PM
> To: Marc G. Fournier
> Cc: Dann Corbit; Richard_D_Levine@raytheon.com; pgsql-
> hackers@postgresql.org; pgsql-general@postgresql.org
> Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
> buysInnobase)
>
> Am Mittwoch, den 19.10.2005, 16:29 -0300 schrieb Marc G. Fournier:
> > I'm CC'ng this over to -hackers ... Tom?  Comments?
> >
> > On Wed, 19 Oct 2005, Dann Corbit wrote:
> >
> > > Yes, clearly that is the wrong result according to the SQL
standard.
> > >
> > > Here is a SQL*Server query:
> > > select 1 where 'a' = 'a ' AND 'a' = 'a  ' AND 'a ' = 'a         '
> > >
> > > It returns (correctly): 1
> > >
> > >> -----Original Message-----
> > >> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> > >> owner@postgresql.org] On Behalf Of Marc G. Fournier
> > >> Sent: Wednesday, October 19, 2005 11:41 AM
> > >> To: Richard_D_Levine@raytheon.com
> > >> Cc: pgsql-general@postgresql.org
> > >> Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase
> > >>
> > >> On Wed, 19 Oct 2005, Richard_D_Levine@raytheon.com wrote:
> > >>
> > >>> I was referring to trailing blanks, but did not explicitly say
it,
> > >>> though showed it in the examples.  I am pretty sure that the SQL
> > >>> standard says that trailing whitespace is insignificant in
string
> > >>> comparison.
> > >>
> > >> Then we are broken too :)
> > >>
> > >> # select 'a ' = 'a  ';
> > >>   ?column?
> > >> ----------
> > >>   f
> > >> (1 row)
>
>
> experiment=# SELECT 'a '::char = 'a  '::char;
>  ?column?
> ----------
>  t
>


pgsql-hackers by date:

Previous
From: Tino Wildenhain
Date:
Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buys
Next
From: Greg Stark
Date:
Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buys