Re: Case Insensitive Data Type - Mailing list pgsql-general
From | Peter Darley |
---|---|
Subject | Re: Case Insensitive Data Type |
Date | |
Msg-id | NNEAICKPNOGDBHNCEDCPIEBCCJAA.pdarley@kinesis-cem.com Whole thread Raw |
In response to | Re: Case Insensitive Data Type (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Case Insensitive Data Type
|
List | pgsql-general |
Friends, I don't want to imply something about things I don't know anything about, but I'm interested in why nobody has suggested creating a new data type using CREATE TYPE, setting up it's various functions and creating new operators for it. It looks (reading the docs) like it would do what Russel wants without too much work. Is there some reason that folks avoid this approach? Thanks, Peter Darley -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Tom Lane Sent: Friday, May 24, 2002 11:48 AM To: Russell Black Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Case Insensitive Data Type "Russell Black" <russell.black@iarchives.com> writes: > I'm using email addresses as a primary key in one of my tables. Currently,= > I have to ensure that the email addresses are converted to lower case befo= > re they get put into the table, and that all lookups on that field are conv= > erted to lower case before the select statement, in order to ensure that Jo= > e@Somewhere.com is the same as joe@somewhere.com. A partial answer to this is CREATE UNIQUE INDEX foo_key on foo (lower(email)); which will prevent duplicate entries with the same-up-to-case-folding address. However you'd still have to query with queries like SELECT * FROM foo WHERE lower(email) = lower('probe value'); if the probe values you are handed can't be relied on to be downcased already. (I tried to think of a way to insert the lower() automatically using a rule, but couldn't come up with anything --- ideas anyone?) > Does anyone know of a case-insensitive data type? I do not think you should go away feeling that a case-insensitive data type would magically solve your problems. Today you might think that case-folding is what you want, but by no later than next week you would figure out that there are other sorts of normalizations you'll also need to do on provided addresses --- one obvious example is stripping leading and trailing blanks and reducing consecutive blanks to a single blank. In fact if you really want to look like you know what you're doing, you'll have to strip out the comment portions of an address entirely. For example, these are all equivalent forms per RFC specs: joe@blow.com Joe Blow <joe@blow.com> "Joe Blow" <joe@blow.com> joe@blow.com (Joe Blow) and should be recognized as such by anything that pretends to know what email addresses are. (For that matter, you do know that the specs say only the hostname part should be assumed case-insensitive, don't you?) So the real bottom line here is that you'd better figure a way to pass the input strings through an address-normalization function. "lower()" might do as a first-order approximation but you won't stay with it forever. > Can I create a custom data type to do this? You probably could, but it'd likely be more work than you want. A minimal implementation would require an input function (which would apply the normalization rules you want), an output function (for which you could probably just commandeer textout), and a no-op text() coercion function (assuming you'd like to do anything besides display the value). Then you'd have to create a set of index operators and access method entries, if you intended to make this indexable --- these could all piggyback on text operators, but you'd still need to do the work of making the catalog entries for them. Assuming you did have all that, I think a query like select * from foo where email = 'Joe Blow <joe@blow.com>' would indeed work the way you want --- the initially untyped string literal would eventually get coerced to your datatype, and then your input conversion routine could do the right things to it. But it'll likely be easier to make it happen on the client side ;-) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
pgsql-general by date: