Thread: Case Insensitive Data Type
I'm using email addresses as a primary key in one of my tables. Currently, I have to ensure that the email addresses are converted to lower case before they get put into the table, and that all lookups on that field are converted to lower case before the select statement, in order to ensure that Joe@Somewhere.com is the same as joe@somewhere.com.
Does anyone know of a case-insensitive data type? I'd want the following behavior:
/* Make the primary key be a case-insensitive data type */
CREATE TABLE foo (email CASE_INSENSITIVE_VARCHAR(50) PRIMARY KEY, name VARCHAR(50));
/* Insert a row with a case insensitive key */
INSERT INTO foo VALUES ('joe@somewhere.com', 'Joe');
INSERT 24751 1
/* A different case of an existing primary key should fail */
INSERT INTO foo VALUES ('Joe@SOMEWHERE.com', 'Joe');
ERROR: Cannot insert a duplicate key into unique index foo_pkey
/* A lookup on a different case of an existing key should be successful: */
SELECT * FROM foo WHERE email = 'Joe@SOMEWHERE.com';
Anyone know how I can accomplish this? Can I create a custom data type to do this?
Thanks, Russell
Three ways: One: write trigger on the table. CREATE OR REPLACE FUNCTION foo_trigger_func() RETURNS OPAQUE AS ' BEGIN NEW.email := LOWER(NEW.email); RETURN NEW; END;' LANGUAGE 'plpgsql'; CREATE TRIGGER foo_trigger BEFORE INSERT OR UPDATE ON foo FOR EACH ROW EXECUTE PROCEDURE foo_trigger_func(); Two: use the lower function in your update and insert queries INSERT INTO foo VALUES(LOWER(email),'Name'); Three: Postgresql has an ILIKE operator You could play with that but not the best solution HTH Darren Ferguson On Fri, 24 May 2002, Russell Black wrote: > I'm using email addresses as a primary key in one of my tables. Currently, I have to ensure that the email addresses areconverted to lower case before they get put into the table, and that all lookups on that field are converted to lowercase before the select statement, in order to ensure that Joe@Somewhere.com is the same as joe@somewhere.com. > > Does anyone know of a case-insensitive data type? I'd want the following behavior: > > /* Make the primary key be a case-insensitive data type */ > CREATE TABLE foo (email CASE_INSENSITIVE_VARCHAR(50) PRIMARY KEY, name VARCHAR(50)); > > /* Insert a row with a case insensitive key */ > INSERT INTO foo VALUES ('joe@somewhere.com', 'Joe'); > INSERT 24751 1 > > /* A different case of an existing primary key should fail */ > INSERT INTO foo VALUES ('Joe@SOMEWHERE.com', 'Joe'); > ERROR: Cannot insert a duplicate key into unique index foo_pkey > > /* A lookup on a different case of an existing key should be successful: */ > SELECT * FROM foo WHERE email = 'Joe@SOMEWHERE.com'; > email | name > -------------------+------ > joe@somewhere.com | Joe > (1 row) > > Anyone know how I can accomplish this? Can I create a custom data type to do this? > > Thanks, Russell > > >
I found this method in the archives:
CREATE TABLE t1 (f1 varchar(40));
CREATE INDEX t1_f1_idx ON t1 (lower(f1));
CREATE INDEX t1_f1_idx ON t1 (lower(f1));
SELECT * FROM t1 WHERE lower( f1 ) = 'whatever';
This will unfortunately not work, since I'm using CMP EJB's, and I'm not writing the SQL queries myself. The EJB container does it for me, and it wants to do a
This will unfortunately not work, since I'm using CMP EJB's, and I'm not writing the SQL queries myself. The EJB container does it for me, and it wants to do a
SELECT * FROM t1 WHERE f1 = 'whatever';
Instead of
Instead of
SELECT * FROM t1 WHERE lower( f1 ) = 'whatever';
This is why I'm hoping to have the database backend take care of this with perhaps a new datatype, so that the '=' operator is case insensitive.
Any ideas?
Russell
----- Original Message -----From: Russell BlackSent: Friday, May 24, 2002 10:46 AMSubject: [GENERAL] Case Insensitive Data TypeI'm using email addresses as a primary key in one of my tables. Currently, I have to ensure that the email addresses are converted to lower case before they get put into the table, and that all lookups on that field are converted to lower case before the select statement, in order to ensure that Joe@Somewhere.com is the same as joe@somewhere.com.Does anyone know of a case-insensitive data type? I'd want the following behavior:/* Make the primary key be a case-insensitive data type */CREATE TABLE foo (email CASE_INSENSITIVE_VARCHAR(50) PRIMARY KEY, name VARCHAR(50));/* Insert a row with a case insensitive key */INSERT INTO foo VALUES ('joe@somewhere.com', 'Joe');INSERT 24751 1/* A different case of an existing primary key should fail */INSERT INTO foo VALUES ('Joe@SOMEWHERE.com', 'Joe');ERROR: Cannot insert a duplicate key into unique index foo_pkey/* A lookup on a different case of an existing key should be successful: */SELECT * FROM foo WHERE email = 'Joe@SOMEWHERE.com';Anyone know how I can accomplish this? Can I create a custom data type to do this?Thanks, Russell
> > Does anyone know of a case-insensitive data type? I'd want the > following behavior: sounds like you want some combination of the built-in lower() function and a trigger or rule to ensure lower on insert. -Fran
Thanks for your quick reply! The first option you gave looks promising. Unfortunately, I have no control over the insert or select statements, since I'm using EJBs with Container Managed Persistence. The EJB container does the INSERT and UPDATE SQL. This means options two and three in your suggestions wouldn't work. But it looks like your first option would ensure that the data is lower case in the table, which gets me half way there. Now I just need a way to make the = operator for the email column case-insensitive. So that I could do INSERT INTO foo VALUES ('abcde'); and selecting on an upper case version would work: SELECT * FROM foo WHERE (email = 'ABCDE'); Any suggestions? Thanks ----- Original Message ----- > Three ways: > > One: write trigger on the table. > > CREATE OR REPLACE FUNCTION foo_trigger_func() RETURNS OPAQUE AS ' > BEGIN > NEW.email := LOWER(NEW.email); > RETURN NEW; > END;' LANGUAGE 'plpgsql'; > > CREATE TRIGGER foo_trigger BEFORE INSERT OR UPDATE ON foo > FOR EACH ROW EXECUTE PROCEDURE foo_trigger_func(); > > Two: use the lower function in your update and insert queries > > INSERT INTO foo VALUES(LOWER(email),'Name'); > > Three: Postgresql has an ILIKE operator > > You could play with that but not the best solution > > HTH > > Darren Ferguson > > On Fri, 24 May 2002, Russell Black wrote: > > > I'm using email addresses as a primary key in one of my tables. Currently, I have to ensure that the email addresses are converted to lower case before they get put into the table, and that all lookups on that field are converted to lower case before the select statement, in order to ensure that Joe@Somewhere.com is the same as joe@somewhere.com. > > > > Does anyone know of a case-insensitive data type? I'd want the following behavior: > > > > /* Make the primary key be a case-insensitive data type */ > > CREATE TABLE foo (email CASE_INSENSITIVE_VARCHAR(50) PRIMARY KEY, name VARCHAR(50)); > > > > /* Insert a row with a case insensitive key */ > > INSERT INTO foo VALUES ('joe@somewhere.com', 'Joe'); > > INSERT 24751 1 > > > > /* A different case of an existing primary key should fail */ > > INSERT INTO foo VALUES ('Joe@SOMEWHERE.com', 'Joe'); > > ERROR: Cannot insert a duplicate key into unique index foo_pkey > > > > /* A lookup on a different case of an existing key should be successful: */ > > SELECT * FROM foo WHERE email = 'Joe@SOMEWHERE.com'; > > email | name > > -------------------+------ > > joe@somewhere.com | Joe > > (1 row) > > > > Anyone know how I can accomplish this? Can I create a custom data type to do this? > > > > Thanks, Russell > > > > > > >
"Russell Black" <russell.black@iarchives.com> writes: > I'm using email addresses as a primary key in one of my tables. Currently,= > I have to ensure that the email addresses are converted to lower case befo= > re they get put into the table, and that all lookups on that field are conv= > erted to lower case before the select statement, in order to ensure that Jo= > e@Somewhere.com is the same as joe@somewhere.com. A partial answer to this is CREATE UNIQUE INDEX foo_key on foo (lower(email)); which will prevent duplicate entries with the same-up-to-case-folding address. However you'd still have to query with queries like SELECT * FROM foo WHERE lower(email) = lower('probe value'); if the probe values you are handed can't be relied on to be downcased already. (I tried to think of a way to insert the lower() automatically using a rule, but couldn't come up with anything --- ideas anyone?) > Does anyone know of a case-insensitive data type? I do not think you should go away feeling that a case-insensitive data type would magically solve your problems. Today you might think that case-folding is what you want, but by no later than next week you would figure out that there are other sorts of normalizations you'll also need to do on provided addresses --- one obvious example is stripping leading and trailing blanks and reducing consecutive blanks to a single blank. In fact if you really want to look like you know what you're doing, you'll have to strip out the comment portions of an address entirely. For example, these are all equivalent forms per RFC specs: joe@blow.com Joe Blow <joe@blow.com> "Joe Blow" <joe@blow.com> joe@blow.com (Joe Blow) and should be recognized as such by anything that pretends to know what email addresses are. (For that matter, you do know that the specs say only the hostname part should be assumed case-insensitive, don't you?) So the real bottom line here is that you'd better figure a way to pass the input strings through an address-normalization function. "lower()" might do as a first-order approximation but you won't stay with it forever. > Can I create a custom data type to do this? You probably could, but it'd likely be more work than you want. A minimal implementation would require an input function (which would apply the normalization rules you want), an output function (for which you could probably just commandeer textout), and a no-op text() coercion function (assuming you'd like to do anything besides display the value). Then you'd have to create a set of index operators and access method entries, if you intended to make this indexable --- these could all piggyback on text operators, but you'd still need to do the work of making the catalog entries for them. Assuming you did have all that, I think a query like select * from foo where email = 'Joe Blow <joe@blow.com>' would indeed work the way you want --- the initially untyped string literal would eventually get coerced to your datatype, and then your input conversion routine could do the right things to it. But it'll likely be easier to make it happen on the client side ;-) regards, tom lane
On Fri, 24 May 2002 11:08:34 -0600 "Russell Black" <russell.black@iarchives.com> wrote: > The EJB container does it for me, and it wants to do a > SELECT * FROM t1 WHERE f1 = 'whatever'; > Instead of > SELECT * FROM t1 WHERE lower( f1 ) = 'whatever'; You could have it run the SELECT on a view, where the relevant column is defined using lower() on the appropriate column of the real table. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Tom Just out of interest and i am not sure if this is possible or practical but: Could / Would it be very difficult to add ON SELECT triggers to Postgres? This could help a little. It would be as a last resort. SELECT email FROM foo WHERE email = 'ABCDE'; TRIGGER could have a part where you could then lower the ABCDE before the query was run. Just a thought Darren Ferguson On Fri, 24 May 2002, Tom Lane wrote: > "Russell Black" <russell.black@iarchives.com> writes: > > I'm using email addresses as a primary key in one of my tables. Currently,= > > I have to ensure that the email addresses are converted to lower case befo= > > re they get put into the table, and that all lookups on that field are conv= > > erted to lower case before the select statement, in order to ensure that Jo= > > e@Somewhere.com is the same as joe@somewhere.com. > > A partial answer to this is > > CREATE UNIQUE INDEX foo_key on foo (lower(email)); > > which will prevent duplicate entries with the same-up-to-case-folding > address. However you'd still have to query with queries like > > SELECT * FROM foo WHERE lower(email) = lower('probe value'); > > if the probe values you are handed can't be relied on to be downcased > already. (I tried to think of a way to insert the lower() automatically > using a rule, but couldn't come up with anything --- ideas anyone?) > > > > Does anyone know of a case-insensitive data type? > > I do not think you should go away feeling that a case-insensitive data > type would magically solve your problems. Today you might think that > case-folding is what you want, but by no later than next week you would > figure out that there are other sorts of normalizations you'll also need > to do on provided addresses --- one obvious example is stripping leading > and trailing blanks and reducing consecutive blanks to a single blank. > In fact if you really want to look like you know what you're doing, > you'll have to strip out the comment portions of an address entirely. > For example, these are all equivalent forms per RFC specs: > joe@blow.com > Joe Blow <joe@blow.com> > "Joe Blow" <joe@blow.com> > joe@blow.com (Joe Blow) > and should be recognized as such by anything that pretends to know > what email addresses are. (For that matter, you do know that the > specs say only the hostname part should be assumed case-insensitive, > don't you?) > > So the real bottom line here is that you'd better figure a way to pass > the input strings through an address-normalization function. "lower()" > might do as a first-order approximation but you won't stay with it > forever. > > > Can I create a custom data type to do this? > > You probably could, but it'd likely be more work than you want. A > minimal implementation would require an input function (which would > apply the normalization rules you want), an output function (for which > you could probably just commandeer textout), and a no-op text() coercion > function (assuming you'd like to do anything besides display the value). > Then you'd have to create a set of index operators and access method > entries, if you intended to make this indexable --- these could all > piggyback on text operators, but you'd still need to do the work of > making the catalog entries for them. > > Assuming you did have all that, I think a query like > > select * from foo where email = 'Joe Blow <joe@blow.com>' > > would indeed work the way you want --- the initially untyped string > literal would eventually get coerced to your datatype, and then your > input conversion routine could do the right things to it. But it'll > likely be easier to make it happen on the client side ;-) > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
>Assuming you did have all that, I think a query like > > select * from foo where email = 'Joe Blow <joe@blow.com>' > >would indeed work the way you want --- the initially untyped string >literal would eventually get coerced to your datatype, and then your >input conversion routine could do the right things to it. But it'll >likely be easier to make it happen on the client side ;-) Since he is using EJB, he is probably stripping out everything already except the core SMTP e-mail address (probably using javax.mail.internet.InternetAddress's parser). Not that this is in the least bit relevant to PG so I'll just shut up now. :) Cheers, Doug
Doug Fields <dfields-pg-general@pexicom.com> writes: > Since he is using EJB, he is probably stripping out everything already > except the core SMTP e-mail address (probably using > javax.mail.internet.InternetAddress's parser). Hm. I'd sort of expect a facility like that to have done all the appropriate normalization already --- in which case this discussion becomes moot ... regards, tom lane
Darren Ferguson <darren@crystalballinc.com> writes: > Could / Would it be very difficult to add ON SELECT triggers to Postgres? Don't see that it would help you. An ON SELECT trigger would presumably fire when a row is selected --- but you are trying to alter the selection conditions, and cause rows to be selected that would *not* be normally. There is no place for a trigger to fire to make that happen. Note that it'd be easy enough for a rule to add lower() to the reference to email --- really all you're doing there is making a view that replaces the email column with a lowercased version of it. But I don't see a way for a rule to rewrite the query to lower-case the literal constant being compared to email. Can anyone see a way to do it? regards, tom lane
Neil Conway <nconway@klamath.dyndns.org> writes: > You could have it run the SELECT on a view, where the relevant > column is defined using lower() on the appropriate column > of the real table. But that only gets him halfway there. He really wants the system to do a lower() on the provided comparison constant, too. I haven't thought of a way to get the rule system to insert that. regards, tom lane
Would it be possible to make a view that simply returns a tolower() of the field, then have an instead of trigger on the update of the view to update the real table? ______________________________________________________________________________ Your mouse has moved. You must restart Windows for your changes to take effect. #!/usr/bin/perl print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);
On Fri, May 24, 2002 at 03:41:39PM -0400, Tom Lane wrote: > Doug Fields <dfields-pg-general@pexicom.com> writes: > > Since he is using EJB, he is probably stripping out everything already > > except the core SMTP e-mail address (probably using > Hm. I'd sort of expect a facility like that to have done all the > appropriate normalization already --- in which case this discussion Except maybe EJB accepts the actual spec, and so allows mixed case? -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
Friends, I don't want to imply something about things I don't know anything about, but I'm interested in why nobody has suggested creating a new data type using CREATE TYPE, setting up it's various functions and creating new operators for it. It looks (reading the docs) like it would do what Russel wants without too much work. Is there some reason that folks avoid this approach? Thanks, Peter Darley -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Tom Lane Sent: Friday, May 24, 2002 11:48 AM To: Russell Black Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Case Insensitive Data Type "Russell Black" <russell.black@iarchives.com> writes: > I'm using email addresses as a primary key in one of my tables. Currently,= > I have to ensure that the email addresses are converted to lower case befo= > re they get put into the table, and that all lookups on that field are conv= > erted to lower case before the select statement, in order to ensure that Jo= > e@Somewhere.com is the same as joe@somewhere.com. A partial answer to this is CREATE UNIQUE INDEX foo_key on foo (lower(email)); which will prevent duplicate entries with the same-up-to-case-folding address. However you'd still have to query with queries like SELECT * FROM foo WHERE lower(email) = lower('probe value'); if the probe values you are handed can't be relied on to be downcased already. (I tried to think of a way to insert the lower() automatically using a rule, but couldn't come up with anything --- ideas anyone?) > Does anyone know of a case-insensitive data type? I do not think you should go away feeling that a case-insensitive data type would magically solve your problems. Today you might think that case-folding is what you want, but by no later than next week you would figure out that there are other sorts of normalizations you'll also need to do on provided addresses --- one obvious example is stripping leading and trailing blanks and reducing consecutive blanks to a single blank. In fact if you really want to look like you know what you're doing, you'll have to strip out the comment portions of an address entirely. For example, these are all equivalent forms per RFC specs: joe@blow.com Joe Blow <joe@blow.com> "Joe Blow" <joe@blow.com> joe@blow.com (Joe Blow) and should be recognized as such by anything that pretends to know what email addresses are. (For that matter, you do know that the specs say only the hostname part should be assumed case-insensitive, don't you?) So the real bottom line here is that you'd better figure a way to pass the input strings through an address-normalization function. "lower()" might do as a first-order approximation but you won't stay with it forever. > Can I create a custom data type to do this? You probably could, but it'd likely be more work than you want. A minimal implementation would require an input function (which would apply the normalization rules you want), an output function (for which you could probably just commandeer textout), and a no-op text() coercion function (assuming you'd like to do anything besides display the value). Then you'd have to create a set of index operators and access method entries, if you intended to make this indexable --- these could all piggyback on text operators, but you'd still need to do the work of making the catalog entries for them. Assuming you did have all that, I think a query like select * from foo where email = 'Joe Blow <joe@blow.com>' would indeed work the way you want --- the initially untyped string literal would eventually get coerced to your datatype, and then your input conversion routine could do the right things to it. But it'll likely be easier to make it happen on the client side ;-) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
"Peter Darley" <pdarley@kinesis-cem.com> writes: > I don't want to imply something about things I don't know anything about, > but I'm interested in why nobody has suggested creating a new data type > using CREATE TYPE, setting up it's various functions and creating new > operators for it. It looks (reading the docs) like it would do what Russel > wants without too much work. Is there some reason that folks avoid this > approach? That's what I was referring to at the end of the message you quoted. The question is what you define as "too much work" ;-). Given a C implementation of the required normalization function (I'm assuming plain lower() won't really cut it), you'd need probably O(100) lines of additional C code and a few dozen catalog entries to make it happen. A couple hours work for someone who'd done it before, rather longer for someone who hadn't ... regards, tom lane
I'd be happy with something which was just plain case-insensitive. I don't really think anyone is going to enter "Their Email <address@like.this>" Trimming whitepsace would be nice, but I could probably live without it. I'm also comfortable with ignoring the fact that the spec says Joe@somewhere.com is really a different address than joe@somewhere.com. Given that, I would imagine it would be much less than 100 lines of C code, and could probably be done in plpgsql. Maybe it's still a big deal to define all the operators for the new datatype, but I wouldn't mind giving it a try if someone could tell me how to piggy-back on the regular string operators, and string datatype (probably VARCHAR). ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Peter Darley" <pdarley@kinesis-cem.com> Cc: "Russell Black" <russell.black@iarchives.com>; <pgsql-general@postgresql.org> Sent: Friday, May 24, 2002 4:37 PM Subject: Re: [GENERAL] Case Insensitive Data Type > "Peter Darley" <pdarley@kinesis-cem.com> writes: > > I don't want to imply something about things I don't know anything about, > > but I'm interested in why nobody has suggested creating a new data type > > using CREATE TYPE, setting up it's various functions and creating new > > operators for it. It looks (reading the docs) like it would do what Russel > > wants without too much work. Is there some reason that folks avoid this > > approach? > > That's what I was referring to at the end of the message you quoted. > The question is what you define as "too much work" ;-). > > Given a C implementation of the required normalization function (I'm > assuming plain lower() won't really cut it), you'd need probably O(100) > lines of additional C code and a few dozen catalog entries to make it > happen. A couple hours work for someone who'd done it before, rather > longer for someone who hadn't ... > > regards, tom lane
"Russell Black" <russell.black@iarchives.com> writes: > I'd be happy with something which was just plain case-insensitive. > Given that, I would imagine it would be much less than 100 lines of C code, > and could probably be done in plpgsql. You'll still have to get your hands dirty with C. AFAIK there isn't any way to define I/O functions for new datatypes in any of the PL languages we have. regards, tom lane
I found this method in the archives:
CREATE TABLE t1 (f1 varchar(40));
SELECT f1 FROM t1 WHERE lower( f1 ) = 'whatever';
CREATE INDEX t1_f1_idx ON t1 (lower(f1));
SELECT f1 FROM t1 WHERE lower( f1 ) = 'whatever';
CREATE INDEX t1_f1_idx ON t1 (lower(f1));
This will unfortunately not work, since I'm using CMP EJB's, and I'm not writing the SQL queries myself. The EJB container does it for me, and it wants to do a
SELECT f1 FROM t1 WHERE f1 = 'whatever';
Instead of
Instead of
SELECT f1 FROM t1 WHERE lower( f1 ) = 'whatever';
This is why I'm hoping to have the database backend take care of this with perhaps a new datatype, so that the '=' operator is case insensitive.
Russell
----- Original Message -----From: Russell BlackSent: Friday, May 24, 2002 10:46 AMSubject: [GENERAL] Case Insensitive Data TypeI'm using email addresses as a primary key in one of my tables. Currently, I have to ensure that the email addresses are converted to lower case before they get put into the table, and that all lookups on that field are converted to lower case before the select statement, in order to ensure that Joe@Somewhere.com is the same as joe@somewhere.com.Does anyone know of a case-insensitive data type? I'd want the following behavior:/* Make the primary key be a case-insensitive data type */CREATE TABLE foo (email CASE_INSENSITIVE_VARCHAR(50) PRIMARY KEY, name VARCHAR(50));/* Insert a row with a case insensitive key */INSERT INTO foo VALUES ('joe@somewhere.com', 'Joe');INSERT 24751 1/* A different case of an existing primary key should fail */INSERT INTO foo VALUES ('Joe@SOMEWHERE.com', 'Joe');ERROR: Cannot insert a duplicate key into unique index foo_pkey/* A lookup on a different case of an existing key should be successful: */SELECT * FROM foo WHERE email = 'Joe@SOMEWHERE.com';Anyone know how I can accomplish this? Can I create a custom data type to do this?Thanks, Russell