Thread: Best way to store case-insensitive data?

Best way to store case-insensitive data?

From
Mike Christensen
Date:
I have a column called "email" that users login with, thus I need to
be able to lookup email very quickly.  The problem is, emails are
case-insensitive.  I want foo@bar.com to be able to login with
FOO@Bar.com as well.  There's two ways of doing this, that I can see:

1) Every time I lookup an email in the database, do a case-insensitive
ilike, or cast both sides with LOWER().  I think both are slow,
correct?
2) Every time the user updates or saves their email, store it in
lowercase, and every time I lookup an email, pass in a lowercase
email.  This is somewhat of a bug farm because one might miss some
little spot in a piece of code where an email is compared or updated.

Is there any way to tell postgres to always store data in lowercase
form, or just have a flat out case-insensitive column?  Thanks!

Mike

Re: Best way to store case-insensitive data?

From
Andy Colson
Date:
On 6/10/2010 3:50 PM, Mike Christensen wrote:
> I have a column called "email" that users login with, thus I need to
> be able to lookup email very quickly.  The problem is, emails are
> case-insensitive.  I want foo@bar.com to be able to login with
> FOO@Bar.com as well.  There's two ways of doing this, that I can see:
>
> 1) Every time I lookup an email in the database, do a case-insensitive
> ilike, or cast both sides with LOWER().  I think both are slow,
> correct?
> 2) Every time the user updates or saves their email, store it in
> lowercase, and every time I lookup an email, pass in a lowercase
> email.  This is somewhat of a bug farm because one might miss some
> little spot in a piece of code where an email is compared or updated.
>
> Is there any way to tell postgres to always store data in lowercase
> form, or just have a flat out case-insensitive column?  Thanks!
>
> Mike
>

There is citext in contrib, it makes case insensitive text columns.

-Andy

Re: Best way to store case-insensitive data?

From
Scott Marlowe
Date:
On Thu, Jun 10, 2010 at 2:50 PM, Mike Christensen <mike@kitchenpc.com> wrote:
> I have a column called "email" that users login with, thus I need to
> be able to lookup email very quickly.  The problem is, emails are
> case-insensitive.  I want foo@bar.com to be able to login with
> FOO@Bar.com as well.  There's two ways of doing this, that I can see:
>
> 1) Every time I lookup an email in the database, do a case-insensitive
> ilike, or cast both sides with LOWER().  I think both are slow,
> correct?
> 2) Every time the user updates or saves their email, store it in
> lowercase, and every time I lookup an email, pass in a lowercase
> email.  This is somewhat of a bug farm because one might miss some
> little spot in a piece of code where an email is compared or updated.
>
> Is there any way to tell postgres to always store data in lowercase
> form, or just have a flat out case-insensitive column?  Thanks!

The contrib module citext provides a case insensitive text type.

Re: Best way to store case-insensitive data?

From
Mike Christensen
Date:
Where do I get info on installing this?

On Thu, Jun 10, 2010 at 2:15 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Thu, Jun 10, 2010 at 2:50 PM, Mike Christensen <mike@kitchenpc.com> wrote:
>> I have a column called "email" that users login with, thus I need to
>> be able to lookup email very quickly.  The problem is, emails are
>> case-insensitive.  I want foo@bar.com to be able to login with
>> FOO@Bar.com as well.  There's two ways of doing this, that I can see:
>>
>> 1) Every time I lookup an email in the database, do a case-insensitive
>> ilike, or cast both sides with LOWER().  I think both are slow,
>> correct?
>> 2) Every time the user updates or saves their email, store it in
>> lowercase, and every time I lookup an email, pass in a lowercase
>> email.  This is somewhat of a bug farm because one might miss some
>> little spot in a piece of code where an email is compared or updated.
>>
>> Is there any way to tell postgres to always store data in lowercase
>> form, or just have a flat out case-insensitive column?  Thanks!
>
> The contrib module citext provides a case insensitive text type.
>

Re: Best way to store case-insensitive data?

From
Scott Marlowe
Date:
On Thu, Jun 10, 2010 at 2:50 PM, Mike Christensen <mike@kitchenpc.com> wrote:
> I have a column called "email" that users login with, thus I need to
> be able to lookup email very quickly.  The problem is, emails are
> case-insensitive.  I want foo@bar.com to be able to login with
> FOO@Bar.com as well.  There's two ways of doing this, that I can see:
>
> 1) Every time I lookup an email in the database, do a case-insensitive
> ilike, or cast both sides with LOWER().  I think both are slow,
> correct?
> 2) Every time the user updates or saves their email, store it in
> lowercase, and every time I lookup an email, pass in a lowercase
> email.  This is somewhat of a bug farm because one might miss some
> little spot in a piece of code where an email is compared or updated.
>
> Is there any way to tell postgres to always store data in lowercase
> form, or just have a flat out case-insensitive column?  Thanks!

Note the other option is to store an index on lower(column)

create index mycaseinsensitiveindex on table ((lower(column));

Re: Best way to store case-insensitive data?

From
Scott Marlowe
Date:
On Thu, Jun 10, 2010 at 3:16 PM, Mike Christensen <mike@kitchenpc.com> wrote:
> Where do I get info on installing this?

Very much depends on OS and how you installed pgsql

Re: Best way to store case-insensitive data?

From
Steve Atkins
Date:

On 6/10/2010 3:50 PM, Mike Christensen wrote:
> I have a column called "email" that users login with, thus I need to
> be able to lookup email very quickly.  The problem is, emails are
> case-insensitive.  I want foo@bar.com to be able to login with
> FOO@Bar.com as well.  There's two ways of doing this, that I can see:
>
> 1) Every time I lookup an email in the database, do a case-insensitive
> ilike, or cast both sides with LOWER().  I think both are slow,
> correct?


Use a functional index and they won't be.

create index email_lower_idx on foo (lower(email));

select * from foo where lower(email) = lower('foo@bar');

> 2) Every time the user updates or saves their email, store it in
> lowercase, and every time I lookup an email, pass in a lowercase
> email.  This is somewhat of a bug farm because one might miss some
> little spot in a piece of code where an email is compared or updated.
>
> Is there any way to tell postgres to always store data in lowercase
> form, or just have a flat out case-insensitive column?  Thanks!
>
> Mike
>

Cheers,
  Steve


Re: Best way to store case-insensitive data?

From
Mike Christensen
Date:
Right now, I'm running 8.3.4 on Windows 2003, and I just used the MSI
installer to install it.

On Thu, Jun 10, 2010 at 2:18 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Thu, Jun 10, 2010 at 3:16 PM, Mike Christensen <mike@kitchenpc.com> wrote:
>> Where do I get info on installing this?
>
> Very much depends on OS and how you installed pgsql
>

Re: Best way to store case-insensitive data?

From
Mike Christensen
Date:
From this site:

http://developer.postgresql.org/pgdocs/postgres/citext.html

I couldn't tell if you still had to create an index on the lower case
value.  It seems that it basically mimics the WHERE LOWER(email) =
LOWER(?) method.  Since this part is incredibly performance critical,
maybe I'm better off storing my data all in lowercase and keeping the
DB case sensitive.

On Thu, Jun 10, 2010 at 2:29 PM, Mike Christensen <mike@kitchenpc.com> wrote:
> Right now, I'm running 8.3.4 on Windows 2003, and I just used the MSI
> installer to install it.
>
> On Thu, Jun 10, 2010 at 2:18 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Thu, Jun 10, 2010 at 3:16 PM, Mike Christensen <mike@kitchenpc.com> wrote:
>>> Where do I get info on installing this?
>>
>> Very much depends on OS and how you installed pgsql
>>
>

Re: Best way to store case-insensitive data?

From
Scott Marlowe
Date:
On Thu, Jun 10, 2010 at 3:34 PM, Mike Christensen <mike@kitchenpc.com> wrote:
> From this site:
>
> http://developer.postgresql.org/pgdocs/postgres/citext.html
>
> I couldn't tell if you still had to create an index on the lower case
> value.  It seems that it basically mimics the WHERE LOWER(email) =
> LOWER(?) method.  Since this part is incredibly performance critical,
> maybe I'm better off storing my data all in lowercase and keeping the
> DB case sensitive.

of course you'd still need an index.  whether you store it lower case
in regular text or mixed case in a citext, the db would need an index
for good performance.  But you wouldn't have to store a lower() index
for citext, just an index.

BTW, citext it new for 8.4, so it's probably not an option for you if
you're on 8.3

Re: Best way to store case-insensitive data?

From
Mike Christensen
Date:
Ah, I should probably upgrade to 8.4.  However, I'll probably just
wait for 9.0 to come out.  So it seems like citext will be about the
same as casting both sides to LOWER(), plus putting an index on the
lowercase version of the text.  I'd probably use that if it were out
of the box, but I'm trying to stay away from adding too many
dependencies..  I think I'll stick with my original approach of only
storing lowercase data in the DB, and perhaps put a CHECK constraint
on there to ensure no upper case letters sneak in.

Mike

On Thu, Jun 10, 2010 at 2:42 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Thu, Jun 10, 2010 at 3:34 PM, Mike Christensen <mike@kitchenpc.com> wrote:
>> From this site:
>>
>> http://developer.postgresql.org/pgdocs/postgres/citext.html
>>
>> I couldn't tell if you still had to create an index on the lower case
>> value.  It seems that it basically mimics the WHERE LOWER(email) =
>> LOWER(?) method.  Since this part is incredibly performance critical,
>> maybe I'm better off storing my data all in lowercase and keeping the
>> DB case sensitive.
>
> of course you'd still need an index.  whether you store it lower case
> in regular text or mixed case in a citext, the db would need an index
> for good performance.  But you wouldn't have to store a lower() index
> for citext, just an index.
>
> BTW, citext it new for 8.4, so it's probably not an option for you if
> you're on 8.3
>

Re: Best way to store case-insensitive data?

From
Adrian von Bidder
Date:
Heyho!

On Thursday 10 June 2010 22.50:23 Mike Christensen wrote:
> 2) Every time the user updates or saves their email, store it in
> lowercase, and every time I lookup an email, pass in a lowercase
> email.

I'd do it this way.  Plus either a CHECK condition on the table (email =
lowercase(email)) (this will reliably catch all cases, but you will
experience failures until you have found all cases)

Or a BEFORE trigger that converts email to lowercase.  (This is mostly
transparent for storing, but I usually try to avoid triggers that modify
data like this.  But that's probably just me.)

In either case, obviously you'll still need to change the code that is used
for retrieving and comparing email addresses.

cheers
-- vbi

--
featured link: http://www.pool.ntp.org

Attachment

Re: Best way to store case-insensitive data?

From
Mike Christensen
Date:
Yup, I actually ended up doing this with this constraint:

ALTER TABLE Users ADD CONSTRAINT check_email CHECK (email ~ E'^[^A-Z]+$');

However, I like your version better so I'll use that instead :)

Mike

On Thu, Jun 10, 2010 at 11:48 PM, Adrian von Bidder
<avbidder@fortytwo.ch> wrote:
> Heyho!
>
> On Thursday 10 June 2010 22.50:23 Mike Christensen wrote:
>> 2) Every time the user updates or saves their email, store it in
>> lowercase, and every time I lookup an email, pass in a lowercase
>> email.
>
> I'd do it this way.  Plus either a CHECK condition on the table (email =
> lowercase(email)) (this will reliably catch all cases, but you will
> experience failures until you have found all cases)
>
> Or a BEFORE trigger that converts email to lowercase.  (This is mostly
> transparent for storing, but I usually try to avoid triggers that modify
> data like this.  But that's probably just me.)
>
> In either case, obviously you'll still need to change the code that is used
> for retrieving and comparing email addresses.
>
> cheers
> -- vbi
>
> --
> featured link: http://www.pool.ntp.org
>

Re: Best way to store case-insensitive data?

From
Michal Politowski
Date:
On Thu, 10 Jun 2010 13:50:23 -0700, Mike Christensen wrote:
> I have a column called "email" that users login with, thus I need to
> be able to lookup email very quickly.  The problem is, emails are
> case-insensitive.  I want foo@bar.com to be able to login with
> FOO@Bar.com as well.  There's two ways of doing this, that I can see:

NB: technically the local part in an email address can be case sensitive.
As RFC 5321 says:
   The local-part of a mailbox MUST BE treated as case sensitive.
   Therefore, SMTP implementations MUST take care to preserve the case
   of mailbox local-parts.  In particular, for some hosts, the user
   "smith" is different from the user "Smith".  However, exploiting the
   case sensitivity of mailbox local-parts impedes interoperability and
   is discouraged.  Mailbox domains follow normal DNS rules and are
   hence not case sensitive.

In practice I've yet to see a system having both smith and Smith
and them being different, but still it is theoretically posible.

--
Michał Politowski
Talking has been known to lead to communication if practiced carelessly.

Re: Best way to store case-insensitive data?

From
Mike Christensen
Date:
Yea this is a valid point.  It's very possible my design won't work
for the long term, and at some point I'll have to store the email name
exactly as it was entered, and allow the lookup logic to be case
insensitive with a lowercase index.  However, I think the way I have
it now should not break any known email server heh.

Mike

2010/6/11 Michal Politowski <mpol@charybda.icm.edu.pl>:
> On Thu, 10 Jun 2010 13:50:23 -0700, Mike Christensen wrote:
>> I have a column called "email" that users login with, thus I need to
>> be able to lookup email very quickly.  The problem is, emails are
>> case-insensitive.  I want foo@bar.com to be able to login with
>> FOO@Bar.com as well.  There's two ways of doing this, that I can see:
>
> NB: technically the local part in an email address can be case sensitive.
> As RFC 5321 says:
>   The local-part of a mailbox MUST BE treated as case sensitive.
>   Therefore, SMTP implementations MUST take care to preserve the case
>   of mailbox local-parts.  In particular, for some hosts, the user
>   "smith" is different from the user "Smith".  However, exploiting the
>   case sensitivity of mailbox local-parts impedes interoperability and
>   is discouraged.  Mailbox domains follow normal DNS rules and are
>   hence not case sensitive.
>
> In practice I've yet to see a system having both smith and Smith
> and them being different, but still it is theoretically posible.
>
> --
> Michał Politowski
> Talking has been known to lead to communication if practiced carelessly.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: Best way to store case-insensitive data?

From
Adrian von Bidder
Date:
On Friday 11 June 2010 09.27:15 Michal Politowski wrote:

[email address local part is case sensitive]

> In practice I've yet to see a system having both smith and Smith
> and them being different, but still it is theoretically posible.

I routinely modify email addresses I store to my addressbook to all
lowercase.  I have yet to have a single case where this gets me into
problems; I think it's probably quite exotic to find a system that actually
is case sensitive.

cheers
-- vbi

--
featured product: ClamAV Antivirus - http://www.clamav.net/

Attachment

Re: Best way to store case-insensitive data?

From
Scott Marlowe
Date:
2010/6/11 Mike Christensen <mike@kitchenpc.com>:
> Yea this is a valid point.  It's very possible my design won't work
> for the long term, and at some point I'll have to store the email name
> exactly as it was entered, and allow the lookup logic to be case
> insensitive with a lowercase index.  However, I think the way I have
> it now should not break any known email server heh.

Instead of mangling data when you store it, mangle it later when you
retrieve it.  with a functional index on the column, you get the
comparison data stored in an index, ready to go.

Performance test the index:

create test_index on table (lower(fieldname));

versus storing the emails in lower case.

Re: Best way to store case-insensitive data?

From
Scott Marlowe
Date:
n Sat, Jun 12, 2010 at 3:21 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> Performance test the index:
>
> create test_index on table (lower(fieldname));
>
> versus storing the emails in lower case.

Some quick testing on that tells me that storing in lower case will be
about twice as fast at retrieval.  But we're talking things like 1ms
versus 2ms.

Re: Best way to store case-insensitive data?

From
Lew
Date:
Please don't top-post.

Mike Christensen wrote:
> Ah, I should probably upgrade to 8.4.  However, I'll probably just
> wait for 9.0 to come out.  So it seems like citext will be about the
> same as casting both sides to LOWER(), plus putting an index on the
> lowercase version of the text.  I'd probably use that if it were out
> of the box, but I'm trying to stay away from adding too many
> dependencies..  I think I'll stick with my original approach of only
> storing lowercase data in the DB, and perhaps put a CHECK constraint
> on there to ensure no upper case letters sneak in.

If your db contains international text there are some corner cases where
lower( upper( val )) != val or upper( lower( val )) != val.  Or there should
be, because that's what happens in certain languages.

For example, upper-case 'ß' should be 'SS' in German.  Lower-case 'SS' is 'ss'.

--
Lew