Thread: email data type first release

email data type first release

From
Gaetano Mendola
Date:
I already post this message using pgsql-hackers@postgresql.org
with attached a binary file but I guess the newsgroup not
accept binary file.

The following was almost the message, now the file is on pg_foundry (
now it's working ).

=====================================================================
Hi all, this is the alpha version for the new email data type,
is not written as built in type but as plug in, Tome Lane and others
drove me in this direction.

The type is indexable and provide also conversion methods:
text <--> email

It also defined the operator >>, is possible use it in select like:

select * from my_user where email >> 'hotmail.com';

this select will extract all records with an email inside
the domain 'hotmail.com'.

The validation routine is very simple, I focused my attention to made all
this working, right now test only the presence of character '@', it's an
alpha version after all), I'd like to put it on pgfoundry but apparently
my DNS are unable to resolve www.pgfoundry.org. I'll put this version as
soon the address become available.

This is the first time that I wrote code for postgresql so please, if
you can, do a sort of code revision on it.

Comments are welcomed.
=====================================================================



Regards
Gaetano Mendola.


Re: email data type first release

From
Greg Stark
Date:
Gaetano Mendola <mendola@bigfoot.com> writes:

> Comments are welcomed.

Well as long as you're asking...

Email domains are case insensitive, but the left hand side is case sensitive.
That's the only part that's hard to handle using a text data type, it would be
kind of neat if the email operators got it right.

Another thing is that it might make more sense to sort email addresses by
domain first (case insensitively of course), then by left hand side (case
sensitively). Since the domain is really the "most significant bit". This is
also convenient for many systems like email since they perform better when
they can handle data in that order.

Note that this would make the type sort differently from its text
representation. This shouldn't really be a problem but occasionally you see
poorly written queries that introduce extra type conversions that the user
doesn't expect. But then if it behaves just like the text datatype then there
wouldn't be much point in using it.


-- 
greg



Re: email data type first release

From
Tommi Maekitalo
Date:
...
>
> Another thing is that it might make more sense to sort email addresses by
> domain first (case insensitively of course), then by left hand side (case
> sensitively). Since the domain is really the "most significant bit". This
> is also convenient for many systems like email since they perform better
> when they can handle data in that order.
>
> Note that this would make the type sort differently from its text
> representation. This shouldn't really be a problem but occasionally you see
> poorly written queries that introduce extra type conversions that the user
> doesn't expect. But then if it behaves just like the text datatype then
> there wouldn't be much point in using it.

Sorting should then be done by top-level-domain first. Then 2nd, 3rd... and 
last by user.

b@a.domain.com < a@b.domain.com
and
b@b.a.com < b@a.b.com

we get then the order:
a@a.a.com < a@b.a.com < a@a.b.com < a@b.b.com

rather than (in normal text-order):
a@a.a.com < a@a.b.com < a@b.a.com < a@a.b.com

Tommi


Re: email data type first release

From
Gaetano Mendola
Date:
Greg Stark wrote:

> Gaetano Mendola <mendola@bigfoot.com> writes:
> 
> 
>>Comments are welcomed.
> 
> 
> Well as long as you're asking...
> 
> Email domains are case insensitive, but the left hand side is case sensitive.
> That's the only part that's hard to handle using a text data type, it would be
> kind of neat if the email operators got it right.
> 
> Another thing is that it might make more sense to sort email addresses by
> domain first (case insensitively of course), then by left hand side (case
> sensitively). Since the domain is really the "most significant bit". This is
> also convenient for many systems like email since they perform better when
> they can handle data in that order.
> 
> Note that this would make the type sort differently from its text
> representation. This shouldn't really be a problem but occasionally you see
> poorly written queries that introduce extra type conversions that the user
> doesn't expect. But then if it behaves just like the text datatype then there
> wouldn't be much point in using it.

That's true, I will order as Tommi Maekitalo suggest.



Regards
Gaetano Mendola




Re: email data type first release

From
Bruno Wolff III
Date:
On Mon, May 17, 2004 at 17:11:42 +0200, Gaetano Mendola <mendola@bigfoot.com> wrote:
> 
> That's true, I will order as Tommi Maekitalo suggest.

And how do domain literals fit into this?

bruno@[66.93.249.74] is a valid email address for me. (At least as
long as my server is at that IP address.)


Re: email data type first release

From
Greg Stark
Date:
Tommi Maekitalo <t.maekitalo@epgmbh.de> writes:

> Sorting should then be done by top-level-domain first. Then 2nd, 3rd... and 
> last by user.

I thought of that but decided not to suggest it:

a) as far as email goes there's no relationship between xxx@foo.com and  xxx@bar.com. The ".com" doesn't mean the
emailsare any more related than  xxx@foo.com and xxx@foo.org are. In fact in practice the latter two are  more likely
tobe related.
 

b) it's a lot of extra work, whereas sorting by domain first is just as easy  as sorting by lhs first.


-- 
greg