Thread: email data type first release
I already post this message using pgsql-hackers@postgresql.org with attached a binary file but I guess the newsgroup not accept binary file. The following was almost the message, now the file is on pg_foundry ( now it's working ). ===================================================================== Hi all, this is the alpha version for the new email data type, is not written as built in type but as plug in, Tome Lane and others drove me in this direction. The type is indexable and provide also conversion methods: text <--> email It also defined the operator >>, is possible use it in select like: select * from my_user where email >> 'hotmail.com'; this select will extract all records with an email inside the domain 'hotmail.com'. The validation routine is very simple, I focused my attention to made all this working, right now test only the presence of character '@', it's an alpha version after all), I'd like to put it on pgfoundry but apparently my DNS are unable to resolve www.pgfoundry.org. I'll put this version as soon the address become available. This is the first time that I wrote code for postgresql so please, if you can, do a sort of code revision on it. Comments are welcomed. ===================================================================== Regards Gaetano Mendola.
Gaetano Mendola <mendola@bigfoot.com> writes: > Comments are welcomed. Well as long as you're asking... Email domains are case insensitive, but the left hand side is case sensitive. That's the only part that's hard to handle using a text data type, it would be kind of neat if the email operators got it right. Another thing is that it might make more sense to sort email addresses by domain first (case insensitively of course), then by left hand side (case sensitively). Since the domain is really the "most significant bit". This is also convenient for many systems like email since they perform better when they can handle data in that order. Note that this would make the type sort differently from its text representation. This shouldn't really be a problem but occasionally you see poorly written queries that introduce extra type conversions that the user doesn't expect. But then if it behaves just like the text datatype then there wouldn't be much point in using it. -- greg
... > > Another thing is that it might make more sense to sort email addresses by > domain first (case insensitively of course), then by left hand side (case > sensitively). Since the domain is really the "most significant bit". This > is also convenient for many systems like email since they perform better > when they can handle data in that order. > > Note that this would make the type sort differently from its text > representation. This shouldn't really be a problem but occasionally you see > poorly written queries that introduce extra type conversions that the user > doesn't expect. But then if it behaves just like the text datatype then > there wouldn't be much point in using it. Sorting should then be done by top-level-domain first. Then 2nd, 3rd... and last by user. b@a.domain.com < a@b.domain.com and b@b.a.com < b@a.b.com we get then the order: a@a.a.com < a@b.a.com < a@a.b.com < a@b.b.com rather than (in normal text-order): a@a.a.com < a@a.b.com < a@b.a.com < a@a.b.com Tommi
Greg Stark wrote: > Gaetano Mendola <mendola@bigfoot.com> writes: > > >>Comments are welcomed. > > > Well as long as you're asking... > > Email domains are case insensitive, but the left hand side is case sensitive. > That's the only part that's hard to handle using a text data type, it would be > kind of neat if the email operators got it right. > > Another thing is that it might make more sense to sort email addresses by > domain first (case insensitively of course), then by left hand side (case > sensitively). Since the domain is really the "most significant bit". This is > also convenient for many systems like email since they perform better when > they can handle data in that order. > > Note that this would make the type sort differently from its text > representation. This shouldn't really be a problem but occasionally you see > poorly written queries that introduce extra type conversions that the user > doesn't expect. But then if it behaves just like the text datatype then there > wouldn't be much point in using it. That's true, I will order as Tommi Maekitalo suggest. Regards Gaetano Mendola
On Mon, May 17, 2004 at 17:11:42 +0200, Gaetano Mendola <mendola@bigfoot.com> wrote: > > That's true, I will order as Tommi Maekitalo suggest. And how do domain literals fit into this? bruno@[66.93.249.74] is a valid email address for me. (At least as long as my server is at that IP address.)
Tommi Maekitalo <t.maekitalo@epgmbh.de> writes: > Sorting should then be done by top-level-domain first. Then 2nd, 3rd... and > last by user. I thought of that but decided not to suggest it: a) as far as email goes there's no relationship between xxx@foo.com and xxx@bar.com. The ".com" doesn't mean the emailsare any more related than xxx@foo.com and xxx@foo.org are. In fact in practice the latter two are more likely tobe related. b) it's a lot of extra work, whereas sorting by domain first is just as easy as sorting by lhs first. -- greg