Thread: extracting the domain from an email address
Good morning,
--
I am searching for a better more efficient way to extract the domain portion of an email address. The two I have been using are very expensive.
The one extract I mostly use: substring(email from '@(.*)$')
also has an index on the email column which is the full email address:
btree (email DESC) WHERE email::text = "substring"(email::text, '@(.*)$'::text) AND length(email::text) > 0,
The other extract:
substr(e.email,(strpos(e.email, '@') + 1))
Currently has no index.
The referenced table has 72 million rows. The email column can be empty (ergo the length = 0 check. I am running postgresql 9.4.
Any insights/observations welcome.
Mark Steben
Database Administrator
@utoRevenue | Autobase
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567
Database Administrator
@utoRevenue | Autobase
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567
www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com
On 16/1/19 3:20 μ.μ., Mark Steben wrote:
Why don't you create an index on the extracted part including the where clause?Good morning,I am searching for a better more efficient way to extract the domain portion of an email address. The two I have been using are very expensive.The one extract I mostly use: substring(email from '@(.*)$')also has an index on the email column which is the full email address:btree (email DESC) WHERE email::text = "substring"(email::text, '@(.*)$'::text) AND length(email::text) > 0,The other extract:substr(e.email,(strpos(e.email, '@') + 1))Currently has no index.
create index your_table_email_domain ON your_table(substring(email from '@(.*)$')) WHERE length(email::text) > 0 ;
The referenced table has 72 million rows. The email column can be empty (ergo the length = 0 check. I am running postgresql 9.4.Any insights/observations welcome.--Mark Steben
Database Administrator
@utoRevenue | Autobase
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
On Jan 16, 2019, at 8:20 AM, Mark Steben <mark.steben@drivedominion.com> wrote:Good morning,I am searching for a better more efficient way to extract the domain portion of an email address. The two I have been using are very expensive.The one extract I mostly use: substring(email from '@(.*)$')also has an index on the email column which is the full email address:btree (email DESC) WHERE email::text = "substring"(email::text, '@(.*)$'::text) AND length(email::text) > 0,The other extract:substr(e.email,(strpos(e.email, '@') + 1))Currently has no index.The referenced table has 72 million rows. The email column can be empty (ergo the length = 0 check. I am running postgresql 9.4.Any insights/observations welcome.
Not sure of the exact problem you are trying to solve; is it just querying the table via domain name? Does the index that you created work?
A few things; the regex given does not get domain name for some valid emails addresses;
i.e. foo@bar@nowhere.com is a valid email; however, the regex expression would return bar@nowhere.com instead of nowhere.com.
create or replace function get_domainname(_value text)
returns text
as $$
begin
_value := reverse(_value);
return nullif(reverse(substring(_value, 0, strpos(_value, '@'))), '');
end;
$$ language plpgsql
immutable returns null on null input
;
create index table_name_idx1 on table_name (get_domainname(email));
Then in the where clause:
where get_domainname(email) = 'nowhere.com'
p.s.
If you want to validate that the email address conforms to the RFC 2822 you can use this function which makes use of Perl’s Email::Address module. You would need to install plperl and Email::Address module.
create or replace function is_valid_email_address(eaddr text)
returns boolean
as
$body$
use Email::Address;
return Email::Address->parse($_[0]);
$body$
language 'plperlu' immutable
;