Thread: Best method to compare subdomains

Best method to compare subdomains

From
Robert James
Date:
Is there a recommended, high performance method to check for subdomains?

Something like:
- www.google.com is subdomain of google.com
- ilikegoogle.com is not subdomain of google.com

There are many ways to do this (lowercase and reverse the string,
append a '.' if not there, append a '%', and do a LIKE).  But I'm
looking for one that will perform well when the master domain list is
an indexed field in a table, and when the possible subdomain is either
an individual value, or a field in a table for a join (potentially
indexed).


Re: Best method to compare subdomains

From
Andrew Sullivan
Date:
On Wed, Jan 16, 2013 at 03:23:30PM -0500, Robert James wrote:
> Is there a recommended, high performance method to check for subdomains?
>
> Something like:
> - www.google.com is subdomain of google.com
> - ilikegoogle.com is not subdomain of google.com
>
> There are many ways to do this (lowercase and reverse the string,
> append a '.' if not there, append a '%', and do a LIKE).  But I'm
> looking for one that will perform well when the master domain list is
> an indexed field in a table, and when the possible subdomain is either
> an individual value, or a field in a table for a join (potentially
> indexed).

Well, the _best_ thing to do would be to convert all the labels to
wire format and compare those, because that way you know you're
matching label by label the way the DNS does.  That sounds like a lot
of work, however, and you probably need to do it in C.

You could find all the label boundaries (in the presentation format,
the dots) and then split out the labels.  I suppose you could put them
into an array and then count backwards in the array to compare the
different labels.

Reversing the string might not actually work, because it's possible that
the labels are just octets and unless you're careful about your locale
you could end up messing that reverse operation up -- oughta be safe in
"C", though.  (Contrary to popular opinion, domain name labels are
not necessarily made of ASCII.)  You can, of course, also force the
labels to be only LDH-labels.

Best,

A

--
Andrew Sullivan
ajs@crankycanuck.ca


Re: Best method to compare subdomains

From
Steve Atkins
Date:
On Jan 16, 2013, at 12:23 PM, Robert James <srobertjames@gmail.com> wrote:

> Is there a recommended, high performance method to check for subdomains?
>
> Something like:
> - www.google.com is subdomain of google.com
> - ilikegoogle.com is not subdomain of google.com
>
> There are many ways to do this (lowercase and reverse the string,
> append a '.' if not there, append a '%', and do a LIKE).  But I'm
> looking for one that will perform well when the master domain list is
> an indexed field in a table, and when the possible subdomain is either
> an individual value, or a field in a table for a join (potentially
> indexed).

If you've already dealt with any punycode encoding then the lowercased,
reversed string works pretty well, either as a (probably trigger-managed)
field or as a functional index.

If you need to get fancier, you might want to take a look at the approach
https://github.com/dimitri/prefix takes.

Cheers,
  Steve


Re: Best method to compare subdomains

From
Sumit Raja
Date:

On 16 January 2013 20:23, Robert James <srobertjames@gmail.com> wrote:
Is there a recommended, high performance method to check for subdomains?

Something like:
- www.google.com is subdomain of google.com
- ilikegoogle.com is not subdomain of google.com

There are many ways to do this (lowercase and reverse the string,
append a '.' if not there, append a '%', and do a LIKE).  But I'm
looking for one that will perform well when the master domain list is
an indexed field in a table, and when the possible subdomain is either
an individual value, or a field in a table for a join (potentially
indexed).

Maybe the ltree extension will suit what you are trying to do? http://www.postgresql.org/docs/9.2/static/ltree.html

Re: Best method to compare subdomains

From
John R Pierce
Date:
On 1/17/2013 1:12 AM, Sumit Raja wrote:
On 16 January 2013 20:23, Robert James <srobertjames@gmail.com> wrote:
Is there a recommended, high performance method to check for subdomains?

Something like:
- www.google.com is subdomain of google.com
- ilikegoogle.com is not subdomain of google.com
...
Maybe the ltree extension will suit what you are trying to do? http://www.postgresql.org/docs/9.2/static/ltree.html

that would work great, if you reversed the word order ...   so (one of my domains),  www.astronomy.santa-cruz.ca.us. becomes .us.ca.santa-cruz.astronomy.www

I half expected to a see a function that would reverse the order of a ltree's elements, ah well.


Re: Best method to compare subdomains

From
Jasen Betts
Date:
On 2013-01-16, Robert James <srobertjames@gmail.com> wrote:
> Is there a recommended, high performance method to check for subdomains?
>
> Something like:
> - www.google.com is subdomain of google.com
> - ilikegoogle.com is not subdomain of google.com
>
> There are many ways to do this (lowercase and reverse the string,
> append a '.' if not there, append a '%', and do a LIKE).  But I'm
> looking for one that will perform well when the master domain list is
> an indexed field in a table, and when the possible subdomain is either
> an individual value, or a field in a table for a join (potentially
> indexed).

write a function that does the string mangling you describe above and
then index on the result.

then you can use like and the index will help.


--
⚂⚃ 100% natural