On Wed, Jan 16, 2013 at 03:23:30PM -0500, Robert James wrote:
> Is there a recommended, high performance method to check for subdomains?
>
> Something like:
> - www.google.com is subdomain of google.com
> - ilikegoogle.com is not subdomain of google.com
>
> There are many ways to do this (lowercase and reverse the string,
> append a '.' if not there, append a '%', and do a LIKE). But I'm
> looking for one that will perform well when the master domain list is
> an indexed field in a table, and when the possible subdomain is either
> an individual value, or a field in a table for a join (potentially
> indexed).
Well, the _best_ thing to do would be to convert all the labels to
wire format and compare those, because that way you know you're
matching label by label the way the DNS does. That sounds like a lot
of work, however, and you probably need to do it in C.
You could find all the label boundaries (in the presentation format,
the dots) and then split out the labels. I suppose you could put them
into an array and then count backwards in the array to compare the
different labels.
Reversing the string might not actually work, because it's possible that
the labels are just octets and unless you're careful about your locale
you could end up messing that reverse operation up -- oughta be safe in
"C", though. (Contrary to popular opinion, domain name labels are
not necessarily made of ASCII.) You can, of course, also force the
labels to be only LDH-labels.
Best,
A
--
Andrew Sullivan
ajs@crankycanuck.ca