Thread: Best method to compare subdomains
Is there a recommended, high performance method to check for subdomains? Something like: - www.google.com is subdomain of google.com - ilikegoogle.com is not subdomain of google.com There are many ways to do this (lowercase and reverse the string, append a '.' if not there, append a '%', and do a LIKE). But I'm looking for one that will perform well when the master domain list is an indexed field in a table, and when the possible subdomain is either an individual value, or a field in a table for a join (potentially indexed).
On Wed, Jan 16, 2013 at 03:23:30PM -0500, Robert James wrote: > Is there a recommended, high performance method to check for subdomains? > > Something like: > - www.google.com is subdomain of google.com > - ilikegoogle.com is not subdomain of google.com > > There are many ways to do this (lowercase and reverse the string, > append a '.' if not there, append a '%', and do a LIKE). But I'm > looking for one that will perform well when the master domain list is > an indexed field in a table, and when the possible subdomain is either > an individual value, or a field in a table for a join (potentially > indexed). Well, the _best_ thing to do would be to convert all the labels to wire format and compare those, because that way you know you're matching label by label the way the DNS does. That sounds like a lot of work, however, and you probably need to do it in C. You could find all the label boundaries (in the presentation format, the dots) and then split out the labels. I suppose you could put them into an array and then count backwards in the array to compare the different labels. Reversing the string might not actually work, because it's possible that the labels are just octets and unless you're careful about your locale you could end up messing that reverse operation up -- oughta be safe in "C", though. (Contrary to popular opinion, domain name labels are not necessarily made of ASCII.) You can, of course, also force the labels to be only LDH-labels. Best, A -- Andrew Sullivan ajs@crankycanuck.ca
On Jan 16, 2013, at 12:23 PM, Robert James <srobertjames@gmail.com> wrote: > Is there a recommended, high performance method to check for subdomains? > > Something like: > - www.google.com is subdomain of google.com > - ilikegoogle.com is not subdomain of google.com > > There are many ways to do this (lowercase and reverse the string, > append a '.' if not there, append a '%', and do a LIKE). But I'm > looking for one that will perform well when the master domain list is > an indexed field in a table, and when the possible subdomain is either > an individual value, or a field in a table for a join (potentially > indexed). If you've already dealt with any punycode encoding then the lowercased, reversed string works pretty well, either as a (probably trigger-managed) field or as a functional index. If you need to get fancier, you might want to take a look at the approach https://github.com/dimitri/prefix takes. Cheers, Steve
On 16 January 2013 20:23, Robert James <srobertjames@gmail.com> wrote:
Is there a recommended, high performance method to check for subdomains?
Something like:
- www.google.com is subdomain of google.com
- ilikegoogle.com is not subdomain of google.com
There are many ways to do this (lowercase and reverse the string,
append a '.' if not there, append a '%', and do a LIKE). But I'm
looking for one that will perform well when the master domain list is
an indexed field in a table, and when the possible subdomain is either
an individual value, or a field in a table for a join (potentially
indexed).
Maybe the ltree extension will suit what you are trying to do? http://www.postgresql.org/docs/9.2/static/ltree.html
On 1/17/2013 1:12 AM, Sumit Raja wrote:
On 16 January 2013 20:23, Robert James <srobertjames@gmail.com> wrote:Is there a recommended, high performance method to check for subdomains?
Something like:
- www.google.com is subdomain of google.com
- ilikegoogle.com is not subdomain of google.com
...
Maybe the ltree extension will suit what you are trying to do? http://www.postgresql.org/docs/9.2/static/ltree.html
that would work great, if you reversed the word order ... so (one of my domains), www.astronomy.santa-cruz.ca.us. becomes .us.ca.santa-cruz.astronomy.www
I half expected to a see a function that would reverse the order of a ltree's elements, ah well.
On 2013-01-16, Robert James <srobertjames@gmail.com> wrote: > Is there a recommended, high performance method to check for subdomains? > > Something like: > - www.google.com is subdomain of google.com > - ilikegoogle.com is not subdomain of google.com > > There are many ways to do this (lowercase and reverse the string, > append a '.' if not there, append a '%', and do a LIKE). But I'm > looking for one that will perform well when the master domain list is > an indexed field in a table, and when the possible subdomain is either > an individual value, or a field in a table for a join (potentially > indexed). write a function that does the string mangling you describe above and then index on the result. then you can use like and the index will help. -- ⚂⚃ 100% natural