Home > mailing lists

Re: Extracting hostname from URI column - Mailing list pgsql-sql

From	John Summerfield
Subject	Re: Extracting hostname from URI column
Date	September 19, 2007 02:32:14
Msg-id	46F0B452.1050104@herakles.homelinux.org Whole thread Raw
In response to	Re: Extracting hostname from URI column (Paul Lambert <paul.lambert@autoledgers.com.au>)
List	pgsql-sql

Tree view

Paul Lambert wrote:
> Paul Lambert wrote:
>> chester c young wrote:
>>>> I'm trying to use substr() and position() functions to extract the
>>>> full host name (and later a domain) from a column that holds URLs.
>>>
>>> substring( href from '.*://\([^/]*)' );
>>>
>>
>> Ok, your solution looks better than mine... but I have no idea how to 
>> interpret that, time to consult some manuals.
>>
> 
> OK - following on from this, I'm doing a similar thing to the OP to 
> analyze my proxy's traffic (never occured to me to do it in a db until 
> that post)
> 
> I've done the above regex to pull out the domain part of the URL and am 
> left with results such as:
> "acvs.mediaonenetwork.net"
> "profile.ak.facebook.com"
> "www.bankwest.com.au"
> 
> What I want to do next is pull out the major domain part of the URL
> I.e. for the above three records I should end up with
> "mediaonenetwork.net"
> "facebook.com"
> "bankwest.com.au"
> 
> What would be the best way to do something like that? I assume it won't 
> be a simple regex like the above due to the country codes on the end of 
> some domains. My thought is look at the last portion of the domain, if 
> it's 2 characters long then assume it's a country code and grab the last 
> three sections, if it's not three characters long then assume it's an 
> international domain and grab the last two... but that sounds a bit dodgy.

and csiro.au?
There may be a few others in .au too, legacies from before we went 
commercial.

pgsql-sql by date:

From: Tom Lane
Date: 18 September 2007, 12:13:28
Subject: Re: error dropping operator

From: Nis Jørgensen
Date: 19 September 2007, 05:13:54
Subject: Re: Table transform query

Re: Extracting hostname from URI column - Mailing list pgsql-sql

Previous

Next