Re: extracting location info from string - Mailing list pgsql-sql

From Andrej
Subject Re: extracting location info from string
Date
Msg-id BANLkTimBKGkhdzYCZqBR=pxa9bNhUBXWLg@mail.gmail.com
Whole thread Raw
In response to Re: extracting location info from string  (Tarlika Elisabeth Schmitz <postgresql3@numerixtechnology.de>)
Responses Re: extracting location info from string  (Tarlika Elisabeth Schmitz <postgresql3@numerixtechnology.de>)
List pgsql-sql
On 26 May 2011 09:13, Tarlika Elisabeth Schmitz
<postgresql3@numerixtechnology.de> wrote:
> On Wed, 25 May 2011 09:25:48 -0600
> Rob Sargent <robjsargent@gmail.com> wrote:
>
>>
>>
>>On 05/24/2011 10:57 AM, Lew wrote:
>>> Tarlika Elisabeth Schmitz wrote:
>>>
>>>> CREATE TABLE person
>>>> (
>>>> id integer NOT NULL,
>>>> "name" character varying(256) NOT NULL,
>>>> "location" character varying(256),
>>>> CONSTRAINT person_pkey PRIMARY KEY (id)
>>>> );
>>>>
>>>> this was just a TEMPORARY table I created for quick analysis
>>>> of my CSV data (now renamed to temp_person).
>
> CREATE TABLE country
> (
>  id character varying(3) NOT NULL, -- alpha-3 code
>  "name" character varying(50) NOT NULL,
>  CONSTRAINT country_pkey PRIMARY KEY (id)
> );
>
>
>>To minimize the ultimately quite necessary human adjudication, one
>>might make good use of what is often termed "crowd sourcing":  Keep
>>all the distinct "hand entered" values and a map to the final human
>>assessment.
>
> I was wondering how to do just that. I don't think it would be a good
> idea to hard code this into the clean-up script. Take, for instance,
> variations of COUNTRY.NAME spelling. Where would I store these?
>
> I could do with a concept for this problem, which applies to a lot of
> string-type info.

I'd start w/ downloading a list as mentioned here:
http://answers.google.com/answers/threadview?id=596822

And run it through a wee perl script using
http://search.cpan.org/~maurice/Text-DoubleMetaphone-0.07/DoubleMetaphone.pm
to make phonetic matches ...

Then I'd run your own data through DoubleMetaphone, and clean up
matches if not too many false positives show up.



Cheers,
Andrej


pgsql-sql by date:

Previous
From: Charlie
Date:
Subject: Re: [SQL] extracting location info from string
Next
From: Peter Koczan
Date:
Subject: Re: enum data type vs table