Home > mailing lists

Re: extracting location info from string - Mailing list pgsql-sql

From	Lew
Subject	Re: extracting location info from string
Date	May 25, 2011 09:09:12
Msg-id	irgo1v$dbv$1@news.albasani.net Whole thread Raw
In response to	Re: extracting location info from string (Tarlika Elisabeth Schmitz <postgresql3@numerixtechnology.de>)
Responses	Re: extracting location info from string (Rob Sargent <robjsargent@gmail.com>) Re: extracting location info from string (Tarlika Elisabeth Schmitz <postgresql3@numerixtechnology.de>)
List	pgsql-sql

Tree view

Tarlika Elisabeth Schmitz wrote:
> Lew wrote:
>> That isn't a table structure, that's a freeform text structure.  You
>> didn't state your question, Tarlika, but your database structure is
>> terrible.  For example, "region" and "country" should be different
>> columns.

> I presume you are referring to my original post:
> CREATE TABLE person
> (
>    id integer NOT NULL,
>    "name" character varying(256) NOT NULL,
>    "location" character varying(256),
>    CONSTRAINT person_pkey PRIMARY KEY (id)
> );
>
> Sorry, this was just a TEMPORARY table I created for quick analysis of
> my CSV data (now renamed to temp_person).
>
>
>
> The target table is:
> CREATE TABLE person
> (
>    id integer NOT NULL,
>    "name" character varying(100) NOT NULL,
>    country character varying(3),
>    county character varying(3),
>    town character varying(50),
>    CONSTRAINT trainer_pkey PRIMARY KEY (id),
>    CONSTRAINT country_person_fk FOREIGN KEY (country)
>        REFERENCES country (id) MATCH SIMPLE
>        ON UPDATE CASCADE ON DELETE RESTRICT,
>    CONSTRAINT county_person_fk FOREIGN KEY (country, county)
>        REFERENCES county (country, code) MATCH SIMPLE
>        ON UPDATE NO ACTION ON DELETE NO ACTION,
> );

Ah, yes, that makes much more sense.  Temporary tables such as you describe 
can be very convenient and effective.  Thanks for the clarification.

I think this problem is very widespread, namely how to get structured 
information out of freeform data.  I've encountered it many times over the 
years, as have so many I know.  I believe that human intervention will always 
be needed for this type of work, e.g., distinguishing place names that seem 
the same or correlating ones that seem distinct.  I also don't know of any 
perfect approach.  Perhaps the best one can find is a probabilistic promise 
that error will be less than some epsilon.

That said, if you have a robust process to correct errors as the user 
population discovers them, then you can approach perfection asymptotically. 
Sometimes the best solution to a technical problem is a good human process. From an engineering standpoint, user
feedbackis a vital element of 

homeostatic control.

Edward W. Rouse's suggestion of a reference table to resolve different forms 
of address or region identification would fit well with such a process.

-- 
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg

pgsql-sql by date:

From: Craig Ringer
Date: 25 May 2011, 02:07:33
Subject: Re: Problems Pgdump

From: Robert Haas
Date: 25 May 2011, 09:09:15
Subject: Re: Performance of NOT IN and <> with PG 9.0.4

Re: extracting location info from string - Mailing list pgsql-sql

Previous

Next