Re: General guidance if there is an in dadabase solution or should stay as excel vba solution. - Mailing list pgsql-general

From David Johnston
Subject Re: General guidance if there is an in dadabase solution or should stay as excel vba solution.
Date
Msg-id 007501cc76d6$d6e98990$84bc9cb0$@yahoo.com
Whole thread Raw
In response to General guidance if there is an in dadabase solution or should stay as excel vba solution.  (Henry Drexler <alonup8tb@gmail.com>)
Responses Re: General guidance if there is an in dadabase solution or should stay as excel vba solution.
List pgsql-general

Look at this module for the actual comparison algorithms (found in Appendix F)

 

“fuzzystrmatch”

 

Performance would be my only concern but you have that issue either way.  With “plpgsql” you can do most things in the database you could do in VBA.  Whether you want to bog the DB down with a processor intensive process like this is another question to consider.

 

I am hoping you are putting in limits such as requiring that the first character (or even first partial word) are equal before even checking for an off-by-one error.  With the “Levenshtein” algorithm you’d be looking for a value of “1” to match your current behavior.

 

In short, what you are doing (given your specification below) in VBA is also doable in PostgreSQL.

 

David J.

 

 

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Henry Drexler
Sent: Monday, September 19, 2011 9:10 AM
To: pgsql-general
Subject: [GENERAL] General guidance if there is an in dadabase solution or should stay as excel vba solution.

 

I have no problem doing this in excel vba, though as the list grows larger obviously excel has row limits.

 

 

What is being done:

            There is a column of data imported into the db - they are just text strings, there are about 80,000 rows of them.  The goal is to do a single character elimination to find matches.

 

so for instance the data is a bunch of rows of this:

 

hello there

what is your name

happy birthday

we are winner

we are winners

we like the sky

task to do

tasks to do

 

so for the above in excel I created a macro that will remove one character and compare and do this for each character of each text string.

 

The final product:

 

hello there

what is your name

happy birthday

we are winner              we are winners

we are winners                        we are winner

we like the sky

task to do                            tasks to do

tasks to do                   task to do

 

 

so you can see that it found the matches with being one character off.

 

 

Is this something best done outside of the db and in excel as I am doing or is it possible to do it in db?

 

Note I am not looking for someone to give a whole solution - just if they know it can be done let me know the direction so I can research it and figure it out.

 

Any advice is welcome.

 

pgsql-general by date:

Previous
From: Henry Drexler
Date:
Subject: General guidance if there is an in dadabase solution or should stay as excel vba solution.
Next
From: Tatsuo Ishii
Date:
Subject: Re: postgis and pgpool