Thread: Re: Matching and Scoring with multiple fields

Re: Matching and Scoring with multiple fields

From
"Tim Johnson"
Date:
I have a problem. Ok I'll rephrase that, a challenge.

I have a table like this:

a,b,c,d,e,f,g,h
---------------
2,5,3,4,4,5,2,2
1,1,1,1,1,1,1,1
5,5,5,5,5,5,5,5
3,3,2,4,5,1,1,3
1,1,5,5,5,5,1,4
1,5,5,5,4,4,2,1
5,5,5,5,1,1,1,1
1,1,1,1,5,5,5,5
(rows 8)

a to h are of type int.


I want to take input values which relate to this table say:
how do you feel about a:
how do you feel about b:
how do you feel about c:
...

and the answers will be 1 to 5.

Now I want to take those answers for my incoming a to h and scan down the
table pulling out the closest matches from best to worst. There will be
about 2000 rows in the final table and I will LIMIT the rows in blocks of 10
or so.

I can do the limiting stuff, but not the matching. My first thought was to
sum each row and match by that until I came out of my mental coma and
noticed that the last two lines have the same sum and are complete
opposites.

So, where to from here? I thought I could go through line by line selecting
with a tolerance on each value say +-1 to begin with, then again with +-2
but that will take hours and I'm not entirely sure it'll work or how I'll do
it.

I know general netequitte says that I shouldn't just dump my problem here,
but I am truly stumped by this one - if anybody can give me a pointer in the
right direction I'd greatly appreciate it.


Thanks,
Tim Johnson
---
http://www.theinkfactory.co.uk



Re: Re: Matching and Scoring with multiple fields

From
Oliver Mueschke
Date:
I'm not sure, but it seems you could calculate a column like:   SELECT a,b,c,...,
abs(<feel_a>-a)+abs(<feel_b>-b)+abs(<feel_c>-c)+...AS weight   FROM t   ORDER BY weight
 

This way the closest matches would come first.

On Mon, Jul 10, 2000 at 07:56:08PM +0100, Tim Johnson wrote:
> I have a problem. Ok I'll rephrase that, a challenge.
> 
> I have a table like this:
> 
> a,b,c,d,e,f,g,h
> ---------------
> 2,5,3,4,4,5,2,2
> 1,1,1,1,1,1,1,1
> 5,5,5,5,5,5,5,5
> 3,3,2,4,5,1,1,3
> 1,1,5,5,5,5,1,4
> 1,5,5,5,4,4,2,1
> 5,5,5,5,1,1,1,1
> 1,1,1,1,5,5,5,5
> (rows 8)
> 
> a to h are of type int.
> 
> 
> I want to take input values which relate to this table say:
> how do you feel about a:
> how do you feel about b:
> how do you feel about c:
> ...
> 
> and the answers will be 1 to 5.
> 
> Now I want to take those answers for my incoming a to h and scan down the
> table pulling out the closest matches from best to worst. There will be
> about 2000 rows in the final table and I will LIMIT the rows in blocks of 10
> or so.


Re: Re: Matching and Scoring with multiple fields

From
Ed Loehr
Date:
Tim Johnson wrote:
> 
> I have a table like this:
> 
> a,b,c,d,e,f,g,h
> ---------------
> 2,5,3,4,4,5,2,2
> 1,1,1,1,1,1,1,1
> 
> a to h are of type int.
> 
> I want to take input values which relate to this table say:
> how do you feel about a:
> how do you feel about b:
> how do you feel about c:
> ...
> 
> and the answers will be 1 to 5.
> 
> Now I want to take those answers for my incoming a to h and scan down the
> table pulling out the closest matches from best to worst. 

I wonder if you don't really just want to find the vector(s) closest in
N-space to the input vector.  You might dig up an old 3-variable calculus
book, find the formula, and write a PL/pgSQL function to compute the
distance between two N-dimensional vectors...

Regards,
Ed Loehr


RE: Re: Matching and Scoring with multiple fields

From
"Tim Johnson"
Date:
Thanks to all of you that replied. I think Oliver's idea (which is pretty
close to Stephan's) will probably do the trick I think.

I will maybe look in the future to add the ability to allow users to weight
fields with more priority. So customers could number the top five most
important fields and then pick how they feel. I still worry about the
results being skewed by extreme data in certain fields but I guess there's
no way around that.

Thanks again.

Tim Johnson,
-- http://www.theinkfactory.co.uk

-----Original Message-----
From: Oliver Mueschke [mailto:o@mueschke.de]
Sent: 10 July 2000 21:15
To: pgsql-sql@postgresql.org
Cc: tim@lincolnshirenow.com
Subject: Re: [SQL] Re: Matching and Scoring with multiple fields


I'm not sure, but it seems you could calculate a column like:   SELECT a,b,c,...,
abs(<feel_a>-a)+abs(<feel_b>-b)+abs(<feel_c>-c)+...AS weight   FROM t   ORDER BY weight
 

This way the closest matches would come first.

On Mon, Jul 10, 2000 at 07:56:08PM +0100, Tim Johnson wrote:
> I have a problem. Ok I'll rephrase that, a challenge.
>
> I have a table like this:
>
> a,b,c,d,e,f,g,h
> ---------------
> 2,5,3,4,4,5,2,2
> 1,1,1,1,1,1,1,1
> 5,5,5,5,5,5,5,5
> 3,3,2,4,5,1,1,3
> 1,1,5,5,5,5,1,4
> 1,5,5,5,4,4,2,1
> 5,5,5,5,1,1,1,1
> 1,1,1,1,5,5,5,5
> (rows 8)
>
> a to h are of type int.
>
>
> I want to take input values which relate to this table say:
> how do you feel about a:
> how do you feel about b:
> how do you feel about c:
> ...
>
> and the answers will be 1 to 5.
>
> Now I want to take those answers for my incoming a to h and scan down the
> table pulling out the closest matches from best to worst. There will be
> about 2000 rows in the final table and I will LIMIT the rows in blocks of
10
> or so.