Thread: Re: Matching and Scoring with multiple fields
I have a problem. Ok I'll rephrase that, a challenge. I have a table like this: a,b,c,d,e,f,g,h --------------- 2,5,3,4,4,5,2,2 1,1,1,1,1,1,1,1 5,5,5,5,5,5,5,5 3,3,2,4,5,1,1,3 1,1,5,5,5,5,1,4 1,5,5,5,4,4,2,1 5,5,5,5,1,1,1,1 1,1,1,1,5,5,5,5 (rows 8) a to h are of type int. I want to take input values which relate to this table say: how do you feel about a: how do you feel about b: how do you feel about c: ... and the answers will be 1 to 5. Now I want to take those answers for my incoming a to h and scan down the table pulling out the closest matches from best to worst. There will be about 2000 rows in the final table and I will LIMIT the rows in blocks of 10 or so. I can do the limiting stuff, but not the matching. My first thought was to sum each row and match by that until I came out of my mental coma and noticed that the last two lines have the same sum and are complete opposites. So, where to from here? I thought I could go through line by line selecting with a tolerance on each value say +-1 to begin with, then again with +-2 but that will take hours and I'm not entirely sure it'll work or how I'll do it. I know general netequitte says that I shouldn't just dump my problem here, but I am truly stumped by this one - if anybody can give me a pointer in the right direction I'd greatly appreciate it. Thanks, Tim Johnson --- http://www.theinkfactory.co.uk
I'm not sure, but it seems you could calculate a column like: SELECT a,b,c,..., abs(<feel_a>-a)+abs(<feel_b>-b)+abs(<feel_c>-c)+...AS weight FROM t ORDER BY weight This way the closest matches would come first. On Mon, Jul 10, 2000 at 07:56:08PM +0100, Tim Johnson wrote: > I have a problem. Ok I'll rephrase that, a challenge. > > I have a table like this: > > a,b,c,d,e,f,g,h > --------------- > 2,5,3,4,4,5,2,2 > 1,1,1,1,1,1,1,1 > 5,5,5,5,5,5,5,5 > 3,3,2,4,5,1,1,3 > 1,1,5,5,5,5,1,4 > 1,5,5,5,4,4,2,1 > 5,5,5,5,1,1,1,1 > 1,1,1,1,5,5,5,5 > (rows 8) > > a to h are of type int. > > > I want to take input values which relate to this table say: > how do you feel about a: > how do you feel about b: > how do you feel about c: > ... > > and the answers will be 1 to 5. > > Now I want to take those answers for my incoming a to h and scan down the > table pulling out the closest matches from best to worst. There will be > about 2000 rows in the final table and I will LIMIT the rows in blocks of 10 > or so.
Tim Johnson wrote: > > I have a table like this: > > a,b,c,d,e,f,g,h > --------------- > 2,5,3,4,4,5,2,2 > 1,1,1,1,1,1,1,1 > > a to h are of type int. > > I want to take input values which relate to this table say: > how do you feel about a: > how do you feel about b: > how do you feel about c: > ... > > and the answers will be 1 to 5. > > Now I want to take those answers for my incoming a to h and scan down the > table pulling out the closest matches from best to worst. I wonder if you don't really just want to find the vector(s) closest in N-space to the input vector. You might dig up an old 3-variable calculus book, find the formula, and write a PL/pgSQL function to compute the distance between two N-dimensional vectors... Regards, Ed Loehr
Thanks to all of you that replied. I think Oliver's idea (which is pretty close to Stephan's) will probably do the trick I think. I will maybe look in the future to add the ability to allow users to weight fields with more priority. So customers could number the top five most important fields and then pick how they feel. I still worry about the results being skewed by extreme data in certain fields but I guess there's no way around that. Thanks again. Tim Johnson, -- http://www.theinkfactory.co.uk -----Original Message----- From: Oliver Mueschke [mailto:o@mueschke.de] Sent: 10 July 2000 21:15 To: pgsql-sql@postgresql.org Cc: tim@lincolnshirenow.com Subject: Re: [SQL] Re: Matching and Scoring with multiple fields I'm not sure, but it seems you could calculate a column like: SELECT a,b,c,..., abs(<feel_a>-a)+abs(<feel_b>-b)+abs(<feel_c>-c)+...AS weight FROM t ORDER BY weight This way the closest matches would come first. On Mon, Jul 10, 2000 at 07:56:08PM +0100, Tim Johnson wrote: > I have a problem. Ok I'll rephrase that, a challenge. > > I have a table like this: > > a,b,c,d,e,f,g,h > --------------- > 2,5,3,4,4,5,2,2 > 1,1,1,1,1,1,1,1 > 5,5,5,5,5,5,5,5 > 3,3,2,4,5,1,1,3 > 1,1,5,5,5,5,1,4 > 1,5,5,5,4,4,2,1 > 5,5,5,5,1,1,1,1 > 1,1,1,1,5,5,5,5 > (rows 8) > > a to h are of type int. > > > I want to take input values which relate to this table say: > how do you feel about a: > how do you feel about b: > how do you feel about c: > ... > > and the answers will be 1 to 5. > > Now I want to take those answers for my incoming a to h and scan down the > table pulling out the closest matches from best to worst. There will be > about 2000 rows in the final table and I will LIMIT the rows in blocks of 10 > or so.