Thread: Links between rows in a table

Links between rows in a table

From
Stefan Weiss
Date:
Hi.

We are currently designing a web-based application in which users can
add other users as "friends". These links are bi-directional, meaning
that when A adds B to his friends, he is automatically one of B's
friends. Eventually we will have to add a feature that shows how A is
is related to some other user E (via B->C->D->...) - similar to the way
Friendster, Orkut and others work, but on a much smaller scale (some
5000 users).

Probably the most annoying part is that it has to work with different
database vendors, including MySQL4 (default install, MyISAM tables, no
foreign keys, no stored procedures, no triggers, no views etc). Most of
the logic will have to live in the application, and I won't be able to
use anything beyond plain SQL.

I can see several ways how such links could be modeled in a relational
database, but I was wondering if there was some tried-and-true recipe
that would spare me from reinventing the wheel. Putting aside for the
moment everything but the links, the simplest way of connecting users
would be a "friends" table (user_id int, friend_id int). We could get a
user's friends with a simple query like this:
       SELECT friend_id FROM friends WHERE user_id   = X UNION SELECT user_id   FROM friends WHERE friend_id = X;

Is there a better way, or any reason why we should not go that way,
especially considering other likely queries such as "friends of friends"
or the connection chain mentioned above?

We are also thinking of precalculating possible connection chains, or
trees, at night (to a certain depth) in order to avoid performance
problems in the peak hours. Any ideas on how such precalculated results
could be stored and queried efficiently?


Thanks in advance,
Stefan Weiss



Re: Links between rows in a table

From
Bruno Wolff III
Date:
On Sun, Mar 06, 2005 at 05:42:14 +0100, Stefan Weiss <spaceman@foo.at> wrote:
> 
> We are currently designing a web-based application in which users can
> add other users as "friends". These links are bi-directional, meaning
> that when A adds B to his friends, he is automatically one of B's
> friends. Eventually we will have to add a feature that shows how A is

This doesn't seem like a good idea unless the person getting linked to
gets to confirm he wants the link creator as a friend.

> I can see several ways how such links could be modeled in a relational
> database, but I was wondering if there was some tried-and-true recipe
> that would spare me from reinventing the wheel. Putting aside for the
> moment everything but the links, the simplest way of connecting users
> would be a "friends" table (user_id int, friend_id int). We could get a
> user's friends with a simple query like this:
> 
>         SELECT friend_id FROM friends WHERE user_id   = X
>   UNION SELECT user_id   FROM friends WHERE friend_id = X;

It would probably be better to always have either both or neither of
the symmetric relationships in the table. You could make a set of triggers
to enforce this.


Re: Links between rows in a table

From
Stefan Weiss
Date:
On 2005-03-06 18:42, Bruno Wolff III wrote:
>> We are currently designing a web-based application in which users can
>> add other users as "friends". These links are bi-directional, meaning
>> that when A adds B to his friends, he is automatically one of B's
>> friends. Eventually we will have to add a feature that shows how A is
> 
> This doesn't seem like a good idea unless the person getting linked to
> gets to confirm he wants the link creator as a friend.

Yes, we have an invitation/pending/confirm process, and users are also
able to block other users. I haven't mentioned this because I did not
think it relevant to the storage question. There is a different system
for unilateral friendships ("favorites/fans").

>>         SELECT friend_id FROM friends WHERE user_id   = X
>>   UNION SELECT user_id   FROM friends WHERE friend_id = X;
> 
> It would probably be better to always have either both or neither of
> the symmetric relationships in the table. You could make a set of triggers
> to enforce this.

We have also considered this, but since "friendship" in this application
is mutual by definition, wouldn't that just lead to data duplication? We
might still insert two rows instead of one, if we find that the union
slows things down more than the larger table, or if the "connection
finder" feature will be easier to implement that way.

By the way, according to the MySQL documentation, "Rudimentary support
for triggers is included beginning with MySQL 5.0.2". The MySQL
compatibility requirement is none of my doing, I have given up trying to
educate my customers about the benefits of a real database...


regards,
stefan weiss


Re: Links between rows in a table

From
PFC
Date:
>> It would probably be better to always have either both or neither of
>> the symmetric relationships in the table. You could make a set of  
>> triggers
>> to enforce this.
Because your relation is symmetric, you should not name them "user" and  
"friend".The duplication is useless if you add a constraint : see this

create table friendship (user_id_1 integer    references ... on delete cascade,user_id_2 integer references ... on
deletecascade,
 
CHECK( user_id_1 < user_id_2 )
);
user_id_1 < user_id_2 means :- a user can't be his own friend- only one row per friend- when you want to know if A is
friendof B, no need to make two selects,  
 
just select where user_id_1 = min(user_id_A, user_id_B) AND user_id_2 =  
max(user_id_A, user_id_B)
To get the list of friends for a user, you still need the union, but that  
is no real problem. Making two queries will be marginally slower than one  
query on a bigger table, but youu save precious cache space, so in the end  
it could be faster.


Re: Links between rows in a table

From
Stefan Weiss
Date:
On 2005-03-06 20:26, PFC wrote:
>     Because your relation is symmetric, you should not name them "user" and  
> "friend".

A good point, thank you.

>     user_id_1 < user_id_2 means :
>     - a user can't be his own friend
>     - only one row per friend
>     - when you want to know if A is friend of B, no need to make two selects,  
> just select where user_id_1 = min(user_id_A, user_id_B) AND user_id_2 =  
> max(user_id_A, user_id_B)

This is what we were planning to do on the application side, but a CHECK
constraint is even better. It will be used and enforced by those DB
engines that understand it, and ignored by the one engine that doesn't.

>     To get the list of friends for a user, you still need the union, but that  
> is no real problem. Making two queries will be marginally slower than one  
> query on a bigger table, but youu save precious cache space, so in the end  
> it could be faster.

Thank you for your insight. We will rename the columns, add the CHECK
and go ahead with this setup.


regards,
stefan weiss


Re: Links between rows in a table

From
Bruno Wolff III
Date:
On Sun, Mar 06, 2005 at 20:26:50 +0100, PFC <lists@boutiquenumerique.com> wrote:
> >>It would probably be better to always have either both or neither of
> >>the symmetric relationships in the table. You could make a set of  
> >>triggers
> >>to enforce this.
> 
>     Because your relation is symmetric, you should not name them "user" 
>     and  "friend".
>     The duplication is useless if you add a constraint : see this
> 
> create table friendship (
>     user_id_1 integer    references ... on delete cascade,
>     user_id_2 integer references ... on delete cascade,
> 
>     CHECK( user_id_1 < user_id_2 )
> );

The trouble with this approach is that for some ways of using this data
you will need to worry about the ordering of of the values. The advantage
of this method is that the space needed to store the data is half of
what is needed to store both pairs for each friendship.

>     user_id_1 < user_id_2 means :
>     - a user can't be his own friend
>     - only one row per friend
>     - when you want to know if A is friend of B, no need to make two 
>     selects,  just select where user_id_1 = min(user_id_A, user_id_B) AND 
> user_id_2 =  max(user_id_A, user_id_B)

Note that you can't literally use 'min' and 'max' as above, as those functions
don't do that. You could use 'case' to do that.


Re: Links between rows in a table

From
PFC
Date:
> The trouble with this approach is that for some ways of using this data
> you will need to worry about the ordering of of the values.
Tradeoffs, always tradeoffs...It depends on the application. Note also that it eliminates duplicates ;  
moreover without such a condition, any relation A-B could have the rows  
[(A,B)] or [(B,A)] or [(A,B),(B,A)] which promises to cause headaches if  
you need to get rid of the duplicates...I used this scheme for an "also purchased products" thingy on a website,  
it works well. In this case the row must be unique because we have  
(A,B,count) which is the number of times products A and B have been  
purchased together, in this case having rows (B,A) and (A,B) separated  
wouldn't help in sorting by this count, which is in my case very fast  
thanks to a multicolumn index.

> Note that you can't literally use 'min' and 'max' as above, as those  
> functions
> don't do that. You could use 'case' to do that.
... yes, it was just a way of saying it. You can define functions that  
take integers as arguments (I wish these basic bricks were defined by  
default)...


>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>