Thread: Links between rows in a table
Hi. We are currently designing a web-based application in which users can add other users as "friends". These links are bi-directional, meaning that when A adds B to his friends, he is automatically one of B's friends. Eventually we will have to add a feature that shows how A is is related to some other user E (via B->C->D->...) - similar to the way Friendster, Orkut and others work, but on a much smaller scale (some 5000 users). Probably the most annoying part is that it has to work with different database vendors, including MySQL4 (default install, MyISAM tables, no foreign keys, no stored procedures, no triggers, no views etc). Most of the logic will have to live in the application, and I won't be able to use anything beyond plain SQL. I can see several ways how such links could be modeled in a relational database, but I was wondering if there was some tried-and-true recipe that would spare me from reinventing the wheel. Putting aside for the moment everything but the links, the simplest way of connecting users would be a "friends" table (user_id int, friend_id int). We could get a user's friends with a simple query like this: SELECT friend_id FROM friends WHERE user_id = X UNION SELECT user_id FROM friends WHERE friend_id = X; Is there a better way, or any reason why we should not go that way, especially considering other likely queries such as "friends of friends" or the connection chain mentioned above? We are also thinking of precalculating possible connection chains, or trees, at night (to a certain depth) in order to avoid performance problems in the peak hours. Any ideas on how such precalculated results could be stored and queried efficiently? Thanks in advance, Stefan Weiss
On Sun, Mar 06, 2005 at 05:42:14 +0100, Stefan Weiss <spaceman@foo.at> wrote: > > We are currently designing a web-based application in which users can > add other users as "friends". These links are bi-directional, meaning > that when A adds B to his friends, he is automatically one of B's > friends. Eventually we will have to add a feature that shows how A is This doesn't seem like a good idea unless the person getting linked to gets to confirm he wants the link creator as a friend. > I can see several ways how such links could be modeled in a relational > database, but I was wondering if there was some tried-and-true recipe > that would spare me from reinventing the wheel. Putting aside for the > moment everything but the links, the simplest way of connecting users > would be a "friends" table (user_id int, friend_id int). We could get a > user's friends with a simple query like this: > > SELECT friend_id FROM friends WHERE user_id = X > UNION SELECT user_id FROM friends WHERE friend_id = X; It would probably be better to always have either both or neither of the symmetric relationships in the table. You could make a set of triggers to enforce this.
On 2005-03-06 18:42, Bruno Wolff III wrote: >> We are currently designing a web-based application in which users can >> add other users as "friends". These links are bi-directional, meaning >> that when A adds B to his friends, he is automatically one of B's >> friends. Eventually we will have to add a feature that shows how A is > > This doesn't seem like a good idea unless the person getting linked to > gets to confirm he wants the link creator as a friend. Yes, we have an invitation/pending/confirm process, and users are also able to block other users. I haven't mentioned this because I did not think it relevant to the storage question. There is a different system for unilateral friendships ("favorites/fans"). >> SELECT friend_id FROM friends WHERE user_id = X >> UNION SELECT user_id FROM friends WHERE friend_id = X; > > It would probably be better to always have either both or neither of > the symmetric relationships in the table. You could make a set of triggers > to enforce this. We have also considered this, but since "friendship" in this application is mutual by definition, wouldn't that just lead to data duplication? We might still insert two rows instead of one, if we find that the union slows things down more than the larger table, or if the "connection finder" feature will be easier to implement that way. By the way, according to the MySQL documentation, "Rudimentary support for triggers is included beginning with MySQL 5.0.2". The MySQL compatibility requirement is none of my doing, I have given up trying to educate my customers about the benefits of a real database... regards, stefan weiss
>> It would probably be better to always have either both or neither of >> the symmetric relationships in the table. You could make a set of >> triggers >> to enforce this. Because your relation is symmetric, you should not name them "user" and "friend".The duplication is useless if you add a constraint : see this create table friendship (user_id_1 integer references ... on delete cascade,user_id_2 integer references ... on deletecascade, CHECK( user_id_1 < user_id_2 ) ); user_id_1 < user_id_2 means :- a user can't be his own friend- only one row per friend- when you want to know if A is friendof B, no need to make two selects, just select where user_id_1 = min(user_id_A, user_id_B) AND user_id_2 = max(user_id_A, user_id_B) To get the list of friends for a user, you still need the union, but that is no real problem. Making two queries will be marginally slower than one query on a bigger table, but youu save precious cache space, so in the end it could be faster.
On 2005-03-06 20:26, PFC wrote: > Because your relation is symmetric, you should not name them "user" and > "friend". A good point, thank you. > user_id_1 < user_id_2 means : > - a user can't be his own friend > - only one row per friend > - when you want to know if A is friend of B, no need to make two selects, > just select where user_id_1 = min(user_id_A, user_id_B) AND user_id_2 = > max(user_id_A, user_id_B) This is what we were planning to do on the application side, but a CHECK constraint is even better. It will be used and enforced by those DB engines that understand it, and ignored by the one engine that doesn't. > To get the list of friends for a user, you still need the union, but that > is no real problem. Making two queries will be marginally slower than one > query on a bigger table, but youu save precious cache space, so in the end > it could be faster. Thank you for your insight. We will rename the columns, add the CHECK and go ahead with this setup. regards, stefan weiss
On Sun, Mar 06, 2005 at 20:26:50 +0100, PFC <lists@boutiquenumerique.com> wrote: > >>It would probably be better to always have either both or neither of > >>the symmetric relationships in the table. You could make a set of > >>triggers > >>to enforce this. > > Because your relation is symmetric, you should not name them "user" > and "friend". > The duplication is useless if you add a constraint : see this > > create table friendship ( > user_id_1 integer references ... on delete cascade, > user_id_2 integer references ... on delete cascade, > > CHECK( user_id_1 < user_id_2 ) > ); The trouble with this approach is that for some ways of using this data you will need to worry about the ordering of of the values. The advantage of this method is that the space needed to store the data is half of what is needed to store both pairs for each friendship. > user_id_1 < user_id_2 means : > - a user can't be his own friend > - only one row per friend > - when you want to know if A is friend of B, no need to make two > selects, just select where user_id_1 = min(user_id_A, user_id_B) AND > user_id_2 = max(user_id_A, user_id_B) Note that you can't literally use 'min' and 'max' as above, as those functions don't do that. You could use 'case' to do that.
> The trouble with this approach is that for some ways of using this data > you will need to worry about the ordering of of the values. Tradeoffs, always tradeoffs...It depends on the application. Note also that it eliminates duplicates ; moreover without such a condition, any relation A-B could have the rows [(A,B)] or [(B,A)] or [(A,B),(B,A)] which promises to cause headaches if you need to get rid of the duplicates...I used this scheme for an "also purchased products" thingy on a website, it works well. In this case the row must be unique because we have (A,B,count) which is the number of times products A and B have been purchased together, in this case having rows (B,A) and (A,B) separated wouldn't help in sorting by this count, which is in my case very fast thanks to a multicolumn index. > Note that you can't literally use 'min' and 'max' as above, as those > functions > don't do that. You could use 'case' to do that. ... yes, it was just a way of saying it. You can define functions that take integers as arguments (I wish these basic bricks were defined by default)... > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >