Thread: Modeling Friendship Relationships

Modeling Friendship Relationships

From

Robert DiFalco

Date:

11 November 2014, 22:38:22

I have a question about modeling a mutual relationship. It seems basic but I can't decide, maybe it is 6 of one a half dozen of the other.

In my system any user might be friends with another user, that means they have a reciprocal friend relationship.

It seems I have two choices for modeling it.

1. I have a table with two columns userOne and userTwo. If John is friends with Jane there will be one row for both of them.

2. I have a table with two columns owner and friend. If John is friends with Jane there will be two rows, one that is {John, Jane} and another {Jane, John}.

The first option has the advantage of saving table size. But queries are more complex because to get John's friends I have to JOIN friends f ON f.userA = "John" OR f.userB = "John" (not the real query, these would be id's but you get the idea).

In the second option the table rows would be 2x but the queries would be simpler -- JOIN friends f ON f.owner = "John".

There could be >1M users. Each user would have <200 friends.

Thoughts? Do I just choose one or is there a clear winner? TIA!

Re: Modeling Friendship Relationships

From

Rob Sargent

Date:

11 November 2014, 23:28:18

On 11/11/2014 03:38 PM, Robert DiFalco wrote:

I have a question about modeling a mutual relationship. It seems basic but I can't decide, maybe it is 6 of one a half dozen of the other.

In my system any user might be friends with another user, that means they have a reciprocal friend relationship.

It seems I have two choices for modeling it.

1. I have a table with two columns userOne and userTwo. If John is friends with Jane there will be one row for both of them.
2. I have a table with two columns owner and friend. If John is friends with Jane there will be two rows, one that is {John, Jane} and another {Jane, John}.

The first option has the advantage of saving table size. But queries are more complex because to get John's friends I have to JOIN friends f ON f.userA = "John" OR f.userB = "John" (not the real query, these would be id's but you get the idea).

In the second option the table rows would be 2x but the queries would be simpler -- JOIN friends f ON f.owner = "John".

There could be >1M users. Each user would have <200 friends.

Thoughts? Do I just choose one or is there a clear winner? TIA!

did you consider a table with id-of-friendship/friend, unique on the pair, id-of-friendship lists all in the friendship. (Easy aggregate to get one-line per friendship).

Re: Modeling Friendship Relationships

From

Steve Crawford

Date:

11 November 2014, 23:39:57

On 11/11/2014 02:38 PM, Robert DiFalco wrote:
> I have a question about modeling a mutual relationship. It seems basic
> but I can't decide, maybe it is 6 of one a half dozen of the other.
>
> In my system any user might be friends with another user, that means
> they have a reciprocal friend relationship.
>
> It seems I have two choices for modeling it.
>
> 1. I have a table with two columns userOne and userTwo. If John is
> friends with Jane there will be one row for both of them.
> 2. I have a table with two columns owner and friend. If John is
> friends with Jane there will be two rows, one that is {John, Jane} and
> another {Jane, John}.
>
> The first option has the advantage of saving table size. But queries
> are more complex because to get John's friends I have to JOIN friends
> f ON  f.userA = "John" OR f.userB = "John" (not the real query, these
> would be id's but you get the idea).
>
> In the second option the table rows would be 2x but the queries would
> be simpler -- JOIN friends f ON f.owner = "John".
>
> There could be >1M users. Each user would have <200 friends.
>
> Thoughts? Do I just choose one or is there a clear winner? TIA!

What you are describing is basically an adjacency-list without any
hierarchy information, i.e. there isn't a John reports to Dick reports
to Jane type of tree.

One-million-users at 200 friends each would (order-of-magnitudeish) be
200-million rows which tends to argue for saving space. It also reduces
the number of rows impacted by deletes and avoids the risk of ending up
with John,Jane without a corresponding Jane,John.

Getting John's friends isn't too complicated but I suspect the form of
the query you gave won't lead to optimal query plans. For a two-column
format I would imagine that ...userB as friend where userA='John' union
userA as friend where userB='John'... would yield a more optimal plan
assuming an index on each column.

I'm guessing that you will also want to study common-table-expressions
and recursive queries (description and examples are available in the
PostgreSQL docs) so you can start to write single queries to answer
things like "list everyone who is a friend of a friend of John."

Cheers,
Steve

Re: Modeling Friendship Relationships

From

Bill Moran

Date:

12 November 2014, 01:15:24

On Tue, 11 Nov 2014 14:38:16 -0800
Robert DiFalco <robert.difalco@gmail.com> wrote:

> I have a question about modeling a mutual relationship. It seems basic but
> I can't decide, maybe it is 6 of one a half dozen of the other.
>
> In my system any user might be friends with another user, that means they
> have a reciprocal friend relationship.
>
> It seems I have two choices for modeling it.
>
> 1. I have a table with two columns userOne and userTwo. If John is friends
> with Jane there will be one row for both of them.
> 2. I have a table with two columns owner and friend. If John is friends
> with Jane there will be two rows, one that is {John, Jane} and another
> {Jane, John}.
>
> The first option has the advantage of saving table size. But queries are
> more complex because to get John's friends I have to JOIN friends f ON
>  f.userA = "John" OR f.userB = "John" (not the real query, these would be
> id's but you get the idea).
>
> In the second option the table rows would be 2x but the queries would be
> simpler -- JOIN friends f ON f.owner = "John".
>
> There could be >1M users. Each user would have <200 friends.
>
> Thoughts? Do I just choose one or is there a clear winner? TIA!

I recommend a single row per relationship, because your estimates suggest that
the size might be big enough to be worth optimizing on the basis of size.

As far as optimizing queries and what not, I did this recently, and here's what
worked well for me.

Take this example table definition:

CREATE TABLE friendship (
 person1 INT NOT NULL,
 person2 INT NOT NULL,
 PRIMARY KEY (person1, person2),
 CHECK (person1 < person2)
);
CREATE INDEX friendship_person2 ON friendship(person2);

The check constraint guarantees the data will always be stored in a certain
order, which allows you to optimize many queries (i.e., when figuring out
wheter person 57 and person 86 are friends, the where clause is simplified
becuase you know that person1 can't be 86).

If you'll need to do queries of the "list all person 57's friends" variety,
then the queries are still pretty simple, but you could create a stored
procedure to make them even easier on the part of application developers.
It's basically "WHERE person1 = 57 or person2 = 57" which will be able to
use the indexes to provide quick results.  Or something more like:

SELECT person1 AS friend FROM friendship WHERE person2 = 57
UNION
SELECT person2 AS friend FROM friendship WHERE person1 = 57;

A view should work very well:
CREATE VIEW friendship_view AS
SELECT person1 AS person, person2 AS friend FROM friendship
UNION
SELECT person2 AS person, person1 AS friend FORM friendship;

That should be a very performant view when a WHERE clause on person is
specified.

Those types of queries weren't a requirement in the implementation I did,
as the code only ever asked "is person x a friend of person y" and never
wanted the entire list.

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com

Re: Modeling Friendship Relationships

From

John R Pierce

Date:

12 November 2014, 01:20:52

a difficulty of the single entry for joe<->bob  is that its hard to have
a unique constraint across two fields.     you'd want to ensure that
joe+bob is unique and that there's no bob+joe



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Modeling Friendship Relationships

From

David G Johnston

Date:

12 November 2014, 01:32:23

John R Pierce wrote
> a difficulty of the single entry for joe<->bob  is that its hard to have
> a unique constraint across two fields.     you'd want to ensure that
> joe+bob is unique and that there's no bob+joe

Bill's solution:

 PRIMARY KEY (person1, person2),
 CHECK (person1 < person2)

seems to make this constraint fairly simple...am I missing something?

Usage need permitting how difficult would setting up materialized views to
maintain arrays of "friend_of" and "friends_are" and simply unnest those
arrays if record-oriented access to the contained ids is required?  More
basically how performant (generally) are ~200 element arrays?

David J.




--
View this message in context: http://postgresql.nabble.com/Modeling-Friendship-Relationships-tp5826592p5826608.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Modeling Friendship Relationships

From

John R Pierce

Date:

12 November 2014, 01:37:45

On 11/11/2014 5:32 PM, David G Johnston wrote:
> Bill's solution:
>
>   PRIMARY KEY (person1, person2),
>   CHECK (person1 < person2)
>
> seems to make this constraint fairly simple...am I missing something?

oh, I guess I missed that part.   of course, you'll have to make sure
you swap any relation into lesser,greater before inserting into this
table, but such is life.    I suppose a insert trigger could force this.

--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Modeling Friendship Relationships

From

Jonathan Vanasco

Date:

13 November 2014, 16:10:53

On Nov 11, 2014, at 5:38 PM, Robert DiFalco wrote:

> Thoughts? Do I just choose one or is there a clear winner? TIA!


I prefer this model

    user_id__a INT NOT NULL REFERENCES user(id),
    user_id__b INT NOT NULL REFERENCES user(id),
    is_reciprocal BOOLEAN
    primary key (user_id__a, user_id__b)

if a relationship is confirmed (or dropped) I toggle is_reciprocal.  having that value saves a lot of work doing joins
oranalyzing friendship sets 

if you have multiple relationship types, then things get tricky.

you can either
    - treat the row as a triplet ( user_id__a, user_id__b, relationship_type_id)   [i still recommend the reciprocal
bool]
    - if you have a finite set of relationship types, you could just use each one as a bool column within the a2b row

I've tried doing the "one row per relationship" approach, and didn't like it.   the time savings on simple searches
weremarginally faster, but the sql was increasingly more complex and slower to execute as we leveraged the table into
otherqueries.

Re: Modeling Friendship Relationships

From

Robert DiFalco

Date:

20 November 2014, 15:06:16

Thanks Jonathan. So in your use case would you put non-approved friend requests in this table as non-reciprocal? If so, did the person requesting friendship get the row in there or the person receiving the friend request? Also, if A and B are friends, and B decided to remove A as a friend, are you saying that you would not remove both rows?

Thanks!

On Thu, Nov 13, 2014 at 8:10 AM, Jonathan Vanasco <postgres@2xlp.com> wrote:

On Nov 11, 2014, at 5:38 PM, Robert DiFalco wrote:

> Thoughts? Do I just choose one or is there a clear winner? TIA!

I prefer this model

user_id__a INT NOT NULL REFERENCES user(id),
user_id__b INT NOT NULL REFERENCES user(id),
is_reciprocal BOOLEAN
primary key (user_id__a, user_id__b)

if a relationship is confirmed (or dropped) I toggle is_reciprocal. having that value saves a lot of work doing joins or analyzing friendship sets

if you have multiple relationship types, then things get tricky.

you can either
- treat the row as a triplet ( user_id__a, user_id__b, relationship_type_id) [i still recommend the reciprocal bool]
- if you have a finite set of relationship types, you could just use each one as a bool column within the a2b row

I've tried doing the "one row per relationship" approach, and didn't like it. the time savings on simple searches were marginally faster, but the sql was increasingly more complex and slower to execute as we leveraged the table into other queries.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Modeling Friendship Relationships

From

Alvaro Herrera

Date:

20 November 2014, 15:25:20

Robert DiFalco wrote:
> I have a question about modeling a mutual relationship. It seems basic but
> I can't decide, maybe it is 6 of one a half dozen of the other.
>
> In my system any user might be friends with another user, that means they
> have a reciprocal friend relationship.
>
> It seems I have two choices for modeling it.

Have you considered having an array with all friends of each person?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services