SQL schema and query optimisation for fast cross-table query execution - Mailing list pgsql-sql

From Brice André
Subject SQL schema and query optimisation for fast cross-table query execution
Date
Msg-id CAOBG12=LHSmZGkOqzRDgomjXQ73u-OazaqbD-RpuDkB-TkyN8A@mail.gmail.com
Whole thread Raw
Responses RE: SQL schema and query optimisation for fast cross-table query execution  ("Mike Sofen" <msofen@runbox.com>)
List pgsql-sql
Dear community,

We are developing a social web site for athletes, and I am facing a technical question on how to implement a complex query on an efficient way.

The current DB schema can be adapted, if it may help...

The concept is the following : we have different users registered on the web-site, and we have several challenges ongoing on the site. Each user can be registered to one or several challenges. Each user performs sport activities. Those activities are not linked to a challenge : they count to all challenges to which the user is registered.

The problem is the following : we would want to add on the website a news feed panel where the users connected on the website can see the most recent activities of the other users. But the constraint is the following : a user can only see the activities of another user if they are both registered to at least one common challenge.

On a performance point of view, this query should be very fast : we plan to refresh automatically the news feed panel of each registered user very often..

The current DB schema is as follows:

user table : this table records all users registered on the web-site
   - user_id
   - ...

challenge table: this table records all ongoing challenges
   - challenge_id
   - ...
   
user_challenge_association table : this table records which user is registered to which challenge
   - user_id
   - challenge_id
   
activity table : this table records all activities of the users
   - user_id
   - date
   - ...
   
a naive implementation of my query could be something like this:

SELECT * FROM activity WHERE user_id IN (SELECT user_id FROM user_challenge_association WHERE challenge_id IN (SELECT challenge_id FROM user_challenge_association WHERE user_id = ||current_user||)) ORDER BY date

But I really fear whatever index I put on the tables, the two nested IN operators will have a huge performance impact on the query...

As there will be fare more queries of this type than queries that add or modify activities, I was guessing if there was not a possibility to pre-compute something at activity insertion so that we can rewrite the query on a more efficient way...

Or maybe a special index that would be cross the activity and the user_challenge_association tables ?

Any help would be very welcome,

Many thanks in advance,

Brice

pgsql-sql by date:

Previous
From: Erik Brandsberg
Date:
Subject: Re: For each statement trigger and update table
Next
From: "Mike Sofen"
Date:
Subject: RE: SQL schema and query optimisation for fast cross-table query execution