Thread: SQL schema and query optimisation for fast cross-table query execution

SQL schema and query optimisation for fast cross-table query execution

From
Brice André
Date:
Dear community,

We are developing a social web site for athletes, and I am facing a technical question on how to implement a complex query on an efficient way.

The current DB schema can be adapted, if it may help...

The concept is the following : we have different users registered on the web-site, and we have several challenges ongoing on the site. Each user can be registered to one or several challenges. Each user performs sport activities. Those activities are not linked to a challenge : they count to all challenges to which the user is registered.

The problem is the following : we would want to add on the website a news feed panel where the users connected on the website can see the most recent activities of the other users. But the constraint is the following : a user can only see the activities of another user if they are both registered to at least one common challenge.

On a performance point of view, this query should be very fast : we plan to refresh automatically the news feed panel of each registered user very often..

The current DB schema is as follows:

user table : this table records all users registered on the web-site
   - user_id
   - ...

challenge table: this table records all ongoing challenges
   - challenge_id
   - ...
   
user_challenge_association table : this table records which user is registered to which challenge
   - user_id
   - challenge_id
   
activity table : this table records all activities of the users
   - user_id
   - date
   - ...
   
a naive implementation of my query could be something like this:

SELECT * FROM activity WHERE user_id IN (SELECT user_id FROM user_challenge_association WHERE challenge_id IN (SELECT challenge_id FROM user_challenge_association WHERE user_id = ||current_user||)) ORDER BY date

But I really fear whatever index I put on the tables, the two nested IN operators will have a huge performance impact on the query...

As there will be fare more queries of this type than queries that add or modify activities, I was guessing if there was not a possibility to pre-compute something at activity insertion so that we can rewrite the query on a more efficient way...

Or maybe a special index that would be cross the activity and the user_challenge_association tables ?

Any help would be very welcome,

Many thanks in advance,

Brice

RE: SQL schema and query optimisation for fast cross-table query execution

From
"Mike Sofen"
Date:

From: Brice André <brice@famille-andre.be>  Sent: Saturday, January 25, 2020 12:49 AM
wrote:

 

The current DB schema is as follows:
user table : this table records all users registered on the web-site
   - user_id
   - ...
challenge table: this table records all ongoing challenges
   - challenge_id
   - ...
user_challenge_association table : this table records which user is registered to which challenge
   - user_id
   - challenge_id
activity table : this table records all activities of the users
   - user_id
   - date
   - ...
   
a naive implementation of my query could be something like this:

SELECT * FROM activity WHERE user_id IN (SELECT user_id FROM user_challenge_association WHERE challenge_id IN (SELECT challenge_id FROM user_challenge_association WHERE user_id = ||current_user||)) ORDER BY date

Brice

---------------------------------------------------------------

 

Brice, you are correct, “IN” clauses are horrific performers because they turn every member of the IN into an OR, so the larger your IN list, the slower your query runs. 

 

But every IN can be replaced with a join, and that leverages the power of the relational engine.  With standard indexes on the primary and foreign keys, this will be extremely fast and scalable to many hundreds of millions of rows.

 

Taking your query and rewriting it with joins:

SELECT u.user_name, c.challenge_name, a.activity_name, a.activity_date

FROM activity a

Join  user_challenge_association uca on (user_id)

Join challenge c on (challenge_id)

Join user u on (user_id)

Where u.user_id = ||current_user||)) ORDER BY date

 

Mike Sofen