Exactly. I'm really just trying to understand if there's some functional limitation to it being able to do that with how it executes these types of queries, or if its just an optimization that hasn't been built into the query planner yet.
I know I can get it to do precisely this if I use a CROSS JOIN LATERAL:
SELECT o.*FROM company_users cu
CROSSJOINLATERAL ( SELECT*FROM orders o WHERE o.user_id = company_users.user_id ORDERBY created_at DESC LIMIT 50 ) cu
WHERE cu.company_id = ?
ORDERBY created_at DESC LIMIT 50
That makes sense to me, it forces a nested loop and executes for each user. But doing a nested select like the query below doesn't use the index or limit the results to 50 per user - even though it does a nested loop just like the lateral join does:
SELECT "orders".*FROM "orders"
WHERE user_id IN (SELECT user_id FROM company_users WHERE company_id = ?)
ORDERBY "orders"."created_at" LIMIT 50
Joins will generally query the whole tables, leading to long run times. Have you tried to preselect the rows of interest with a "WITH ... SELECT ..." query to reduce the amount of data processed?
Who knows which users are going to be in that list???
It doesn't matter. Worse case scenario there is only one user in the result and so all 50 rows are their earliest 50 rows. The system will thus never need more than the earliest 50 rows per user to answer this question.