Thank you for looking into this.
Now we can execute a, still narrow, family queries!
Maybe it helps to see this as a social network feeds. Imagine a social network, you have a few friends, or follow a few people, and you want to see their updates ordered by date. For each user we have a different combination of users that we have to display. But maybe, even having hundreds of users we will only show the first 10.
There is a low hanging fruit on the skip scan, if we need N rows, and one group already has M rows we could stop there.
If Nx is the number of friends, and M is the number of posts to show.
This runs with complexity (Nx * M) rows, followed by an (Nx * M) sort, instead of (Nx * N) followed by an (Nx * N) sort.
Where M = 10 and N is 1000 this is a significant improvement.
But if M ~ N, the merge scan that runs with M + Nx row accesses, (M + Nx) heap operations.
If everything is on the same page the skip scan would win.
The cost estimation is probably far off.
I am also not considering the filters applied after this operator, and I don't know if the planner infrastructure is able to adjust it by itself.
This is where I would like reviewer's feedback. I think that the planner costs are something to be determined experimentally.
Next I will make it slightly more general handling
* More index columns: Index (a, b, s...) could support WHERE a IN (...) ORDER BY b LIMIT N (ignoring s...)
* Multi-column prefix: WHERE (a, b) IN (...) ORDER BY c
* Non-leading prefix: WHERE b IN (...) AND a = const ORDER BY c on index (a, b, c)
---
Kind Regards,
Alexandre