On 2/7/25 23:43, James Hunter wrote:
> On Fri, Feb 7, 2025 at 12:09 PM Tomas Vondra <tomas@vondra.me> wrote:
>> ...
>> Yes, I think that's pretty much the idea. Except that I don't think we
>> need to look at the |F| at all - it will have more impact for small |F|,
>> of course, but it doesn't hurt for large |F|.
>>
>> I think it'll probably need to consider which joins increase/decrease
>> the cardinality, and "inject" the dimension joins in between those.
>
> YMMV, but I suspect you may find it much easier to look at |F|, |F
> JOIN D1|, |(F JOIN D1) JOIN D2|, etc., than to consider |F JOIN D1| /
> |F|, etc. (In other words, I suspect that considering absolute
> cardinalities will end up easier/cleaner than considering ratios of
> increases/decreases in cardinalities.) But I have not thought about
> this much, so I am not putting too much weight on my suspicions.
>
That's not what I meant when I mentioned joins that increase/decrease
cardinality. That wasn't referring to the "dimension" joins, which we
expect to have FK and thus should not affect the cardinality at all.
Instead, I was thinking about the "other" joins (if there are any), that
may add or remove rows. AFAIK we want to join the dimensions at the
place with the lowest cardinality - the discussion mostly assumed the
joins would only reduce the cardinality, in which case we'd just leave
the dimensions until the very end.
But ISTM that may not be necessarily true. Let's say there's a join that
"multiplies" each row. It'll probably be done at the end, and the
dimension joins should probably happen right before it ... not sure.
cheers
--
Tomas Vondra