On 2/4/25 20:43, Jeff Davis wrote:
> On Tue, 2025-02-04 at 15:00 +0100, Tomas Vondra wrote:
>> This is a surprisingly common query pattern in OLTP applications,
>> thanks
>> to normalization.
>
> +1. Creating a small lookup table should be encouraged rather than
> penalized.
>
> Your test data includes a fact table with 10k rows and no index on the
> filter condition. In OLTP applications the fact table might often fit
> in memory, but I'd still expect it to have an index on the filter
> condition. That might not change your overall point, but I'm curious
> why you constructed the test that way?
>
No particular reason. I think I intended to make it a lookup by PK
(which would match the use case examples), and I forgot about that. But
yeah, I would expect an index too.
>
>> There's a lot of stuff that could / should be improved on the current
>> patch. For (1) we might add support for more complex cases with
>> snowflake schemas [3] or with multiple fact tables. At the same time
>> (1)
>> needs to be very cheap, so that it does not regress every non-
>> starjoin
>> query.
>
> The patch only considers the largest table as the fact table, which is
> a good heuristic of course. However, I'm curious if other approaches
> might work. For instance, could we consider the table involved in the
> most join conditions to be the fact table?
>
> If you base it on the join conditions rather than the size of the
> table, then detection of the star join would be based purely on the
> query structure (not stats), which would be nice for predictability.
>
Right, there may be other (possibly better) ways to detect the star join
shape. I was thinking about also requiring for foreign keys on the join
clauses - in DWH systems FKeys are sometimes omitted, which would break
the heuristics, but in OLTP it's common to still have them.
I think the cost of the heuristic will be an important metric - I don't
know if the number of join conditions is more expensive to determine
than what the patch does now, though.
>> But the bigger question is whether it makes sense to have such fast-
>> path
>> modes for certain query shapes.
>
> We should explore what kinds of surprising cases it might create, or
> what maintenance headaches might come up with future planner changes.
> But the performance numbers you posted suggest that we should do
> something here.
>
Yes, it seems like an interesting opportunity for starjoin queries. It's
a pretty common query pattern, but it also happens to be very expensive
to plan because the dimensions can be reordered almost arbitrarily.
regards
--
Tomas Vondra