Re: should we have a fast-path planning for OLTP starjoins? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: should we have a fast-path planning for OLTP starjoins?
Date
Msg-id 9edd7fa2-843d-4381-9836-5771c2c72b86@vondra.me
Whole thread Raw
In response to Re: should we have a fast-path planning for OLTP starjoins?  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: should we have a fast-path planning for OLTP starjoins?
List pgsql-hackers
On 2/4/25 20:43, Jeff Davis wrote:
> On Tue, 2025-02-04 at 15:00 +0100, Tomas Vondra wrote:
>> This is a surprisingly common query pattern in OLTP applications,
>> thanks
>> to normalization.
> 
> +1. Creating a small lookup table should be encouraged rather than
> penalized.
> 
> Your test data includes a fact table with 10k rows and no index on the
> filter condition. In OLTP applications the fact table might often fit
> in memory, but I'd still expect it to have an index on the filter
> condition. That might not change your overall point, but I'm curious
> why you constructed the test that way?
> 

No particular reason. I think I intended to make it a lookup by PK
(which would match the use case examples), and I forgot about that. But
yeah, I would expect an index too.

> 
>> There's a lot of stuff that could / should be improved on the current
>> patch. For (1) we might add support for more complex cases with
>> snowflake schemas [3] or with multiple fact tables. At the same time
>> (1)
>> needs to be very cheap, so that it does not regress every non-
>> starjoin
>> query.
> 
> The patch only considers the largest table as the fact table, which is
> a good heuristic of course. However, I'm curious if other approaches
> might work. For instance, could we consider the table involved in the
> most join conditions to be the fact table?
> 
> If you base it on the join conditions rather than the size of the
> table, then detection of the star join would be based purely on the
> query structure (not stats), which would be nice for predictability.
> 

Right, there may be other (possibly better) ways to detect the star join
shape. I was thinking about also requiring for foreign keys on the join
clauses - in DWH systems FKeys are sometimes omitted, which would break
the heuristics, but in OLTP it's common to still have them.

I think the cost of the heuristic will be an important metric - I don't
know if the number of join conditions is more expensive to determine
than what the patch does now, though.

>> But the bigger question is whether it makes sense to have such fast-
>> path
>> modes for certain query shapes.
> 
> We should explore what kinds of surprising cases it might create, or
> what maintenance headaches might come up with future planner changes.
> But the performance numbers you posted suggest that we should do
> something here.
> 

Yes, it seems like an interesting opportunity for starjoin queries. It's
a pretty common query pattern, but it also happens to be very expensive
to plan because the dimensions can be reordered almost arbitrarily.

regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Robert Treat
Date:
Subject: Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Next
From: Jeff Davis
Date:
Subject: Re: new commitfest transition guidance