Re: star schema and the optimizer - Mailing list pgsql-hackers

From Marc Cousin
Subject Re: star schema and the optimizer
Date
Msg-id 54F091B9.9060602@gmail.com
Whole thread Raw
In response to Re: star schema and the optimizer  (Marc Cousin <cousinmarc@gmail.com>)
List pgsql-hackers
On 27/02/2015 15:27, Marc Cousin wrote:
> On 27/02/2015 15:08, Tom Lane wrote:
>> Marc Cousin <cousinmarc@gmail.com> writes:
>>> So I gave a look at the optimizer's code to try to understand why I got this problem. If I understand correctly,
theoptimizer won't do cross joins, except if it has no choice.
 
>>
>> That's right, and as you say, the planning-speed consequences of doing
>> otherwise would be disastrous.  However, all you need to help it find the
>> right plan is some dummy join condition between the dimension tables,
>> which will allow the join path you want to be considered.  Perhaps you
>> could do something like
>>
>> SELECT * FROM dim1,dim2,facts WHERE facts.dim1=dim1.a and facts.dim2=dim2.a and dim1.b=12 AND dim2.b=17 and
(dim1.a+dim2.a)is not null;
 
> 
> No I can't. I cannot rewrite the query at all, in my context.
> 
> 
> What do you mean by disastrous ?
> 
> I've given it a few tries here, and with 8 joins (same model, 7
> dimensions), planning time is around 100ms. At least in my context, it's
> well worth the planning time, to save minutes of execution.
> 
> I perfectly understand that it's not something that should be "by
> default", that would be crazy. But in a datawarehouse, it seems to me
> that accepting one, or even a few seconds of planning time to save
> minutes of execution is perfectly legetimate.
> 

I have given it a bit more thought. Could it be possible, to mitigate
this, to permit only a few (few being to define) cross joins ? Still
optional, of course, it still has an important cost. Only allowing cross
joins for the first 3 levels, and keeping this to left-right sided
joins, I can plan up to 11 joins on my small test machine in 500ms
(instead of 150ms with the unpatched one), and get a "good plan", good
meaning 100 times faster.

Regards



pgsql-hackers by date:

Previous
From: Jan Urbański
Date:
Subject: Re: GSoC idea - Simulated annealing to search for query plans
Next
From: Tom Lane
Date:
Subject: Re: star schema and the optimizer