Re: star schema and the optimizer - Mailing list pgsql-hackers
From | Marc Cousin |
---|---|
Subject | Re: star schema and the optimizer |
Date | |
Msg-id | 54F17668.902@gmail.com Whole thread Raw |
In response to | Re: star schema and the optimizer (Marc Cousin <cousinmarc@gmail.com>) |
Responses |
Re: star schema and the optimizer
|
List | pgsql-hackers |
On 27/02/2015 20:01, Marc Cousin wrote: > On 27/02/2015 19:45, Tom Lane wrote: >>> I wrote: >>>> I had actually thought that we'd fixed this type of problem in recent >>>> versions, and that you should be able to get a plan that would look >>>> like >> >>>> Nestloop >>>> -> scan dim1 >>>> -> Nestloop >>>> -> scan dim2 >>>> -> indexscan fact table using dim1.a and dim2.b >> >> After closer study, I think this is an oversight in commit >> e2fa76d80ba571d4de8992de6386536867250474, which quoth >> >> +It can be useful for the parameter value to be passed down through >> +intermediate layers of joins, for example: >> + >> + NestLoop >> + -> Seq Scan on A >> + Hash Join >> + Join Condition: B.Y = C.W >> + -> Seq Scan on B >> + -> Index Scan using C_Z_IDX on C >> + Index Condition: C.Z = A.X >> + >> +If all joins are plain inner joins then this is unnecessary, because >> +it's always possible to reorder the joins so that a parameter is used >> +immediately below the nestloop node that provides it. But in the >> +presence of outer joins, join reordering may not be possible, and then >> +this option can be critical. Before version 9.2, Postgres used ad-hoc >> >> This reasoning overlooked the fact that if we need parameters from >> more than one relation, and there's no way to join those relations >> to each other directly, then we have to allow passing the dim1 parameter >> down through the join to dim2. >> >> The attached patch seems to fix it (modulo the need for some updates >> in the README, and maybe a regression test). Could you see if this >> produces satisfactory plans for you? > > > From what I see, it's just perfect. I'll give it a more thorough look a > bit later, but it seems to be exactly what I was waiting for. > > Thanks a lot. > > Regards I gave it another look this morning. It works very well with the initial test schema. The situation is much improved forme. I still have one issue: I've extended the test to more than 2 dimensions. marc=# explain analyze SELECT * FROM dim1,dim2,dim3,dim4,facts WHERE facts.dim1=dim1.a and facts.dim2=dim2.a and facts.dim3=dim3.aand facts.dim4=dim4.a and dim1.b between 10 and 60 AND dim2.b between 30 and 50 and dim3.b=8526 and dim4.bbetween 70 and 90; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------Nested Loop (cost=15.00..957710.99 rows=1 width=200) (actual time=793.247..793.247 rows=0 loops=1) -> Nested Loop (cost=14.71..957704.75rows=1 width=192) (actual time=793.245..793.245 rows=0 loops=1) Join Filter: (facts.dim3 = dim3.a) Rows Removed by Join Filter: 942 -> Index Scan using idxdim32 on dim3 (cost=0.29..3300.29 rows=10width=8) (actual time=49.498..67.923 rows=1 loops=1) Filter: (b = 8526) Rows Removed byFilter: 99999 -> Materialize (cost=14.41..954262.56 rows=962 width=184) (actual time=0.552..724.988 rows=942 loops=1) -> Nested Loop (cost=14.41..954257.75 rows=962 width=184) (actual time=0.542..723.380 rows=942 loops=1) -> Index Scan using idxdim22 on dim2 (cost=0.29..3648.29 rows=192 width=16) (actual time=0.074..47.445rows=229 loops=1) Filter: ((b >= 30) AND (b <= 50)) RowsRemoved by Filter: 99771 -> Nested Loop (cost=14.12..4946.08 rows=501 width=168) (actual time=2.575..2.946rows=4 loops=229) -> Bitmap Heap Scan on dim1 (cost=13.43..573.60 rows=501 width=16)(actual time=0.170..0.647 rows=509 loops=229) Recheck Cond: ((b >= 10) AND (b <=60)) Heap Blocks: exact=78547 -> Bitmap Index Scan on idxdim11 (cost=0.00..13.30 rows=501 width=0) (actual time=0.112..0.112 rows=509 loops=229) Index Cond: ((b >= 10) AND (b <= 60)) -> Index Scan using idxfacts on facts (cost=0.69..8.72rows=1 width=152) (actual time=0.004..0.004 rows=0 loops=116561) Index Cond:((dim1 = dim1.a) AND (dim2 = dim2.a)) -> Index Scan using idxdim42 on dim4 (cost=0.29..6.24 rows=1 width=8) (neverexecuted) Index Cond: (a = facts.dim4) Filter: ((b >= 70) AND (b <= 90))Planning time: 8.092 msExecutiontime: 793.621 ms (25 lignes) marc=# \d facts Table « public.facts »Colonne | Type | Modificateurs ---------+--------+---------------dim1 | bigint | dim2 | bigint | dim3 | bigint | dim4 | bigint | dim5 | bigint| dim6 | bigint | dim7 | bigint | dim8 | bigint | dim9 | bigint | dim10 | bigint | data1 | text |data2 | text | data3 | text | data4 | text | data5 | text | data6 | text | Index : "idxfacts" btree (dim1, dim2, dim3, dim4, dim5, dim6, dim7, dim8, dim9, dim10) I hoped that the dim3=dim3.a condition could be used in the index scan on facts (I have chosen the 8526 value because itonly has one matching row). Maybe that's not possible, or not efficient, but I thought I'd rather ask. Regards.
pgsql-hackers by date: