Thread: inefficient query plan with left joined view
What I'm doing on V7.3.1: select t1.keycol, t2.keycol from tab1 t1 LEFT join myview t2 on t1.keycol=t2.keycol where t1.keycol=1000001 t1 has 100 rows, t2 has 4000, both with keycol as PK. the view is created as CREATE myview AS SELECT keycol, 22::integer as calc_col FROM tab2 The query plan will show an ugly subquery scan on all tab2 rows. If the view is created without calculated columns, the query plan looks as expected showing and index scan on tab2 with the correct condition, inner join will always be ok. In real life, the view consists of a lot more tables, and the tables may contain >1,000,000 rows so you may imagine the performance...
Andreas Pflug <Andreas.Pflug@web.de> writes: > What I'm doing on V7.3.1: > select t1.keycol, t2.keycol > from tab1 t1 > LEFT join myview t2 on t1.keycol=t2.keycol > where t1.keycol=1000001 > the view is created as > CREATE myview AS SELECT keycol, 22::integer as calc_col FROM tab2 The subquery isn't pulled up because it doesn't pass the has_nullable_targetlist test in src/backend/optimizer/plan/planner.c. If we did flatten it, then references to calc_col wouldn't correctly go to NULL when the LEFT JOIN should make them do so --- they'd be 22 all the time. As the notes in that routine say, it could be made smarter: strict functions of nullable variables could be allowed. So if your real concern is not '22' but something like 'othercol + 22' then this is fixable. regards, tom lane
Tom Lane wrote: >The subquery isn't pulled up because it doesn't pass the >has_nullable_targetlist test in src/backend/optimizer/plan/planner.c. >If we did flatten it, then references to calc_col wouldn't correctly >go to NULL when the LEFT JOIN should make them do so --- they'd be >22 all the time. > >As the notes in that routine say, it could be made smarter: strict >functions of nullable variables could be allowed. So if your real >concern is not '22' but something like 'othercol + 22' then this is >fixable. > > regards, tom lane > > > Tom, actually my views do use calculated columns (mostly concated strings, e.g. full name from title/1st/last name). As the example shows even columns that are never used will be taken into account when checking has_nullable_targetlist. Unfortunately I have a lot of views containing views which containing.... delivering a lot more columns than needed. But they are checked anyway... I'd expect the parser to look at the join construction only to find out about available data. Why should the selected (and even unselected) columns be evaluated if the join delivers no result? Maybe this can be achieved by checking only JOIN ON/WHERE columns with has_nullable_targetlist? Regards, Andreas
Andreas Pflug <Andreas.Pflug@web.de> writes: > As the example shows even > columns that are never used will be taken into account when checking > has_nullable_targetlist. It's not really practical to do otherwise, as the code that needs to check this doesn't have access to a list of the columns actually used. Even if we kluged things up enough to make it possible to find that out, that would merely mean that *some* of your queries wouldn't have a problem. What about improving the intelligence of the nullability check --- or do you have non-strict expressions in there? regards, tom lane