Thread: inefficient query plan with left joined view

inefficient query plan with left joined view

From
Andreas Pflug
Date:
What I'm doing on V7.3.1:

select t1.keycol, t2.keycol
  from  tab1 t1
  LEFT join myview t2 on t1.keycol=t2.keycol
where t1.keycol=1000001

t1 has 100 rows, t2 has 4000, both with keycol as PK.

the view is created as
CREATE myview AS SELECT keycol, 22::integer as calc_col FROM tab2

The query plan will show an ugly subquery scan on all tab2 rows. If the
view is created without calculated columns, the query plan looks as
expected showing and index scan on tab2 with the correct condition,
inner join will always be ok.

In real life, the view consists of a lot more tables, and the tables may
contain >1,000,000 rows so you may imagine the performance...



Re: inefficient query plan with left joined view

From
Tom Lane
Date:
Andreas Pflug <Andreas.Pflug@web.de> writes:
> What I'm doing on V7.3.1:
> select t1.keycol, t2.keycol
>   from  tab1 t1
>   LEFT join myview t2 on t1.keycol=t2.keycol
> where t1.keycol=1000001

> the view is created as
> CREATE myview AS SELECT keycol, 22::integer as calc_col FROM tab2

The subquery isn't pulled up because it doesn't pass the
has_nullable_targetlist test in src/backend/optimizer/plan/planner.c.
If we did flatten it, then references to calc_col wouldn't correctly
go to NULL when the LEFT JOIN should make them do so --- they'd be
22 all the time.

As the notes in that routine say, it could be made smarter: strict
functions of nullable variables could be allowed.  So if your real
concern is not '22' but something like 'othercol + 22' then this is
fixable.

            regards, tom lane

Re: inefficient query plan with left joined view

From
Andreas Pflug
Date:
Tom Lane wrote:

>The subquery isn't pulled up because it doesn't pass the
>has_nullable_targetlist test in src/backend/optimizer/plan/planner.c.
>If we did flatten it, then references to calc_col wouldn't correctly
>go to NULL when the LEFT JOIN should make them do so --- they'd be
>22 all the time.
>
>As the notes in that routine say, it could be made smarter: strict
>functions of nullable variables could be allowed.  So if your real
>concern is not '22' but something like 'othercol + 22' then this is
>fixable.
>
>            regards, tom lane
>
>
>
Tom,

actually my views do use calculated columns (mostly concated strings,
e.g. full name from title/1st/last name). As the example shows even
columns that are never used will be taken into account when checking
has_nullable_targetlist. Unfortunately I have a lot of views containing
views which containing.... delivering a lot more columns than needed.
But they are checked anyway...

I'd expect the parser to look at the join construction only to find out
about available data. Why should the selected (and even unselected)
columns be evaluated if the join delivers no result? Maybe this can be
achieved by checking only JOIN ON/WHERE columns with
has_nullable_targetlist?



Regards,
Andreas


Re: inefficient query plan with left joined view

From
Tom Lane
Date:
Andreas Pflug <Andreas.Pflug@web.de> writes:
> As the example shows even
> columns that are never used will be taken into account when checking
> has_nullable_targetlist.

It's not really practical to do otherwise, as the code that needs to
check this doesn't have access to a list of the columns actually used.
Even if we kluged things up enough to make it possible to find that out,
that would merely mean that *some* of your queries wouldn't have a
problem.

What about improving the intelligence of the nullability check --- or do
you have non-strict expressions in there?

            regards, tom lane