Thread: Trouble with LEFT JOIN using VIEWS.

Trouble with LEFT JOIN using VIEWS.

From
Piotr Gasidło
Date:
Hello,

I've just hit problem, that is unusual for me.

quaker=> \d sites
                                 Table "public.sites"
   Column   |       Type        |                     Modifiers

-----------+-------------------+----------------------------------------------------
  id        | integer           | not null default
nextval('sites_id_seq'::regclass)
  site_name | character varying | not null
  user_id   | integer           | not null
  extra     | integer           |
Indexes:
     "sites_pkey" PRIMARY KEY, btree (id)
     "sites_site_name_key_unique" UNIQUE, btree (site_name text_pattern_ops)
     "sites_user_id_key" btree (user_id)

quaker=> \d users
                                 Table "public.users"
   Column   |       Type        |                     Modifiers

-----------+-------------------+----------------------------------------------------
  id        | integer           | not null default
nextval('users_id_seq'::regclass)
  user_name | character varying | not null
  extra     | integer           |
Indexes:
     "users_pkey" PRIMARY KEY, btree (id)

Both tables filled with 100k records of random data. In users id is in
range from 1..100k, same in sites. In sites user_id is random, range
from 1..150k.

I've created views:

quaker=> \d users_secure
         View "public.users_secure"
   Column   |       Type        | Modifiers
-----------+-------------------+-----------
  id        | integer           |
  user_name | character varying |
View definition:
  SELECT users.id, users.user_name
    FROM users;

quaker=> \d users_secure_with_has_extra
  View "public.users_secure_with_has_extra"
   Column   |       Type        | Modifiers
-----------+-------------------+-----------
  id        | integer           |
  user_name | character varying |
  has_extra | boolean           |
View definition:
  SELECT users.id, users.user_name, users.extra IS NOT NULL AS has_extra
    FROM users;

Now, when I do simple query to find all data for sites matching
site_name like 'H3bh%' (there are at least one record in sites matching
this condition).

quaker=> explain analyze select s.site_name,u.user_name from
sites_secure s left join users_secure_with_has_extra u on u.id =
s.user_id where site_name like 'H3bh%' order by site_name limit 10;

QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=3897.02..3897.03 rows=2 width=44) (actual
time=430.326..430.331 rows=1 loops=1)
    ->  Sort  (cost=3897.02..3897.03 rows=2 width=44) (actual
time=430.321..430.323 rows=1 loops=1)
          Sort Key: sites.site_name
          ->  Nested Loop Left Join  (cost=0.00..3897.01 rows=2
width=44) (actual time=290.103..430.301 rows=1 loops=1)
                Join Filter: ("inner".id = "outer".user_id)
                ->  Index Scan using sites_site_name_key_unique on sites
  (cost=0.00..6.01 rows=1 width=16) (actual time=0.044..0.054 rows=1
loops=1)
                      Index Cond: (((site_name)::text ~>=~ 'H3bh'::text)
AND ((site_name)::text ~<~ 'H3bi'::text))
                      Filter: ((site_name)::text ~~ 'H3bh%'::text)
                ->  Seq Scan on users  (cost=0.00..1641.00 rows=100000
width=20) (actual time=0.007..245.406 rows=100000 loops=1)
  Total runtime: 430.432 ms
(10 rows)

When I resign from LEFT JOIN users_secure_with_has_extra, and put JOIN
instead I've got:

quaker=> explain analyze select s.site_name,u.user_name from
sites_secure s join users_secure_with_has_extra u on u.id = s.user_id
where site_name like 'H3bh%' order by site_name limit 10;

QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=9.05..9.06 rows=1 width=24) (actual time=0.112..0.118
rows=1 loops=1)
    ->  Sort  (cost=9.05..9.06 rows=1 width=24) (actual
time=0.106..0.108 rows=1 loops=1)
          Sort Key: sites.site_name
          ->  Nested Loop  (cost=0.00..9.04 rows=1 width=24) (actual
time=0.073..0.088 rows=1 loops=1)
                ->  Index Scan using sites_site_name_key_unique on sites
  (cost=0.00..6.01 rows=1 width=16) (actual time=0.044..0.050 rows=1
loops=1)
                      Index Cond: (((site_name)::text ~>=~ 'H3bh'::text)
AND ((site_name)::text ~<~ 'H3bi'::text))
                      Filter: ((site_name)::text ~~ 'H3bh%'::text)
                ->  Index Scan using users_pkey on users
(cost=0.00..3.02 rows=1 width=16) (actual time=0.019..0.023 rows=1 loops=1)
                      Index Cond: (users.id = "outer".user_id)
  Total runtime: 0.216 ms
(10 rows)

As explain shows PostgreSQL is using index scan on users, instead of seq
scan like in example above.

Now. When I use view with no has_extra field (important: field is a
simple function on extra field) I get expectable results. Both using
indexes.

quaker=> explain analyze select s.site_name,u.user_name from
sites_secure s left join users_secure u on u.id = s.user_id where
site_name like 'H3bh%' order by site_name limit 10;

QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=9.05..9.06 rows=1 width=24) (actual time=0.111..0.117
rows=1 loops=1)
    ->  Sort  (cost=9.05..9.06 rows=1 width=24) (actual
time=0.105..0.107 rows=1 loops=1)
          Sort Key: sites.site_name
          ->  Nested Loop Left Join  (cost=0.00..9.04 rows=1 width=24)
(actual time=0.072..0.087 rows=1 loops=1)
                ->  Index Scan using sites_site_name_key_unique on sites
  (cost=0.00..6.01 rows=1 width=16) (actual time=0.043..0.049 rows=1
loops=1)
                      Index Cond: (((site_name)::text ~>=~ 'H3bh'::text)
AND ((site_name)::text ~<~ 'H3bi'::text))
                      Filter: ((site_name)::text ~~ 'H3bh%'::text)
                ->  Index Scan using users_pkey on users
(cost=0.00..3.02 rows=1 width=16) (actual time=0.019..0.022 rows=1 loops=1)
                      Index Cond: (users.id = "outer".user_id)
  Total runtime: 0.216 ms
(10 rows)

quaker=> explain analyze select s.site_name,u.user_name from
sites_secure s join users_secure u on u.id = s.user_id where site_name
like 'H3bh%' order by site_name limit 10;

QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=9.05..9.06 rows=1 width=24) (actual time=0.109..0.115
rows=1 loops=1)
    ->  Sort  (cost=9.05..9.06 rows=1 width=24) (actual
time=0.104..0.106 rows=1 loops=1)
          Sort Key: sites.site_name
          ->  Nested Loop  (cost=0.00..9.04 rows=1 width=24) (actual
time=0.071..0.086 rows=1 loops=1)
                ->  Index Scan using sites_site_name_key_unique on sites
  (cost=0.00..6.01 rows=1 width=16) (actual time=0.042..0.048 rows=1
loops=1)
                      Index Cond: (((site_name)::text ~>=~ 'H3bh'::text)
AND ((site_name)::text ~<~ 'H3bi'::text))
                      Filter: ((site_name)::text ~~ 'H3bh%'::text)
                ->  Index Scan using users_pkey on users
(cost=0.00..3.02 rows=1 width=16) (actual time=0.018..0.021 rows=1 loops=1)
                      Index Cond: (users.id = "outer".user_id)
  Total runtime: 0.214 ms
(10 rows)

Why?

quaker=> select version();
                                                         version

-----------------------------------------------------------------------------------------------------------------------
  PostgreSQL 8.1.10 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.3
20070831 (prerelease) (Ubuntu 4.1.2-16ubuntu1)
(1 row)



Re: Trouble with LEFT JOIN using VIEWS.

From
Tom Lane
Date:
=?ISO-8859-2?Q?Piotr_Gasid=B3o?= <quaker@barbara.eu.org> writes:
> I've just hit problem, that is unusual for me.

> View definition:
>   SELECT users.id, users.user_name, users.extra IS NOT NULL AS has_extra
>     FROM users;

What you've got here is a non-nullable target list, which creates an
optimization fence when used in the nullable side of an outer join.
The problem is that has_extra should read as NULL in the query output
for a sites_secure row that has no match in users_secure_with_has_extra,
but making users.extra go to null will not make that happen, so there's
a constraint on where the expression can be evaluated.  The current
planner has no way to deal with that except by restricting the plan
structure.

We have some ideas about how to fix this, but don't hold your breath
... it's going to take major surgery on the planner, AFAICS.

            regards, tom lane