Thread: subselect instead of a view...

subselect instead of a view...

From
"Dan Langille"
Date:
Create view WLE_pathnames as 
SELECT E.name, EP.pathname  FROM element             E,       element_pathnames   EP,       watch_list_element  WLE
WHEREWLE.watch_list_id = 3724   and WLE.element_id    = E.id   and E.id              = EP.id;   name     |
pathname
-------------+-----------------------------euchre      | /ports/games/euchrereallyslick | /ports/graphics/reallyslick

The above query is pretty well optimized:

Nested Loop  (cost=0.00..647.08 rows=62 width=61) (actual 
time=0.99..1.19 rows=2 loops=1) ->  Nested Loop  (cost=0.00..437.06 rows=62 width=20) (actual 
time=0.66..0.78 rows=2 loops=1)       ->  Index Scan using watch_list_element_pkey on 
watch_list_element wle  (cost=0.00..229.64 rows=62 width=4) (actual 
time=0.34..0.36 rows=2 loops=1)       ->  Index Scan using element_pkey on element e  
(cost=0.00..3.34 rows=1 width=16) (actual time=0.16..0.17 rows=1 
loops=2) ->  Index Scan using element_pathnames_pkey on element_pathnames ep (cost=0.00..3.38 rows=1 width=41) (actual
time=0.16..0.17rows=1 
 
loops=2)
Total runtime: 1.44 msec

Now I want to get all the stuff from element_pathnames like
pathname || '/%'.  Does that make sense?

Essentially, I want this using the above view:

explain analyzeSELECT E.id, CLE.commit_log_id, E.name, EP.pathname   FROM element             E,
element_pathnames  EP,        commit_log_elements CLE,        WLE_pathnames       WLEP  WHERE E.id              = EP.id
  AND EP.pathname       like WLEP.pathname || '/%'    AND CLE.element_id    = E.id
 
ORDER BY EP.pathname;

I know this can be done better, I just can't figure out how. I keep
thinking of a subselect but I'm totally blocked.  It must be bed 
time.

Sort  (cost=285579.85..285579.85 rows=67012 width=114) (actual 
time=9463.95..9464.01 rows=11 loops=1) ->  Hash Join  (cost=264060.42..272748.13 rows=67012 width=114) 
(actual time=9154.69..9463.55 rows=11 loops=1)       ->  Seq Scan on commit_log_elements cle  (cost=0.00..3936.75 
rows=216575 width=8) (actual time=0.18..1762.38 rows=216575 loops=1)       ->  Hash  (cost=263370.92..263370.92
rows=36997width=106) 
 
(actual time=5716.62..5716.62 rows=0 loops=1)             ->  Hash Join  (cost=258032.99..263370.92 rows=36997 
width=106) (actual time=5524.78..5695.47 rows=10 loops=1)                   ->  Seq Scan on element e
(cost=0.00..2286.70
 
rows=119570 width=16) (actual time=0.15..892.40 rows=119570 loops=1)                   ->  Hash
(cost=257416.50..257416.50rows=36997 
 
width=90) (actual time=3481.05..3481.05 rows=0 loops=1)                         ->  Nested Loop  (cost=0.00..257416.50

rows=36997 width=90) (actual time=1847.01..3465.54 rows=10 loops=1)                               ->  Nested Loop
(cost=0.00..647.08
 
rows=62 width=49) (actual time=0.96..1.50 rows=2 loops=1)                                     ->  Nested Loop  
(cost=0.00..437.06 rows=62 width=8) (actual time=0.64..0.94 rows=2 
loops=1)                                           ->  Index Scan using 
watch_list_element_pkey on watch_list_element wle  (cost=0.00..229.64 
rows=62 width=4) (actual time=0.34..0.37 rows=2 loops=1)                                           ->  Index Scan using

element_pkey on element e  (cost=0.00..3.34 rows=1 width=4) (actual 
time=0.21..0.22 rows=1 loops=2)                                     ->  Index Scan using 
element_pathnames_pkey on element_pathnames ep  (cost=0.00..3.38 
rows=1 width=41) (actual time=0.21..0.23 rows=1 loops=2)                               ->  Seq Scan on
element_pathnamesep  
 
(cost=0.00..2355.70 rows=119570 width=41) (actual time=0.08..858.74 
rows=119570 loops=2)
Total runtime: 9464.51 msec

Clues please?
-- 
Dan Langille : http://www.langille.org/



Re: subselect instead of a view...

From
"Dan Langille"
Date:
On 25 Nov 2002 at 22:15, Dan Langille wrote:

> I know this can be done better, I just can't figure out how. I keep
> thinking of a subselect but I'm totally blocked.  It must be bed time.

It's odd what reading the paper, relaxing with a book, and then 
trying to sleep can generate.  There I was, almost dropping off, when 
I realised I needed this:

SELECT E.name, EP.pathname, E.id, EP2.pathname, CL.id  FROM element             E,       element_pathnames   EP,
watch_list_element WLE,       element_pathnames   EP2,       element             E2,       commit_log_elements CLE,
 commit_log          CL WHERE WLE.watch_list_id = 3724   and WLE.element_id    = E.id   and E.id              = EP.id
andEP2.pathname like   EP.pathname || '/%'   AND EP2.id            = E2.id   AND E2.id             = CLE.element_id
ANDCLE.commit_log_id = CL.id;
 

I am still suspicous of that like.  It seems to be the performance 
killer here.  There is an index which can be used:

# explain select * from element_pathnames WHERE pathname like 'abc%';
NOTICE:  QUERY PLAN:

Index Scan using element_pathnames_pathname on element_pathnames  
(cost=0.00..5.80 rows=1 width=41)

But in the main query, it doesn't get picked up.  The explain appears 
below (and at http://www.freshports.org/tmp/explain.txt which will be 
easier to read than this text-wrapped version).  There are quite a 
few sequential scans there.  I'm confused as to why the indexes are 
not being used.  A "vacuum analyze" has been run.

Thanks.

Hash Join  (cost=266574.28..279596.82 rows=67012 width=118) ->  Hash Join  (cost=263685.03..272372.74 rows=67012
width=114)      ->  Seq Scan on commit_log_elements cle  (cost=0.00..3936.75 
 
rows=216575 width=8)       ->  Hash  (cost=262995.54..262995.54 rows=36997 width=106)             ->  Hash Join
(cost=2994.62..262995.54rows=36997 
 
width=106)                   ->  Nested Loop  (cost=0.00..257416.50 rows=36997 
width=102)                         ->  Nested Loop  (cost=0.00..647.08 rows=62 
width=61)                               ->  Nested Loop  (cost=0.00..437.06 
rows=62 width=20)                                     ->  Index Scan using 
watch_list_element_pkey on watch_list_element wle  (cost=0.00..229.64 
rows=62 width=4)                                     ->  Index Scan using 
element_pkey on element e  (cost=0.00..3.34 rows=1 width=16)                               ->  Index Scan using 
element_pathnames_pkey on element_pathnames ep  (cost=0.00..3.38 
rows=1 width=41)                         ->  Seq Scan on element_pathnames ep2  
(cost=0.00..2355.70 rows=119570 width=41)                   ->  Hash  (cost=2286.70..2286.70 rows=119570 
width=4)                         ->  Seq Scan on element e2  
(cost=0.00..2286.70 rows=119570 width=4) ->  Hash  (cost=2543.20..2543.20 rows=58420 width=4)       ->  Seq Scan on
commit_logcl  (cost=0.00..2543.20 rows=58420 
 
width=4)
-- 
Dan Langille : http://www.langille.org/



Re: subselect instead of a view...

From
Tom Lane
Date:
"Dan Langille" <dan@langille.org> writes:
> SELECT E.name, EP.pathname, E.id, EP2.pathname, CL.id
> ...
>     and EP2.pathname like   EP.pathname || '/%'

> I am still suspicous of that like.  It seems to be the performance 
> killer here.  There is an index which can be used:

It won't be, though.  The LIKE-to-indexscan transformation happens at
plan time, and that means it can only happen if the pattern is a
constant.  Which it surely will not be in your example.
        regards, tom lane


Re: subselect instead of a view...

From
"Dan Langille"
Date:
On 26 Nov 2002 at 0:29, Tom Lane wrote:

> "Dan Langille" <dan@langille.org> writes:
> > SELECT E.name, EP.pathname, E.id, EP2.pathname, CL.id
> > ...
> >     and EP2.pathname like   EP.pathname || '/%'
> 
> > I am still suspicous of that like.  It seems to be the performance
> > killer here.  There is an index which can be used:
> 
> It won't be, though.  The LIKE-to-indexscan transformation happens at
> plan time, and that means it can only happen if the pattern is a
> constant.  Which it surely will not be in your example.

Thanks.  I'll see if I can come up with something else to do this.
-- 
Dan Langille : http://www.langille.org/