Thread: index usage for query

index usage for query

From
Tomas Berndtsson
Date:
Hi,

I have a question about index usage in PostgreSQL 7.2.1 on Solaris.

I have three tables like this:

port: element   text     portno    int
primary key: element, portno
index: element

port_s: element   text       portno    int       sname     text       pri       int
primary key: element, portno, sname
index: element
index: element, portno

port_s_p: element   text         portno    int         sname     text         pname     text         value     text
primary key: element, portno, sname, pname
index: element, portno, sname


At first, I did the query like this:

SELECT po.portno,po.element,s.sname,pri,p.pname,value
FROM port po, port_s s LEFT OUTER JOIN port_s_p p
USING (element, portno, sname) WHERE po.element=s.element
AND po.portno=s.portno AND po.element LIKE 'lab-el1'
ORDER BY po.element,po.portno,pri,s.sname;

And got this query plan using EXPLAIN:

Sort  (cost=43690.55..43690.55 rows=26 width=157) ->  Merge Join  (cost=41757.20..43689.93 rows=26 width=157)       ->
IndexScan using idx_p_element on port po (cost=0.00..72.65 rows=22 width=67)       ->  Materialize
(cost=42682.50..42682.50rows=370111 width=90)             ->  Merge Join  (cost=41757.20..42682.50 rows=370111
width=90)                  ->  Sort  (cost=41755.93..41755.93 rows=370111 width=42)                         ->  Seq
Scanon port_s s (cost=0.00..7525.11 rows=370111 width=42)                   ->  Sort  (cost=1.27..1.27 rows=10
width=48)                        ->  Seq Scan on port_s_p p (cost=0.00..1.10 rows=10 width=48)
 

EXPLAIN



What I don't understand is why the index port_s(element,portno)
is not used here.


If I changed the query to:

SELECT po.portno,po.element,s.sname,pri,p.pname,value
FROM port po INNER JOIN port_s s USING (element, portno) LEFT
OUTER JOIN port_s_p p USING (element, portno, sname)
WHERE po.element LIKE 'lab-el1'
ORDER BY po.element,po.portno,pri,s.sname;

I.e. using INNER JOIN instead of the WHERE case to join port and
port_s. This query gave this plan:

NOTICE:  QUERY PLAN:

Sort  (cost=239.17..239.17 rows=26 width=157) ->  Merge Join  (cost=1.27..238.55 rows=26 width=157)       ->  Nested
Loop (cost=0.00..237.19 rows=26 width=109)             ->  Index Scan using idx_p_element on port po (cost=0.00..72.65
rows=22width=67)             ->  Index Scan using idx_p_s_element_portno on port_s s  (cost=0.00..7.38 rows=1 width=42)
     ->  Sort  (cost=1.27..1.27 rows=10 width=48)             ->  Seq Scan on port_s_p p (cost=0.00..1.10 rows=10
width=48)

EXPLAIN



Can someone explain why the index is used in the second query, but not
in the first?


Greetings,

Tomas


Re: index usage for query

From
Richard Huxton
Date:
On Tuesday 19 Nov 2002 9:10 am, Tomas Berndtsson wrote:
> Hi,
>
> I have a question about index usage in PostgreSQL 7.2.1 on Solaris.

>
> At first, I did the query like this:
>
> SELECT po.portno,po.element,s.sname,pri,p.pname,value
> FROM port po, port_s s LEFT OUTER JOIN port_s_p p
> USING (element, portno, sname) WHERE po.element=s.element
> AND po.portno=s.portno AND po.element LIKE 'lab-el1'
> ORDER BY po.element,po.portno,pri,s.sname;
>
> And got this query plan using EXPLAIN:
>
> Sort  (cost=43690.55..43690.55 rows=26 width=157)
>   ->  Merge Join  (cost=41757.20..43689.93 rows=26 width=157)
>         ->  Index Scan using idx_p_element on port po (cost=0.00..72.65
> rows=22 width=67) ->  Materialize  (cost=42682.50..42682.50 rows=370111
> width=90) ->  Merge Join  (cost=41757.20..42682.50 rows=370111 width=90) ->
>  Sort  (cost=41755.93..41755.93 rows=370111 width=42) ->  Seq Scan on
> port_s s (cost=0.00..7525.11 rows=370111 width=42) ->  Sort
> (cost=1.27..1.27 rows=10 width=48)
>                           ->  Seq Scan on port_s_p p (cost=0.00..1.10
> rows=10 width=48)

> What I don't understand is why the index port_s(element,portno)
> is not used here.

Well - the estimate of 370111 rows shows where we have our problem.

An explicit JOIN overrides the planner's normal order of operation. Looking at
the plan, I think what's happening is that OUTER JOIN is being done first to
"post_s s". Then it does your implicit join.

> If I changed the query to:
>
> SELECT po.portno,po.element,s.sname,pri,p.pname,value
> FROM port po INNER JOIN port_s s USING (element, portno) LEFT
> OUTER JOIN port_s_p p USING (element, portno, sname)
> WHERE po.element LIKE 'lab-el1'
> ORDER BY po.element,po.portno,pri,s.sname;
>
> I.e. using INNER JOIN instead of the WHERE case to join port and
> port_s. This query gave this plan:
>
> NOTICE:  QUERY PLAN:
>
> Sort  (cost=239.17..239.17 rows=26 width=157)
>   ->  Merge Join  (cost=1.27..238.55 rows=26 width=157)
>         ->  Nested Loop  (cost=0.00..237.19 rows=26 width=109)
>               ->  Index Scan using idx_p_element on port po
> (cost=0.00..72.65 rows=22 width=67) ->  Index Scan using
> idx_p_s_element_portno on port_s s  (cost=0.00..7.38 rows=1 width=42) ->
> Sort  (cost=1.27..1.27 rows=10 width=48)
>               ->  Seq Scan on port_s_p p (cost=0.00..1.10 rows=10 width=48)

Here you only have one join in two parts "port po" - "port_s s" and then
"port_s_p p".

It's the overriding that's hitting you here. If you've got time could you
rewrite it so that all the joins are implicit and let us know what that does
to the plan?

--  Richard Huxton