Thread: LEFT JOINs takes forever...almost.

LEFT JOINs takes forever...almost.

From
"Niclas Gustafsson"
Date:
Hi, help needed!

I'm having a slight performace problem here, i'm using pg 7.2.1 on a
Linux(RH.7.2).
My query that is causing me problems looks like this:

SELECT v.*,vd.vessel_name,vd.start_date as vd_start_date,vd.end_date as
vd_end_date,d.firstname||' '||d.lastname as customer_name
FROM vessel as v
LEFT JOIN vesseldebtor as vd ON v.callsign=vd.callsign
LEFT JOIN comment as c ON v.comment_id=c.comment_id
LEFT JOIN debtor as d ON vd.customer_id=d.customer_id
WHERE vd.customer_id = '35034694'
ORDER BY vd.start_date ASC

As we see there is four tables in the query, the attentive reader
notices that
I don't select anything from the table comment, but I'm going to do that
later
on so please disregard that for now.

The four tables are:
vessel, vesseldebtor, debtor and comment.

When running a Explain Anaylze on the query this is returned:

Sort  (cost=24333.42..24333.42 rows=41392 width=157) (actual
time=3181.32..3181.33 rows=3 loops=1)
  ->  Hash Join  (cost=3083.88..15830.55 rows=41392 width=157) (actual
time=2943.43..3181.23 rows=3 loops=1)
        ->  Hash Join  (cost=1869.92..11545.77 rows=41392 width=153)
(actual time=2357.00..2672.09 rows=3 loops=1)
              ->  Merge Join  (cost=0.00..6737.57 rows=41392 width=128)
(actual time=1683.54..1787.73 rows=3 loops=1)
                    ->  Index Scan using vessel_pkey on vessel v
(cost=0.00..2498.30 rows=41392 width=95) (actual time=0.22..617.74
rows=41392 loops=1)
                    ->  Index Scan using vesseldebtor_pkey on
vesseldebtor vd  (cost=0.00..3289.35 rows=48368 width=33) (actual
time=0.21..495.45 rows=48368 loops=1)
              ->  Hash  (cost=1504.94..1504.94 rows=41194 width=25)
(actual time=408.73..408.73 rows=0 loops=1)
                    ->  Seq Scan on debtor d  (cost=0.00..1504.94
rows=41194 width=25) (actual time=0.06..184.87 rows=41194 loops=1)
        ->  Hash  (cost=979.16..979.16 rows=39516 width=4) (actual
time=221.50..221.50 rows=0 loops=1)
              ->  Seq Scan on comment c  (cost=0.00..979.16 rows=39516
width=4) (actual time=0.05..82.59 rows=39516 loops=1)
Total runtime: 3181.60 msec

Whereas a similar query using INNER joins like this:

EXPLAIN ANALYZE SELECT v.*,vd.vessel_name,vd.start_date as
vd_start_date,vd.end_date as
vd_end_date,d.firstname||' '||d.lastname as customer_name
FROM vessel as v, vesseldebtor as vd, comment as c, debtor as d
WHERE
v.callsign=vd.callsign AND
v.comment_id=c.comment_id AND
vd.customer_id=d.customer_id AND
vd.customer_id = '35034694'
ORDER BY vd.start_date ASC

And the output:

Sort  (cost=54.29..54.29 rows=4 width=157) (actual time=1.39..1.39
rows=3 loops=1)
  ->  Nested Loop  (cost=0.00..54.25 rows=4 width=157) (actual
time=0.83..1.31 rows=3 loops=1)
        ->  Nested Loop  (cost=0.00..40.40 rows=4 width=132) (actual
time=0.63..1.04 rows=3 loops=1)
              ->  Nested Loop  (cost=0.00..29.27 rows=4 width=128)
(actual time=0.44..0.70 rows=3 loops=1)
                    ->  Index Scan using vd_customer_id_idx on
vesseldebtor vd  (cost=0.00..8.45 rows=4 width=33) (actual
time=0.22..0.31 rows=3 loops=1)
                    ->  Index Scan using vessel_pkey on vessel v
(cost=0.00..5.71 rows=1 width=95) (actual time=0.11..0.12 rows=1
loops=3)
              ->  Index Scan using comment_pkey on comment c
(cost=0.00..3.05 rows=1 width=4) (actual time=0.10..0.10 rows=1 loops=3)
        ->  Index Scan using debtor_pkey on debtor d  (cost=0.00..3.79
rows=1 width=25) (actual time=0.07..0.07 rows=1 loops=3)
Total runtime: 1.64 msec

EXPLAIN

Can someone please explain why the first query is 3000 times slower?
Surely there must be a any way to speed it up?
If the solution is not that trivial, someone please tell me so that I
can describe the columns and indexes more in detail.


Regards,

Niclas Gustafsson
CodeSense AB




Re: LEFT JOINs takes forever...almost.

From
"Nikolaus Dilger"
Date:
Niclas,

You must be pretty impatient if 3 seconds is forever...

Anyway, the reason why the outer join takes so much
longer than the inner join is pretty simple.  The
database has to look through many more rows for your
query.  The more outer joins you have the worse it
gets.  And your explain plan clearly shows that.

Regards
Nikolaus


From the PostgreSQL documentation
LEFT [ OUTER ] JOIN

First, an INNER JOIN is performed. Then, for each row
in T1 that does not satisfy the join condition with any
row in T2, an additional joined row is returned with
null fields in the columns from T2.

Tip: The joined table unconditionally has a row for
each row in T1.




On Fri, 13 September 2002, "Niclas Gustafsson" wrote:

>
> Hi, help needed!
>
> I'm having a slight performace problem here, i'm using
> pg 7.2.1 on a
> Linux(RH.7.2).
> My query that is causing me problems looks like this:
>
> SELECT v.*,vd.vessel_name,vd.start_date as
> vd_start_date,vd.end_date as
> vd_end_date,d.firstname||' '||d.lastname as
> customer_name
> FROM vessel as v
> LEFT JOIN vesseldebtor as vd ON
v.callsign=vd.callsign
> LEFT JOIN comment as c ON v.comment_id=c.comment_id
> LEFT JOIN debtor as d ON vd.customer_id=d.customer_id
> WHERE vd.customer_id = '35034694'
> ORDER BY vd.start_date ASC
>
> As we see there is four tables in the query, the
> attentive reader
> notices that
> I don't select anything from the table comment, but
I'm
> going to do that
> later
> on so please disregard that for now.
>
> The four tables are:
> vessel, vesseldebtor, debtor and comment.
>
> When running a Explain Anaylze on the query this is
> returned:
>
> Sort  (cost=24333.42..24333.42 rows=41392 width=157)
> (actual
> time=3181.32..3181.33 rows=3 loops=1)
>   ->  Hash Join  (cost=3083.88..15830.55 rows=41392
> width=157) (actual
> time=2943.43..3181.23 rows=3 loops=1)
>         ->  Hash Join  (cost=1869.92..11545.77
> rows=41392 width=153)
> (actual time=2357.00..2672.09 rows=3 loops=1)
>               ->  Merge Join  (cost=0.00..6737.57
> rows=41392 width=128)
> (actual time=1683.54..1787.73 rows=3 loops=1)
>                     ->  Index Scan using
vessel_pkey on
> vessel v
> (cost=0.00..2498.30 rows=41392 width=95) (actual
> time=0.22..617.74
> rows=41392 loops=1)
>                     ->  Index Scan using
> vesseldebtor_pkey on
> vesseldebtor vd  (cost=0.00..3289.35 rows=48368
> width=33) (actual
> time=0.21..495.45 rows=48368 loops=1)
>               ->  Hash  (cost=1504.94..1504.94
> rows=41194 width=25)
> (actual time=408.73..408.73 rows=0 loops=1)
>                     ->  Seq Scan on debtor d
> (cost=0.00..1504.94
> rows=41194 width=25) (actual time=0.06..184.87
> rows=41194 loops=1)
>         ->  Hash  (cost=979.16..979.16 rows=39516
> width=4) (actual
> time=221.50..221.50 rows=0 loops=1)
>               ->  Seq Scan on comment c
> (cost=0.00..979.16 rows=39516
> width=4) (actual time=0.05..82.59 rows=39516 loops=1)
> Total runtime: 3181.60 msec
>
> Whereas a similar query using INNER joins like this:
>
> EXPLAIN ANALYZE SELECT
v.*,vd.vessel_name,vd.start_date
> as
> vd_start_date,vd.end_date as
> vd_end_date,d.firstname||' '||d.lastname as
> customer_name
> FROM vessel as v, vesseldebtor as vd, comment as c,
> debtor as d
> WHERE
> v.callsign=vd.callsign AND
> v.comment_id=c.comment_id AND
> vd.customer_id=d.customer_id AND
> vd.customer_id = '35034694'
> ORDER BY vd.start_date ASC
>
> And the output:
>
> Sort  (cost=54.29..54.29 rows=4 width=157) (actual
> time=1.39..1.39
> rows=3 loops=1)
>   ->  Nested Loop  (cost=0.00..54.25 rows=4
width=157)
> (actual
> time=0.83..1.31 rows=3 loops=1)
>         ->  Nested Loop  (cost=0.00..40.40 rows=4
> width=132) (actual
> time=0.63..1.04 rows=3 loops=1)
>               ->  Nested Loop  (cost=0.00..29.27
rows=4
> width=128)
> (actual time=0.44..0.70 rows=3 loops=1)
>                     ->  Index Scan using
> vd_customer_id_idx on
> vesseldebtor vd  (cost=0.00..8.45 rows=4 width=33)
> (actual
> time=0.22..0.31 rows=3 loops=1)
>                     ->  Index Scan using
vessel_pkey on
> vessel v
> (cost=0.00..5.71 rows=1 width=95) (actual
> time=0.11..0.12 rows=1
> loops=3)
>               ->  Index Scan using comment_pkey on
> comment c
> (cost=0.00..3.05 rows=1 width=4) (actual
> time=0.10..0.10 rows=1 loops=3)
>         ->  Index Scan using debtor_pkey on debtor
d
> (cost=0.00..3.79
> rows=1 width=25) (actual time=0.07..0.07 rows=1
loops=3)
> Total runtime: 1.64 msec
>
> EXPLAIN
>
> Can someone please explain why the first query is 3000
> times slower?
> Surely there must be a any way to speed it up?
> If the solution is not that trivial, someone please
> tell me so that I
> can describe the columns and indexes more in detail.
>
>
> Regards,
>
> Niclas Gustafsson
> CodeSense AB
>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org

Re: LEFT JOINs takes forever...almost.

From
Tom Lane
Date:
"Niclas Gustafsson" <niclas.gustafsson@codesense.com> writes:
> Can someone please explain why the first query is 3000 times slower?
> Surely there must be a any way to speed it up?

You're forcing a poor join order.  See
http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/explicit-joins.html

            regards, tom lane