Thread: Four table join with million records - performance improvement?

Four table join with million records - performance improvement?

From
Vijay Moses
Date:
Hi i have four sample tables ename, esal, edoj and esum
All of them have 1000000 records. Im running the following
query : select ename.eid, name, sal, doj, summary from
ename,esal,edoj,esum where ename.eid=esal.eid and ename.eid=edoj.eid
and ename.eid=esum.eid. Its a join of all four tables which returns
all 1 million records. The eid field in ename is a Primary Key and the
eid in all other tables are Foreign Keys. I have created an index for
all Foreign Keys. This query takes around 16 MINUTES to complete. Can
this time be reduced?
Thanks
Vijay

----------------------------------------------------------------

EXPLAIN OUTPUT

QUERY PLAN
Merge Join  (cost=647497.97..163152572.97 rows=25000025000000 width=80)
  Merge Cond: ("outer".eid = "inner".eid)
  ->  Merge Join  (cost=356059.69..75361059.69 rows=5000000000 width=44)
        Merge Cond: ("outer".eid = "inner".eid)
        ->  Sort  (cost=150295.84..152795.84 rows=1000000 width=8)
              Sort Key: edoj.eid
              ->  Seq Scan on edoj  (cost=0.00..15568.00 rows=1000000 width=8)
        ->  Sort  (cost=205763.84..208263.84 rows=1000000 width=36)
              Sort Key: esum.eid
              ->  Seq Scan on esum  (cost=0.00..31976.00 rows=1000000 width=36)
  ->  Sort  (cost=291438.28..293938.29 rows=1000002 width=48)
        Sort Key: ename.eid
        ->  Hash Join  (cost=26683.01..107880.23 rows=1000002 width=48)
              Hash Cond: ("outer".eid = "inner".eid)
              ->  Seq Scan on esal  (cost=0.00..21613.01 rows=1000001 width=12)
              ->  Hash  (cost=16370.01..16370.01 rows=1000001 width=36)
                    ->  Seq Scan on ename  (cost=0.00..16370.01
rows=1000001 width=36)

17 row(s)

Total runtime: 181.021 ms

----------------------------------------------------------------

EXPLAIN ANALYZE OUTPUT

QUERY PLAN

Merge Join  (cost=647497.97..163152572.97 rows=25000025000000
width=80) (actual time=505418.965..584981.013 rows=1000000 loops=1)
  Merge Cond: ("outer".eid = "inner".eid)
  ->  Merge Join  (cost=356059.69..75361059.69 rows=5000000000
width=44) (actual time=110394.376..138177.569 rows=1000000 loops=1)
        Merge Cond: ("outer".eid = "inner".eid)
        ->  Sort  (cost=150295.84..152795.84 rows=1000000 width=8)
(actual time=27587.622..31077.077 rows=1000000 loops=1)
              Sort Key: edoj.eid
              ->  Seq Scan on edoj  (cost=0.00..15568.00 rows=1000000
width=8) (actual time=144.000..10445.145 rows=1000000 loops=1)
        ->  Sort  (cost=205763.84..208263.84 rows=1000000 width=36)
(actual time=82806.646..90322.943 rows=1000000 loops=1)
              Sort Key: esum.eid
              ->  Seq Scan on esum  (cost=0.00..31976.00 rows=1000000
width=36) (actual time=20.312..29030.247 rows=1000000 loops=1)
  ->  Sort  (cost=291438.28..293938.29 rows=1000002 width=48) (actual
time=395024.482..426870.491 rows=1000001 loops=1)
        Sort Key: ename.eid
        ->  Hash Join  (cost=26683.01..107880.23 rows=1000002
width=48) (actual time=29234.472..198064.105 rows=1000001 loops=1)
              Hash Cond: ("outer".eid = "inner".eid)
              ->  Seq Scan on esal  (cost=0.00..21613.01 rows=1000001
width=12) (actual time=32.257..23999.163 rows=1000001 loops=1)
              ->  Hash  (cost=16370.01..16370.01 rows=1000001
width=36) (actual time=19362.095..19362.095 rows=0 loops=1)
                    ->  Seq Scan on ename  (cost=0.00..16370.01
rows=1000001 width=36) (actual time=26.744..13878.410 rows=1000001
loops=1)

Total runtime: 586226.831 ms

18 row(s)

Total runtime: 586,435.978 ms

----------------------------------------------------------------

Re: Four table join with million records - performance improvement?

From
Tom Lane
Date:
Vijay Moses <vijay.moses@gmail.com> writes:
> Hi i have four sample tables ename, esal, edoj and esum
> All of them have 1000000 records. Im running the following
> query : select ename.eid, name, sal, doj, summary from
> ename,esal,edoj,esum where ename.eid=esal.eid and ename.eid=edoj.eid
> and ename.eid=esum.eid. Its a join of all four tables which returns
> all 1 million records. The eid field in ename is a Primary Key and the
> eid in all other tables are Foreign Keys. I have created an index for
> all Foreign Keys. This query takes around 16 MINUTES to complete. Can
> this time be reduced?

The indexes will be completely useless for that sort of query; the
reasonable choices are sort/merge or hashjoin.  For either one, your
best way to speed it up is to increase sort_mem.

            regards, tom lane