Joining 2 tables with 300 million rows - Mailing list pgsql-performance

From Amit V Shah
Subject Joining 2 tables with 300 million rows
Date
Msg-id 0C072E7CC947D511AC9600A0CC7341200256D143@xeon400.tagaudit.com
Whole thread Raw
Responses Re: Joining 2 tables with 300 million rows
List pgsql-performance
Hi all,

First of all, please pardon if the question is dumb! Is it even feasible or
normal to do such a thing ! This query is needed by a webpage so needs to be
lightning fast. Anything beyond 2-3 seconds is unacceptable performance.

I have two tables

CREATE TABLE runresult
(
  id_runresult int8 NOT NULL,
  rundefinition_id_rundefinition int4 NOT NULL,
  measure_id_measure int4 NOT NULL,
  value float4 NOT NULL,
  "sequence" varchar(20) NOT NULL,
  CONSTRAINT pk_runresult_ars PRIMARY KEY (id_runresult),
)


CREATE TABLE runresult_has_catalogtable
(
  runresult_id_runresult int8 NOT NULL,
  catalogtable_id_catalogtable int4 NOT NULL,
  value int4 NOT NULL,
  CONSTRAINT pk_runresult_has_catalogtable PRIMARY KEY
(runresult_id_runresult, catalogtable_id_catalogtable, value)
  CONSTRAINT fk_temp FOREIGN KEY (runresult_id_runresult) REFERENCES
runresult(id_runresult) ON UPDATE RESTRICT ON DELETE RESTRICT
)

Each table has around 300 million records (will grow to probably billions).
Below is the query and the explain analyze --

explain analyze SELECT measure.description, runresult.value
FROM ((((rundefinition INNER JOIN runresult ON
rundefinition.id_rundefinition = runresult.rundefinition_id_rundefinition)
INNER JOIN runresult_has_catalogtable ON runresult.id_runresult =
runresult_has_catalogtable.runresult_id_runresult)
INNER JOIN runresult_has_catalogtable AS runresult_has_catalogtable_1 ON
runresult.id_runresult =
runresult_has_catalogtable_1.runresult_id_runresult)
INNER JOIN runresult_has_catalogtable AS runresult_has_catalogtable_2 ON
runresult.id_runresult =
runresult_has_catalogtable_2.runresult_id_runresult)
INNER JOIN measure ON runresult.measure_id_measure = measure.id_measure
WHERE (((runresult_has_catalogtable.catalogtable_id_catalogtable)=52)
AND ((runresult_has_catalogtable_1.catalogtable_id_catalogtable)=54)
AND ((runresult_has_catalogtable_2.catalogtable_id_catalogtable)=55)
AND ((runresult_has_catalogtable.value)=15806)
AND ((runresult_has_catalogtable_1.value)=1)
AND ((runresult_has_catalogtable_2.value) In (21,22,23,24))
AND ((rundefinition.id_rundefinition)=10106));

'Nested Loop  (cost=0.00..622582.70 rows=1 width=28) (actual
time=25.221..150.563 rows=22 loops=1)'
'  ->  Nested Loop  (cost=0.00..622422.24 rows=2 width=52) (actual
time=25.201..150.177 rows=22 loops=1)'
'        ->  Nested Loop  (cost=0.00..622415.97 rows=2 width=32) (actual
time=25.106..149.768 rows=22 loops=1)'
'              ->  Nested Loop  (cost=0.00..621258.54 rows=15 width=24)
(actual time=24.582..149.061 rows=30 loops=1)'
'                    ->  Index Scan using pk_rundefinition on rundefinition
(cost=0.00..3.86 rows=1 width=4) (actual time=0.125..0.147 rows=1 loops=1)'
'                          Index Cond: (id_rundefinition = 10106)'
'                    ->  Nested Loop  (cost=0.00..621254.54 rows=15
width=28) (actual time=24.443..148.784 rows=30 loops=1)'
'                          ->  Index Scan using
runresult_has_catalogtable_value on runresult_has_catalogtable
(cost=0.00..575069.35 rows=14437 width=8) (actual time=0.791..33.036
rows=10402 loops=1)'
'                                Index Cond: (value = 15806)'
'                                Filter: (catalogtable_id_catalogtable =
52)'
'                          ->  Index Scan using pk_runresult_ars on
runresult  (cost=0.00..3.19 rows=1 width=20) (actual time=0.007..0.007
rows=0 loops=10402)'
'                                Index Cond: (runresult.id_runresult =
"outer".runresult_id_runresult)'
'                                Filter: (10106 =
rundefinition_id_rundefinition)'
'              ->  Index Scan using runresult_has_catalogtable_id_runresult
on runresult_has_catalogtable runresult_has_catalogtable_1
(cost=0.00..76.65 rows=41 width=8) (actual time=0.015..0.017 rows=1
loops=30)'
'                    Index Cond:
(runresult_has_catalogtable_1.runresult_id_runresult =
"outer".runresult_id_runresult)'
'                    Filter: ((catalogtable_id_catalogtable = 54) AND (value
= 1))'
'        ->  Index Scan using pk_measure on measure  (cost=0.00..3.12 rows=1
width=28) (actual time=0.008..0.010 rows=1 loops=22)'
'              Index Cond: ("outer".measure_id_measure =
measure.id_measure)'
'  ->  Index Scan using runresult_has_catalogtable_id_runresult on
runresult_has_catalogtable runresult_has_catalogtable_2  (cost=0.00..79.42
rows=65 width=8) (actual time=0.007..0.010 rows=1 loops=22)'
'        Index Cond: (runresult_has_catalogtable_2.runresult_id_runresult =
"outer".runresult_id_runresult)'
'        Filter: ((catalogtable_id_catalogtable = 55) AND ((value = 21) OR
(value = 22) OR (value = 23) OR (value = 24)))'
'Total runtime: 150.863 ms'



pgsql-performance by date:

Previous
From: Evgeny Gridasov
Date:
Subject: slow COMMITs
Next
From: Scott Marlowe
Date:
Subject: Re: opinion on disk speed