Join runs for > 10 hours and then fills up >1.3TB of disk space - Mailing list pgsql-performance

Hi List;

I have a table with 9,961,914 rows in it (see the describe of
bigtab_stats_fact_tmp14 below)

I also have a table with 7,785 rows in it (see the describe of
xsegment_dim below)

I'm running the join shown below and it takes > 10 hours and
eventually runs out of disk space on a 1.4TB file system

I've included below a describe of both tables, the join and an explain
plan, any help / suggestions would be much appreciated !

I need to get this beast to run as quickly as possible (without
filling up my file system)


Thanks in advance...











select
f14.xpublisher_dim_id,
f14.xtime_dim_id,
f14.xlocation_dim_id,
f14.xreferrer_dim_id,
f14.xsite_dim_id,
f14.xsystem_cfg_dim_id,
f14.xaffiliate_dim_id,
f14.customer_id,
pf_dts_id,
episode_id,
sessionid,
bytes_received,
bytes_transmitted,
total_played_time_sec,
segdim.xsegment_dim_id as episode_level_segid
from
bigtab_stats_fact_tmp14 f14,
xsegment_dim segdim
where
f14.customer_id = segdim.customer_srcid
and f14.show_id = segdim.show_srcid
and f14.season_id = segdim.season_srcid
and f14.episode_id = segdim.episode_srcid
and segdim.segment_srcid is NULL;






QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join  (cost=1757001.74..73569676.49 rows=3191677219 width=118)
Merge Cond: ((segdim.episode_srcid = f14.episode_id) AND
(segdim.customer_srcid = f14.customer_id) AND (segdim.show_srcid =
f14.show_id) AND (segdim.season_srcid = f14.season_id))
->  Sort  (cost=1570.35..1579.46 rows=3643 width=40)
Sort Key: segdim.episode_srcid, segdim.customer_srcid,
segdim.show_srcid, segdim.season_srcid
->  Seq Scan on xsegment_dim segdim  (cost=0.00..1354.85 rows=3643
width=40)
Filter: (segment_srcid IS NULL)
->  Sort  (cost=1755323.26..1780227.95 rows=9961874 width=126)
Sort Key: f14.episode_id, f14.customer_id, f14.show_id, f14.season_id
->  Seq Scan on bigtab_stats_fact_tmp14 f14  (cost=0.00..597355.74
rows=9961874 width=126)
(9 rows)









# \d bigtab_stats_fact_tmp14
Table "public.bigtab_stats_fact_tmp14"
Column          |            Type             | Modifiers
--------------------------+-----------------------------+-----------
pf_dts_id     | bigint                      |
pf_device_id       | bigint                      |
segment_id               | bigint                      |
cdn_id                   | bigint                      |
collector_id             | bigint                      |
digital_envoy_id         | bigint                      |
maxmind_id               | bigint                      |
quova_id                 | bigint                      |
website_id               | bigint                      |
referrer_id              | bigint                      |
affiliate_id             | bigint                      |
custom_info_id           | bigint                      |
start_dt                 | timestamp without time zone |
total_played_time_sec    | numeric(18,5)               |
bytes_received           | bigint                      |
bytes_transmitted        | bigint                      |
stall_count              | integer                     |
stall_duration_sec       | numeric(18,5)               |
hiccup_count             | integer                     |
hiccup_duration_sec      | numeric(18,5)               |
watched_duration_sec     | numeric(18,5)               |
rewatched_duration_sec   | numeric(18,5)               |
requested_start_position | numeric(18,5)               |
requested_stop_position  | numeric(18,5)               |
post_position            | numeric(18,5)               |
is_vod                   | numeric(1,0)                |
sessionid                | bigint                      |
create_dt                | timestamp without time zone |
segment_type_id          | bigint                      |
customer_id              | bigint                      |
content_publisher_id     | bigint                      |
content_owner_id         | bigint                      |
episode_id               | bigint                      |
duration_sec             | numeric(18,5)               |
device_id                | bigint                      |
os_id                    | bigint                      |
browser_id               | bigint                      |
cpu_id                   | bigint                      |
xsystem_cfg_dim_id  | bigint                      |
xreferrer_dim_id    | bigint                      |
xaffiliate_dim_id   | bigint                      |
xsite_dim_id        | bigint                      |
xpublisher_dim_id   | bigint                      |
season_id                | bigint                      |
show_id                  | bigint                      |
xsegment_dim_id     | bigint                      |
location_id              | bigint                      |
zipcode                  | character varying(20)       |
xlocation_dim_id    | bigint                      |
location_srcid           | bigint                      |
timezone                 | real                        |
xtime_dim_id        | bigint                      |
Indexes:
"bigtab_stats_fact_tmp14_idx1" btree (customer_id)
"bigtab_stats_fact_tmp14_idx2" btree (show_id)
"bigtab_stats_fact_tmp14_idx3" btree (season_id)
"bigtab_stats_fact_tmp14_idx4" btree (episode_id)






# \d xsegment_dim
Table "public.xsegment_dim"
Column        |            Type             |
Modifiers
----------------------+-----------------------------
+-------------------------------------------------------------
xsegment_dim_id | bigint                      | not null default
nextval('xsegment_dim_seq'::regclass)
customer_srcid       | bigint                      | not null
show_srcid           | bigint                      | not null
show_name            | character varying(500)      | not null
season_srcid         | bigint                      | not null
season_name          | character varying(500)      | not null
episode_srcid        | bigint                      | not null
episode_name         | character varying(500)      | not null
segment_type_id      | integer                     |
segment_type         | character varying(500)      |
segment_srcid        | bigint                      |
segment_name         | character varying(500)      |
effective_dt         | timestamp without time zone | not null default
now()
inactive_dt          | timestamp without time zone |
last_update_dt       | timestamp without time zone | not null default
now()
Indexes:
"xsegment_dim_pk" PRIMARY KEY, btree (xsegment_dim_id)
"seg1" btree (customer_srcid)
"seg2" btree (show_srcid)
"seg3" btree (season_srcid)
"seg4" btree (episode_srcid)
"seg5" btree (segment_srcid)
"xsegment_dim_ix1" btree (customer_srcid)






pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: I/O on select count(*)
Next
From: "Claus Guttesen"
Date:
Subject: Re: Join runs for > 10 hours and then fills up >1.3TB of disk space