Re: Join runs for > 10 hours and then fills up >1.3TB of disk space - Mailing list pgsql-performance
From | kevin kempter |
---|---|
Subject | Re: Join runs for > 10 hours and then fills up >1.3TB of disk space |
Date | |
Msg-id | 18F4251F-5B1F-4A0D-A2EE-CA178F82F776@kevinkempterllc.com Whole thread Raw |
In response to | Join runs for > 10 hours and then fills up >1.3TB of disk space (kevin kempter <kevin@kevinkempterllc.com>) |
List | pgsql-performance |
Also, I'm running version 8.3 on a centOS box with 2 dual core CPU's and 32Gig of ram On May 16, 2008, at 12:58 AM, kevin kempter wrote: > Sorry I goofed on the query text Here's the correct query: > > select > f14.xpublisher_dim_id, > f14.xtime_dim_id, > f14.xlocation_dim_id, > f14.xreferrer_dim_id, > f14.xsite_dim_id, > f14.xsystem_cfg_dim_id, > f14.xaffiliate_dim_id, > f14.customer_id, > f14.pf_dts_id, > f14.episode_id, > f14.sessionid, > f14.bytes_received, > f14.bytes_transmitted, > f14.total_played_time_sec, > segdim.xsegment_dim_id as episode_level_segid > from > bigtab_stats_fact_tmp14 f14, > xsegment_dim segdim > where > f14.customer_id = segdim.customer_srcid > and f14.show_id = segdim.show_srcid > and f14.season_id = segdim.season_srcid > and f14.episode_id = segdim.episode_srcid > and segdim.segment_srcid is NULL; > > > > > > > On May 16, 2008, at 12:31 AM, kevin kempter wrote: > >> Hi List; >> >> I have a table with 9,961,914 rows in it (see the describe of >> bigtab_stats_fact_tmp14 below) >> >> I also have a table with 7,785 rows in it (see the describe of >> xsegment_dim below) >> >> I'm running the join shown below and it takes > 10 hours and >> eventually runs out of disk space on a 1.4TB file system >> >> I've included below a describe of both tables, the join and an >> explain plan, any help / suggestions would be much appreciated ! >> >> I need to get this beast to run as quickly as possible (without >> filling up my file system) >> >> >> Thanks in advance... >> >> >> >> >> >> >> >> >> >> >> >> select >> f14.xpublisher_dim_id, >> f14.xtime_dim_id, >> f14.xlocation_dim_id, >> f14.xreferrer_dim_id, >> f14.xsite_dim_id, >> f14.xsystem_cfg_dim_id, >> f14.xaffiliate_dim_id, >> f14.customer_id, >> pf_dts_id, >> episode_id, >> sessionid, >> bytes_received, >> bytes_transmitted, >> total_played_time_sec, >> segdim.xsegment_dim_id as episode_level_segid >> from >> bigtab_stats_fact_tmp14 f14, >> xsegment_dim segdim >> where >> f14.customer_id = segdim.customer_srcid >> and f14.show_id = segdim.show_srcid >> and f14.season_id = segdim.season_srcid >> and f14.episode_id = segdim.episode_srcid >> and segdim.segment_srcid is NULL; >> >> >> >> >> >> >> QUERY PLAN >> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> Merge Join (cost=1757001.74..73569676.49 rows=3191677219 width=118) >> Merge Cond: ((segdim.episode_srcid = f14.episode_id) AND >> (segdim.customer_srcid = f14.customer_id) AND (segdim.show_srcid = >> f14.show_id) AND (segdim.season_srcid = f14.season_id)) >> -> Sort (cost=1570.35..1579.46 rows=3643 width=40) >> Sort Key: segdim.episode_srcid, segdim.customer_srcid, >> segdim.show_srcid, segdim.season_srcid >> -> Seq Scan on xsegment_dim segdim (cost=0.00..1354.85 rows=3643 >> width=40) >> Filter: (segment_srcid IS NULL) >> -> Sort (cost=1755323.26..1780227.95 rows=9961874 width=126) >> Sort Key: f14.episode_id, f14.customer_id, f14.show_id, f14.season_id >> -> Seq Scan on bigtab_stats_fact_tmp14 f14 (cost=0.00..597355.74 >> rows=9961874 width=126) >> (9 rows) >> >> >> >> >> >> >> >> >> >> # \d bigtab_stats_fact_tmp14 >> Table "public.bigtab_stats_fact_tmp14" >> Column | Type | Modifiers >> --------------------------+-----------------------------+----------- >> pf_dts_id | bigint | >> pf_device_id | bigint | >> segment_id | bigint | >> cdn_id | bigint | >> collector_id | bigint | >> digital_envoy_id | bigint | >> maxmind_id | bigint | >> quova_id | bigint | >> website_id | bigint | >> referrer_id | bigint | >> affiliate_id | bigint | >> custom_info_id | bigint | >> start_dt | timestamp without time zone | >> total_played_time_sec | numeric(18,5) | >> bytes_received | bigint | >> bytes_transmitted | bigint | >> stall_count | integer | >> stall_duration_sec | numeric(18,5) | >> hiccup_count | integer | >> hiccup_duration_sec | numeric(18,5) | >> watched_duration_sec | numeric(18,5) | >> rewatched_duration_sec | numeric(18,5) | >> requested_start_position | numeric(18,5) | >> requested_stop_position | numeric(18,5) | >> post_position | numeric(18,5) | >> is_vod | numeric(1,0) | >> sessionid | bigint | >> create_dt | timestamp without time zone | >> segment_type_id | bigint | >> customer_id | bigint | >> content_publisher_id | bigint | >> content_owner_id | bigint | >> episode_id | bigint | >> duration_sec | numeric(18,5) | >> device_id | bigint | >> os_id | bigint | >> browser_id | bigint | >> cpu_id | bigint | >> xsystem_cfg_dim_id | bigint | >> xreferrer_dim_id | bigint | >> xaffiliate_dim_id | bigint | >> xsite_dim_id | bigint | >> xpublisher_dim_id | bigint | >> season_id | bigint | >> show_id | bigint | >> xsegment_dim_id | bigint | >> location_id | bigint | >> zipcode | character varying(20) | >> xlocation_dim_id | bigint | >> location_srcid | bigint | >> timezone | real | >> xtime_dim_id | bigint | >> Indexes: >> "bigtab_stats_fact_tmp14_idx1" btree (customer_id) >> "bigtab_stats_fact_tmp14_idx2" btree (show_id) >> "bigtab_stats_fact_tmp14_idx3" btree (season_id) >> "bigtab_stats_fact_tmp14_idx4" btree (episode_id) >> >> >> >> >> >> >> # \d xsegment_dim >> Table "public.xsegment_dim" >> Column | Type >> | Modifiers >> ----------------------+----------------------------- >> +------------------------------------------------------------- >> xsegment_dim_id | bigint | not null default >> nextval('xsegment_dim_seq'::regclass) >> customer_srcid | bigint | not null >> show_srcid | bigint | not null >> show_name | character varying(500) | not null >> season_srcid | bigint | not null >> season_name | character varying(500) | not null >> episode_srcid | bigint | not null >> episode_name | character varying(500) | not null >> segment_type_id | integer | >> segment_type | character varying(500) | >> segment_srcid | bigint | >> segment_name | character varying(500) | >> effective_dt | timestamp without time zone | not null >> default now() >> inactive_dt | timestamp without time zone | >> last_update_dt | timestamp without time zone | not null >> default now() >> Indexes: >> "xsegment_dim_pk" PRIMARY KEY, btree (xsegment_dim_id) >> "seg1" btree (customer_srcid) >> "seg2" btree (show_srcid) >> "seg3" btree (season_srcid) >> "seg4" btree (episode_srcid) >> "seg5" btree (segment_srcid) >> "xsegment_dim_ix1" btree (customer_srcid) >> >> >> >> >> >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org >> ) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance >
pgsql-performance by date: