Re: Join runs for > 10 hours and then fills up >1.3TB of disk space - Mailing list pgsql-performance

From kevin kempter
Subject Re: Join runs for > 10 hours and then fills up >1.3TB of disk space
Date
Msg-id 614B0322-C9A7-45B2-9BB0-828075B041BE@kevinkempterllc.com
Whole thread Raw
In response to Join runs for > 10 hours and then fills up >1.3TB of disk space  (kevin kempter <kevin@kevinkempterllc.com>)
List pgsql-performance
Sorry I goofed on the query text Here's the correct query:

select
f14.xpublisher_dim_id,
f14.xtime_dim_id,
f14.xlocation_dim_id,
f14.xreferrer_dim_id,
f14.xsite_dim_id,
f14.xsystem_cfg_dim_id,
f14.xaffiliate_dim_id,
f14.customer_id,
f14.pf_dts_id,
f14.episode_id,
f14.sessionid,
f14.bytes_received,
f14.bytes_transmitted,
f14.total_played_time_sec,
segdim.xsegment_dim_id as episode_level_segid
from
bigtab_stats_fact_tmp14 f14,
xsegment_dim segdim
where
f14.customer_id = segdim.customer_srcid
and f14.show_id = segdim.show_srcid
and f14.season_id = segdim.season_srcid
and f14.episode_id = segdim.episode_srcid
and segdim.segment_srcid is NULL;






On May 16, 2008, at 12:31 AM, kevin kempter wrote:

> Hi List;
>
> I have a table with 9,961,914 rows in it (see the describe of
> bigtab_stats_fact_tmp14 below)
>
> I also have a table with 7,785 rows in it (see the describe of
> xsegment_dim below)
>
> I'm running the join shown below and it takes > 10 hours and
> eventually runs out of disk space on a 1.4TB file system
>
> I've included below a describe of both tables, the join and an
> explain plan, any help / suggestions would be much appreciated !
>
> I need to get this beast to run as quickly as possible (without
> filling up my file system)
>
>
> Thanks in advance...
>
>
>
>
>
>
>
>
>
>
>
> select
> f14.xpublisher_dim_id,
> f14.xtime_dim_id,
> f14.xlocation_dim_id,
> f14.xreferrer_dim_id,
> f14.xsite_dim_id,
> f14.xsystem_cfg_dim_id,
> f14.xaffiliate_dim_id,
> f14.customer_id,
> pf_dts_id,
> episode_id,
> sessionid,
> bytes_received,
> bytes_transmitted,
> total_played_time_sec,
> segdim.xsegment_dim_id as episode_level_segid
> from
> bigtab_stats_fact_tmp14 f14,
> xsegment_dim segdim
> where
> f14.customer_id = segdim.customer_srcid
> and f14.show_id = segdim.show_srcid
> and f14.season_id = segdim.season_srcid
> and f14.episode_id = segdim.episode_srcid
> and segdim.segment_srcid is NULL;
>
>
>
>
>
>
> QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Merge Join  (cost=1757001.74..73569676.49 rows=3191677219 width=118)
> Merge Cond: ((segdim.episode_srcid = f14.episode_id) AND
> (segdim.customer_srcid = f14.customer_id) AND (segdim.show_srcid =
> f14.show_id) AND (segdim.season_srcid = f14.season_id))
> ->  Sort  (cost=1570.35..1579.46 rows=3643 width=40)
> Sort Key: segdim.episode_srcid, segdim.customer_srcid,
> segdim.show_srcid, segdim.season_srcid
> ->  Seq Scan on xsegment_dim segdim  (cost=0.00..1354.85 rows=3643
> width=40)
> Filter: (segment_srcid IS NULL)
> ->  Sort  (cost=1755323.26..1780227.95 rows=9961874 width=126)
> Sort Key: f14.episode_id, f14.customer_id, f14.show_id, f14.season_id
> ->  Seq Scan on bigtab_stats_fact_tmp14 f14  (cost=0.00..597355.74
> rows=9961874 width=126)
> (9 rows)
>
>
>
>
>
>
>
>
>
> # \d bigtab_stats_fact_tmp14
> Table "public.bigtab_stats_fact_tmp14"
> Column          |            Type             | Modifiers
> --------------------------+-----------------------------+-----------
> pf_dts_id     | bigint                      |
> pf_device_id       | bigint                      |
> segment_id               | bigint                      |
> cdn_id                   | bigint                      |
> collector_id             | bigint                      |
> digital_envoy_id         | bigint                      |
> maxmind_id               | bigint                      |
> quova_id                 | bigint                      |
> website_id               | bigint                      |
> referrer_id              | bigint                      |
> affiliate_id             | bigint                      |
> custom_info_id           | bigint                      |
> start_dt                 | timestamp without time zone |
> total_played_time_sec    | numeric(18,5)               |
> bytes_received           | bigint                      |
> bytes_transmitted        | bigint                      |
> stall_count              | integer                     |
> stall_duration_sec       | numeric(18,5)               |
> hiccup_count             | integer                     |
> hiccup_duration_sec      | numeric(18,5)               |
> watched_duration_sec     | numeric(18,5)               |
> rewatched_duration_sec   | numeric(18,5)               |
> requested_start_position | numeric(18,5)               |
> requested_stop_position  | numeric(18,5)               |
> post_position            | numeric(18,5)               |
> is_vod                   | numeric(1,0)                |
> sessionid                | bigint                      |
> create_dt                | timestamp without time zone |
> segment_type_id          | bigint                      |
> customer_id              | bigint                      |
> content_publisher_id     | bigint                      |
> content_owner_id         | bigint                      |
> episode_id               | bigint                      |
> duration_sec             | numeric(18,5)               |
> device_id                | bigint                      |
> os_id                    | bigint                      |
> browser_id               | bigint                      |
> cpu_id                   | bigint                      |
> xsystem_cfg_dim_id  | bigint                      |
> xreferrer_dim_id    | bigint                      |
> xaffiliate_dim_id   | bigint                      |
> xsite_dim_id        | bigint                      |
> xpublisher_dim_id   | bigint                      |
> season_id                | bigint                      |
> show_id                  | bigint                      |
> xsegment_dim_id     | bigint                      |
> location_id              | bigint                      |
> zipcode                  | character varying(20)       |
> xlocation_dim_id    | bigint                      |
> location_srcid           | bigint                      |
> timezone                 | real                        |
> xtime_dim_id        | bigint                      |
> Indexes:
> "bigtab_stats_fact_tmp14_idx1" btree (customer_id)
> "bigtab_stats_fact_tmp14_idx2" btree (show_id)
> "bigtab_stats_fact_tmp14_idx3" btree (season_id)
> "bigtab_stats_fact_tmp14_idx4" btree (episode_id)
>
>
>
>
>
>
> # \d xsegment_dim
> Table "public.xsegment_dim"
> Column        |            Type
> |                          Modifiers
> ----------------------+-----------------------------
> +-------------------------------------------------------------
> xsegment_dim_id | bigint                      | not null default
> nextval('xsegment_dim_seq'::regclass)
> customer_srcid       | bigint                      | not null
> show_srcid           | bigint                      | not null
> show_name            | character varying(500)      | not null
> season_srcid         | bigint                      | not null
> season_name          | character varying(500)      | not null
> episode_srcid        | bigint                      | not null
> episode_name         | character varying(500)      | not null
> segment_type_id      | integer                     |
> segment_type         | character varying(500)      |
> segment_srcid        | bigint                      |
> segment_name         | character varying(500)      |
> effective_dt         | timestamp without time zone | not null
> default now()
> inactive_dt          | timestamp without time zone |
> last_update_dt       | timestamp without time zone | not null
> default now()
> Indexes:
> "xsegment_dim_pk" PRIMARY KEY, btree (xsegment_dim_id)
> "seg1" btree (customer_srcid)
> "seg2" btree (show_srcid)
> "seg3" btree (season_srcid)
> "seg4" btree (episode_srcid)
> "seg5" btree (segment_srcid)
> "xsegment_dim_ix1" btree (customer_srcid)
>
>
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


pgsql-performance by date:

Previous
From: kevin kempter
Date:
Subject: Re: Join runs for > 10 hours and then fills up >1.3TB of disk space
Next
From: Simon Riggs
Date:
Subject: Re: Join runs for > 10 hours and then fills up >1.3TB of disk space