Re: Looking for ideas on how to speed up warehouse loading - Mailing list pgsql-performance

From Tom Lane
Subject Re: Looking for ideas on how to speed up warehouse loading
Date
Msg-id 14459.1082685792@sss.pgh.pa.us
Whole thread Raw
In response to Looking for ideas on how to speed up warehouse loading  (Sean Shanny <shannyconsulting@earthlink.net>)
List pgsql-performance
Sean Shanny <shannyconsulting@earthlink.net> writes:
> explain analyze SELECT t1.id, t2.url FROM referral_temp t2 LEFT OUTER
> JOIN d_referral t1 ON t2.url = t1.referral_raw_url ORDER BY t1.id;

> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=4012064.81..4013194.45 rows=451856 width=115) (actual
> time=1297320.823..1297739.813 rows=476176 loops=1)
>    Sort Key: t1.id
>    ->  Hash Left Join  (cost=1052345.95..3969623.10 rows=451856
> width=115) (actual time=1146650.487..1290230.590 rows=476176 loops=1)
>          Hash Cond: ("outer".url = "inner".referral_raw_url)
>          ->  Seq Scan on referral_temp t2  (cost=0.00..6645.56
> rows=451856 width=111) (actual time=20.285..1449.634 rows=476176 loops=1)
>          ->  Hash  (cost=729338.16..729338.16 rows=46034716 width=124)
> (actual time=1146440.710..1146440.710 rows=0 loops=1)
>                ->  Seq Scan on d_referral t1  (cost=0.00..729338.16
> rows=46034716 width=124) (actual time=14.502..-1064277.123 rows=46034715
> loops=1)
>  Total runtime: 1298153.193 ms
> (8 rows)

> What I would like to know is if there are better ways to do the join?

What have you got sort_mem set to?  You might try increasing it to a gig
or so, since you seem to have plenty of RAM in that box ...

            regards, tom lane

pgsql-performance by date:

Previous
From: Nicholas Shanny
Date:
Subject: Re: Looking for ideas on how to speed up warehouse loading
Next
From: "Anjan Dave"
Date:
Subject: Re: Wierd context-switching issue on Xeon