Re: General performance questions about postgres on Apple - Mailing list pgsql-performance
From | scott.marlowe |
---|---|
Subject | Re: General performance questions about postgres on Apple |
Date | |
Msg-id | Pine.LNX.4.33.0402201405160.11556-100000@css120.ihs.com Whole thread Raw |
In response to | General performance questions about postgres on Apple hardware... (Sean Shanny <shannyconsulting@earthlink.net>) |
List | pgsql-performance |
On Fri, 20 Feb 2004, Sean Shanny wrote: > max_connections = 100 > > # - Memory - > > shared_buffers = 16000 # min 16, at least max_connections*2, > 8KB each > sort_mem = 256000 # min 64, size in KB You might wanna drop sort_mem somewhat and just set it during your imports to something big like 512000 or larger. That way with 100 users during the day you won't have to worry about swap storms, and when you run your updates, you get all that sort_mem. > Actual row count in the temp table: > > select count(*) from referral_temp ; > 502347 > > Actual row count in d_referral table: > > select count(*) from d_referral ; > 27908024 > > > Note: that an analyze had not been performed on the referral_temp table > prior to the explain analyze run. > > explain analyze SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT > OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5 > > Nested Loop Left Join (cost=0.00..3046.00 rows=1001 width=68) (actual > time=136.513..6440616.541 rows=502347 loops=1) > -> Seq Scan on referral_temp t2 (cost=0.00..20.00 rows=1000 > width=64) (actual time=21.730..10552.421 rows=502347 loops=1) > -> Index Scan using d_referral_referral_md5_key on d_referral t1 > (cost=0.00..3.01 rows=1 width=40) (actual time=12.768..14.022 rows=1 > loops=502347) > Index Cond: ("outer".md5 = t1.referral_md5) > > > Thanks. > > --sean > Total runtime: 6441969.698 ms > (5 rows) > > > Here is an explain analyze after the analyze was done. Unfortunately I > think a lot of the data was still in cache when I did this again :-( > > explain analyze SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT > OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5; > > Nested Loop Left Join (cost=0.00..1468759.69 rows=480082 width=149) > (actual time=69.576..3226854.850 rows=502347 loops=1) > -> Seq Scan on referral_temp t2 (cost=0.00..16034.81 rows=480081 > width=145) (actual time=11.206..4003.521 rows=502347 loops=1) > -> Index Scan using d_referral_referral_md5_key on d_referral t1 > (cost=0.00..3.01 rows=1 width=40) (actual time=6.396..6.402 rows=1 > loops=502347) > Index Cond: ("outer".md5 = t1.referral_md5) > Total runtime: 3227830.752 ms Hmmm. It looks like postgresql is still picking a nested loop when it should be sorting something faster. Try doing a "set enable_nestloop = off" and see what you get. If that makes it faster, you may want to adjust the costs of the cpu_* stuff higher to see if that can force it to do the right thing. Looking at the amount of time taken by the nested loop, it looks like the problem to me. And why are you doing a left join of ONE row from one table against the whole temp table? Do you really need to do that? since there's only one row in the source table, and I'd guess is only matches one or a few rows from the temp table, this means you're gonna have that one row and a bunch of null filled rows to go with it.
pgsql-performance by date: