Re: General performance questions about postgres on Apple - Mailing list pgsql-performance

From scott.marlowe
Subject Re: General performance questions about postgres on Apple
Date
Msg-id Pine.LNX.4.33.0402201405160.11556-100000@css120.ihs.com
Whole thread Raw
In response to General performance questions about postgres on Apple hardware...  (Sean Shanny <shannyconsulting@earthlink.net>)
List pgsql-performance
On Fri, 20 Feb 2004, Sean Shanny wrote:

> max_connections = 100
>
> # - Memory -
>
> shared_buffers = 16000          # min 16, at least max_connections*2,
> 8KB each
> sort_mem = 256000               # min 64, size in KB

You might wanna drop sort_mem somewhat and just set it during your imports
to something big like 512000 or larger.  That way with 100 users during
the day you won't have to worry about swap storms, and when you run your
updates, you get all that sort_mem.

> Actual row count in the temp table:
>
> select count(*) from referral_temp ;
>   502347
>
> Actual row count in d_referral table:
>
> select count(*) from d_referral ;
>   27908024
>
>
> Note: that an analyze had not been performed on the referral_temp table
> prior to the explain analyze run.
>
> explain analyze  SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT
> OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5
>
> Nested Loop Left Join  (cost=0.00..3046.00 rows=1001 width=68) (actual
> time=136.513..6440616.541 rows=502347 loops=1)
>    ->  Seq Scan on referral_temp t2  (cost=0.00..20.00 rows=1000
> width=64) (actual time=21.730..10552.421 rows=502347 loops=1)
>    ->  Index Scan using d_referral_referral_md5_key on d_referral t1
> (cost=0.00..3.01 rows=1 width=40) (actual time=12.768..14.022 rows=1
> loops=502347)
>          Index Cond: ("outer".md5 = t1.referral_md5)
>
>
> Thanks.
>
> --sean
>  Total runtime: 6441969.698 ms
> (5 rows)
>
>
> Here is an explain analyze after the analyze was done.  Unfortunately I
> think a lot of the data was still in cache when I did this again :-(
>
> explain analyze  SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT
> OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5;
>
> Nested Loop Left Join  (cost=0.00..1468759.69 rows=480082 width=149)
> (actual time=69.576..3226854.850 rows=502347 loops=1)
>    ->  Seq Scan on referral_temp t2  (cost=0.00..16034.81 rows=480081
> width=145) (actual time=11.206..4003.521 rows=502347 loops=1)
>    ->  Index Scan using d_referral_referral_md5_key on d_referral t1
> (cost=0.00..3.01 rows=1 width=40) (actual time=6.396..6.402 rows=1
> loops=502347)
>          Index Cond: ("outer".md5 = t1.referral_md5)
>  Total runtime: 3227830.752 ms

Hmmm.  It looks like postgresql is still picking a nested loop when it
should be sorting something faster.  Try doing a "set enable_nestloop =
off" and see what you get.

If that makes it faster, you may want to adjust the costs of the cpu_*
stuff higher to see if that can force it to do the right thing.

Looking at the amount of time taken by the nested loop, it looks like the
problem to me.

And why are you doing a left join of ONE row from one table against the
whole temp table?  Do you really need to do that?  since there's only one
row in the source table, and I'd guess is only matches one or a few rows
from the temp table, this means you're gonna have that one row and a bunch
of null filled rows to go with it.


pgsql-performance by date:

Previous
From: Sean Shanny
Date:
Subject: Re: General performance questions about postgres on Apple
Next
From: "Andrew Lazarus"
Date:
Subject: JOIN order, 15K, 15K, 7MM rows