Re: Performance With Joins on Large Tables - Mailing list pgsql-performance

From Joshua Marsh
Subject Re: Performance With Joins on Large Tables
Date
Msg-id 38242de90609140718w2d10cfdfueb36bb53285c6cfa@mail.gmail.com
Whole thread Raw
In response to Re: Performance With Joins on Large Tables  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Wow, that correlation value is *way* away from order.  If they were
really in exact order by dsiacctno then I'd expect to see 1.0 in
that column.  Can you take another look at the tables and confirm
the ordering?  Does the correlation change if you do an ANALYZE on the
tables?  (Some small change is to be expected due to random sampling,
but this is way off.)

                       regards, tom lane
 
Thanks for pointing that out.  Generally we load the tables via COPY and then never touch the data.  Because of the slowdown, I have been updating tuples.  I reloaded it from scratch, set enable_seqscan=off and random_access_age=4 and I got the results I was looking for:
 
 
data=# analyze view_505;
ANALYZE
data=# analyze r3s169;
ANALYZE
data=# select tablename, attname, n_distinct, avg_width, correlation from pg_stats where tablename in ('view_505', 'r3s169') and attname = 'dsiacctno';
 tablename |  attname  | n_distinct | avg_width | correlation
-----------+-----------+------------+-----------+-------------
 view_505  | dsiacctno |         -1 |        13 |           1
 r3s169    | dsiacctno |      42140 |        13 |           1
(2 rows)

data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..20099712.79 rows=285153952 width=11)
   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
   ->  Index Scan using view_505_dsiacctno on view_505 v  (cost=0.00..5147252.74 rows=112282976 width=20)
   ->  Index Scan using r3s169_dsiacctno on r3s169 s  (cost=0.00..8256331.47 rows=285153952 width=17)
(4 rows)

 Thanks for you help everyone. 

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: High CPU Load
Next
From: Scott Marlowe
Date:
Subject: Re: High CPU Load