Thread: Performance With Joins on Large Tables

Performance With Joins on Large Tables

From
"Joshua Marsh"
Date:
I am having problems performing a join on two large tables.  It seems to only
want to use a sequential scan on the join, but that method seems to be slower
than an index scan.  I've never actually had it complete the sequential scan
because I stop it after 24+ hours.  I've run joins against large tables before
and an index scan was always faster (a few hours at the most).

Here is some information on the two tables:
data=# analyze view_505;
ANALYZE
data=# analyze r3s169;
ANALYZE
data=# \d view_505
               Table "public.view_505"
      Column      |         Type          | Modifiers
------------------+-----------------------+-----------
dsiacctno        | numeric               |
name             | boolean               |
title            | boolean               |
company          | boolean               |
zip4             | boolean               |
acceptcall       | boolean               |
phonedirect      | smallint              |
phonetollfree    | smallint              |
fax              | smallint              |
editdrop         | boolean               |
postsuppress     | boolean               |
firstnameinit    | boolean               |
prefix           | integer               |
crrt             | boolean               |
dpbc             | boolean               |
executive        | integer               |
addressline      | integer               |
multibuyer       | integer               |
activemultibuyer | integer               |
active           | boolean               |
emails           | integer               |
domains          | integer               |
zip1             | character varying(1)  |
zip3             | character varying(3)  |
gender           | character varying(1)  |
topdomains       | bit varying           |
city             | character varying(35) |
state            | character varying(35) |
zip              | character varying(20) |
country          | character varying(30) |
selects          | bit varying           |
files            | integer[]             |
sics             | integer[]             |
custdate         | date                  |
Indexes:
    "view_505_city" btree (city)
    "view_505_dsiacctno" btree (dsiacctno)
    "view_505_state" btree (state)
    "view_505_zip" btree (zip)
    "view_505_zip1" btree (zip1)
    "view_505_zip3" btree (zip3)

data=# \d r3s169
              Table "public.r3s169"
   Column    |          Type          | Modifiers
-------------+------------------------+-----------
dsiacctno   | numeric                |
fileid      | integer                |
customerid  | character varying(20)  |
email       | character varying(100) |
sic2        | character varying(2)   |
sic4        | character varying(4)   |
sic6        | character varying(6)   |
custdate    | date                   |
inqdate     | date                   |
eentrydate  | date                   |
esubdate    | date                   |
efaildate   | date                   |
eunlistdate | date                   |
pentrydate  | date                   |
psubdate    | date                   |
punlistdate | date                   |
pexpiredate | date                   |
lastupdate  | date                   |
emaildrop   | numeric                |
sic8        | character varying(8)   |
Indexes:
    "r3s169_dsiacctno" btree (dsiacctno)

data=# select count(*) from view_505;
   count
-----------
112393845
(1 row)

data=# select count(*) from r3s169;
   count
-----------
285230264
(1 row)


Here is what EXPLAIN says:

data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH
FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno =
s.dsiacctno;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
Merge Join  (cost=293767607.69..305744319.52 rows=285392608 width=11)
   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
   ->  Sort  (cost=127304933.87..127585815.71 rows=112352736 width=20)
         Sort Key: v.dsiacctno
         ->  Seq Scan on view_505 v  (cost=100000000.00..104604059.36
rows=112352736 width=20)
   ->  Sort  (cost=166462673.82..167176155.34 rows=285392608 width=17)
         Sort Key: s.dsiacctno
         ->  Seq Scan on r3s169 s  (cost=100000000.00..106875334.08
rows=285392608 width=17)
(8 rows)



I can't really do and EXPLAIN ANALYZE because the query never really finishes.
Also, I use a cursor to loop through the data.  view_505 isn't a pgsql view, its
just how we decided to name the table.  There is a one to many
relationship between
view_505 and r3s169.

Since enable_seqscan is off, my understanding is that in order for the query
planner to user a sequential scan it must think there is no other alternative.
Both sides are indexed and anaylzed, so that confuses me a little.

I tried it on a smaller sample set of the data and it works fine:

data=# select * into r3s169_test from r3s169 limit 1000000;
SELECT
data=# select * into view_505_test from view_505 limit 1000000;
SELECT
data=# create index r3s169_test_dsiacctno on r3s169_test (dsiacctno);
CREATE INDEX
data=# create index view_505_test_dsiacctno on view_505_test (dsiacctno);
CREATE INDEX
data=# analyze r3s169_test;
ANALYZE
data=# analyze view_505_test;
ANALYZE
data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH
FROM s.custdate) FROM view_505_test v INNER JOIN r3s169_test s ON
v.dsiacctno = s.dsiacctno;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Merge Join  (cost=0.00..1976704.69 rows=1000187 width=11)
   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
   ->  Index Scan using view_505_test_dsiacctno on view_505_test v
(cost=0.00..1676260.67 rows=999985 width=20)
   ->  Index Scan using r3s169_test_dsiacctno on r3s169_test s
(cost=0.00..1089028.66 rows=1000186 width=17)
(4 rows)


Is there anything I'm missing that is preventing it from using the index?  It
just seems weird to me that other joins like this work fine and fast
with indexes,
but this one won't.

Re: Performance With Joins on Large Tables

From
"Jim C. Nasby"
Date:
On Tue, Sep 12, 2006 at 04:17:34PM -0600, Joshua Marsh wrote:
> data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
> v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH
> FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno =
> s.dsiacctno;
>                                          QUERY PLAN
> -----------------------------------------------------------------------------------------------
> Merge Join  (cost=293767607.69..305744319.52 rows=285392608 width=11)
>   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
>   ->  Sort  (cost=127304933.87..127585815.71 rows=112352736 width=20)
>         Sort Key: v.dsiacctno
>         ->  Seq Scan on view_505 v  (cost=100000000.00..104604059.36
> rows=112352736 width=20)
>   ->  Sort  (cost=166462673.82..167176155.34 rows=285392608 width=17)
>         Sort Key: s.dsiacctno
>         ->  Seq Scan on r3s169 s  (cost=100000000.00..106875334.08
> rows=285392608 width=17)
> (8 rows)
>
>
> Since enable_seqscan is off, my understanding is that in order for the query
> planner to user a sequential scan it must think there is no other
> alternative.
> Both sides are indexed and anaylzed, so that confuses me a little.
>
> I tried it on a smaller sample set of the data and it works fine:

Actually, enable_seqscan=off just adds a fixed overhead to the seqscan
cost estimate. That's why the cost for the seqscans in that plan starts
at 100000000. I've suggested changing that to a variable overhead based
on the expected rowcount, but the counter-argument was that anyone with
so much data that the fixed amount wouldn't work would most likely be
having bigger issues anyway.

Other things you can try to get the index scan back would be to reduce
random_page_cost and to analyze the join fields in those tables with a
higher statistics target (though I'm not 100% certain the join cost
estimator actually takes that into account). Or if you don't mind
patching your source code, it wouldn't be difficult to make
enable_seqscan use a bigger 'penalty value' than 10000000.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Performance With Joins on Large Tables

From
"Joshua Marsh"
Date:
On 9/13/06, Jim C. Nasby <jim@nasby.net> wrote:
> On Tue, Sep 12, 2006 at 04:17:34PM -0600, Joshua Marsh wrote:
> > data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
> > v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH
> > FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno =
> > s.dsiacctno;
> >                                          QUERY PLAN
> > -----------------------------------------------------------------------------------------------
> > Merge Join  (cost=293767607.69..305744319.52 rows=285392608 width=11)
> >   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
> >   ->  Sort  (cost=127304933.87..127585815.71 rows=112352736 width=20)
> >         Sort Key: v.dsiacctno
> >         ->  Seq Scan on view_505 v  (cost=100000000.00..104604059.36
> > rows=112352736 width=20)
> >   ->  Sort  (cost=166462673.82..167176155.34 rows=285392608 width=17)
> >         Sort Key: s.dsiacctno
> >         ->  Seq Scan on r3s169 s  (cost=100000000.00..106875334.08
> > rows=285392608 width=17)
> > (8 rows)
> >
> >
> > Since enable_seqscan is off, my understanding is that in order for the query
> > planner to user a sequential scan it must think there is no other
> > alternative.
> > Both sides are indexed and anaylzed, so that confuses me a little.
> >
> > I tried it on a smaller sample set of the data and it works fine:
>
> Actually, enable_seqscan=off just adds a fixed overhead to the seqscan
> cost estimate. That's why the cost for the seqscans in that plan starts
> at 100000000. I've suggested changing that to a variable overhead based
> on the expected rowcount, but the counter-argument was that anyone with
> so much data that the fixed amount wouldn't work would most likely be
> having bigger issues anyway.
>
> Other things you can try to get the index scan back would be to reduce
> random_page_cost and to analyze the join fields in those tables with a
> higher statistics target (though I'm not 100% certain the join cost
> estimator actually takes that into account). Or if you don't mind
> patching your source code, it wouldn't be difficult to make
> enable_seqscan use a bigger 'penalty value' than 10000000.
> --
> Jim Nasby                                            jim@nasby.net
> EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)
>

Thanks for the tip. I lowered random_page_cost and got these results:

data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH
FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno =
s.dsiacctno;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..20921221.49 rows=285230272 width=11)
   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
   ->  Index Scan using view_505_dsiacctno on view_505 v
(cost=0.00..2838595.79 rows=112393848 width=20)
   ->  Index Scan using r3s169_dsiacctno on r3s169 s
(cost=0.00..7106203.68 rows=285230272 width=17)
(4 rows)

That seems to have done it.  Are there any side effects to this
change?  I read about random_page_cost in the documentation and it
seems like this is strictly for planning.  All the tables on this
database will be indexed and of a size similar to these two, so I
don't see it  causing any other problems.  Though I would check though
:)

Re: Performance With Joins on Large Tables

From
Jeff Davis
Date:
On Wed, 2006-09-13 at 08:49 -0600, Joshua Marsh wrote:
> That seems to have done it.  Are there any side effects to this
> change?  I read about random_page_cost in the documentation and it
> seems like this is strictly for planning.  All the tables on this
> database will be indexed and of a size similar to these two, so I
> don't see it  causing any other problems.  Though I would check though
> :)
>

Right, it's just used for planning. Avoid setting it too low, if it's
below about 2.0 you would most likely see some very strange plans.
Certainly it doesn't make sense at all to set it below 1.0, since that
is saying it's cheaper to get a random page than a sequential one.

What was your original random_page_cost, and what is the new value you
set it to?

Regards,
    Jeff Davis




Re: Performance With Joins on Large Tables

From
"Joshua Marsh"
Date:
On 9/13/06, Jeff Davis <pgsql@j-davis.com> wrote:
> On Wed, 2006-09-13 at 08:49 -0600, Joshua Marsh wrote:
> > That seems to have done it.  Are there any side effects to this
> > change?  I read about random_page_cost in the documentation and it
> > seems like this is strictly for planning.  All the tables on this
> > database will be indexed and of a size similar to these two, so I
> > don't see it  causing any other problems.  Though I would check though
> > :)
> >
>
> Right, it's just used for planning. Avoid setting it too low, if it's
> below about 2.0 you would most likely see some very strange plans.
> Certainly it doesn't make sense at all to set it below 1.0, since that
> is saying it's cheaper to get a random page than a sequential one.
>
> What was your original random_page_cost, and what is the new value you
> set it to?
>
> Regards,
>        Jeff Davis
>
>
>
>

I tried it at several levels.  It was initially at 4 (the default).  I
tried 3 and 2 with no changes.  When I set it to 1, it used and index
on view_505 but no r3s169:

data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH
FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno =
s.dsiacctno;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=154730044.01..278318711.49 rows=285230272 width=11)
   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
   ->  Index Scan using view_505_dsiacctno on view_505 v
(cost=0.00..111923570.63 rows=112393848 width=20)
   ->  Sort  (cost=154730044.01..155443119.69 rows=285230272 width=17)
         Sort Key: s.dsiacctno
         ->  Seq Scan on r3s169 s  (cost=100000000.00..106873675.72
rows=285230272 width=17)


Setting to 0.1 finally gave me the result I was looking for. I know
that the index scan is faster though.  The seq scan never finished (i
killed it after 24+ hours) and I'm running the query now with indexes
and it's progressing nicely (will probably take 4 hours).

Re: Performance With Joins on Large Tables

From
Jeff Davis
Date:
On Wed, 2006-09-13 at 10:19 -0600, Joshua Marsh wrote:
> > Right, it's just used for planning. Avoid setting it too low, if it's
> > below about 2.0 you would most likely see some very strange plans.
> > Certainly it doesn't make sense at all to set it below 1.0, since that
> > is saying it's cheaper to get a random page than a sequential one.
> >
> > What was your original random_page_cost, and what is the new value you
> > set it to?
> >
> > Regards,
> >        Jeff Davis
> >
> >
> >
> >
>
> I tried it at several levels.  It was initially at 4 (the default).  I
> tried 3 and 2 with no changes.  When I set it to 1, it used and index
> on view_505 but no r3s169:
>
> data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
> v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH
> FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno =
> s.dsiacctno;
>                                                  QUERY PLAN
> ------------------------------------------------------------------------------------------------------------
>  Merge Join  (cost=154730044.01..278318711.49 rows=285230272 width=11)
>    Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
>    ->  Index Scan using view_505_dsiacctno on view_505 v
> (cost=0.00..111923570.63 rows=112393848 width=20)
>    ->  Sort  (cost=154730044.01..155443119.69 rows=285230272 width=17)
>          Sort Key: s.dsiacctno
>          ->  Seq Scan on r3s169 s  (cost=100000000.00..106873675.72
> rows=285230272 width=17)
>
>
> Setting to 0.1 finally gave me the result I was looking for. I know
> that the index scan is faster though.  The seq scan never finished (i
> killed it after 24+ hours) and I'm running the query now with indexes
> and it's progressing nicely (will probably take 4 hours).

Hmm... that sounds bad. I'm sure your system will always choose indexes
with that value.

Is it overestimating the cost of using indexes or underestimating the
cost of a seq scan, or both? Maybe explain with the 0.1 setting will
help?

Regards,
    Jeff Davis




Query Progress (was: Performance With Joins on Large Tables)

From
"Bucky Jordan"
Date:
Setting to 0.1 finally gave me the result I was looking for. I know
that the index scan is faster though.  The seq scan never finished (i
killed it after 24+ hours) and I'm running the query now with indexes
and it's progressing nicely (will probably take 4 hours).


In regards to "progressing nicely (will probably take 4 hours)" - is
this just an estimate or is there some way to get progress status (or
something similar- e.g. on step 6 of 20 planned steps) on a query in pg?
I looked through Chap 24, Monitoring DB Activity, but most of that looks
like aggregate stats. Trying to relate these to a particular query
doesn't really seem feasible.

This would be useful in the case where you have a couple of long running
transactions or stored procedures doing analysis and you'd like to give
the user some feedback where you're at.

Thanks,

Bucky

Re: Performance With Joins on Large Tables

From
"Marcin Mank"
Date:
> Is there anything I'm missing that is preventing it from using the index?
It
> just seems weird to me that other joins like this work fine and fast
> with indexes,
> but this one won't.


Did You consider clustering both tables on the dsiacctno index?

I just checked that for a 4M rows table even with enable_seqscan=on and
default *page_cost on PG 8.1.4 an index scan is being chosen for
select * from table order by serial_pkey_field


This is essentially the question in Your case - sort it, or get it sorted
via the index at the expense of more random IO.

I think clustering should work for You, but I am no expert, check with
others.

Greetings
Marcin


Re: Query Progress (was: Performance With Joins on Large Tables)

From
"Joshua Marsh"
Date:
On 9/13/06, Bucky Jordan <bjordan@lumeta.com> wrote:
>
> Setting to 0.1 finally gave me the result I was looking for. I know
> that the index scan is faster though.  The seq scan never finished (i
> killed it after 24+ hours) and I'm running the query now with indexes
> and it's progressing nicely (will probably take 4 hours).
>
>
> In regards to "progressing nicely (will probably take 4 hours)" - is
> this just an estimate or is there some way to get progress status (or
> something similar- e.g. on step 6 of 20 planned steps) on a query in pg?
> I looked through Chap 24, Monitoring DB Activity, but most of that looks
> like aggregate stats. Trying to relate these to a particular query
> doesn't really seem feasible.
>
> This would be useful in the case where you have a couple of long running
> transactions or stored procedures doing analysis and you'd like to give
> the user some feedback where you're at.
>
> Thanks,
>
> Bucky
>

I do it programmatically, not through postgresql.  I'm using a cursor,
so I can keep track of how many records I've handled.  I'm not aware
of a way to do this in Postgresql.

Re: Performance With Joins on Large Tables

From
"Joshua Marsh"
Date:
> Hmm... that sounds bad. I'm sure your system will always choose indexes
> with that value.
>
> Is it overestimating the cost of using indexes or underestimating the
> cost of a seq scan, or both? Maybe explain with the 0.1 setting will
> help?
>
> Regards,
>        Jeff Davis

data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER
JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..51808909.26 rows=285230272 width=11)
   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
   ->  Index Scan using view_505_dsiacctno on view_505 v
(cost=0.00..12755411.69 rows=112393848 width=20)
   ->  Index Scan using r3s169_dsiacctno on r3s169 s
(cost=0.00..32357747.90 rows=285230272 width=17)
(4 rows)

This is what I wanted, two index scans.  Just to give you an idea of
the difference in time, this plan would allow me to process 100,000
records ever few seconds, while the sequential scan would only
produces 100,000 every 10 minutes.

Re: Performance With Joins on Large Tables

From
Terje Elde
Date:
Jeff Davis wrote:
> Is it overestimating the cost of using indexes or underestimating the
> cost of a seq scan, or both? Maybe explain with the 0.1 setting will
> help?
>

If enable_seqscan is off, and cost is still set to 100000000, it could
be that it's quite simply forcibly underestimating the cost of a seqscan
in this case.

If enable_secscan was off for the mentioned plan, it'd be interesting to
see if things would be saner with seqscans enabled, and a more
reasonable random page cost.  If more 'sane' values still produce the
desired plan, it might be better for other plans etc.

Terje


Re: Performance With Joins on Large Tables

From
"Joshua Marsh"
Date:
On 9/13/06, Terje Elde <terje@elde.net> wrote:
> Jeff Davis wrote:
> > Is it overestimating the cost of using indexes or underestimating the
> > cost of a seq scan, or both? Maybe explain with the 0.1 setting will
> > help?
> >
>
> If enable_seqscan is off, and cost is still set to 100000000, it could
> be that it's quite simply forcibly underestimating the cost of a seqscan
> in this case.
>
> If enable_secscan was off for the mentioned plan, it'd be interesting to
> see if things would be saner with seqscans enabled, and a more
> reasonable random page cost.  If more 'sane' values still produce the
> desired plan, it might be better for other plans etc.
>
> Terje
>
>

I turned enable_seqscan to off and got similar results.

random_age_cost at 4.0:
data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER
JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Merge Join  (cost=293737539.01..301430139.34 rows=285230272 width=11)
   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
   ->  Sort  (cost=127311593.00..127592577.62 rows=112393848 width=20)
         Sort Key: v.dsiacctno
         ->  Seq Scan on view_505 v  (cost=100000000.00..104602114.48
rows=112393848 width=20)
   ->  Sort  (cost=166425946.01..167139021.69 rows=285230272 width=17)
         Sort Key: s.dsiacctno
         ->  Seq Scan on r3s169 s  (cost=100000000.00..106873675.72
rows=285230272 width=17)
(8 rows)



random_page_cost at 3.0:
data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER
JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Merge Join  (cost=288303269.01..295995869.34 rows=285230272 width=11)
   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
   ->  Sort  (cost=125775957.00..126056941.62 rows=112393848 width=20)
         Sort Key: v.dsiacctno
         ->  Seq Scan on view_505 v  (cost=100000000.00..104602114.48
rows=112393848 width=20)
   ->  Sort  (cost=162527312.01..163240387.69 rows=285230272 width=17)
         Sort Key: s.dsiacctno
         ->  Seq Scan on r3s169 s  (cost=100000000.00..106873675.72
rows=285230272 width=17)
(8 rows)



random_age_cost ad 2,0:
data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER
JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Merge Join  (cost=282868999.01..290561599.34 rows=285230272 width=11)
   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
   ->  Sort  (cost=124240321.00..124521305.62 rows=112393848 width=20)
         Sort Key: v.dsiacctno
         ->  Seq Scan on view_505 v  (cost=100000000.00..104602114.48
rows=112393848 width=20)
   ->  Sort  (cost=158628678.01..159341753.69 rows=285230272 width=17)
         Sort Key: s.dsiacctno
         ->  Seq Scan on r3s169 s  (cost=100000000.00..106873675.72
rows=285230272 width=17)
(8 rows)



random_page_cost at 1.0:
data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER
JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=154730044.01..274040257.41 rows=285230272 width=11)
   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
   ->  Index Scan using view_505_dsiacctno on view_505 v
(cost=0.00..111923570.63 rows=112393848 width=20)
   ->  Sort  (cost=154730044.01..155443119.69 rows=285230272 width=17)
         Sort Key: s.dsiacctno
         ->  Seq Scan on r3s169 s  (cost=100000000.00..106873675.72
rows=285230272 width=17)
(6 rows)



random_page_cost ad 0.1:
data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER
JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..51808909.26 rows=285230272 width=11)
   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
   ->  Index Scan using view_505_dsiacctno on view_505 v
(cost=0.00..12755411.69 rows=112393848 width=20)
   ->  Index Scan using r3s169_dsiacctno on r3s169 s
(cost=0.00..32357747.90 rows=285230272 width=17)
(4 rows)

I have a suspision that pgsql isn't tuned to properly deal with tables
of this size.  Are there other things I should look at when dealing
with a database of this size.

Re: Performance With Joins on Large Tables

From
Tom Lane
Date:
"Joshua Marsh" <icub3d@gmail.com> writes:
> I have a suspision that pgsql isn't tuned to properly deal with tables
> of this size.

Actually, it is.  Most of the planner complaints we get are from people
whose tables fit in memory and they find that the default planner
behavior doesn't apply real well to that case.  I find your
indexscan-is-faster-than-sort results pretty suspicious for large
tables.  Are the tables perhaps nearly in order by the dsiacctno fields?
If that were the case, and the planner were missing it for some reason,
these results would be plausible.

BTW, what are you using for work_mem, and how does that compare to your
available RAM?

            regards, tom lane

Re: Performance With Joins on Large Tables

From
"Joshua Marsh"
Date:
> Are the tables perhaps nearly in order by the dsiacctno fields?
> If that were the case, and the planner were missing it for some reason,
> these results would be plausible.
>
> BTW, what are you using for work_mem, and how does that compare to your
> available RAM?
>
>                        regards, tom lane
>

My assumption would be they are in exact order.  The text file I used
in the COPY statement had them in order, so if COPY preserves that in
the database, then it is in order.

The system has 8GB of ram and work_mem is set to 256MB.

I'll see if I can't make time to run the sort-seqscan method so we can
have an exact time to work with.

Re: Performance With Joins on Large Tables

From
Tom Lane
Date:
"Joshua Marsh" <icub3d@gmail.com> writes:
>> Are the tables perhaps nearly in order by the dsiacctno fields?

> My assumption would be they are in exact order.  The text file I used
> in the COPY statement had them in order, so if COPY preserves that in
> the database, then it is in order.

Ah.  So the question is why the planner isn't noticing that.  What do
you see in the pg_stats view for the two dsiacctno fields --- the
correlation field in particular?

> The system has 8GB of ram and work_mem is set to 256MB.

Seems reasonable enough.  BTW, I don't think you've mentioned exactly
which PG version you're using?

            regards, tom lane

Re: Performance With Joins on Large Tables

From
"Joshua Marsh"
Date:
On 9/13/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Joshua Marsh" <icub3d@gmail.com> writes:
> >> Are the tables perhaps nearly in order by the dsiacctno fields?
>
> > My assumption would be they are in exact order.  The text file I used
> > in the COPY statement had them in order, so if COPY preserves that in
> > the database, then it is in order.
>
> Ah.  So the question is why the planner isn't noticing that.  What do
> you see in the pg_stats view for the two dsiacctno fields --- the
> correlation field in particular?


Here are the results:
data=# select tablename, attname, n_distinct, avg_width, correlation
from pg_stats where tablename in ('view_505', 'r3s169') and attname =
'dsiacctno';
 tablename |  attname  | n_distinct | avg_width | correlation
-----------+-----------+------------+-----------+-------------
 view_505  | dsiacctno |         -1 |        13 |    -0.13912
 r3s169    | dsiacctno |      44156 |        13 |   -0.126824
(2 rows)


Someone suggested CLUSTER to make sure they are in fact ordered, I can
try that to and let everyone know the results.

> > The system has 8GB of ram and work_mem is set to 256MB.
>
> Seems reasonable enough.  BTW, I don't think you've mentioned exactly
> which PG version you're using?
>
>                        regards, tom lane
>

I am using 8.0.3.

Re: Performance With Joins on Large Tables

From
Tom Lane
Date:
"Joshua Marsh" <icub3d@gmail.com> writes:
>>> On 9/13/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Are the tables perhaps nearly in order by the dsiacctno fields?
>>
>>> My assumption would be they are in exact order.  The text file I used
>>> in the COPY statement had them in order, so if COPY preserves that in
>>> the database, then it is in order.
>>
>> Ah.  So the question is why the planner isn't noticing that.  What do
>> you see in the pg_stats view for the two dsiacctno fields --- the
>> correlation field in particular?

> Here are the results:
> data=# select tablename, attname, n_distinct, avg_width, correlation
> from pg_stats where tablename in ('view_505', 'r3s169') and attname =
> 'dsiacctno';
>  tablename |  attname  | n_distinct | avg_width | correlation
> -----------+-----------+------------+-----------+-------------
>  view_505  | dsiacctno |         -1 |        13 |    -0.13912
>  r3s169    | dsiacctno |      44156 |        13 |   -0.126824
> (2 rows)

Wow, that correlation value is *way* away from order.  If they were
really in exact order by dsiacctno then I'd expect to see 1.0 in
that column.  Can you take another look at the tables and confirm
the ordering?  Does the correlation change if you do an ANALYZE on the
tables?  (Some small change is to be expected due to random sampling,
but this is way off.)

            regards, tom lane

Re: Performance With Joins on Large Tables

From
"Joshua Marsh"
Date:
Wow, that correlation value is *way* away from order.  If they were
really in exact order by dsiacctno then I'd expect to see 1.0 in
that column.  Can you take another look at the tables and confirm
the ordering?  Does the correlation change if you do an ANALYZE on the
tables?  (Some small change is to be expected due to random sampling,
but this is way off.)

                       regards, tom lane
 
Thanks for pointing that out.  Generally we load the tables via COPY and then never touch the data.  Because of the slowdown, I have been updating tuples.  I reloaded it from scratch, set enable_seqscan=off and random_access_age=4 and I got the results I was looking for:
 
 
data=# analyze view_505;
ANALYZE
data=# analyze r3s169;
ANALYZE
data=# select tablename, attname, n_distinct, avg_width, correlation from pg_stats where tablename in ('view_505', 'r3s169') and attname = 'dsiacctno';
 tablename |  attname  | n_distinct | avg_width | correlation
-----------+-----------+------------+-----------+-------------
 view_505  | dsiacctno |         -1 |        13 |           1
 r3s169    | dsiacctno |      42140 |        13 |           1
(2 rows)

data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..20099712.79 rows=285153952 width=11)
   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
   ->  Index Scan using view_505_dsiacctno on view_505 v  (cost=0.00..5147252.74 rows=112282976 width=20)
   ->  Index Scan using r3s169_dsiacctno on r3s169 s  (cost=0.00..8256331.47 rows=285153952 width=17)
(4 rows)

 Thanks for you help everyone.