Thread: Performance With Joins on Large Tables
I am having problems performing a join on two large tables. It seems to only want to use a sequential scan on the join, but that method seems to be slower than an index scan. I've never actually had it complete the sequential scan because I stop it after 24+ hours. I've run joins against large tables before and an index scan was always faster (a few hours at the most). Here is some information on the two tables: data=# analyze view_505; ANALYZE data=# analyze r3s169; ANALYZE data=# \d view_505 Table "public.view_505" Column | Type | Modifiers ------------------+-----------------------+----------- dsiacctno | numeric | name | boolean | title | boolean | company | boolean | zip4 | boolean | acceptcall | boolean | phonedirect | smallint | phonetollfree | smallint | fax | smallint | editdrop | boolean | postsuppress | boolean | firstnameinit | boolean | prefix | integer | crrt | boolean | dpbc | boolean | executive | integer | addressline | integer | multibuyer | integer | activemultibuyer | integer | active | boolean | emails | integer | domains | integer | zip1 | character varying(1) | zip3 | character varying(3) | gender | character varying(1) | topdomains | bit varying | city | character varying(35) | state | character varying(35) | zip | character varying(20) | country | character varying(30) | selects | bit varying | files | integer[] | sics | integer[] | custdate | date | Indexes: "view_505_city" btree (city) "view_505_dsiacctno" btree (dsiacctno) "view_505_state" btree (state) "view_505_zip" btree (zip) "view_505_zip1" btree (zip1) "view_505_zip3" btree (zip3) data=# \d r3s169 Table "public.r3s169" Column | Type | Modifiers -------------+------------------------+----------- dsiacctno | numeric | fileid | integer | customerid | character varying(20) | email | character varying(100) | sic2 | character varying(2) | sic4 | character varying(4) | sic6 | character varying(6) | custdate | date | inqdate | date | eentrydate | date | esubdate | date | efaildate | date | eunlistdate | date | pentrydate | date | psubdate | date | punlistdate | date | pexpiredate | date | lastupdate | date | emaildrop | numeric | sic8 | character varying(8) | Indexes: "r3s169_dsiacctno" btree (dsiacctno) data=# select count(*) from view_505; count ----------- 112393845 (1 row) data=# select count(*) from r3s169; count ----------- 285230264 (1 row) Here is what EXPLAIN says: data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno = s.dsiacctno; QUERY PLAN ----------------------------------------------------------------------------------------------- Merge Join (cost=293767607.69..305744319.52 rows=285392608 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Sort (cost=127304933.87..127585815.71 rows=112352736 width=20) Sort Key: v.dsiacctno -> Seq Scan on view_505 v (cost=100000000.00..104604059.36 rows=112352736 width=20) -> Sort (cost=166462673.82..167176155.34 rows=285392608 width=17) Sort Key: s.dsiacctno -> Seq Scan on r3s169 s (cost=100000000.00..106875334.08 rows=285392608 width=17) (8 rows) I can't really do and EXPLAIN ANALYZE because the query never really finishes. Also, I use a cursor to loop through the data. view_505 isn't a pgsql view, its just how we decided to name the table. There is a one to many relationship between view_505 and r3s169. Since enable_seqscan is off, my understanding is that in order for the query planner to user a sequential scan it must think there is no other alternative. Both sides are indexed and anaylzed, so that confuses me a little. I tried it on a smaller sample set of the data and it works fine: data=# select * into r3s169_test from r3s169 limit 1000000; SELECT data=# select * into view_505_test from view_505 limit 1000000; SELECT data=# create index r3s169_test_dsiacctno on r3s169_test (dsiacctno); CREATE INDEX data=# create index view_505_test_dsiacctno on view_505_test (dsiacctno); CREATE INDEX data=# analyze r3s169_test; ANALYZE data=# analyze view_505_test; ANALYZE data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH FROM s.custdate) FROM view_505_test v INNER JOIN r3s169_test s ON v.dsiacctno = s.dsiacctno; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Merge Join (cost=0.00..1976704.69 rows=1000187 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Index Scan using view_505_test_dsiacctno on view_505_test v (cost=0.00..1676260.67 rows=999985 width=20) -> Index Scan using r3s169_test_dsiacctno on r3s169_test s (cost=0.00..1089028.66 rows=1000186 width=17) (4 rows) Is there anything I'm missing that is preventing it from using the index? It just seems weird to me that other joins like this work fine and fast with indexes, but this one won't.
On Tue, Sep 12, 2006 at 04:17:34PM -0600, Joshua Marsh wrote: > data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM > v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH > FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno = > s.dsiacctno; > QUERY PLAN > ----------------------------------------------------------------------------------------------- > Merge Join (cost=293767607.69..305744319.52 rows=285392608 width=11) > Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) > -> Sort (cost=127304933.87..127585815.71 rows=112352736 width=20) > Sort Key: v.dsiacctno > -> Seq Scan on view_505 v (cost=100000000.00..104604059.36 > rows=112352736 width=20) > -> Sort (cost=166462673.82..167176155.34 rows=285392608 width=17) > Sort Key: s.dsiacctno > -> Seq Scan on r3s169 s (cost=100000000.00..106875334.08 > rows=285392608 width=17) > (8 rows) > > > Since enable_seqscan is off, my understanding is that in order for the query > planner to user a sequential scan it must think there is no other > alternative. > Both sides are indexed and anaylzed, so that confuses me a little. > > I tried it on a smaller sample set of the data and it works fine: Actually, enable_seqscan=off just adds a fixed overhead to the seqscan cost estimate. That's why the cost for the seqscans in that plan starts at 100000000. I've suggested changing that to a variable overhead based on the expected rowcount, but the counter-argument was that anyone with so much data that the fixed amount wouldn't work would most likely be having bigger issues anyway. Other things you can try to get the index scan back would be to reduce random_page_cost and to analyze the join fields in those tables with a higher statistics target (though I'm not 100% certain the join cost estimator actually takes that into account). Or if you don't mind patching your source code, it wouldn't be difficult to make enable_seqscan use a bigger 'penalty value' than 10000000. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On 9/13/06, Jim C. Nasby <jim@nasby.net> wrote: > On Tue, Sep 12, 2006 at 04:17:34PM -0600, Joshua Marsh wrote: > > data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM > > v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH > > FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno = > > s.dsiacctno; > > QUERY PLAN > > ----------------------------------------------------------------------------------------------- > > Merge Join (cost=293767607.69..305744319.52 rows=285392608 width=11) > > Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) > > -> Sort (cost=127304933.87..127585815.71 rows=112352736 width=20) > > Sort Key: v.dsiacctno > > -> Seq Scan on view_505 v (cost=100000000.00..104604059.36 > > rows=112352736 width=20) > > -> Sort (cost=166462673.82..167176155.34 rows=285392608 width=17) > > Sort Key: s.dsiacctno > > -> Seq Scan on r3s169 s (cost=100000000.00..106875334.08 > > rows=285392608 width=17) > > (8 rows) > > > > > > Since enable_seqscan is off, my understanding is that in order for the query > > planner to user a sequential scan it must think there is no other > > alternative. > > Both sides are indexed and anaylzed, so that confuses me a little. > > > > I tried it on a smaller sample set of the data and it works fine: > > Actually, enable_seqscan=off just adds a fixed overhead to the seqscan > cost estimate. That's why the cost for the seqscans in that plan starts > at 100000000. I've suggested changing that to a variable overhead based > on the expected rowcount, but the counter-argument was that anyone with > so much data that the fixed amount wouldn't work would most likely be > having bigger issues anyway. > > Other things you can try to get the index scan back would be to reduce > random_page_cost and to analyze the join fields in those tables with a > higher statistics target (though I'm not 100% certain the join cost > estimator actually takes that into account). Or if you don't mind > patching your source code, it wouldn't be difficult to make > enable_seqscan use a bigger 'penalty value' than 10000000. > -- > Jim Nasby jim@nasby.net > EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) > Thanks for the tip. I lowered random_page_cost and got these results: data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno = s.dsiacctno; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Merge Join (cost=0.00..20921221.49 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Index Scan using view_505_dsiacctno on view_505 v (cost=0.00..2838595.79 rows=112393848 width=20) -> Index Scan using r3s169_dsiacctno on r3s169 s (cost=0.00..7106203.68 rows=285230272 width=17) (4 rows) That seems to have done it. Are there any side effects to this change? I read about random_page_cost in the documentation and it seems like this is strictly for planning. All the tables on this database will be indexed and of a size similar to these two, so I don't see it causing any other problems. Though I would check though :)
On Wed, 2006-09-13 at 08:49 -0600, Joshua Marsh wrote: > That seems to have done it. Are there any side effects to this > change? I read about random_page_cost in the documentation and it > seems like this is strictly for planning. All the tables on this > database will be indexed and of a size similar to these two, so I > don't see it causing any other problems. Though I would check though > :) > Right, it's just used for planning. Avoid setting it too low, if it's below about 2.0 you would most likely see some very strange plans. Certainly it doesn't make sense at all to set it below 1.0, since that is saying it's cheaper to get a random page than a sequential one. What was your original random_page_cost, and what is the new value you set it to? Regards, Jeff Davis
On 9/13/06, Jeff Davis <pgsql@j-davis.com> wrote: > On Wed, 2006-09-13 at 08:49 -0600, Joshua Marsh wrote: > > That seems to have done it. Are there any side effects to this > > change? I read about random_page_cost in the documentation and it > > seems like this is strictly for planning. All the tables on this > > database will be indexed and of a size similar to these two, so I > > don't see it causing any other problems. Though I would check though > > :) > > > > Right, it's just used for planning. Avoid setting it too low, if it's > below about 2.0 you would most likely see some very strange plans. > Certainly it doesn't make sense at all to set it below 1.0, since that > is saying it's cheaper to get a random page than a sequential one. > > What was your original random_page_cost, and what is the new value you > set it to? > > Regards, > Jeff Davis > > > > I tried it at several levels. It was initially at 4 (the default). I tried 3 and 2 with no changes. When I set it to 1, it used and index on view_505 but no r3s169: data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno = s.dsiacctno; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Merge Join (cost=154730044.01..278318711.49 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Index Scan using view_505_dsiacctno on view_505 v (cost=0.00..111923570.63 rows=112393848 width=20) -> Sort (cost=154730044.01..155443119.69 rows=285230272 width=17) Sort Key: s.dsiacctno -> Seq Scan on r3s169 s (cost=100000000.00..106873675.72 rows=285230272 width=17) Setting to 0.1 finally gave me the result I was looking for. I know that the index scan is faster though. The seq scan never finished (i killed it after 24+ hours) and I'm running the query now with indexes and it's progressing nicely (will probably take 4 hours).
On Wed, 2006-09-13 at 10:19 -0600, Joshua Marsh wrote: > > Right, it's just used for planning. Avoid setting it too low, if it's > > below about 2.0 you would most likely see some very strange plans. > > Certainly it doesn't make sense at all to set it below 1.0, since that > > is saying it's cheaper to get a random page than a sequential one. > > > > What was your original random_page_cost, and what is the new value you > > set it to? > > > > Regards, > > Jeff Davis > > > > > > > > > > I tried it at several levels. It was initially at 4 (the default). I > tried 3 and 2 with no changes. When I set it to 1, it used and index > on view_505 but no r3s169: > > data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM > v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH > FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno = > s.dsiacctno; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------ > Merge Join (cost=154730044.01..278318711.49 rows=285230272 width=11) > Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) > -> Index Scan using view_505_dsiacctno on view_505 v > (cost=0.00..111923570.63 rows=112393848 width=20) > -> Sort (cost=154730044.01..155443119.69 rows=285230272 width=17) > Sort Key: s.dsiacctno > -> Seq Scan on r3s169 s (cost=100000000.00..106873675.72 > rows=285230272 width=17) > > > Setting to 0.1 finally gave me the result I was looking for. I know > that the index scan is faster though. The seq scan never finished (i > killed it after 24+ hours) and I'm running the query now with indexes > and it's progressing nicely (will probably take 4 hours). Hmm... that sounds bad. I'm sure your system will always choose indexes with that value. Is it overestimating the cost of using indexes or underestimating the cost of a seq scan, or both? Maybe explain with the 0.1 setting will help? Regards, Jeff Davis
Setting to 0.1 finally gave me the result I was looking for. I know that the index scan is faster though. The seq scan never finished (i killed it after 24+ hours) and I'm running the query now with indexes and it's progressing nicely (will probably take 4 hours). In regards to "progressing nicely (will probably take 4 hours)" - is this just an estimate or is there some way to get progress status (or something similar- e.g. on step 6 of 20 planned steps) on a query in pg? I looked through Chap 24, Monitoring DB Activity, but most of that looks like aggregate stats. Trying to relate these to a particular query doesn't really seem feasible. This would be useful in the case where you have a couple of long running transactions or stored procedures doing analysis and you'd like to give the user some feedback where you're at. Thanks, Bucky
> Is there anything I'm missing that is preventing it from using the index? It > just seems weird to me that other joins like this work fine and fast > with indexes, > but this one won't. Did You consider clustering both tables on the dsiacctno index? I just checked that for a 4M rows table even with enable_seqscan=on and default *page_cost on PG 8.1.4 an index scan is being chosen for select * from table order by serial_pkey_field This is essentially the question in Your case - sort it, or get it sorted via the index at the expense of more random IO. I think clustering should work for You, but I am no expert, check with others. Greetings Marcin
On 9/13/06, Bucky Jordan <bjordan@lumeta.com> wrote: > > Setting to 0.1 finally gave me the result I was looking for. I know > that the index scan is faster though. The seq scan never finished (i > killed it after 24+ hours) and I'm running the query now with indexes > and it's progressing nicely (will probably take 4 hours). > > > In regards to "progressing nicely (will probably take 4 hours)" - is > this just an estimate or is there some way to get progress status (or > something similar- e.g. on step 6 of 20 planned steps) on a query in pg? > I looked through Chap 24, Monitoring DB Activity, but most of that looks > like aggregate stats. Trying to relate these to a particular query > doesn't really seem feasible. > > This would be useful in the case where you have a couple of long running > transactions or stored procedures doing analysis and you'd like to give > the user some feedback where you're at. > > Thanks, > > Bucky > I do it programmatically, not through postgresql. I'm using a cursor, so I can keep track of how many records I've handled. I'm not aware of a way to do this in Postgresql.
> Hmm... that sounds bad. I'm sure your system will always choose indexes > with that value. > > Is it overestimating the cost of using indexes or underestimating the > cost of a seq scan, or both? Maybe explain with the 0.1 setting will > help? > > Regards, > Jeff Davis data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Merge Join (cost=0.00..51808909.26 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Index Scan using view_505_dsiacctno on view_505 v (cost=0.00..12755411.69 rows=112393848 width=20) -> Index Scan using r3s169_dsiacctno on r3s169 s (cost=0.00..32357747.90 rows=285230272 width=17) (4 rows) This is what I wanted, two index scans. Just to give you an idea of the difference in time, this plan would allow me to process 100,000 records ever few seconds, while the sequential scan would only produces 100,000 every 10 minutes.
Jeff Davis wrote: > Is it overestimating the cost of using indexes or underestimating the > cost of a seq scan, or both? Maybe explain with the 0.1 setting will > help? > If enable_seqscan is off, and cost is still set to 100000000, it could be that it's quite simply forcibly underestimating the cost of a seqscan in this case. If enable_secscan was off for the mentioned plan, it'd be interesting to see if things would be saner with seqscans enabled, and a more reasonable random page cost. If more 'sane' values still produce the desired plan, it might be better for other plans etc. Terje
On 9/13/06, Terje Elde <terje@elde.net> wrote: > Jeff Davis wrote: > > Is it overestimating the cost of using indexes or underestimating the > > cost of a seq scan, or both? Maybe explain with the 0.1 setting will > > help? > > > > If enable_seqscan is off, and cost is still set to 100000000, it could > be that it's quite simply forcibly underestimating the cost of a seqscan > in this case. > > If enable_secscan was off for the mentioned plan, it'd be interesting to > see if things would be saner with seqscans enabled, and a more > reasonable random page cost. If more 'sane' values still produce the > desired plan, it might be better for other plans etc. > > Terje > > I turned enable_seqscan to off and got similar results. random_age_cost at 4.0: data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno; QUERY PLAN ----------------------------------------------------------------------------------------------- Merge Join (cost=293737539.01..301430139.34 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Sort (cost=127311593.00..127592577.62 rows=112393848 width=20) Sort Key: v.dsiacctno -> Seq Scan on view_505 v (cost=100000000.00..104602114.48 rows=112393848 width=20) -> Sort (cost=166425946.01..167139021.69 rows=285230272 width=17) Sort Key: s.dsiacctno -> Seq Scan on r3s169 s (cost=100000000.00..106873675.72 rows=285230272 width=17) (8 rows) random_page_cost at 3.0: data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno; QUERY PLAN ----------------------------------------------------------------------------------------------- Merge Join (cost=288303269.01..295995869.34 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Sort (cost=125775957.00..126056941.62 rows=112393848 width=20) Sort Key: v.dsiacctno -> Seq Scan on view_505 v (cost=100000000.00..104602114.48 rows=112393848 width=20) -> Sort (cost=162527312.01..163240387.69 rows=285230272 width=17) Sort Key: s.dsiacctno -> Seq Scan on r3s169 s (cost=100000000.00..106873675.72 rows=285230272 width=17) (8 rows) random_age_cost ad 2,0: data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno; QUERY PLAN ----------------------------------------------------------------------------------------------- Merge Join (cost=282868999.01..290561599.34 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Sort (cost=124240321.00..124521305.62 rows=112393848 width=20) Sort Key: v.dsiacctno -> Seq Scan on view_505 v (cost=100000000.00..104602114.48 rows=112393848 width=20) -> Sort (cost=158628678.01..159341753.69 rows=285230272 width=17) Sort Key: s.dsiacctno -> Seq Scan on r3s169 s (cost=100000000.00..106873675.72 rows=285230272 width=17) (8 rows) random_page_cost at 1.0: data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Merge Join (cost=154730044.01..274040257.41 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Index Scan using view_505_dsiacctno on view_505 v (cost=0.00..111923570.63 rows=112393848 width=20) -> Sort (cost=154730044.01..155443119.69 rows=285230272 width=17) Sort Key: s.dsiacctno -> Seq Scan on r3s169 s (cost=100000000.00..106873675.72 rows=285230272 width=17) (6 rows) random_page_cost ad 0.1: data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Merge Join (cost=0.00..51808909.26 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Index Scan using view_505_dsiacctno on view_505 v (cost=0.00..12755411.69 rows=112393848 width=20) -> Index Scan using r3s169_dsiacctno on r3s169 s (cost=0.00..32357747.90 rows=285230272 width=17) (4 rows) I have a suspision that pgsql isn't tuned to properly deal with tables of this size. Are there other things I should look at when dealing with a database of this size.
"Joshua Marsh" <icub3d@gmail.com> writes: > I have a suspision that pgsql isn't tuned to properly deal with tables > of this size. Actually, it is. Most of the planner complaints we get are from people whose tables fit in memory and they find that the default planner behavior doesn't apply real well to that case. I find your indexscan-is-faster-than-sort results pretty suspicious for large tables. Are the tables perhaps nearly in order by the dsiacctno fields? If that were the case, and the planner were missing it for some reason, these results would be plausible. BTW, what are you using for work_mem, and how does that compare to your available RAM? regards, tom lane
> Are the tables perhaps nearly in order by the dsiacctno fields? > If that were the case, and the planner were missing it for some reason, > these results would be plausible. > > BTW, what are you using for work_mem, and how does that compare to your > available RAM? > > regards, tom lane > My assumption would be they are in exact order. The text file I used in the COPY statement had them in order, so if COPY preserves that in the database, then it is in order. The system has 8GB of ram and work_mem is set to 256MB. I'll see if I can't make time to run the sort-seqscan method so we can have an exact time to work with.
"Joshua Marsh" <icub3d@gmail.com> writes: >> Are the tables perhaps nearly in order by the dsiacctno fields? > My assumption would be they are in exact order. The text file I used > in the COPY statement had them in order, so if COPY preserves that in > the database, then it is in order. Ah. So the question is why the planner isn't noticing that. What do you see in the pg_stats view for the two dsiacctno fields --- the correlation field in particular? > The system has 8GB of ram and work_mem is set to 256MB. Seems reasonable enough. BTW, I don't think you've mentioned exactly which PG version you're using? regards, tom lane
On 9/13/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Joshua Marsh" <icub3d@gmail.com> writes: > >> Are the tables perhaps nearly in order by the dsiacctno fields? > > > My assumption would be they are in exact order. The text file I used > > in the COPY statement had them in order, so if COPY preserves that in > > the database, then it is in order. > > Ah. So the question is why the planner isn't noticing that. What do > you see in the pg_stats view for the two dsiacctno fields --- the > correlation field in particular? Here are the results: data=# select tablename, attname, n_distinct, avg_width, correlation from pg_stats where tablename in ('view_505', 'r3s169') and attname = 'dsiacctno'; tablename | attname | n_distinct | avg_width | correlation -----------+-----------+------------+-----------+------------- view_505 | dsiacctno | -1 | 13 | -0.13912 r3s169 | dsiacctno | 44156 | 13 | -0.126824 (2 rows) Someone suggested CLUSTER to make sure they are in fact ordered, I can try that to and let everyone know the results. > > The system has 8GB of ram and work_mem is set to 256MB. > > Seems reasonable enough. BTW, I don't think you've mentioned exactly > which PG version you're using? > > regards, tom lane > I am using 8.0.3.
"Joshua Marsh" <icub3d@gmail.com> writes: >>> On 9/13/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Are the tables perhaps nearly in order by the dsiacctno fields? >> >>> My assumption would be they are in exact order. The text file I used >>> in the COPY statement had them in order, so if COPY preserves that in >>> the database, then it is in order. >> >> Ah. So the question is why the planner isn't noticing that. What do >> you see in the pg_stats view for the two dsiacctno fields --- the >> correlation field in particular? > Here are the results: > data=# select tablename, attname, n_distinct, avg_width, correlation > from pg_stats where tablename in ('view_505', 'r3s169') and attname = > 'dsiacctno'; > tablename | attname | n_distinct | avg_width | correlation > -----------+-----------+------------+-----------+------------- > view_505 | dsiacctno | -1 | 13 | -0.13912 > r3s169 | dsiacctno | 44156 | 13 | -0.126824 > (2 rows) Wow, that correlation value is *way* away from order. If they were really in exact order by dsiacctno then I'd expect to see 1.0 in that column. Can you take another look at the tables and confirm the ordering? Does the correlation change if you do an ANALYZE on the tables? (Some small change is to be expected due to random sampling, but this is way off.) regards, tom lane
Wow, that correlation value is *way* away from order. If they were
really in exact order by dsiacctno then I'd expect to see 1.0 in
that column. Can you take another look at the tables and confirm
the ordering? Does the correlation change if you do an ANALYZE on the
tables? (Some small change is to be expected due to random sampling,
but this is way off.)
regards, tom lane
Thanks for pointing that out. Generally we load the tables via COPY and then never touch the data. Because of the slowdown, I have been updating tuples. I reloaded it from scratch, set enable_seqscan=off and random_access_age=4 and I got the results I was looking for:
data=# analyze view_505;
ANALYZE
data=# analyze r3s169;
ANALYZE
data=# select tablename, attname, n_distinct, avg_width, correlation from pg_stats where tablename in ('view_505', 'r3s169') and attname = 'dsiacctno';
tablename | attname | n_distinct | avg_width | correlation
-----------+-----------+------------+-----------+-------------
view_505 | dsiacctno | -1 | 13 | 1
r3s169 | dsiacctno | 42140 | 13 | 1
(2 rows)
ANALYZE
data=# analyze r3s169;
ANALYZE
data=# select tablename, attname, n_distinct, avg_width, correlation from pg_stats where tablename in ('view_505', 'r3s169') and attname = 'dsiacctno';
tablename | attname | n_distinct | avg_width | correlation
-----------+-----------+------------+-----------+-------------
view_505 | dsiacctno | -1 | 13 | 1
r3s169 | dsiacctno | 42140 | 13 | 1
(2 rows)
data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Merge Join (cost=0.00..20099712.79 rows=285153952 width=11)
Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
-> Index Scan using view_505_dsiacctno on view_505 v (cost=0.00..5147252.74 rows=112282976 width=20)
-> Index Scan using r3s169_dsiacctno on r3s169 s (cost=0.00..8256331.47 rows=285153952 width=17)
(4 rows)
Thanks for you help everyone.