Thread: slow joining very large table to smaller ones
I'm trying to improve the speed of this query: explain select recordtext from eventactivity inner join ( select incidentid from k_r where id = 94 ) a using ( incidentid ) inner join ( select incidentid from k_b where id = 107 ) b using ( incidentid ); QUERY PLAN ------------------------------------------------------------------------ -------------------------------------- Merge Join (cost=2747.29..4249364.96 rows=11968693 width=35) Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") -> Merge Join (cost=1349.56..4230052.73 rows=4413563 width=117) Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") -> Index Scan using eventactivity1 on eventactivity (cost=0.00..4051200.28 rows=44519781 width=49) -> Sort (cost=1349.56..1350.85 rows=517 width=68) Sort Key: (k_b.incidentid)::text -> Index Scan using k_b_idx on k_b (cost=0.00..1326.26 rows=517 width=68) Index Cond: (id = 107) -> Sort (cost=1397.73..1399.09 rows=542 width=68) Sort Key: (k_r.incidentid)::text -> Index Scan using k_r_idx on k_r (cost=0.00..1373.12 rows=542 width=68) Index Cond: (id = 94) (13 rows) There are many millions of rows in eventactivity. There are a few ten-thousand rows in k_r and k_b. There is an index on 'incidentid' in all three tables. There should only be less than 100 rows matched in k_r and k_b total. That part on its own is very very fast. But, it should have those 100 or so incidentids extracted in under a second and then go into eventactivity AFTER doing that. At least, that's my intention to make this fast. Right now, it looks like pg is trying to sort the entire eventactivity table for the merge join which is taking several minutes to do. Can I rephrase this so that it does the searching through k_r and k_b FIRST and then go into eventactivity using the index on incidentid? It seems like that shouldn't be too hard to make fast but my SQL query skills are only average. Thanks -Dan
Dan Harris wrote: > I'm trying to improve the speed of this query: > > explain select recordtext from eventactivity inner join ( select > incidentid from k_r where id = 94 ) a using ( incidentid ) inner join ( > select incidentid from k_b where id = 107 ) b using ( incidentid ); You might try giving it a little bit more freedom with: EXPLAIN ANALYZE SELECT recordtext FROM eventactivity, k_r, k_b WHERE eventactivity.incidentid = k_r.incidentid AND eventactivity.incidentid = k_b.incidentid AND k_r.id = 94 AND k_b.id = 107 -- AND k_r.incidentid = k_b.incidentid ; I'm pretty sure that would give identical results, just let the planner have a little bit more freedom about how it does it. Also the last line is commented out, because I think it is redundant. You might also try: EXPLAIN ANALYZE SELECT recordtext FROM eventactivity JOIN k_r USING (incidentid) JOIN k_b USING (incidentid) WHERE k_r.id = 94 AND k_b.id = 107 ; Also, if possible give us the EXPLAIN ANALYZE so that we know if the planner is making accurate estimates. (You might send an EXPLAIN while waiting for the EXPLAIN ANALYZE to finish) You can also try disabling merge joins, and see how that changes things. > QUERY PLAN > ------------------------------------------------------------------------ > -------------------------------------- > Merge Join (cost=2747.29..4249364.96 rows=11968693 width=35) > Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") > -> Merge Join (cost=1349.56..4230052.73 rows=4413563 width=117) > Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") > -> Index Scan using eventactivity1 on eventactivity > (cost=0.00..4051200.28 rows=44519781 width=49) > -> Sort (cost=1349.56..1350.85 rows=517 width=68) > Sort Key: (k_b.incidentid)::text > -> Index Scan using k_b_idx on k_b (cost=0.00..1326.26 > rows=517 width=68) > Index Cond: (id = 107) > -> Sort (cost=1397.73..1399.09 rows=542 width=68) > Sort Key: (k_r.incidentid)::text > -> Index Scan using k_r_idx on k_r (cost=0.00..1373.12 > rows=542 width=68) > Index Cond: (id = 94) > (13 rows) > > > There are many millions of rows in eventactivity. There are a few > ten-thousand rows in k_r and k_b. There is an index on 'incidentid' in > all three tables. There should only be less than 100 rows matched in > k_r and k_b total. That part on its own is very very fast. But, it > should have those 100 or so incidentids extracted in under a second and > then go into eventactivity AFTER doing that. At least, that's my > intention to make this fast. Well, postgres is estimating around 500 rows each, is that way off? Try just doing: EXPLAIN ANALYZE SELECT incidentid FROM k_b WHERE id = 107; EXPLAIN ANALYZE SELECT incidentid FROM k_r WHERE id = 94; And see if postgres estimates the number of rows properly. I assume you have recently VACUUM ANALYZEd, which means you might need to update the statistics target (ALTER TABLE k_b ALTER COLUMN incidientid SET STATISTICS 100) default is IIRC 10, ranges from 1-1000, higher is more accurate, but makes ANALYZE slower. > > Right now, it looks like pg is trying to sort the entire eventactivity > table for the merge join which is taking several minutes to do. Can I > rephrase this so that it does the searching through k_r and k_b FIRST > and then go into eventactivity using the index on incidentid? It seems > like that shouldn't be too hard to make fast but my SQL query skills > are only average. To me, it looks like it is doing an index scan (on k_b.id) through k_b first, sorting the results by incidentid, then merge joining that with eventactivity. I'm guessing you actually want it to merge k_b and k_r to get extra selectivity before joining against eventactivity. I think my alternate forms would let postgres realize this. But if not, you could try: EXPLAIN ANALYZE SELECT recordtext FROM eventactivity JOIN (SELECT incidentid FROM k_r JOIN k_b USING (incidentid) WHERE k_r.id = 94 AND k_b.id = 107) USING (incidentid); I don't know how selective your keys are, but one of these queries should probably structure it better for the planner. It depends a lot on how selective your query is. If you have 100M rows, the above query looks like it expects k_r to restrict it to 44M rows, and k_r + k_b down to 11M rows, which really should be a seq scan (> 10% of the rows = seq scan). But if you are saying the selectivity is mis-estimated it could be different. John =:-> > > Thanks > -Dan > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
Attachment
On Jul 14, 2005, at 9:42 AM, John A Meinel wrote: > > > You might try giving it a little bit more freedom with: > > EXPLAIN ANALYZE > SELECT recordtext FROM eventactivity, k_r, k_b > WHERE eventactivity.incidentid = k_r.incidentid > AND eventactivity.incidentid = k_b.incidentid > AND k_r.id = 94 > AND k_b.id = 107 > -- AND k_r.incidentid = k_b.incidentid > ; > > I'm pretty sure that would give identical results, just let the > planner > have a little bit more freedom about how it does it. > Also the last line is commented out, because I think it is redundant. > Ok, I tried this one. My ssh keeps getting cut off by a router somewhere between me and the server due to inactivity timeouts, so all I know is that both the select and explain analyze are taking over an hour to run. Here's the explain select for that one, since that's the best I can get. explain select recordtext from eventactivity,k_r,k_b where eventactivity.incidentid = k_r.incidentid and eventactivity.incidentid = k_b.incidentid and k_r.id = 94 and k_b.id = 107; QUERY PLAN ------------------------------------------------------------------------ -------------------------------------- Merge Join (cost=9624.61..4679590.52 rows=151009549 width=35) Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") -> Merge Join (cost=4766.92..4547684.26 rows=16072733 width=117) Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") -> Index Scan using eventactivity1 on eventactivity (cost=0.00..4186753.16 rows=46029271 width=49) -> Sort (cost=4766.92..4771.47 rows=1821 width=68) Sort Key: (k_b.incidentid)::text -> Index Scan using k_b_idx on k_b (cost=0.00..4668.31 rows=1821 width=68) Index Cond: (id = 107) -> Sort (cost=4857.69..4862.39 rows=1879 width=68) Sort Key: (k_r.incidentid)::text -> Index Scan using k_r_idx on k_r (cost=0.00..4755.52 rows=1879 width=68) Index Cond: (id = 94) (13 rows) > You might also try: > EXPLAIN ANALYZE > SELECT recordtext > FROM eventactivity JOIN k_r USING (incidentid) > JOIN k_b USING (incidentid) > WHERE k_r.id = 94 > AND k_b.id = 107 > ; > Similar results here. The query is taking at least an hour to finish. explain select recordtext from eventactivity join k_r using ( incidentid ) join k_b using (incidentid ) where k_r.id = 94 and k_b.id = 107; QUERY PLAN ------------------------------------------------------------------------ -------------------------------------- Merge Join (cost=9542.77..4672831.12 rows=148391132 width=35) Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") -> Merge Join (cost=4726.61..4542825.87 rows=15930238 width=117) Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") -> Index Scan using eventactivity1 on eventactivity (cost=0.00..4184145.43 rows=46000104 width=49) -> Sort (cost=4726.61..4731.13 rows=1806 width=68) Sort Key: (k_b.incidentid)::text -> Index Scan using k_b_idx on k_b (cost=0.00..4628.92 rows=1806 width=68) Index Cond: (id = 107) -> Sort (cost=4816.16..4820.82 rows=1863 width=68) Sort Key: (k_r.incidentid)::text -> Index Scan using k_r_idx on k_r (cost=0.00..4714.97 rows=1863 width=68) Index Cond: (id = 94) (13 rows) > Also, if possible give us the EXPLAIN ANALYZE so that we know if the > planner is making accurate estimates. (You might send an EXPLAIN while > waiting for the EXPLAIN ANALYZE to finish) > > You can also try disabling merge joins, and see how that changes > things. > Are there any negative sideaffects of doing this? > >> >> > > Well, postgres is estimating around 500 rows each, is that way off? > Try > just doing: > EXPLAIN ANALYZE SELECT incidentid FROM k_b WHERE id = 107; > EXPLAIN ANALYZE SELECT incidentid FROM k_r WHERE id = 94; > > And see if postgres estimates the number of rows properly. > > I assume you have recently VACUUM ANALYZEd, which means you might need > to update the statistics target (ALTER TABLE k_b ALTER COLUMN > incidientid SET STATISTICS 100) default is IIRC 10, ranges from > 1-1000, > higher is more accurate, but makes ANALYZE slower. > > >> >> Right now, it looks like pg is trying to sort the entire >> eventactivity >> table for the merge join which is taking several minutes to do. >> Can I >> rephrase this so that it does the searching through k_r and k_b >> FIRST >> and then go into eventactivity using the index on incidentid? It >> seems >> like that shouldn't be too hard to make fast but my SQL query skills >> are only average. >> > > To me, it looks like it is doing an index scan (on k_b.id) through k_b > first, sorting the results by incidentid, then merge joining that with > eventactivity. > > I'm guessing you actually want it to merge k_b and k_r to get extra > selectivity before joining against eventactivity. > I think my alternate forms would let postgres realize this. But if > not, > you could try: > > EXPLAIN ANALYZE > SELECT recordtext FROM eventactivity > JOIN (SELECT incidentid FROM k_r JOIN k_b USING (incidentid) > WHERE k_r.id = 94 AND k_b.id = 107) > USING (incidentid); > This one looks like the same plan as the others: explain select recordtext from eventactivity join ( select incidentid from k_r join k_b using (incidentid) where k_r.id = 94 and k_b.id = 107 ) a using (incidentid ); QUERY PLAN ------------------------------------------------------------------------ -------------------------------------- Merge Join (cost=9793.33..4693149.15 rows=156544758 width=35) Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") -> Merge Join (cost=4847.75..4557237.59 rows=16365843 width=117) Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") -> Index Scan using eventactivity1 on eventactivity (cost=0.00..4191691.79 rows=46084161 width=49) -> Sort (cost=4847.75..4852.38 rows=1852 width=68) Sort Key: (k_b.incidentid)::text -> Index Scan using k_b_idx on k_b (cost=0.00..4747.24 rows=1852 width=68) Index Cond: (id = 107) -> Sort (cost=4945.58..4950.36 rows=1913 width=68) Sort Key: (k_r.incidentid)::text -> Index Scan using k_r_idx on k_r (cost=0.00..4841.30 rows=1913 width=68) Index Cond: (id = 94) (13 rows) > I don't know how selective your keys are, but one of these queries > should probably structure it better for the planner. It depends a > lot on > how selective your query is. eventactivity currently has around 36 million rows in it. There should only be maybe 200-300 incidentids at most that will be matched with the combination of k_b and k_r. That's why I was thinking I could somehow get a list of just the incidentids that matched the id = 94 and id = 107 in k_b and k_r first. Then, I would only need to grab a few hundred out of 36 million rows from eventactivity. > If you have 100M rows, the above query looks like it expects k_r to > restrict it to 44M rows, and k_r + k_b down to 11M rows, which really > should be a seq scan (> 10% of the rows = seq scan). But if you are > saying the selectivity is mis-estimated it could be different. > Yeah, if I understand you correctly, I think the previous paragraph shows this is a significant misestimate.
Dan Harris wrote: > > On Jul 14, 2005, at 9:42 AM, John A Meinel wrote: ... Did you try doing this to see how good the planners selectivity estimates are? >> Well, postgres is estimating around 500 rows each, is that way off? Try >> just doing: >> EXPLAIN ANALYZE SELECT incidentid FROM k_b WHERE id = 107; >> EXPLAIN ANALYZE SELECT incidentid FROM k_r WHERE id = 94; These should be fast queries. John =:->
Attachment
On Thu, Jul 14, 2005 at 04:29:58PM -0600, Dan Harris wrote: >Ok, I tried this one. My ssh keeps getting cut off by a router >somewhere between me and the server due to inactivity timeouts, so >all I know is that both the select and explain analyze are taking >over an hour to run. Try running the query as a script with nohup & redirect the output to a file. Mike Stone
Dan Harris <fbsd@drivefaster.net> writes: > Here's the explain select for that one, since > that's the best I can get. > explain select recordtext from eventactivity,k_r,k_b where > eventactivity.incidentid = k_r.incidentid and > eventactivity.incidentid = k_b.incidentid and k_r.id = 94 and k_b.id > = 107; > QUERY PLAN > ------------------------------------------------------------------------ > -------------------------------------- > Merge Join (cost=9624.61..4679590.52 rows=151009549 width=35) > Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") > -> Merge Join (cost=4766.92..4547684.26 rows=16072733 width=117) > Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") > -> Index Scan using eventactivity1 on eventactivity > (cost=0.00..4186753.16 rows=46029271 width=49) > -> Sort (cost=4766.92..4771.47 rows=1821 width=68) > Sort Key: (k_b.incidentid)::text > -> Index Scan using k_b_idx on k_b > (cost=0.00..4668.31 rows=1821 width=68) > Index Cond: (id = 107) > -> Sort (cost=4857.69..4862.39 rows=1879 width=68) > Sort Key: (k_r.incidentid)::text > -> Index Scan using k_r_idx on k_r (cost=0.00..4755.52 > rows=1879 width=68) > Index Cond: (id = 94) > (13 rows) There's something awfully fishy here. The 8.0 planner is definitely capable of figuring out that it ought to join the smaller relations first. As an example, using 8.0.3+ (CVS branch tip) I did regression=# create table eventactivity(incidentid varchar, recordtext text); CREATE TABLE regression=# create table k_r(incidentid varchar); CREATE TABLE regression=# create table k_b(incidentid varchar); CREATE TABLE regression=# explain select recordtext from eventactivity inner join (select incidentid from k_r) a using (incidentid) inner join (select incidentid from k_b) b using (incidentid); (Being too impatient to actually fill the eventactivity table with 36M rows of data, I just did some debugger magic to make the planner think that that was the table size...) The default plan looks like Merge Join (cost=16137814.70..36563453.23 rows=1361700000 width=32) Merge Cond: (("outer".incidentid)::text = "inner"."?column3?") -> Merge Join (cost=170.85..290.48 rows=7565 width=64) Merge Cond: ("outer"."?column2?" = "inner"."?column2?") -> Sort (cost=85.43..88.50 rows=1230 width=32) Sort Key: (k_r.incidentid)::text -> Seq Scan on k_r (cost=0.00..22.30 rows=1230 width=32) -> Sort (cost=85.43..88.50 rows=1230 width=32) Sort Key: (k_b.incidentid)::text -> Seq Scan on k_b (cost=0.00..22.30 rows=1230 width=32) -> Sort (cost=16137643.84..16227643.84 rows=36000000 width=64) Sort Key: (eventactivity.incidentid)::text -> Seq Scan on eventactivity (cost=0.00..1080000.00 rows=36000000 width=64) and if I "set enable_mergejoin TO 0;" I get Hash Join (cost=612.54..83761451.54 rows=1361700000 width=32) Hash Cond: (("outer".incidentid)::text = ("inner".incidentid)::text) -> Seq Scan on eventactivity (cost=0.00..1080000.00 rows=36000000 width=64) -> Hash (cost=504.62..504.62 rows=7565 width=64) -> Hash Join (cost=25.38..504.62 rows=7565 width=64) Hash Cond: (("outer".incidentid)::text = ("inner".incidentid)::text) -> Seq Scan on k_r (cost=0.00..22.30 rows=1230 width=32) -> Hash (cost=22.30..22.30 rows=1230 width=32) -> Seq Scan on k_b (cost=0.00..22.30 rows=1230 width=32) which is the plan I would judge Most Likely To Succeed based on what we know about Dan's problem. (The fact that the planner is estimating it as twice as expensive as the mergejoin comes from the fact that with no statistics about the join keys, the planner deliberately estimates hash join as expensive, because it can be pretty awful in the presence of many equal keys.) So the planner is certainly capable of finding the desired plan, even without any tweaking of the query text. This means that what we have is mainly a statistical problem. Have you ANALYZEd these tables recently? If so, may we see the pg_stats rows for incidentid in all three tables? regards, tom lane
Dan Harris wrote: > > On Jul 14, 2005, at 9:42 AM, John A Meinel wrote: > >> >> >> You might try giving it a little bit more freedom with: >> >> EXPLAIN ANALYZE >> SELECT recordtext FROM eventactivity, k_r, k_b >> WHERE eventactivity.incidentid = k_r.incidentid >> AND eventactivity.incidentid = k_b.incidentid >> AND k_r.id = 94 >> AND k_b.id = 107 >> -- AND k_r.incidentid = k_b.incidentid >> ; >> >> I'm pretty sure that would give identical results, just let the planner >> have a little bit more freedom about how it does it. >> Also the last line is commented out, because I think it is redundant. >> > > Ok, I tried this one. My ssh keeps getting cut off by a router > somewhere between me and the server due to inactivity timeouts, so all > I know is that both the select and explain analyze are taking over an > hour to run. Here's the explain select for that one, since that's the > best I can get. > > explain select recordtext from eventactivity,k_r,k_b where > eventactivity.incidentid = k_r.incidentid and eventactivity.incidentid > = k_b.incidentid and k_r.id = 94 and k_b.id = 107; > QUERY PLAN > ------------------------------------------------------------------------ > -------------------------------------- > Merge Join (cost=9624.61..4679590.52 rows=151009549 width=35) > Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") > -> Merge Join (cost=4766.92..4547684.26 rows=16072733 width=117) > Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") > -> Index Scan using eventactivity1 on eventactivity > (cost=0.00..4186753.16 rows=46029271 width=49) > -> Sort (cost=4766.92..4771.47 rows=1821 width=68) > Sort Key: (k_b.incidentid)::text > -> Index Scan using k_b_idx on k_b (cost=0.00..4668.31 > rows=1821 width=68) > Index Cond: (id = 107) > -> Sort (cost=4857.69..4862.39 rows=1879 width=68) > Sort Key: (k_r.incidentid)::text > -> Index Scan using k_r_idx on k_r (cost=0.00..4755.52 > rows=1879 width=68) > Index Cond: (id = 94) > (13 rows) > If anything, the estimations have gotten worse. As now it thinks there will be 1800 rows returned each, whereas you were thinking it would be more around 100. Since you didn't say, you did VACUUM ANALYZE recently, right? > ... >> >> You can also try disabling merge joins, and see how that changes things. >> > > Are there any negative sideaffects of doing this? If the planner is estimating things correctly, you want to give it the most flexibility of plans to pick from, because sometimes a merge join is faster (postgres doesn't pick things because it wants to go slower). The only reason for the disable flags is that sometimes the planner doesn't estimate correctly. Usually disabling a method is not the final solution, but a way to try out different methods, and see what happens to the results. Using: SET enable_mergejoin TO off; You can disable it just for the current session (not for the entire database). Which is the recommended way if you have a query that postgres is messing up on. (Usually it is correct elsewhere). >> >> Well, postgres is estimating around 500 rows each, is that way off? Try >> just doing: >> EXPLAIN ANALYZE SELECT incidentid FROM k_b WHERE id = 107; >> EXPLAIN ANALYZE SELECT incidentid FROM k_r WHERE id = 94; Once again, do this and post the results. We might just need to tweak your settings so that it estimates the number of rows correctly, and we don't need to do anything else. >> >> And see if postgres estimates the number of rows properly. >> >> I assume you have recently VACUUM ANALYZEd, which means you might need >> to update the statistics target (ALTER TABLE k_b ALTER COLUMN >> incidientid SET STATISTICS 100) default is IIRC 10, ranges from 1-1000, >> higher is more accurate, but makes ANALYZE slower. >> ... >> EXPLAIN ANALYZE >> SELECT recordtext FROM eventactivity >> JOIN (SELECT incidentid FROM k_r JOIN k_b USING (incidentid) >> WHERE k_r.id = 94 AND k_b.id = 107) >> USING (incidentid); >> > > This one looks like the same plan as the others: > > explain select recordtext from eventactivity join ( select incidentid > from k_r join k_b using (incidentid) where k_r.id = 94 and k_b.id = 107 > ) a using (incidentid ); Well, the planner is powerful enough to flatten nested selects. To make it less "intelligent" you can do: SET join_collapse_limit 1; or SET join_collapse_limit 0; Which should tell postgres to not try and get tricky with your query. Again, *usually* the planner knows better than you do. So again just do it to see what you get. The problem is that if you are only using EXPLAIN SELECT, you will probably get something which *looks* worse. Because if it looked better, the planner would have used it. That is why you really need the EXPLAIN ANALYZE, so that you can see where the planner is incorrect in it's estimates. > QUERY PLAN > ------------------------------------------------------------------------ > -------------------------------------- > Merge Join (cost=9793.33..4693149.15 rows=156544758 width=35) > Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") > -> Merge Join (cost=4847.75..4557237.59 rows=16365843 width=117) > Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") > -> Index Scan using eventactivity1 on eventactivity > (cost=0.00..4191691.79 rows=46084161 width=49) > -> Sort (cost=4847.75..4852.38 rows=1852 width=68) > Sort Key: (k_b.incidentid)::text > -> Index Scan using k_b_idx on k_b (cost=0.00..4747.24 > rows=1852 width=68) > Index Cond: (id = 107) > -> Sort (cost=4945.58..4950.36 rows=1913 width=68) > Sort Key: (k_r.incidentid)::text > -> Index Scan using k_r_idx on k_r (cost=0.00..4841.30 > rows=1913 width=68) > Index Cond: (id = 94) > (13 rows) > What I don't understand is that the planner is actually estimating that joining against the new table is going to *increase* the number of returned rows. Because the final number of rows here is 156M while the initial join shows only 16M. And it also thinks that it will only grab 46M rows from eventactivity. If you have analyzed recently can you do: SELECT relname, reltuples FROM pg_class WHERE relname='eventactivity'; It is a cheaper form than "SELECT count(*) FROM eventactivity" to get an approximate estimate of the number of rows. But if it isn't too expensive, please also give the value from SELECT count(*) FROM eventactivity. Again, that helps us know if your tables are up-to-date. > > >> I don't know how selective your keys are, but one of these queries >> should probably structure it better for the planner. It depends a lot on >> how selective your query is. > > > eventactivity currently has around 36 million rows in it. There should > only be maybe 200-300 incidentids at most that will be matched with the > combination of k_b and k_r. That's why I was thinking I could somehow > get a list of just the incidentids that matched the id = 94 and id = > 107 in k_b and k_r first. Then, I would only need to grab a few hundred > out of 36 million rows from eventactivity. > Well, you can also try: SELECT count(*) FROM k_b JOIN k_r USING (incidentid) WHERE k_b.id=?? AND k_r.id=?? ; That will tell you how many rows they have in common. >> If you have 100M rows, the above query looks like it expects k_r to >> restrict it to 44M rows, and k_r + k_b down to 11M rows, which really >> should be a seq scan (> 10% of the rows = seq scan). But if you are >> saying the selectivity is mis-estimated it could be different. >> > > Yeah, if I understand you correctly, I think the previous paragraph > shows this is a significant misestimate. > Well, if you look at the latest plans, things have gone up from 44M to 156M, I don't know why it is worse, but it is getting there. John =:->
Attachment
John A Meinel <john@arbash-meinel.com> writes: > What I don't understand is that the planner is actually estimating that > joining against the new table is going to *increase* the number of > returned rows. It evidently thinks that incidentid in the k_r table is pretty nonunique. We really need to look at the statistics data to see what's going on. regards, tom lane
Tom Lane wrote: > John A Meinel <john@arbash-meinel.com> writes: > >>What I don't understand is that the planner is actually estimating that >>joining against the new table is going to *increase* the number of >>returned rows. > > > It evidently thinks that incidentid in the k_r table is pretty > nonunique. We really need to look at the statistics data to > see what's going on. > > regards, tom lane > Okay, sure. What about doing this, then: EXPLAIN ANALYZE SELECT recordtext FROM eventactivity JOIN (SELECT DISTINCT incidentid FROM k_r JOIN k_b USING (incidentid) WHERE k_r.id = ?? AND k_b.id = ??) USING (incidentid) ; Since I assume that eventactivity is the only table with "recordtext", and that you don't get any columns from k_r and k_b, meaning it would be pointless to get duplicate incidentids. I may be misunderstanding what the query is trying to do, but depending on what is in k_r and k_b, is it possible to use a UNIQUE INDEX rather than just an index on incidentid? There is also the possibility of EXPLAIN ANALYZE SELECT recordtext FROM eventactivtity JOIN (SELECT incidentid FROM k_r WHERE k_r.id = ?? UNION SELECT incidentid FROM k_b WHERE k_b.id = ??) USING (incidentid) ; But both of these would mean that you don't actually want columns from k_r or k_b, just a unique list of incident ids. But first, I agree, we should make sure the pg_stats values are reasonable. John =:->
Attachment
On Jul 14, 2005, at 5:12 PM, John A Meinel wrote: > Dan Harris wrote: > > >>> >>> Well, postgres is estimating around 500 rows each, is that way >>> off? Try >>> just doing: >>> EXPLAIN ANALYZE SELECT incidentid FROM k_b WHERE id = 107; >>> EXPLAIN ANALYZE SELECT incidentid FROM k_r WHERE id = 94; >>> > > Once again, do this and post the results. We might just need to tweak > your settings so that it estimates the number of rows correctly, > and we > don't need to do anything else. > Ok, sorry I missed these the first time through: explain analyze select incidentid from k_b where id = 107; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------ Index Scan using k_b_idx on k_b (cost=0.00..1926.03 rows=675 width=14) (actual time=0.042..298.394 rows=2493 loops=1) Index Cond: (id = 107) Total runtime: 299.103 ms select count(*) from k_b; count -------- 698350 ( sorry! I think I said this one only had tens of thousands in it ) explain analyze select incidentid from k_r where id = 94; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------- Index Scan using k_r_idx on k_r (cost=0.00..2137.61 rows=757 width=14) (actual time=0.092..212.187 rows=10893 loops=1) Index Cond: (id = 94) Total runtime: 216.498 ms (3 rows) select count(*) from k_r; count -------- 671670 That one is quite a bit slower, yet it's the same table structure and same index as k_b, also it has fewer records. I did run VACUUM ANALYZE immediately before running these queries. It seems a lot better with the join_collapse set. > > \ > Well, the planner is powerful enough to flatten nested selects. To > make > it less "intelligent" you can do: > SET join_collapse_limit 1; > or > SET join_collapse_limit 0; > Which should tell postgres to not try and get tricky with your query. > Again, *usually* the planner knows better than you do. So again > just do > it to see what you get. > Ok, when join_collapse_limit = 1 I get this now: explain analyze select recordtext from eventactivity join ( select incidentid from k_r join k_b using (incidentid) where k_r.id = 94 and k_b.id = 107 ) a using (incidentid ); QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------------------------- Nested Loop (cost=0.00..156509.08 rows=2948 width=35) (actual time=1.555..340.625 rows=24825 loops=1) -> Nested Loop (cost=0.00..5361.89 rows=6 width=28) (actual time=1.234..142.078 rows=366 loops=1) -> Index Scan using k_b_idx on k_b (cost=0.00..1943.09 rows=681 width=14) (actual time=0.423..56.974 rows=2521 loops=1) Index Cond: (id = 107) -> Index Scan using k_r_idx on k_r (cost=0.00..5.01 rows=1 width=14) (actual time=0.031..0.031 rows=0 loops=2521) Index Cond: ((k_r.id = 94) AND ((k_r.incidentid)::text = ("outer".incidentid)::text)) -> Index Scan using eventactivity1 on eventactivity (cost=0.00..25079.55 rows=8932 width=49) (actual time=0.107..0.481 rows=68 loops=366) Index Cond: ((eventactivity.incidentid)::text = ("outer".incidentid)::text) Total runtime: 347.975 ms MUCH better! Maybe you can help me understand what I did and if I need to make something permanent to get this behavior from now on? > > > > If you have analyzed recently can you do: > SELECT relname, reltuples FROM pg_class WHERE relname='eventactivity'; > > It is a cheaper form than "SELECT count(*) FROM eventactivity" to > get an > approximate estimate of the number of rows. But if it isn't too > expensive, please also give the value from SELECT count(*) FROM > eventactivity. > > Again, that helps us know if your tables are up-to-date. > Sure: select relname, reltuples from pg_class where relname='eventactivity'; relname | reltuples ---------------+------------- eventactivity | 3.16882e+07 select count(*) from eventactivity; count ---------- 31871142 > > >> >> >> >>> I don't know how selective your keys are, but one of these queries >>> should probably structure it better for the planner. It depends >>> a lot on >>> how selective your query is. >>> >> >> >> eventactivity currently has around 36 million rows in it. There >> should >> only be maybe 200-300 incidentids at most that will be matched >> with the >> combination of k_b and k_r. That's why I was thinking I could >> somehow >> get a list of just the incidentids that matched the id = 94 and id = >> 107 in k_b and k_r first. Then, I would only need to grab a few >> hundred >> out of 36 million rows from eventactivity. >> >> > > Well, you can also try: > SELECT count(*) FROM k_b JOIN k_r USING (incidentid) > WHERE k_b.id=?? AND k_r.id=?? > ; > > That will tell you how many rows they have in common. select count(*) from k_b join k_r using (incidentid) where k_b.id=107 and k_r.id=94; count ------- 373 > > Well, if you look at the latest plans, things have gone up from 44M to > 156M, I don't know why it is worse, but it is getting there. I assume this is because r_k and r_b are growing fairly rapidly right now. The time in between queries contained a lot of inserts. I was careful to vacuum analyze before sending statistics, as I did this time. I'm sorry if this has confused the issue.
On Jul 14, 2005, at 7:15 PM, John A Meinel wrote: > > > Is the distribution of your rows uneven? Meaning do you have more rows > with a later id than an earlier one? > There are definitely some id's that will have many times more than the others. If I group and count them, the top 10 are fairly dominant in the table. >> > > Hmm.. How to do it permanantly? Well you could always issue "set > join_collapse set 1; select * from ...." > But obviously that isn't what you prefer. :) > > I think there are things you can do to make merge join more expensive > than a nested loop, but I'm not sure what they are. Maybe someone else has some ideas to encourage this behavior for future work? Setting it on a per-connection basis is doable, but would add some burden to us in code. > > What I really don't understand is that the estimates dropped as well. > The actual number of estimate rows drops to 3k instead of > 1M. > The real question is why does the planner think it will be so > expensive? > > >> select count(*) from k_b join k_r using (incidentid) where k_b.id=107 >> and k_r.id=94; >> count >> ------- >> 373 >> >> > > Well, this says that they are indeed much more selective. > Each one has > 1k rows, but together you end up with only 400. > Is this a bad thing? Is this not "selective enough" to make it much faster? Overall, I'm much happier now after seeing the new plan come about, if I can find a way to make that join_collapse behavior permanent, I can certainly live with these numbers. Thanks again for your continued efforts. -Dan
Dan Harris wrote: > > On Jul 14, 2005, at 7:15 PM, John A Meinel wrote: > >> >> >> Is the distribution of your rows uneven? Meaning do you have more rows >> with a later id than an earlier one? >> > > There are definitely some id's that will have many times more than the > others. If I group and count them, the top 10 are fairly dominant in > the table. That usually skews the estimates. Since the estimate is more of an average (unless the statistics are higher). > >>> >> >> Hmm.. How to do it permanantly? Well you could always issue "set >> join_collapse set 1; select * from ...." >> But obviously that isn't what you prefer. :) >> >> I think there are things you can do to make merge join more expensive >> than a nested loop, but I'm not sure what they are. > > > Maybe someone else has some ideas to encourage this behavior for future > work? Setting it on a per-connection basis is doable, but would add > some burden to us in code. My biggest question is why the planner things the Nested Loop would be so expensive. Have you tuned any of the parameters? It seems like something is out of whack. (cpu_tuple_cost, random_page_cost, etc...) > >> >> What I really don't understand is that the estimates dropped as well. >> The actual number of estimate rows drops to 3k instead of > 1M. >> The real question is why does the planner think it will be so expensive? >> >> >>> select count(*) from k_b join k_r using (incidentid) where k_b.id=107 >>> and k_r.id=94; >>> count >>> ------- >>> 373 >>> >>> >> >> Well, this says that they are indeed much more selective. >> Each one has > 1k rows, but together you end up with only 400. >> > > Is this a bad thing? Is this not "selective enough" to make it much > faster? Yes, being more selective is what makes it faster. But the planner doesn't seem to notice it properly. > > Overall, I'm much happier now after seeing the new plan come about, if > I can find a way to make that join_collapse behavior permanent, I can > certainly live with these numbers. > I'm sure there are pieces to tune, but I've reached my limits of parameters to tweak :) > Thanks again for your continued efforts. > > -Dan > John =:->
Attachment
On Jul 14, 2005, at 10:12 PM, John A Meinel wrote: > > My biggest question is why the planner things the Nested Loop would be > so expensive. > Have you tuned any of the parameters? It seems like something is > out of > whack. (cpu_tuple_cost, random_page_cost, etc...) > here's some of my postgresql.conf. Feel free to blast me if I did something idiotic here. shared_buffers = 50000 effective_cache_size = 1348000 random_page_cost = 3 work_mem = 512000 max_fsm_pages = 80000 log_min_duration_statement = 60000 fsync = true ( not sure if I'm daring enough to run without this ) wal_buffers = 1000 checkpoint_segments = 64 checkpoint_timeout = 3000 #---- FOR PG_AUTOVACUUM --# stats_command_string = true stats_row_level = true
On Jul 15, 2005, at 9:09 AM, Dan Harris wrote: > > On Jul 14, 2005, at 10:12 PM, John A Meinel wrote: > >> >> My biggest question is why the planner things the Nested Loop >> would be >> so expensive. >> Have you tuned any of the parameters? It seems like something is >> out of >> whack. (cpu_tuple_cost, random_page_cost, etc...) >> >> > > here's some of my postgresql.conf. Feel free to blast me if I did > something idiotic here. > > shared_buffers = 50000 > effective_cache_size = 1348000 > random_page_cost = 3 > work_mem = 512000 > max_fsm_pages = 80000 > log_min_duration_statement = 60000 > fsync = true ( not sure if I'm daring enough to run without this ) > wal_buffers = 1000 > checkpoint_segments = 64 > checkpoint_timeout = 3000 > > > #---- FOR PG_AUTOVACUUM --# > stats_command_string = true > stats_row_level = true > Sorry, I forgot to re-post my hardware specs. HP DL585 4 x 2.2 GHz Opteron 12GB RAM SmartArray RAID controller, 1GB hardware cache, 4x73GB 10k SCSI in RAID 0+1 ext2 filesystem Also, there are 30 databases on the machine, 27 of them are identical schemas.
On Thu, Jul 14, 2005 at 16:29:58 -0600, Dan Harris <fbsd@drivefaster.net> wrote: > > Ok, I tried this one. My ssh keeps getting cut off by a router > somewhere between me and the server due to inactivity timeouts, so > all I know is that both the select and explain analyze are taking > over an hour to run. Here's the explain select for that one, since > that's the best I can get. Are you using NAT at home? That's probably where the issue is. If you have control of that box you can probably increase the timeout to a couple of hours.
>> Ok, I tried this one. My ssh keeps getting cut off by a router >> somewhere between me and the server due to inactivity timeouts, so >> all I know is that both the select and explain analyze are taking >> over an hour to run. Here's the explain select for that one, since >> that's the best I can get. one word : screen ! one of the most useful little command line utilities...
Dan Harris wrote: > > On Jul 14, 2005, at 10:12 PM, John A Meinel wrote: > >> >> My biggest question is why the planner things the Nested Loop would be >> so expensive. >> Have you tuned any of the parameters? It seems like something is out of >> whack. (cpu_tuple_cost, random_page_cost, etc...) >> > > here's some of my postgresql.conf. Feel free to blast me if I did > something idiotic here. > > shared_buffers = 50000 > effective_cache_size = 1348000 > random_page_cost = 3 > work_mem = 512000 Unless you are the only person connecting to this database, your work_mem is very high. And if you haven't modified maintenance_work_mem it is probably very low. work_mem might be causing postgres to think it can fit all of a merge into ram, making it faster, I can't say for sure. > max_fsm_pages = 80000 This seems high, but it depends how many updates/deletes you get in-between vacuums. It may not be too excessive. VACUUM [FULL] VERBOSE replies with how many free pages are left, if you didn't use that already for tuning. Though it should be tuned based on a steady state situation. Not a one time test. > log_min_duration_statement = 60000 > fsync = true ( not sure if I'm daring enough to run without this ) > wal_buffers = 1000 > checkpoint_segments = 64 > checkpoint_timeout = 3000 > These seem fine to me. Can you include the output of EXPLAIN SELECT both with and without SET join_collapselimit? Since your tables have grown, I can't compare the estimated number of rows, and costs very well. EXPLAIN without ANALYZE is fine, since I am wondering what the planner is thinking things cost. John =:-> > > #---- FOR PG_AUTOVACUUM --# > stats_command_string = true > stats_row_level = true >
Attachment
On 7/15/05, Bruno Wolff III <bruno@wolff.to> wrote: > On Thu, Jul 14, 2005 at 16:29:58 -0600, > Dan Harris <fbsd@drivefaster.net> wrote: > > > > Ok, I tried this one. My ssh keeps getting cut off by a router > > somewhere between me and the server due to inactivity timeouts, so > > all I know is that both the select and explain analyze are taking > > over an hour to run. Here's the explain select for that one, since > > that's the best I can get. > > Are you using NAT at home? That's probably where the issue is. If you > have control of that box you can probably increase the timeout to a > couple of hours. Some versions of ssh have such a configuration option (in .ssh/config): Host * ServerAliveInterval 600 ...it means that ssh will send a "ping" packet to a sshd every 10 minutes of inactivity. This way NAT will see activity and won't kill the session. I'm using OpenSSH_4.1p1 for this... Oh, and it doesn't have anything to do with TCP keep alive, which is rather for finding dead connections than keeping connections alive. ;) Regards, Dawid