Thread: Wrong plan or what ?
Hi all, I'm running Postgres7.3.3 and I'm performing this simple select: select * from user_logs ul, user_data ud, class_default cd where ul.id_user = ud.id_user and ud.id_class = cd.id_class and cd.id_provider = 39; these are the number of rows for each table: user_logs: 1258955 class_default: 31 ( only one with id_provider = 39 ) user_data: 10274; this is the explain analyze for that query: QUERY PLAN Hash Join (cost=265.64..32000.76 rows=40612 width=263) (actual time=11074.21..11134.28 rows=10 loops=1) Hash Cond: ("outer".id_user = "inner".id_user) -> Seq Scan on user_logs ul (cost=0.00..24932.65 rows=1258965 width=48) (actual time=0.02..8530.21 rows=1258966 loops=1) -> Hash (cost=264.81..264.81 rows=331 width=215) (actual time=30.22..30.22 rows=0 loops=1) -> Nested Loop (cost=0.00..264.81 rows=331 width=215) (actual time=29.95..30.20 rows=6 loops=1) -> Seq Scan on class_default cd (cost=0.00..1.39 rows=1 width=55) (actual time=0.08..0.10 rows=1 loops=1) Filter: (id_provider = 39) -> Index Scan using idx_user_data_class on user_data ud (cost=0.00..258.49 rows=395 width=160) (actual time=29.82..29.96 rows=6 loops=1) Index Cond: (ud.id_class = "outer".id_class) Total runtime: 11135.65 msec (10 rows) I'm able to performe that select with these 3 steps: SELECT id_class from class_default where id_provider = 39; id_class ---------- 48 (1 row) SELECT id_user from user_data where id_class in ( 48 ); id_user --------- 10943 10942 10934 10927 10910 10909 (6 rows) SELECT * from user_logs where id_user in ( 10943, 10942, 10934, 10927, 10910, 10909 ); [SNIPPED] and the time ammount is a couple of milliseconds. Why the planner or the executor ( I don't know ) do not follow the same strategy ? Thank you Gaetano Mendola
"Josh Berkus" <josh@agliodbs.com> > Gaetano, > > > SELECT * from user_logs where id_user in ( > > 10943, 10942, 10934, 10927, 10910, 10909 > > ); > > [SNIPPED] > > > Why the planner or the executor ( I don't know ) do not follow > > the same strategy ? > > It is, actually, according to the query plan. > > Can you post the EXPLAIN ANALYZE for the above query? Index Scan using idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs on user_logs (cost=0.00..5454.21 rows=2498 width=48) (actual time=0.09..0.28 rows=10 loops=1) Index Cond: ((id_user = 10943) OR (id_user = 10942) OR (id_user = 10934) OR (id_user = 10927) OR (id_user = 10910) OR (id_user = 10909)) Total runtime: 0.41 msec (3 rows) Thank you Gaetano PS: if I execute the query I obtain 10 rows instead of 3 that say the explain analyze.
Forget my PS to last message.
"Josh Berkus" <josh@agliodbs.com> > Gaetano, > > > QUERY PLAN > > Hash Join (cost=265.64..32000.76 rows=40612 width=263) (actual > > time=11074.21..11134.28 rows=10 loops=1) > > Hash Cond: ("outer".id_user = "inner".id_user) > > -> Seq Scan on user_logs ul (cost=0.00..24932.65 rows=1258965 width=48) > > (actual time=0.02..8530.21 rows=1258966 loops=1) > > OK, here's your problem > > The planner thinks that you're going to get 40162 rows out of the final join, > not 10. If the row estimate was correct, then the Seq Scan would be a > reasonable plan. But it's not. Here's some steps you can take to clear > things up for the planner: > > 1) Make sure you've VACUUM ANALYZED > 2) Adjust the following postgresql.conf statistics: > a) effective_cache_size: increase to 70% of available (not used by other > processes) RAM. > b) random_page_cost: decrease, maybe to 2. > c) default_statistics_target: try increasing to 100 > (warning: this will significantly increase the time required to do ANALYZE) > > Then test again! No improvement at all, I pushed default_statistics_target to 1000 but the rows expected are still 40612 :-( Of course I restarted the postmaster and I vacuumed analyze the DB Thank you Gaetano
Gaetano, > SELECT * from user_logs where id_user in ( > 10943, 10942, 10934, 10927, 10910, 10909 > ); > [SNIPPED] > Why the planner or the executor ( I don't know ) do not follow > the same strategy ? It is, actually, according to the query plan. Can you post the EXPLAIN ANALYZE for the above query? -- -Josh Berkus Aglio Database Solutions San Francisco
In response to "Mendola Gaetano": > I'm running Postgres7.3.3 and I'm performing this simple select: Looking at your fast three step plan > SELECT id_class from class_default where id_provider = 39; > SELECT id_user from user_data where id_class in ( 48 ); > SELECT * from user_logs where id_user in ( > 10943, 10942, 10934, 10927, 10910, 10909 ); I'ld stem for reordering the from and where clauses alike: select * from class_default cd, user_data ud, user_logs ul where cd.id_provider = 39 and ud.id_class = cd.id_class and ul.id_user = ud.id_user; Personally I dislike implied joins and rather go for _about_ this: select * from ( class_default cd LEFT JOIN user_data ud ON ud.id_class = cd.id_class ) LEFT JOIN user_logs ul ON ul.id_user = ud.id_user, where cd.id_provider = 39; Good luck, HansH
""HansH"" <hartenhans@op.het.net> > In response to "Mendola Gaetano": > > I'm running Postgres7.3.3 and I'm performing this simple select: > > Looking at your fast three step plan > > SELECT id_class from class_default where id_provider = 39; > > SELECT id_user from user_data where id_class in ( 48 ); > > SELECT * from user_logs where id_user in ( > > 10943, 10942, 10934, 10927, 10910, 10909 ); > I'ld stem for reordering the from and where clauses alike: > select * > from > class_default cd, > user_data ud, > user_logs ul > where > cd.id_provider = 39 and > ud.id_class = cd.id_class and > ul.id_user = ud.id_user; still wrong: Hash Join (cost=267.10..32994.34 rows=41881 width=264) (actual time=6620.17..6847.20 rows=94 loops=1) Hash Cond: ("outer".id_user = "inner".id_user) -> Seq Scan on user_logs ul (cost=0.00..25712.15 rows=1298315 width=48) (actual time=0.01..5381.69 rows=1298351 loops=1) -> Hash (cost=266.25..266.25 rows=339 width=216) (actual time=0.89..0.89 rows=0 loops=1) -> Nested Loop (cost=0.00..266.25 rows=339 width=216) (actual time=0.16..0.83 rows=21 loops=1) -> Seq Scan on class_default cd (cost=0.00..1.39 rows=1 width=55) (actual time=0.08..0.09 rows=1 loops=1) Filter: (id_provider = 39) -> Index Scan using idx_user_data_class on user_data ud (cost=0.00..260.00 rows=389 width=161) (actual time=0.06..0.40 rows=21 loops=1) Index Cond: (ud.id_class = "outer".id_class) Total runtime: 6847.60 msec (10 rows) the returned are 94. > Personally I dislike implied joins and rather go for _about_ this: > select * > from > ( class_default cd > LEFT JOIN user_data ud ON ud.id_class = cd.id_class ) > LEFT JOIN user_logs ul ON ul.id_user = ud.id_user, > where > cd.id_provider = 39; worst: Merge Join (cost=280.48..55717.14 rows=41881 width=264) (actual time=18113.64..18182.94 rows=105 loops=1) Merge Cond: ("outer".id_user = "inner".id_user) -> Index Scan using idx_user_user_logs on user_logs ul (cost=0.00..51665.66 rows=1298315 width=48) (actual time=10.78..15459.37 rows=1298354 loops=1) -> Sort (cost=280.48..281.33 rows=339 width=216) (actual time=1.11..1.20 rows=105 loops=1) Sort Key: ud.id_user -> Nested Loop (cost=0.00..266.25 rows=339 width=216) (actual time=0.14..0.82 rows=21 loops=1) -> Seq Scan on class_default cd (cost=0.00..1.39 rows=1 width=55) (actual time=0.07..0.07 rows=1 loops=1) Filter: (id_provider = 39) -> Index Scan using idx_user_data_class on user_data ud (cost=0.00..260.00 rows=389 width=161) (actual time=0.05..0.39 rows=21 loops=1) Index Cond: (ud.id_class = "outer".id_class) Total runtime: 18185.61 msec :-( thank you anyway. Gaetano