Thread: Wrong plan or what ?
Hi all, I'm running Postgres7.3.3 and I'm performing this simple select: select * from user_logs ul, user_data ud, class_default cd where ul.id_user = ud.id_user and ud.id_class = cd.id_class and cd.id_provider = 39; these are the number of rows for each table: user_logs: 1258955 class_default: 31 ( only one with id_provider = 39 ) user_data: 10274; this is the explain analyze for that query: QUERY PLAN Hash Join (cost=265.64..32000.76 rows=40612 width=263) (actual time=11074.21..11134.28 rows=10 loops=1) Hash Cond: ("outer".id_user = "inner".id_user) -> Seq Scan on user_logs ul (cost=0.00..24932.65 rows=1258965 width=48) (actual time=0.02..8530.21 rows=1258966 loops=1) -> Hash (cost=264.81..264.81 rows=331 width=215) (actual time=30.22..30.22 rows=0 loops=1) -> Nested Loop (cost=0.00..264.81 rows=331 width=215) (actual time=29.95..30.20 rows=6 loops=1) -> Seq Scan on class_default cd (cost=0.00..1.39 rows=1 width=55) (actual time=0.08..0.10 rows=1 loops=1) Filter: (id_provider = 39) -> Index Scan using idx_user_data_class on user_data ud (cost=0.00..258.49 rows=395 width=160) (actual time=29.82..29.96 rows=6 loops=1) Index Cond: (ud.id_class = "outer".id_class) Total runtime: 11135.65 msec (10 rows) I'm able to performe that select with these 3 steps: SELECT id_class from class_default where id_provider = 39; id_class ---------- 48 (1 row) SELECT id_user from user_data where id_class in ( 48 ); id_user --------- 10943 10942 10934 10927 10910 10909 (6 rows) SELECT * from user_logs where id_user in ( 10943, 10942, 10934, 10927, 10910, 10909 ); [SNIPPED] and the time ammount is a couple of milliseconds. Why the planner or the executor ( I don't know ) do not follow the same strategy ? Thank you Gaetano Mendola
Gaetano, > SELECT * from user_logs where id_user in ( > 10943, 10942, 10934, 10927, 10910, 10909 > ); > [SNIPPED] > Why the planner or the executor ( I don't know ) do not follow > the same strategy ? It is, actually, according to the query plan. Can you post the EXPLAIN ANALYZE for the above query? -- -Josh Berkus Aglio Database Solutions San Francisco
"Josh Berkus" <josh@agliodbs.com> > Gaetano, > > > SELECT * from user_logs where id_user in ( > > 10943, 10942, 10934, 10927, 10910, 10909 > > ); > > [SNIPPED] > > > Why the planner or the executor ( I don't know ) do not follow > > the same strategy ? > > It is, actually, according to the query plan. > > Can you post the EXPLAIN ANALYZE for the above query? Index Scan using idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs on user_logs (cost=0.00..5454.21 rows=2498 width=48) (actual time=0.09..0.28 rows=10 loops=1) Index Cond: ((id_user = 10943) OR (id_user = 10942) OR (id_user = 10934) OR (id_user = 10927) OR (id_user = 10910) OR (id_user = 10909)) Total runtime: 0.41 msec (3 rows) Thank you Gaetano PS: if I execute the query I obtain 10 rows instead of 3 that say the explain analyze.
Forget my PS to last message.
Gaetano, > QUERY PLAN > Hash Join (cost=265.64..32000.76 rows=40612 width=263) (actual > time=11074.21..11134.28 rows=10 loops=1) > Hash Cond: ("outer".id_user = "inner".id_user) > -> Seq Scan on user_logs ul (cost=0.00..24932.65 rows=1258965 width=48) > (actual time=0.02..8530.21 rows=1258966 loops=1) OK, here's your problem The planner thinks that you're going to get 40162 rows out of the final join, not 10. If the row estimate was correct, then the Seq Scan would be a reasonable plan. But it's not. Here's some steps you can take to clear things up for the planner: 1) Make sure you've VACUUM ANALYZED 2) Adjust the following postgresql.conf statistics: a) effective_cache_size: increase to 70% of available (not used by other processes) RAM. b) random_page_cost: decrease, maybe to 2. c) default_statistics_target: try increasing to 100 (warning: this will significantly increase the time required to do ANALYZE) Then test again! -- -Josh Berkus Aglio Database Solutions San Francisco
"Josh Berkus" <josh@agliodbs.com> > Gaetano, > > > QUERY PLAN > > Hash Join (cost=265.64..32000.76 rows=40612 width=263) (actual > > time=11074.21..11134.28 rows=10 loops=1) > > Hash Cond: ("outer".id_user = "inner".id_user) > > -> Seq Scan on user_logs ul (cost=0.00..24932.65 rows=1258965 width=48) > > (actual time=0.02..8530.21 rows=1258966 loops=1) > > OK, here's your problem > > The planner thinks that you're going to get 40162 rows out of the final join, > not 10. If the row estimate was correct, then the Seq Scan would be a > reasonable plan. But it's not. Here's some steps you can take to clear > things up for the planner: > > 1) Make sure you've VACUUM ANALYZED > 2) Adjust the following postgresql.conf statistics: > a) effective_cache_size: increase to 70% of available (not used by other > processes) RAM. > b) random_page_cost: decrease, maybe to 2. > c) default_statistics_target: try increasing to 100 > (warning: this will significantly increase the time required to do ANALYZE) > > Then test again! No improvement at all, I pushed default_statistics_target to 1000 but the rows expected are still 40612 :-( Of course I restarted the postmaster and I vacuumed analyze the DB Thank you Gaetano