Thread: Nested Loop Left Join always shows rows=1
Query is: "explain analyze select sum(A), count(*) from tab1 left outer join tab_outer on id=out_id where id=10;" output: Aggregate (cost=31.91..31.92 rows=1 width=4) (actual time=14.185..14.185 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..17.11 rows=2959 width=4) (actual time=8.608..13.400 rows=2953 loops=1) -> Index Scan using id_idx on tab1 (cost=0.00..8.27 rows=1 width=4) (actual time=0.010..0.011 rows=1 loops=1) Index Cond: (id = 10) -> Index Scan using out_id_idx on tab_outer (cost=0.00..8.83 rows=1 width=8) (actual time=8.590..11.924 rows=2953 loops=1) Index Cond: ((tab_outer.out_id = 10) AND (tab1.id = tab_outer.out_id)) http://explain-analyze.info/query_plans/2519-query-plan-1240 I ran ANALYZE on both tables, but the plan shows rows=1 in the inner table results. If I change the "left outer join" into an "inner join" the bitmap index scan of the inner table of the nested loop shows a correct number of rows (not the exact same thing as "actual rows", but very close). Am I reading the explain output in the wrong way? I'm not complaining about performance issues, I'm just trying to learn how to read plans.
Scara Maccai <m_lists@yahoo.it> writes: > Aggregate (cost=31.91..31.92 rows=1 width=4) (actual > time=14.185..14.185 rows=1 loops=1) > -> Nested Loop Left Join (cost=0.00..17.11 rows=2959 width=4) > (actual time=8.608..13.400 rows=2953 loops=1) > -> Index Scan using id_idx on tab1 (cost=0.00..8.27 rows=1 > width=4) (actual time=0.010..0.011 rows=1 loops=1) > Index Cond: (id = 10) > -> Index Scan using out_id_idx on tab_outer (cost=0.00..8.83 > rows=1 width=8) (actual time=8.590..11.924 rows=2953 loops=1) > Index Cond: ((tab_outer.out_id = 10) AND (tab1.id = > tab_outer.out_id)) This seems a bit broken :-( ... ideally it shouldn't be generating the redundant index condition, either. I think the redundant condition is probably indirectly responsible for the low rowcount estimate, too. It doesn't really hurt anything, since the join output estimate is correct, but it'd be nice to make it look better. regards, tom lane
Tom Lane wrote: > Scara Maccai <m_lists@yahoo.it> writes: >> -> Index Scan using id_idx on tab1 (cost=0.00..8.27 rows=1 >> width=4) (actual time=0.010..0.011 rows=1 loops=1) >> Index Cond: (id = 10) >> -> Index Scan using out_id_idx on tab_outer (cost=0.00..8.83 >> rows=1 width=8) (actual time=8.590..11.924 rows=2953 loops=1) >> Index Cond: ((tab_outer.out_id = 10) AND (tab1.id = >> tab_outer.out_id)) > > This seems a bit broken :-( ... ideally it shouldn't be generating the > redundant index condition, either. Why is the index condition redundant? I guess the redundant part is "AND (tab1.id = tab_outer.out_id)" but I'm not sure, and I would really appreciate if you could explain a little bit more... The reason I'm asking is because I'm looking at the implementation of some VERY basic form of progress indicator based on"estimated rows vs rows already processed". I guess this would work for our DB because the planner is usually very good at guessing the number of rows for each stepof the plan, since our distributions are pretty simple. But this left outer join thing I'm afraid would invalidate the method, since the inner table row count is always off... Thank you for the reply. (I added hackers too, since this seems a place where some fixing/development would be necessary. I hope it's ok.)
Scara Maccai <m_lists@yahoo.it> writes: > I ran ANALYZE on both tables, but the plan shows rows=1 in the inner > table results. > If I change the "left outer join" into an "inner join" the bitmap index > scan of the inner table of the nested loop shows a correct number of > rows (not the exact same thing as "actual rows", but very close). Yeah, this is a bug: it's effectively double-counting the selectivity of the index clause. The error isn't enormously critical, since the join size estimate is right; but it could perhaps lead to choosing a plain indexscan when a bitmap scan would be better. I've applied a patch. regards, tom lane
> Yeah, this is a bug: it's effectively double-counting the > selectivity of the index clause. The error isn't enormously critical, > since the join size estimate is right; but it could perhaps lead to > choosing a plain indexscan when a bitmap scan would be better. I've > applied a patch. Thank you. I'll try that. I was able to change the PgBackendStatus struct to hold a "percentage of completion" field, which shows up when calling thepg_stat_get_activity function. As I said in a previous mail, the progress indicator gives very good estimates for the simple queries I need. If I can comeup with something that is good in general I'll post it. I'm reading some papers about the argument (mostly the ones listedin http://wiki.postgresql.org/wiki/Query_progress_indication).