Thread: Nested Loop Left Join always shows rows=1

Nested Loop Left Join always shows rows=1

From

Scara Maccai

Date:

27 November 2008, 12:51:23

Query is:

"explain analyze  select sum(A), count(*) from tab1 left outer join
tab_outer on id=out_id where id=10;"

output:

  Aggregate  (cost=31.91..31.92 rows=1 width=4) (actual
time=14.185..14.185 rows=1 loops=1)
    ->  Nested Loop Left Join  (cost=0.00..17.11 rows=2959 width=4)
(actual time=8.608..13.400 rows=2953 loops=1)
          ->  Index Scan using id_idx on tab1  (cost=0.00..8.27 rows=1
width=4) (actual time=0.010..0.011 rows=1 loops=1)
                Index Cond: (id = 10)
          ->  Index Scan using out_id_idx on tab_outer  (cost=0.00..8.83
rows=1 width=8) (actual time=8.590..11.924 rows=2953 loops=1)
                Index Cond: ((tab_outer.out_id = 10) AND (tab1.id =
tab_outer.out_id))



http://explain-analyze.info/query_plans/2519-query-plan-1240

I ran ANALYZE on both tables, but the plan shows rows=1 in the inner
table results.

If I change the "left outer join" into an "inner join" the bitmap index
scan of the inner table of the nested loop shows a correct number of
rows (not the exact same thing as "actual rows", but very close).

Am I reading the explain output in the wrong way?
I'm not complaining about performance issues, I'm just trying to learn
how to read plans.

Re: Nested Loop Left Join always shows rows=1

From

Tom Lane

Date:

27 November 2008, 17:49:28

Scara Maccai <m_lists@yahoo.it> writes:
>   Aggregate  (cost=31.91..31.92 rows=1 width=4) (actual
> time=14.185..14.185 rows=1 loops=1)
>     ->  Nested Loop Left Join  (cost=0.00..17.11 rows=2959 width=4)
> (actual time=8.608..13.400 rows=2953 loops=1)
>           ->  Index Scan using id_idx on tab1  (cost=0.00..8.27 rows=1
> width=4) (actual time=0.010..0.011 rows=1 loops=1)
>                 Index Cond: (id = 10)
>           ->  Index Scan using out_id_idx on tab_outer  (cost=0.00..8.83
> rows=1 width=8) (actual time=8.590..11.924 rows=2953 loops=1)
>                 Index Cond: ((tab_outer.out_id = 10) AND (tab1.id =
> tab_outer.out_id))

This seems a bit broken :-( ... ideally it shouldn't be generating the
redundant index condition, either.  I think the redundant condition is
probably indirectly responsible for the low rowcount estimate, too.
It doesn't really hurt anything, since the join output estimate is
correct, but it'd be nice to make it look better.

            regards, tom lane

Re: Nested Loop Left Join always shows rows=1

From

Scara Maccai

Date:

28 November 2008, 01:18:53

Tom Lane wrote:
> Scara Maccai <m_lists@yahoo.it> writes:
>>           ->  Index Scan using id_idx on tab1  (cost=0.00..8.27 rows=1
>> width=4) (actual time=0.010..0.011 rows=1 loops=1)
>>                 Index Cond: (id = 10)
>>           ->  Index Scan using out_id_idx on tab_outer  (cost=0.00..8.83
>> rows=1 width=8) (actual time=8.590..11.924 rows=2953 loops=1)
>>                 Index Cond: ((tab_outer.out_id = 10) AND (tab1.id =
>> tab_outer.out_id))
>
> This seems a bit broken :-( ... ideally it shouldn't be generating the
> redundant index condition, either.

Why is the index condition redundant? I guess the redundant part is

"AND (tab1.id = tab_outer.out_id)" but I'm not sure, and I would really appreciate if you could explain a little bit
more...

The reason I'm asking is because I'm looking at the implementation of some VERY basic form of progress indicator based
on"estimated rows vs rows already processed". 
I guess this would work for our DB because the planner is usually very good at guessing the number of rows for each
stepof the plan, since our distributions are pretty simple. 

But this left outer join thing I'm afraid would invalidate the method, since the inner table row count is always off...

Thank you for the reply.

(I added hackers too, since this seems a place where some
fixing/development would be necessary. I hope it's ok.)

Re: Nested Loop Left Join always shows rows=1

From

Tom Lane

Date:

01 December 2008, 17:09:44

Scara Maccai <m_lists@yahoo.it> writes:
> I ran ANALYZE on both tables, but the plan shows rows=1 in the inner
> table results.

> If I change the "left outer join" into an "inner join" the bitmap index
> scan of the inner table of the nested loop shows a correct number of
> rows (not the exact same thing as "actual rows", but very close).

Yeah, this is a bug: it's effectively double-counting the selectivity of
the index clause.  The error isn't enormously critical, since the join
size estimate is right; but it could perhaps lead to choosing a plain
indexscan when a bitmap scan would be better.  I've applied a patch.

            regards, tom lane

Re: Nested Loop Left Join always shows rows=1

From

Scara Maccai

Date:

02 December 2008, 15:49:40

> Yeah, this is a bug: it's effectively double-counting the
> selectivity of the index clause. The error isn't enormously critical,
> since the join size estimate is right; but it could perhaps lead to
> choosing a plain indexscan when a bitmap scan would be better. I've
> applied a patch.

Thank you. I'll try that.

I was able to change the PgBackendStatus struct to hold a "percentage of completion" field, which shows up when calling
thepg_stat_get_activity function. 

As I said in a previous mail, the progress indicator gives very good estimates for the simple queries I need. If I can
comeup with something that is good in general I'll post it. I'm reading some papers about the argument (mostly the ones
listedin http://wiki.postgresql.org/wiki/Query_progress_indication).