Thread: strange nested loop row count estimates

strange nested loop row count estimates

From
Sergey Koposov
Date:
Hi, 

I'm currently trying to understand the expected row counts for a query involving a nested loop join and bitmap index
scan
 
on the functional index and a custom operator. And the numbers that I see don't make sense to me currently. Hopefully 
somebody here can shed some light on it, or confirm this is some kind of issue.  

Here is the query and explain analyze

explain analyze select * from twomass.psc as t , gaia_dr2.gaia_source as g where 
    (
        (q3c_ang2ipix(g.ra,g.dec) between q3c_nearby_it(t.ra, t.decl, 0.0003, 0)  and  
                        q3c_nearby_it(t.ra, t.decl, 0.0003, 1))  
    or  
        (q3c_ang2ipix(g.ra,g.dec) between q3c_nearby_it(t.ra, t.decl, 0.0003, 1)  and  
                        q3c_nearby_it(t.ra, t.decl, 0.0003, 3))
    ) 
    and
    0.0003 ==<<>>== (g.ra,g.dec,t.ra,t.decl)::q3c_type limit 10;

https://explain.depesz.com/s/vcNd

What I can't understand at all is how the estimate of 3E15 rows!!!! is obtained by the nested loop 
given that the bitmap heap scan is expected to return *one* single row for each row of the 'left' table. 
So in my mind the estimate of the total number of rows
should be ~ 1e9 rows after the nested loop. Because of this crazy overestimate, I actually have to force the nested
loop
 
in this query by disabling seqscan. 
(if I don't disable the seqscan -- this is the plan I get which ignores the indices:
https://explain.depesz.com/s/EIiG

Some more details about the query: 
q3c_ang2ipix(ra,dec) is the function mapping (double,double) -> bigint and the tables have a functional index on that.
Like this: 
                   Table "gaia_dr2.gaia_source"
              Column              |       Type        | Modifiers 
----------------------------------+-------------------+-----------
 ra                               | double precision  | 
 dec                              | double precision  | 
.......
Indexes:
    "gaia_source2_q3c_ang2ipix_idx" btree (q3c_ang2ipix(ra, "dec"))

q3c_nearby_() function just returns bigint.

The ==<<>== is the custom operator with custom low selectivity (1e-12 in this case)

The tables in the join in question have 450 mill  and 1.5 billion rows. 

I hope somebody can help me understand what's going on. 

Thank you in advance. 

  Sergey


PS the kind of query that I show  comes from the q3c module ( https://github.com/segasai/q3c ) 
that is used for spatial queries of large astronomical catalogues. 


Re: strange nested loop row count estimates

From
Tom Lane
Date:
Sergey Koposov <skoposov@cmu.edu> writes:
> I'm currently trying to understand the expected row counts for a query involving a nested loop join and bitmap index
scan 
> on the functional index and a custom operator. And the numbers that I see don't make sense to me currently.

What sort of selectivity estimator have you got attached to that custom
operator?

            regards, tom lane



Re: strange nested loop row count estimates

From
Sergey Koposov
Date:
On Thu, 2019-05-02 at 00:36 -0400, Tom Lane wrote:
> Sergey Koposov <skoposov@cmu.edu> writes:
> > 
> > I'm currently trying to understand the expected row counts for a query involving a nested loop join and bitmap
indexscan 
 
> > on the functional index and a custom operator. And the numbers that I see don't make sense to me currently.
> What sort of selectivity estimator have you got attached to that custom
> operator?

This is the code, but basically it is just a constant based on the search radius (which is the leftmost float argument
ofthe operator)
 
https://github.com/segasai/q3c/blob/361140d4f1f36bf16c9c53721d1c4f03cb4de930/q3c.c#L89
For the query in question it should be ~ 1e-12

The whole idea of the operator was to specifically inform PG that this query returns a small number of rows. 

(the underlying idea of the query is that it does positional crossmatch between datasets on the sphere with a certain
smallradius). And 
 
obviously the selectivity of this is is extremely tiny). 


Re: strange nested loop row count estimates

From
Tom Lane
Date:
Sergey Koposov <skoposov@cmu.edu> writes:
> On Thu, 2019-05-02 at 00:36 -0400, Tom Lane wrote:
>> What sort of selectivity estimator have you got attached to that custom
>> operator?

> This is the code, but basically it is just a constant based on the search radius (which is the leftmost float
argumentof the operator) 
> https://github.com/segasai/q3c/blob/361140d4f1f36bf16c9c53721d1c4f03cb4de930/q3c.c#L89

Hm, that query should be paying attention to join selectivity, and
you don't have a join selectivity function.

I think that it applies the restriction selectivity while
estimating the size of the bitmap scan's output.  But that's not
what's going to determine the estimated size of the join output.

Too tired to look at this really closely, but I think basically
the inconsistency boils down to the lack of consistency between
your restriction estimator (1e-12) and your join estimator
(which, since you haven't got one, is going to default to
something way larger, possibly 0.5).

            regards, tom lane



Re: strange nested loop row count estimates

From
Sergey Koposov
Date:
On Thu, 2019-05-02 at 01:05 -0400, Tom Lane wrote:
> Sergey Koposov <skoposov@cmu.edu> writes:
> > 
> > On Thu, 2019-05-02 at 00:36 -0400, Tom Lane wrote:
> > > 
> > > What sort of selectivity estimator have you got attached to that custom
> > > operator?
> > 
> > This is the code, but basically it is just a constant based on the search radius (which is the leftmost float
argumentof the operator)
 
> > https://github.com/segasai/q3c/blob/361140d4f1f36bf16c9c53721d1c4f03cb4de930/q3c.c#L89
> Hm, that query should be paying attention to join selectivity, and
> you don't have a join selectivity function.
> 
> I think that it applies the restriction selectivity while
> estimating the size of the bitmap scan's output.  But that's not
> what's going to determine the estimated size of the join output.
> 
> Too tired to look at this really closely, but I think basically
> the inconsistency boils down to the lack of consistency between
> your restriction estimator (1e-12) and your join estimator
> (which, since you haven't got one, is going to default to
> something way larger, possibly 0.5).

Thanks very much checking, Tom!  
Adding the join selectivity estimator fixed the problem. 
I think I initially tried it, but it wasn't clear whether it was called at all or not.
Plus I was confused by the fact that the bitmap scan prediction showed 1 row, so it looked like the selectivity
worked. 

        Sergey