strange nested loop row count estimates - Mailing list pgsql-general

From Sergey Koposov
Subject strange nested loop row count estimates
Date
Msg-id e5cc7ba42336483fa7d072d92718573fc250bcb1.camel@cmu.edu
Whole thread Raw
Responses Re: strange nested loop row count estimates
List pgsql-general
Hi, 

I'm currently trying to understand the expected row counts for a query involving a nested loop join and bitmap index
scan
 
on the functional index and a custom operator. And the numbers that I see don't make sense to me currently. Hopefully 
somebody here can shed some light on it, or confirm this is some kind of issue.  

Here is the query and explain analyze

explain analyze select * from twomass.psc as t , gaia_dr2.gaia_source as g where 
    (
        (q3c_ang2ipix(g.ra,g.dec) between q3c_nearby_it(t.ra, t.decl, 0.0003, 0)  and  
                        q3c_nearby_it(t.ra, t.decl, 0.0003, 1))  
    or  
        (q3c_ang2ipix(g.ra,g.dec) between q3c_nearby_it(t.ra, t.decl, 0.0003, 1)  and  
                        q3c_nearby_it(t.ra, t.decl, 0.0003, 3))
    ) 
    and
    0.0003 ==<<>>== (g.ra,g.dec,t.ra,t.decl)::q3c_type limit 10;

https://explain.depesz.com/s/vcNd

What I can't understand at all is how the estimate of 3E15 rows!!!! is obtained by the nested loop 
given that the bitmap heap scan is expected to return *one* single row for each row of the 'left' table. 
So in my mind the estimate of the total number of rows
should be ~ 1e9 rows after the nested loop. Because of this crazy overestimate, I actually have to force the nested
loop
 
in this query by disabling seqscan. 
(if I don't disable the seqscan -- this is the plan I get which ignores the indices:
https://explain.depesz.com/s/EIiG

Some more details about the query: 
q3c_ang2ipix(ra,dec) is the function mapping (double,double) -> bigint and the tables have a functional index on that.
Like this: 
                   Table "gaia_dr2.gaia_source"
              Column              |       Type        | Modifiers 
----------------------------------+-------------------+-----------
 ra                               | double precision  | 
 dec                              | double precision  | 
.......
Indexes:
    "gaia_source2_q3c_ang2ipix_idx" btree (q3c_ang2ipix(ra, "dec"))

q3c_nearby_() function just returns bigint.

The ==<<>== is the custom operator with custom low selectivity (1e-12 in this case)

The tables in the join in question have 450 mill  and 1.5 billion rows. 

I hope somebody can help me understand what's going on. 

Thank you in advance. 

  Sergey


PS the kind of query that I show  comes from the q3c module ( https://github.com/segasai/q3c ) 
that is used for spatial queries of large astronomical catalogues. 


pgsql-general by date:

Previous
From: Igal Sapir
Date:
Subject: Starting Postgres when there is no disk space
Next
From: David Rowley
Date:
Subject: Re: Starting Postgres when there is no disk space