Re: selectivity function - Mailing list pgsql-hackers

From Greg Hennessy
Subject Re: selectivity function
Date
Msg-id e23e9771-463f-c189-8502-f2c3c748f8fb@gmail.com
Whole thread Raw
In response to Re: selectivity function  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, May 26, 2022 at 3:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Can you do anything useful with attaching selectivity estimates
> to the functions it references, instead?
I may have been doing down a bad path before. The function I'm
working to improve has five argument, the last being "degrees", which
is the match radius. Obviously a larger match radius  should cause more
matches.

For a small value of a match radius (0.005 degrees):

q3c_test=# explain (analyze, buffers) select * from test as a, test1 as 
b where q3c_join(a.ra,a.dec,b.ra,b.dec,.005);
QUERY PLAN
Nested Loop  (cost=92.28..22787968818.00 rows=5 width=32) (actual 
time=7.799..10758.566 rows=31 loops=1)
   Buffers: shared hit=8005684
   ->  Seq Scan on test a  (cost=0.00..15406.00 rows=1000000 width=16) 
(actual time=0.008..215.570 rows=1000000 loops=1)
         Buffers: shared hit=5406
   ->  Bitmap Heap Scan on test1 b  (cost=92.28..22785.45 rows=250 
width=16) (actual time=0.009..0.009 rows=0 loops=1000000)

(note: I deleted some of the output, since I think I'm keeping the 
important bits)

So, the cost of the query is calculated as 2e10, where it expect five rows,
found 31, and a hot cache of reading 8 million units of disk space, I'd have
to check the fine manual to remind myself of the units of that.

When I do the same sort of query on a much larger match radius (5 deg) I 
get:
q3c_test=# explain (analyze, buffers) select * from test as a, test1 as 
b where q3c_join(a.ra,a.dec,b.ra,b.dec,5);
QUERY PLAN
Nested Loop  (cost=92.28..22787968818.00 rows=4766288 width=32) (actual 
time=0.086..254995.691 rows=38051626 loops=1)
   Buffers: shared hit=104977026
   ->  Seq Scan on test a  (cost=0.00..15406.00 rows=1000000 width=16) 
(actual time=0.008..261.425 rows=1000000 loops=1)
         Buffers: shared hit=5406
   ->  Bitmap Heap Scan on test1 b  (cost=92.28..22785.45 rows=250 
width=16) (actual time=0.053..0.247 rows=38 loops=1000000)

The "total cost" is the same identical 2e10, this time the number of 
rows expectd
is 4.7 million, the number of rows delivered is 38 million (so the 
calculation is off
by a factor of 8, I'm not sure that is important), but the io is now 104 
million units.
So while we are doing a lot more IO, and dealing with a lot more rows, the
calculated cost is identical. That seems strange me me. Is that a normal 
thing?
Is it possible that the cost calculation isn't including the selectivity 
calculation?

Greg





pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: suboverflowed subtransactions concurrency performance optimize
Next
From: Masahiko Sawada
Date:
Subject: Re: Support logical replication of DDLs