Re: join selectivity - Mailing list pgsql-hackers
From | strk@refractions.net |
---|---|
Subject | Re: join selectivity |
Date | |
Msg-id | 20041223093351.GC96913@freek.keybit.net Whole thread Raw |
In response to | Re: join selectivity (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Thu, Dec 16, 2004 at 01:56:29PM -0500, Tom Lane wrote: > "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes: > > ... But in the case of <column> <op> > > <unknown constant>, if we're estimating the number of rows to return then > > that becomes harder > > I didn't say it was easy ;-). The existing selectivity functions can't > do better than a rough guess in such cases, and I don't expect you can > either. Tom, correct me if I'm wrong. Doing some tests I've found out that the returned value from the JOINSEL is applied to REL1.rows X REL2.rows, but REL1 and REL2 are not 'base' table, rather relations with a number of rows once again estimated by other selectivity functions. For example, if JOINSEL always returns 1.0, you get a different 'estimated' number of rows for a Nested Loop depending on the presence of a condition filtering one of the tables. Example: test1 has 34 rowstest2 has 32 rows a full join makes the estimate=1088 rows ( 34*32 )a join with a filter on test2 makes estimate=34 ( 34*1 ? ) strk=# explain analyze select * from test1, test2 where test1.geom && test2.geom; NOTICE: LWGEOM_gist_joinsel called (returning 1.000000) QUERY PLAN --------------------------------------------------------------------------------------------------------------Nested Loop (cost=3.37..32.17 rows=1088 width=36) (actual time=0.193..70.691 rows=983 loops=1) Join Filter: ("inner".geom && "outer".geom) -> Seq Scan on test2 (cost=0.00..4.32 rows=32 width=4) (actual time=0.074..0.267 rows=32 loops=1) -> Materialize (cost=3.37..3.71 rows=34 width=32) (actual time=0.002..0.026 rows=34 loops=32) -> Seq Scan on test1 (cost=0.00..3.34 rows=34 width=32) (actual time=0.042..0.159 rows=34 loops=1)Total runtime: 71.426 ms (6 rows) trk=# explain analyze select * from test1, test2 where test1.geom && test2.geom and test2.id = 1; NOTICE: LWGEOM_gist_joinsel called (returning 1.000000) QUERY PLAN --------------------------------------------------------------------------------------------------------Nested Loop (cost=0.00..8.17rows=34 width=44) (actual time=0.179..2.704 rows=17 loops=1) Join Filter: ("inner".geom && "outer".geom) -> Seq Scan on test2 (cost=0.00..4.40 rows=1 width=8) (actual time=0.078..0.208 rows=1 loops=1) Filter:(id = 1) -> Seq Scan on test1 (cost=0.00..3.34 rows=34 width=36) (actual time=0.041..0.181 rows=34 loops=1)Totalruntime: 2.819 ms (6 rows) Now, is the number 1 what has been estimated bythe RESTRICT selectivity estimator forSERIAL = <constant> ?If it is, doesour JOINSEL function have access to thisinformation ? TIA --strk;
pgsql-hackers by date: