Thread: Re: join selectivity

Re: join selectivity

From

"Mark Cave-Ayland"

Date:

13 December 2004, 15:15:29

Hi strk,

(cut)

> > Taking a look at join selectivity...
> > For a query like this:
> >
> >    SELECT id FROM table1, table2
> >    WHERE table1.geom && table2.geom;
> >
> > RESTRICT selectivity is invoked twice and
> > JOIN selectivity is invoked once.
> > The RESTRICT code is not able to find a costant part
> > and thus returns the default value (0.000005),
> > JOIN selectivity so far returns an hard-wired 0.1.
> >
> > Questions:
> > (1) What should RESTRICT selectivity do in this case ?!

> Maybe that's how the planner decide what to do:
>     1) sequencially scan table1 and use index for each row
> (RESTRICT)
>     2) sequencially scan table2 and use index for each row
> (RESTRICT)
>     3) ... some other magic I'm missing .. (JOIN)

Indeed, you could be on the right lines here in thinking the planner
considers some form of individual scan on each first before finalising on a
plan type (although unless the tables are small I would have thought this
would not have been an option). Does this change if you do a SET
ENABLE_SEQSCAN = 'f' before the query?

It just seems strange for a <column> <operator> <column> clause to call a
function involving a constant. Again, I'd probably ask on pgsql-hackers just
to clarify - I think Tom Lane was involved with the planner, so will be able
to answer this one fairly quickly.

> (2) Is JOIN selectivity a fraction of table2 X table1
>    records ?
> I've tested this. It is a fraction of table2.rows X
> table1.rows. 0.1 is probably a big number for that...

Hehe indeed :) The reason this hit my TODO list was that I attempted a join
on two large geometry columns and ended up with a query plan that was doomed
to failure..... Maybe we should suggest some improved wording for the
documentation?


Kind regards,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk

Re: join selectivity

From

Tom Lane

Date:

13 December 2004, 20:16:13

"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> For a query like this:
> 
> SELECT id FROM table1, table2 
> WHERE table1.geom && table2.geom;
> 
> RESTRICT selectivity is invoked twice and
> JOIN selectivity is invoked once.

Hm, are you testing in a context where both tables have indexes that are
relevant to the && operator?

The estimated join result size is computed from the join selectivity
estimate for the && operator.  I was about to say that restriction
selectivity wouldn't be used at all, but on second thought I believe
that it would be invoked while considering nestloop with inner indexscan
plans.  That is, we'd consider
NestLoop    Seq Scan on table2    Indexscan on table1        IndexCond: table1.geom && outer.geom

and to determine the estimated cost of each indexscan, we would invoke
restriction selectivity for &&, with varRelid referencing table1.
Given this call you are supposed to treat table2.geom as a constant of
uncertain value, so the thing is semantically sensible as a restriction
clause for table1 (whether you can produce a really good estimate is
another question :-().

Similarly, we'd consider the reverse plan with table1 as outer, and
that would give rise to another restriction selectivity check with
varRelid = table2.

>> (2) Is JOIN selectivity a fraction of table2 X table1
>> records ?

Yes.  Similarly restriction selectivity is a fraction of records in the
table under consideration.
        regards, tom lane

Re: join selectivity

From

strk

Date:

14 December 2004, 14:24:18

On Mon, Dec 13, 2004 at 12:16:05PM -0500, Tom Lane wrote:
> "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> > For a query like this:
> > 
> > SELECT id FROM table1, table2 
> > WHERE table1.geom && table2.geom;
> > 
> > RESTRICT selectivity is invoked twice and
> > JOIN selectivity is invoked once.
> 
> Hm, are you testing in a context where both tables have indexes that are
> relevant to the && operator?

Single index relevant to the && operator makes 2 calls to RESTRICT.
Double index (one for each table) makes 4 calls to RESTRICT.
In both cases JOIN is called once.

--strk;

> The estimated join result size is computed from the join selectivity
> estimate for the && operator.  I was about to say that restriction
> selectivity wouldn't be used at all, but on second thought I believe
> that it would be invoked while considering nestloop with inner indexscan
> plans.  That is, we'd consider
> 
>     NestLoop
>         Seq Scan on table2
>         Indexscan on table1
>             IndexCond: table1.geom && outer.geom
> 
> and to determine the estimated cost of each indexscan, we would invoke
> restriction selectivity for &&, with varRelid referencing table1.
> Given this call you are supposed to treat table2.geom as a constant of
> uncertain value, so the thing is semantically sensible as a restriction
> clause for table1 (whether you can produce a really good estimate is
> another question :-().
> 
> Similarly, we'd consider the reverse plan with table1 as outer, and
> that would give rise to another restriction selectivity check with
> varRelid = table2.
> 
> >> (2) Is JOIN selectivity a fraction of table2 X table1
> >> records ?
> 
> Yes.  Similarly restriction selectivity is a fraction of records in the
> table under consideration.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: join selectivity

From

"Mark Cave-Ayland"

Date:

16 December 2004, 17:29:41

Hi Tom,

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 13 December 2004 17:16
> To: Mark Cave-Ayland
> Cc: strk@refractions.net; pgsql-hackers@postgresql.org;
> postgis-devel@postgis.refractions.net
> Subject: Re: [HACKERS] join selectivity
>
>
> "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> > For a query like this:
> >
> > SELECT id FROM table1, table2
> > WHERE table1.geom && table2.geom;
> >
> > RESTRICT selectivity is invoked twice and
> > JOIN selectivity is invoked once.
>
> Hm, are you testing in a context where both tables have
> indexes that are relevant to the && operator?
>
> The estimated join result size is computed from the join
> selectivity estimate for the && operator.  I was about to say
> that restriction selectivity wouldn't be used at all, but on
> second thought I believe that it would be invoked while
> considering nestloop with inner indexscan plans.  That is,
> we'd consider
>
>     NestLoop
>         Seq Scan on table2
>         Indexscan on table1
>             IndexCond: table1.geom && outer.geom
>
> and to determine the estimated cost of each indexscan, we
> would invoke restriction selectivity for &&, with varRelid
> referencing table1. Given this call you are supposed to treat
> table2.geom as a constant of uncertain value, so the thing is
> semantically sensible as a restriction clause for table1
> (whether you can produce a really good estimate is another
> question :-().
>
> Similarly, we'd consider the reverse plan with table1 as
> outer, and that would give rise to another restriction
> selectivity check with varRelid = table2.

Just to clarify, here are the explain results from strk's query:


strk=# explain analyze select * from test1, test2 where test1.geom &&
test2.geom;
NOTICE:  LWGEOM_gist_joinsel called (returning 0.000005)                                                 QUERY PLAN

----------------------------------------------------------------------------
----------------------------------Nested Loop  (cost=3.27..105.84 rows=1 width=64) (actual time=0.217..39.305
rows=2700 loops=1)  Join Filter: ("inner".geom && "outer".geom)  ->  Seq Scan on test2  (cost=0.00..28.32 rows=132
width=32)(actual 
time=0.081..1.111 rows=108 loops=1)  ->  Materialize  (cost=3.27..3.52 rows=25 width=32) (actual
time=0.001..0.011 rows=25 loops=108)        ->  Seq Scan on test1  (cost=0.00..3.25 rows=25 width=32) (actual
time=0.043..0.129 rows=25 loops=1)  Total runtime: 40.471 ms (6 rows)


.... so with no indices the JOIN function is called once, RESTRICT never. I
can understand this :)


strk=# create index test2_gist on test2 using gist (geom gist_geometry_ops);
CREATE INDEX
strk=# explain analyze select * from test1, test2 where test1.geom &&
test2.geom;
NOTICE:  LWGEOM_gist_joinsel called (returning 0.000005)
NOTICE:  LWGEOM_gist_sel called
NOTICE:   no constant arguments - returning default selectivity
NOTICE:  LWGEOM_gist_sel called
NOTICE:   no constant arguments - returning default selectivity                                                 QUERY
PLAN

----------------------------------------------------------------------------
----------------------------------Nested Loop  (cost=3.27..92.11 rows=1 width=64) (actual time=0.046..39.219
rows=2700 loops=1)  Join Filter: ("inner".geom && "outer".geom)  ->  Seq Scan on test2  (cost=0.00..28.08 rows=108
width=32)(actual 
time=0.009..0.198 rows=108 loops=1)  ->  Materialize  (cost=3.27..3.52 rows=25 width=32) (actual
time=0.000..0.013 rows=25 loops=108)        ->  Seq Scan on test1  (cost=0.00..3.25 rows=25 width=32) (actual
time=0.002..0.052 rows=25 loops=1)  Total runtime: 40.307 ms (6 rows)


...with one index RESTRICT is called twice.....


strk=# create index test1_gist on test1 using gist (geom gist_geometry_ops);
CREATE INDEX
strk=# explain analyze select * from test1, test2 where test1.geom &&
test2.geom;
NOTICE:  LWGEOM_gist_joinsel called (returning 0.000005)
NOTICE:  LWGEOM_gist_sel called
NOTICE:   no constant arguments - returning default selectivity
NOTICE:  LWGEOM_gist_sel called
NOTICE:   no constant arguments - returning default selectivity
NOTICE:  LWGEOM_gist_sel called
NOTICE:   no constant arguments - returning default selectivity
NOTICE:  LWGEOM_gist_sel called
NOTICE:   no constant arguments - returning default selectivity                                                 QUERY
PLAN

----------------------------------------------------------------------------
----------------------------------Nested Loop  (cost=3.27..92.11 rows=1 width=64) (actual time=0.052..38.867
rows=2700 loops=1)  Join Filter: ("inner".geom && "outer".geom)  ->  Seq Scan on test2  (cost=0.00..28.08 rows=108
width=32)(actual 
time=0.012..0.181 rows=108 loops=1)  ->  Materialize  (cost=3.27..3.52 rows=25 width=32) (actual
time=0.000..0.010 rows=25 loops=108)        ->  Seq Scan on test1  (cost=0.00..3.25 rows=25 width=32) (actual
time=0.002..0.032 rows=25 loops=1)  Total runtime: 40.027 ms (6 rows)


...and with two indices RESTRICT is called four times. The part I find
confusing is why with one index that RESTRICT is called twice. Surely for a
nested loop plan of the form you gave before which was:
    NestLoop        Seq Scan on table2        Indexscan on table1            IndexCond: table1.geom && outer.geom

Then if we just have an index on table1.geom this is the only plan that can
be considered - surely we cannot begin to consider the reverse plan because
there is no index on table2.geom. Based upon what you have suggested, I
would have expected that with one index RESTRICT would be called once, and
with two indices RESTRICT would be called twice.

I was also thinking whether calling RESTRICT when comparing with an unknown
value is worth doing at all, however I did think that perhaps if you are
using a cast to perform an operation on two datatypes, then you may be able
to imply something from the index, such as its physical size, and hint that
the planner should use a particular index in preference for the other. Would
it be correct to assume that if returning the same value for RESTRICT for
both means that the planner will choose one at random?


Many thanks,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk

Re: join selectivity

From

Tom Lane

Date:

16 December 2004, 18:54:51

"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> ...and with two indices RESTRICT is called four times. The part I find
> confusing is why with one index that RESTRICT is called twice.

[ shrug... ]  clause_selectivity doesn't try to cache the result.

> I was also thinking whether calling RESTRICT when comparing with an unknown
> value is worth doing at all, however I did think that perhaps if you are
> using a cast to perform an operation on two datatypes, then you may be able
> to imply something from the index, such as its physical size, and hint that
> the planner should use a particular index in preference for the other.

That would be inappropriate; the index size is factored in elsewhere
(gistcostestimate() to be specific).  Restriction selectivity shouldn't
directly consider the existence of indexes at all.

> Would it be correct to assume that if returning the same value for
> RESTRICT for both means that the planner will choose one at random?

If the tables/indexes are exactly the same size then you'd get the same
cost and the choice would be effectively random.
        regards, tom lane

Re: join selectivity

From

"Mark Cave-Ayland"

Date:

16 December 2004, 20:44:08

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 16 December 2004 15:55
> To: Mark Cave-Ayland
> Cc: strk@refractions.net; pgsql-hackers@postgresql.org;
> postgis-devel@postgis.refractions.net
> Subject: Re: [HACKERS] join selectivity
>
>
> "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> > ...and with two indices RESTRICT is called four times. The part I
> > find
> > confusing is why with one index that RESTRICT is called twice.
>
> [ shrug... ]  clause_selectivity doesn't try to cache the result.

Hi Tom,

OK I think I've misunderstood something more fundamental than that; I
understood from what you said that the RESTRICT clause is used to evaluate
the cost of table1.geom && table2.geom against table2.geom && table1.geom
(i.e. it is used to help decide which one should be seq scanned and which
should be index scanned in a nested loop node). So is the trick here for a
commutative operator to simply return the same value for both cases, as
other factors such as index size costs are considered elsewhere?

My final question would be how would can we detect the difference between
RESTRICT being called in this manner (as part of <column> <op> <column> with
an unknown constant) as opposed to <column> <op> <constant> with a known
constant?

Many thanks,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk

> > I was also thinking whether calling RESTRICT when comparing with an
> > unknown value is worth doing at all, however I did think
> that perhaps
> > if you are using a cast to perform an operation on two
> datatypes, then
> > you may be able to imply something from the index, such as its
> > physical size, and hint that the planner should use a
> particular index
> > in preference for the other.
>
> That would be inappropriate; the index size is factored in elsewhere
> (gistcostestimate() to be specific).  Restriction selectivity
> shouldn't directly consider the existence of indexes at all.
>
> > Would it be correct to assume that if returning the same value for
> > RESTRICT for both means that the planner will choose one at random?
>
> If the tables/indexes are exactly the same size then you'd
> get the same cost and the choice would be effectively random.
>
>             regards, tom lane
>

Re: join selectivity

From

Tom Lane

Date:

16 December 2004, 20:56:31

"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> OK I think I've misunderstood something more fundamental than that; I
> understood from what you said that the RESTRICT clause is used to evaluate
> the cost of table1.geom && table2.geom against table2.geom && table1.geom
> (i.e. it is used to help decide which one should be seq scanned and which
> should be index scanned in a nested loop node). So is the trick here for a
> commutative operator to simply return the same value for both cases, as
> other factors such as index size costs are considered elsewhere?

If the operator is commutative then the result should be too.  Really
you should not be thinking about costs at all when coding a selectivity
estimator: its charter is to estimate how many rows will match the
condition, not to estimate costs per se.

Note however that these aren't really the "same case", as you'd be
referencing two different columns with presumably different statistics.

> My final question would be how would can we detect the difference between
> RESTRICT being called in this manner (as part of <column> <op> <column> with
> an unknown constant) as opposed to <column> <op> <constant> with a known
> constant?

You should probably read the existing selectivity estimators in
utils/adt/selfuncs.c.  There's a fair amount of infrastructure code in
that file that you could borrow.  (It's not currently exported because
it tends to change from version to version, but maybe we could think
about making some of the routines global.)
        regards, tom lane

Re: join selectivity

From

"Mark Cave-Ayland"

Date:

16 December 2004, 21:50:17

Hi Tom,

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 16 December 2004 17:56
> To: Mark Cave-Ayland
> Cc: strk@refractions.net; pgsql-hackers@postgresql.org;
> postgis-devel@postgis.refractions.net
> Subject: Re: [HACKERS] join selectivity
>
>
> "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> > OK I think I've misunderstood something more fundamental
> than that; I
> > understood from what you said that the RESTRICT clause is used to
> > evaluate the cost of table1.geom && table2.geom against
> table2.geom &&
> > table1.geom (i.e. it is used to help decide which one should be seq
> > scanned and which should be index scanned in a nested loop
> node). So
> > is the trick here for a commutative operator to simply
> return the same
> > value for both cases, as other factors such as index size costs are
> > considered elsewhere?
>
> If the operator is commutative then the result should be too.
>  Really you should not be thinking about costs at all when
> coding a selectivity
> estimator: its charter is to estimate how many rows will
> match the condition, not to estimate costs per se.
>
> Note however that these aren't really the "same case", as
> you'd be referencing two different columns with presumably
> different statistics.

Well at the moment PostGIS has a RESTRICT function that takes an expression
of the form <column> <op> <constant> where column is a column consisting of
geometries and constant is a bounding box. This is based upon histogram
statistics and works well.

The surprise came when writing the JOIN function and finding that the
RESTRICT clause was being called. Now I understand that this is part of the
nested loop and not the JOIN so that helps. But in the case of <column> <op>
<unknown constant>, if we're estimating the number of rows to return then
that becomes harder - I'm thinking pick a rectangle half the area of the
statistical rectangle for the column and return the number of rows within
that area.

> You should probably read the existing selectivity estimators
> in utils/adt/selfuncs.c.  There's a fair amount of
> infrastructure code in that file that you could borrow.
> (It's not currently exported because it tends to change from
> version to version, but maybe we could think about making
> some of the routines global.)

OK will try and find some inspiration within.


Many thanks,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk

Re: join selectivity

From

Tom Lane

Date:

16 December 2004, 21:56:36

"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> ... But in the case of <column> <op>
> <unknown constant>, if we're estimating the number of rows to return then
> that becomes harder

I didn't say it was easy ;-).  The existing selectivity functions can't
do better than a rough guess in such cases, and I don't expect you can
either.
        regards, tom lane

Re: join selectivity

From

Greg Stark

Date:

16 December 2004, 23:12:51

"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:

> Well at the moment PostGIS has a RESTRICT function that takes an expression
> of the form <column> <op> <constant> where column is a column consisting of
> geometries and constant is a bounding box. This is based upon histogram
> statistics and works well.

Are these functions that would be useful for GiST indexes in general? 

What's involved in pulling them into a system? I mean, for example, a database
using RTREE (or GiST I guess) boxes and the @ operator.

I didn't realize anyone really had any idea where to start with gathering
statistics or writing selectivity functions for geometric types. It's great
news to hear there's actually work in this area.

-- 
greg

Re: join selectivity

From

strk

Date:

23 December 2004, 12:17:36

On Thu, Dec 16, 2004 at 03:12:21PM -0500, Greg Stark wrote:
> 
> "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> 
> > Well at the moment PostGIS has a RESTRICT function that takes an expression
> > of the form <column> <op> <constant> where column is a column consisting of
> > geometries and constant is a bounding box. This is based upon histogram
> > statistics and works well.
> 
> Are these functions that would be useful for GiST indexes in general? 

They provide selectivity for an 'overlap' operator.
GiST is not involved in any way.
Basically it provides statistical gathering for box types columns
and it's analysys in estimating the number of boxes that would
overlap a constant box.

> What's involved in pulling them into a system? I mean, for example, a database
> using RTREE (or GiST I guess) boxes and the @ operator.

It uses BOX2D as a key, maybe if you provide a cast from your
type to BOX2D it could work... I'd like to hear about attempt
at this.

> I didn't realize anyone really had any idea where to start with gathering
> statistics or writing selectivity functions for geometric types. It's great
> news to hear there's actually work in this area.

Statistics in postgis have been available for a long time:

2002-10-12 00:52  dblasby
       * postgis_estimate.c: New file with original estimation methods.

--strk;

> 
> -- 
> greg
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: join selectivity

From

Tom Lane

Date:

23 December 2004, 18:01:44

strk@refractions.net writes:
> Doing some tests I've found out that the returned value from the
> JOINSEL is applied to REL1.rows X REL2.rows, but REL1 and REL2
> are not 'base' table, rather relations with a number of 
> rows once again estimated by other selectivity functions.

Right.  This amounts to assuming that the join conditions and the
restriction conditions are independent, which of course is bogus,
but we really don't have enough information to do better.
        regards, tom lane

Re: join selectivity

From

Tom Lane

Date:

23 December 2004, 18:13:22

strk@refractions.net writes:
> On Thu, Dec 23, 2004 at 10:01:33AM -0500, Tom Lane wrote:
>> Right.  This amounts to assuming that the join conditions and the
>> restriction conditions are independent, which of course is bogus,
>> but we really don't have enough information to do better.

> Doesn't JOINSEL have access to RESTRICTSEL output for REL1 and REL2 ?

You could probably compare the fields of the RelOptInfo structures,
but what are you going to do with it?  AFAICS you *should not* make
the join selectivity depend on that.
        regards, tom lane

Re: join selectivity

From

Tom Lane

Date:

23 December 2004, 18:49:00

strk@refractions.net writes:
> So it should NOT depend on full number of rows either, is this right ?

No, it's supposed to return a fraction.
        regards, tom lane

Re: join selectivity

From

strk@refractions.net

Date:

24 December 2004, 17:11:18

On Thu, Dec 16, 2004 at 01:56:29PM -0500, Tom Lane wrote:
> "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> > ... But in the case of <column> <op>
> > <unknown constant>, if we're estimating the number of rows to return then
> > that becomes harder
> 
> I didn't say it was easy ;-).  The existing selectivity functions can't
> do better than a rough guess in such cases, and I don't expect you can
> either.

Tom, correct me if I'm wrong.
Doing some tests I've found out that the returned value from the
JOINSEL is applied to REL1.rows X REL2.rows, but REL1 and REL2
are not 'base' table, rather relations with a number of 
rows once again estimated by other selectivity functions.

For example, if JOINSEL always returns 1.0, you get a different
'estimated' number of rows for a Nested Loop depending on the
presence of a condition filtering one of the tables.

Example:
test1 has 34 rowstest2 has 32 rows
a full join makes the estimate=1088 rows  ( 34*32  )a join with a filter on test2 makes estimate=34 ( 34*1 ? )

strk=# explain analyze select * from test1, test2 where test1.geom && test2.geom;
NOTICE:  LWGEOM_gist_joinsel called (returning 1.000000)                                                 QUERY PLAN
              
 
--------------------------------------------------------------------------------------------------------------Nested
Loop (cost=3.37..32.17 rows=1088 width=36) (actual time=0.193..70.691 rows=983 loops=1)  Join Filter: ("inner".geom &&
"outer".geom) ->  Seq Scan on test2  (cost=0.00..4.32 rows=32 width=4) (actual time=0.074..0.267 rows=32 loops=1)  ->
Materialize (cost=3.37..3.71 rows=34 width=32) (actual time=0.002..0.026 rows=34 loops=32)        ->  Seq Scan on test1
(cost=0.00..3.34 rows=34 width=32) (actual time=0.042..0.159 rows=34 loops=1)Total runtime: 71.426 ms
 
(6 rows)


trk=# explain analyze select * from test1, test2 where test1.geom && test2.geom and test2.id = 1;
NOTICE:  LWGEOM_gist_joinsel called (returning 1.000000)                                              QUERY PLAN
              
 
--------------------------------------------------------------------------------------------------------Nested Loop
(cost=0.00..8.17rows=34 width=44) (actual time=0.179..2.704 rows=17 loops=1)  Join Filter: ("inner".geom &&
"outer".geom) ->  Seq Scan on test2  (cost=0.00..4.40 rows=1 width=8) (actual time=0.078..0.208 rows=1 loops=1)
Filter:(id = 1)  ->  Seq Scan on test1  (cost=0.00..3.34 rows=34 width=36) (actual time=0.041..0.181 rows=34
loops=1)Totalruntime: 2.819 ms
 
(6 rows)

Now, is the number 1 what has been estimated bythe RESTRICT selectivity estimator forSERIAL = <constant> ?If it is,
doesour JOINSEL function have access to thisinformation ?
 

TIA
--strk;

Re: join selectivity

From

strk@refractions.net

Date:

24 December 2004, 17:11:22

On Thu, Dec 23, 2004 at 10:13:03AM -0500, Tom Lane wrote:
> strk@refractions.net writes:
> > On Thu, Dec 23, 2004 at 10:01:33AM -0500, Tom Lane wrote:
> >> Right.  This amounts to assuming that the join conditions and the
> >> restriction conditions are independent, which of course is bogus,
> >> but we really don't have enough information to do better.
> 
> > Doesn't JOINSEL have access to RESTRICTSEL output for REL1 and REL2 ?
> 
> You could probably compare the fields of the RelOptInfo structures,
> but what are you going to do with it?  AFAICS you *should not* make
> the join selectivity depend on that.

So it should NOT depend on full number of rows either, is this right ?

--strk;

> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
>                http://archives.postgresql.org

Re: join selectivity

From

strk@refractions.net

Date:

24 December 2004, 17:11:27

On Thu, Dec 23, 2004 at 10:01:33AM -0500, Tom Lane wrote:
> strk@refractions.net writes:
> > Doing some tests I've found out that the returned value from the
> > JOINSEL is applied to REL1.rows X REL2.rows, but REL1 and REL2
> > are not 'base' table, rather relations with a number of 
> > rows once again estimated by other selectivity functions.
> 
> Right.  This amounts to assuming that the join conditions and the
> restriction conditions are independent, which of course is bogus,
> but we really don't have enough information to do better.
> 
>             regards, tom lane

Doesn't JOINSEL have access to RESTRICTSEL output for REL1 and REL2 ?

--strk;