Re: Hypothetical suggestions for planner, indexing improvement - Mailing list pgsql-hackers

From Zeugswetter Andreas SB SD
Subject Re: Hypothetical suggestions for planner, indexing improvement
Date
Msg-id 46C15C39FEB2C44BA555E356FBCD6FA4961F99@m0114.s-mxs.net
Whole thread Raw
In response to Hypothetical suggestions for planner, indexing improvement  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Hypothetical suggestions for planner, indexing improvement
List pgsql-hackers
> > The reason that I mention EXISTS is because that's where the lack of
> > cross-column corellation is most dramatic; the planner seems to estimate a
> > flat 50% for EXISTS clauses regardless of the content.
>
> No "seems to" about that one: see
> src/backend/optimizer/path/clausesel.c
>
>     else if (is_subplan(clause))
>     {
>         /*
>          * Just for the moment! FIX ME! - vadim 02/04/98
>          */
>         s1 = (Selectivity) 0.5;
>     }

I think the main issue cannot be the correlation in this case.
In this FK PK case an events row exists for each row in the subselect,
so the correlation must be above 1, how much above is irrelevant for the
selectivity estimate (it is only relevant in the below 1 cases).
The selectivity in this case can be estimated by estimating the number of rows
returned from the subselect where the (in the example missing) PK-FK join condition
is removed (here: select distinct event_id from event_day where event_day BETWEEN
'2003-04-08' AND '2003-05-18'). (selectivity = min (1, e. rows of subselect / e. rows of
main select))

The "event_day BETWEEN '2003-04-08' AND '2003-05-18'" is what really reduces the
result set here, and that is not used.

Andreas

PS: in the example the subselect join clause event_id=events.event_id is missing



pgsql-hackers by date:

Previous
From: "Dave Page"
Date:
Subject: Re: 7.4 features list
Next
From: Andreas Pflug
Date:
Subject: contribute pg_get_viewdef2 et al