On Mon, 28 Sep 2020 at 13:22, David Rowley <dgrowleyml@gmail.com> wrote:
> I'm more leaning towards the fact that the planner picked the seqscan
> path in the first place. Unfotunately I don't have any great ideas on
> how to fix without fudging the costs a bit. Maybe we can make
> var_eq_non_const() divide down the selectivity by:
>
> ndistinct = Max(get_variable_numdistinct(vardata, &isdefault),
> DEFAULT_NUM_DISTINCT);
>
> instead of :
>
> ndistinct = get_variable_numdistinct(vardata, &isdefault);
>
> But that's going to make the row estimate in this case 50 rows (10000
> / 200). With a subplan that the stats say can only either return 10000
> or 0 rows. Is that too weird?
I figured it was easier to have a discussion around a patch, so, for
discussion purposes, I wrote one and attached it.
The patch is along the lines of what I mentioned above, so aims to be
a more generic fix for this problem rather than something that only
aims to change EXISTS type subqueries.
It affects both of the examples that I mentioned in my previous email.
So we now get an index scan in the following case;
postgres=# set plan_cache_mode = force_generic_plan;
SET
postgres=# prepare q1(int) as select b from t where b = $1 limit 1;
PREPARE
postgres=# explain (analyze, costs off, timing off) execute q1(1);
QUERY PLAN
------------------------------------------------------------------
Limit (actual rows=1 loops=1)
-> Index Only Scan using t_b_idx on t (actual rows=1 loops=1)
Index Cond: (b = $1)
Heap Fetches: 0
Planning Time: 0.166 ms
Execution Time: 1.772 ms
(6 rows)
and we get a hashed subplan in the other case I mentioned. (Which does
win out over the Index Scan subplan in fix_alternative_subplan())
While it is great to fix both those cases, this patch could have
repercussions later in planning. The selectivity estimate for these
parameter equality scans is now going to be at most 1.0 / 200.0, where
before it could have been 1.0 given an ndistinct estimate of 1.
There's probably more smarts that could be added, like checking if the
parameter is from a Var and if there's a foreign key to verify that
we'll never pass in a parameter value that's not in the table. I'm not
sure how hard that will be exactly. I see we do at least pass down the
PlannerInfo into the operator's oprcode function, which is a good
start, but I suspect it'll still be tricky to do this efficiently.
Any opinions on this?
David