On 11/17/21 16:39, Xiaozhe Yao wrote:
> Hi Tom,
>
> Thanks for your feedback. I completely agree with you that a
> higher-level hook is better suited for this case. I have adjusted the
> PoC patch to this email.
>
> Now it is located in the clauselist_selectivity_ext function, where we
> first check if the hook is defined. If so, we let the hook estimate the
> selectivity and return the result. With this one, I can also develop
> extensions to better estimate the selectivity.
>
I think clauselist_selectivity is the right level, because this is
pretty similar to what extended statistics are doing. I'm not sure if
the hook should be called in clauselist_selectivity_ext or in the plain
clauselist_selectivity. But it should be in clauselist_selectivity_or
too, probably.
The way the hook is used seems pretty inconvenient, though. I mean, if
you do this
if (clauselist_selectivity_hook)
return clauselist_selectivity_hook(...);
then what will happen when the ML model has no information applicable to
a query? This is called for all relations, all conditions, etc. and
you've short-circuited all the regular code, so the hook will have to
copy all of that. Seems pretty silly and fragile.
IMO the right approach is what statext_clauselist_selectivity is doing,
i.e. estimate clauses, mark them as estimated in a bitmap, and let the
rest of the existing code take care of the remaining clauses. So more
something like
if (clauselist_selectivity_hook)
s1 *= clauselist_selectivity_hook(..., &estimatedclauses);
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company