On 10/23/24 15:05, Robert Haas wrote:
> On Sat, Oct 19, 2024 at 6:00 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
>> Generally, a hash value doesn't 100% guarantee the uniqueness of a node
>> identification. Also, RelOptInfo corresponds to a subtree in the final
>> plan, and sometimes, it takes work to find which node in the partially
>> executed plan corresponds to this specific estimation on row number
>> during selectivity estimation. Remember parameterised paths - you should
>> attach some signature for each path. So, it is not fully strict method.
>> If you are interested, I can perhaps explain the method a little bit
>> more at some meetup.
>
> Yeah, I agree that this is not the best method. While it's true that
> you could get a false match in case of a hash value collision, IMHO
> the bigger problem is that it seems like an expensive way of
> determining something that we really should know already. If the user
> types the same query, mentioning the same relations, in the same
> order, with the same constructs around them, it's hard to believe that
> hashing is the cheapest way of matching up the old and new ones. I'm
> not sure exactly what we should do instead, but it feels like we more
> or less have this information during parsing and then we lose track of
> it as the query goes through the rewrite and planning phases.
Parse tree may be implemented with multiple execution plans. Even
clauses can be transformed during optimisation (Remember OR -> ANY).
Also, the cardinality of a middle-tree join depends on the inner and
outer subtrees. Because of that, having a hash on RelOptInfo's relids
and restrictions + hashes of child RelOptInfos and carrying it through
all other stages up to the end of execution is the most stable approach
I know.
--
regards, Andrei Lepikhov