Hello,
On 11/26/23 12:22, Tom Lane wrote:
> Yes, this regression test is entirely unacceptable; the numbers will
> not be stable enough. Even aside from the different-settings issue,
> you can't rely on ANALYZE deriving exactly the same stats every time.
> Usually what we try to do is devise a query where the plan shape
> changes because of the better estimate.
Here is a patch with an improved test. With the old "10" estimate we get a Merge Join, but now that
the planner can see there are only ~4 elements per array, we get a Nested Loop.
It was actually hard to get a new plan, since all our regress tables' arrays have around 5 elements
average, which isn't so far from 10. Adding a table with 1- or 2- element arrays would be more
dramatic. So I resorted to tuning the query with `WHERE seqno <= 50`. Hopefully that's not cheating
too much.
I thought about also adding a test where the old code *underestimates*, but then I'd have to add a
new table with big arrays. If it's worth it let me know.
On 11/26/23 23:05, jian he wrote:
> I found using table array_op_test test more convincing.
True, arrtest is pretty small. The new test uses array_op_test instead.
On 11/29/23 20:35, jian he wrote:
> I did a minor change. change estimate_array_length return type to double
I'm not sure I want to change estimate_array_length from returning ints to returning doubles, since
it's called in many places. I can see how that might give plans that are more accurate yet, so maybe
it's worth it? It feels like it ought to be a separate patch to me, but if others want me to include
it here please let me know.
I did add clamp_row_est since it's essentially free and maybe gives some safety.
Rebased onto d43bd090a8.
Yours,
--
Paul ~{:-)
pj@illuminatedcomputing.com