On 6/14/24 14:54, Kohei KaiGai wrote:
> ...
>
> I tracked the behavior of estimate_num_groups() using gdb line-by-line to
> observe how 'input_rows' is changed
> and how it affects the result value.
> According to the call trace, the problematic estimate_num_groups()
> invocation is called with "input_rows=3251872.916666667",
> then it was rounded up to 3251873 by the clamp_row_est(). Eventually, its
> result value was calculated larger than the upper
> limit, so the return value was suppressed by 3251873, but it is a tiny bit
> larger than the input value!
>
> Back to the cost_memoize_rescan().
> The hit_ratio is calculated as follows:
>
> hit_ratio = ((calls - ndistinct) / calls) *
> (est_cache_entries / Max(ndistinct, est_cache_entries));
>
> The "calls" is the "input_rows" above, and "ndistinct" is the return value
> of the estimate_num_groups().
> What happen if "ndistinct" is a tiny bit larger than "calls"?
> In the results, the "hit_ratio" is calculated as a very small negative
> value, then it was terminated by Assert().
>
> How do we fix the logic? Please some ideas.
>
Interesting. Seems like a bug due to the two places clamping the values
inconsistently. It probably does not matter in other contexts because we
don't subtract the values like this, but here it triggers the assert.
I guess the simplest fix would be to clamp "calls" the same way before
calculating hit_ratio. That makes the ">= 0" part of the assert somewhat
pointless, though.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company