Re: assertion failure at cost_memoize_rescan() - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: assertion failure at cost_memoize_rescan()
Date
Msg-id cce7141f-b0c8-4c3b-a16c-46aad7bf5cdd@enterprisedb.com
Whole thread Raw
In response to assertion failure at cost_memoize_rescan()  (Kohei KaiGai <kaigai@heterodb.com>)
Responses Re: assertion failure at cost_memoize_rescan()
List pgsql-hackers
On 6/14/24 14:54, Kohei KaiGai wrote:
> ...
>
> I tracked the behavior of estimate_num_groups() using gdb line-by-line to
> observe how 'input_rows' is changed
> and how it affects the result value.
> According to the call trace, the problematic estimate_num_groups()
> invocation is called with "input_rows=3251872.916666667",
> then it was rounded up to 3251873 by the clamp_row_est(). Eventually, its
> result value was calculated larger than the upper
> limit, so the return value was suppressed by 3251873, but it is a tiny bit
> larger than the input value!
> 
> Back to the cost_memoize_rescan().
> The hit_ratio is calculated as follows:
> 
>     hit_ratio = ((calls - ndistinct) / calls) *
>         (est_cache_entries / Max(ndistinct, est_cache_entries));
> 
> The "calls" is the "input_rows" above, and "ndistinct"  is the return value
> of the estimate_num_groups().
> What happen if "ndistinct" is a tiny bit larger than "calls"?
> In the results, the "hit_ratio" is calculated as a very small negative
> value, then it was terminated by Assert().
> 
> How do we fix the logic? Please some ideas.
> 

Interesting. Seems like a bug due to the two places clamping the values
inconsistently. It probably does not matter in other contexts because we
don't subtract the values like this, but here it triggers the assert.

I guess the simplest fix would be to clamp "calls" the same way before
calculating hit_ratio. That makes the ">= 0" part of the assert somewhat
pointless, though.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Inval reliability, especially for inplace updates
Next
From: Andres Freund
Date:
Subject: Re: Using LibPq in TAP tests via FFI