On Tue, Mar 11, 2025 at 5:58 AM Ilia Evdokimov
<ilya.evdokimov@tantorlabs.com> wrote:
> In the stats_ext regression test, there is a function
> check_estimated_rows that returns actual rows as an integer. Currently,
> none of the test cases produce a non-zero fractional part in actual rows.
>
> The question is: should this function be modified to return a fractional
> number instead of an integer?
>
> Personally, I don’t see much value in doing so, because the purpose of
> the test is to compare the estimated row count with the actual number of
> rows. It is also unlikely that there will be test cases where loops > 1,
> and the presence of a fractional part does not change the essence of the
> test.
>
> What do you think?
I suppose the way this is currently coded, it will have the effect of
extracting the integer part of the row estimate. That doesn't really
seem like a bad idea to me, so I'm not sure it's worth trying to
adjust anything here.
One thing that I just noticed, though, is that we added two decimal
places to the actual row count, but not the estimated row count. So
now we get stuff like this:
-> Seq Scan on pgbench_branches b (cost=0.00..1.10 rows=10
width=364) (actual time=0.007..0.009 rows=10.00 loops=1)
But why isn't it just as valuable to have two decimal places for the
estimate? I theorize that the cases that are really a problem here are
those where the row count estimate is between 0 and 1 per row, and
rounding to an integer loses all precision.
--
Robert Haas
EDB: http://www.enterprisedb.com