David Rowley <dgrowleyml@gmail.com> writes:
> Because there's been quite a few of these, and this report is yet
> another one, I wonder if it's time to try and stamp these out at the
> source rather than where the row counts are being used.
I'm on board with trying to get rid of NaN rowcount estimates more
centrally. I do not think it is a good idea to try to wire in a
prohibition against zero rowcounts. That is actually the correct
thing in assorted scenarios --- one example recently under discussion
was ModifyTable without RETURNING, and another is where we can prove
that a restriction clause is constant-false. At some point I think
we are going to want to deal honestly with those cases instead of
sweeping them under the rug. So I'm disinclined to remove zero
defenses that we'll just have to put back someday.
I think converting Inf to DBL_MAX, in hopes of avoiding creation of
NaNs later, is fine. (Note that applying rint() to that is quite
useless --- in every floating-point system, values bigger than
2^number-of-mantissa-bits are certainly integral.)
I'm not sure why you propose to map NaN to one. Wouldn't mapping it
to Inf (and thence to DBL_MAX) make at least as much sense? Probably
more in fact. We know that unwarranted one-row estimates are absolute
death to our chances of picking a well-chosen plan.
> I toyed around with the attached patch, but I'm still not that excited
> about the clamping of infinite values to DBL_MAX. The test case I
> showed above with generate_Series(1,379) still ends up with NaN cost
> estimates due to costing a sort with DBL_MAX rows. When I was writing
> the patch, I had it in my head that the costs per row will always be
> lower than 1.
Yeah, that is a good point. Maybe instead of clamping to DBL_MAX,
we should clamp rowcounts to something that provides some headroom
for multiplication by per-row costs. A max rowcount of say 1e100
should serve fine, while still being comfortably more than any
non-insane estimate.
So now I'm imagining something like
#define MAXIMUM_ROWCOUNT 1e100
clamp_row_est(double nrows)
{
/* Get rid of NaN, Inf, and impossibly large row counts */
if (isnan(nrows) || nrows >= MAXIMUM_ROWCOUNT)
nrows = MAXIMUM_ROWCOUNT;
else
... existing logic ...
Perhaps we should also have some sort of clamp for path cost
estimates, at least to prevent them from being NaNs which
is going to confuse add_path terribly.
regards, tom lane