Thread: min() and NaN
Hi, I have a table containing a double precision column. That column contains at least one judiciously placed NaN. I'd expect the aggregate function min() to return the minimum, valid numeric value. Instead, it seems to return the minimum value from the subset of rows following the 'NaN'. What's going here? What should I expect the aggregate function min() to return in this case? And why? Any help is appreciated, Mike testdb=> \d min_with_nan Table "public.min_with_nan" Column | Type | Modifiers --------+------------------+----------- col1 | double precision | testdb=> select * from min_with_nan ; col1 ------- 3.141 2.718 NaN 10 (4 rows) testdb=> select min(col1) from min_with_nan ; min ----- 10 (1 row) testdb=> select min(col1) from min_with_nan where col1 != 'NaN'; min ------- 2.718 (1 row)
"Michael S. Tibbetts" <mtibbetts@head-cfa.cfa.harvard.edu> writes: > I'd expect the aggregate function min() to return the minimum, valid > numeric value. Instead, it seems to return the minimum value from the > subset of rows following the 'NaN'. Not real surprising given than min() is implemented with float8smaller, which does this: result = ((arg1 > arg2) ? arg1 : arg2); In most C implementations, any comparison involving a NaN will return "false". So when we hit the NaN, we have arg1 = min so far, arg2 = NaN, comparison yields false, result is NaN. On the next row, we have arg1 = NaN, arg2 = next value, comparison yields false, result is next value; and away it goes. We could probably make it work the way you want with explicit tests for NaN in float8smaller, arranged to make sure that the result is not NaN unless both inputs are NaN. But I'm not entirely convinced that we should make it work like that. The other float8 comparison operators are designed to treat NaN as larger than every other float8 value (so that it has a well-defined position when sorting), and I'm inclined to think that float8smaller and float8larger probably should behave likewise. (That actually is the same as what you want for MIN(), but not for MAX() ...) Comments anyone? regards, tom lane
On Sun, 20 Jul 2003, Tom Lane wrote: > "Michael S. Tibbetts" <mtibbetts@head-cfa.cfa.harvard.edu> writes: > > I'd expect the aggregate function min() to return the minimum, valid > > numeric value. Instead, it seems to return the minimum value from the > > subset of rows following the 'NaN'. > > Not real surprising given than min() is implemented with float8smaller, > which does this: > > result = ((arg1 > arg2) ? arg1 : arg2); > > In most C implementations, any comparison involving a NaN will return > "false". So when we hit the NaN, we have arg1 = min so far, arg2 = NaN, > comparison yields false, result is NaN. On the next row, we have > arg1 = NaN, arg2 = next value, comparison yields false, result is next > value; and away it goes. > > We could probably make it work the way you want with explicit tests for > NaN in float8smaller, arranged to make sure that the result is not NaN > unless both inputs are NaN. But I'm not entirely convinced that we > should make it work like that. The other float8 comparison operators > are designed to treat NaN as larger than every other float8 value (so > that it has a well-defined position when sorting), and I'm inclined to > think that float8smaller and float8larger probably should behave > likewise. (That actually is the same as what you want for MIN(), but > not for MAX() ...) The spec seems to say that min/max should work the same way as the comparison operators by saying that it returns the maximum or minimum value as determined by the comparison rules of the comparison predicate section. That'd seem to be asking for the second version.
On Tue, 15 Jul 2003, Michael S. Tibbetts wrote: > Hi, > > I have a table containing a double precision column. That column > contains at least one judiciously placed NaN. > > I'd expect the aggregate function min() to return the minimum, valid > numeric value. Instead, it seems to return the minimum value from the > subset of rows following the 'NaN'. This appears to be a bug in the min (and presumably max) aggregate function for floats. AFAICT It should use the same logic as the < (or >) comparison operator for the type (SQL92 6.5 GR2b iii I believe), but it's just doing a < (or >) in C on the two arguments which isn't the same. Changing it to follow those rules would give 2.718 for min and it looks like NaN for max.
If a compare with NaN is always false, how about rewriting it as: result = ((arg1 < arg2) ? arg2 : arg1). Or better yet, swap arg1 and arg2 when calling float8smaller. Use flaost8smaller( current_min, value). JLL Tom Lane wrote: > > "Michael S. Tibbetts" <mtibbetts@head-cfa.cfa.harvard.edu> writes: > > I'd expect the aggregate function min() to return the minimum, valid > > numeric value. Instead, it seems to return the minimum value from the > > subset of rows following the 'NaN'. > > Not real surprising given than min() is implemented with float8smaller, > which does this: > > result = ((arg1 > arg2) ? arg1 : arg2); > > In most C implementations, any comparison involving a NaN will return > "false". So when we hit the NaN, we have arg1 = min so far, arg2 = NaN, > comparison yields false, result is NaN. On the next row, we have > arg1 = NaN, arg2 = next value, comparison yields false, result is next > value; and away it goes. > > We could probably make it work the way you want with explicit tests for > NaN in float8smaller, arranged to make sure that the result is not NaN > unless both inputs are NaN. But I'm not entirely convinced that we > should make it work like that. The other float8 comparison operators > are designed to treat NaN as larger than every other float8 value (so > that it has a well-defined position when sorting), and I'm inclined to > think that float8smaller and float8larger probably should behave > likewise. (That actually is the same as what you want for MIN(), but > not for MAX() ...) > > Comments anyone? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
Jean-Luc Lachance <jllachan@nsd.ca> writes: > If a compare with NaN is always false, how about rewriting it as: > result = ((arg1 < arg2) ? arg2 : arg1). That just changes the failure mode. regards, tom lane
Is this a TODO? --------------------------------------------------------------------------- Tom Lane wrote: > Jean-Luc Lachance <jllachan@nsd.ca> writes: > > If a compare with NaN is always false, how about rewriting it as: > > result = ((arg1 < arg2) ? arg2 : arg1). > > That just changes the failure mode. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Is this a TODO? It'll only take ten minutes to make it a DONE, once we figure out what the behavior ought to be. So far I think both Stephan and I argued that MIN/MAX ought to treat NaN as larger than all ordinary values, for consistency with the comparison operators. That was not the behavior Michael wanted, but I don't see that we have much choice given the wording of the SQL spec. Does anyone want to argue against that definition? regards, tom lane
Well, my 2 cents is that though we consider NULL when ordering via ORDER BY, we ignore it in MAX because it really isn't a value, and NaN seems to be similar to NULL. When doing ORDER BY, we have to put the NULL value somewhere, so we put it at the end, but with aggregates, we aren't required to put the NULL somewhere, so we ignore it. Should that be the same for NaN? I just don't see how we can arbitrarly say it is greater/less than other values. --------------------------------------------------------------------------- Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Is this a TODO? > > It'll only take ten minutes to make it a DONE, once we figure out what > the behavior ought to be. So far I think both Stephan and I argued that > MIN/MAX ought to treat NaN as larger than all ordinary values, for > consistency with the comparison operators. That was not the behavior > Michael wanted, but I don't see that we have much choice given the > wording of the SQL spec. Does anyone want to argue against that > definition? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Tue, 22 Jul 2003, Bruce Momjian wrote: > Well, my 2 cents is that though we consider NULL when ordering via ORDER > BY, we ignore it in MAX because it really isn't a value, and NaN seems > to be similar to NULL. > > When doing ORDER BY, we have to put the NULL value somewhere, so we put > it at the end, but with aggregates, we aren't required to put the NULL > somewhere, so we ignore it. Should that be the same for NaN? I just > don't see how we can arbitrarly say it is greater/less than other > values. But we already do. When doing a less than/greater than comparison, 'NaN' is considered greater than normal values which is different from NULL which returns unknown for both.
Hey! here is a (stupid maybe) idea. Why not disallow 'NaN' for a float? JLL Stephan Szabo wrote: > > On Tue, 22 Jul 2003, Bruce Momjian wrote: > > > Well, my 2 cents is that though we consider NULL when ordering via ORDER > > BY, we ignore it in MAX because it really isn't a value, and NaN seems > > to be similar to NULL. > > > > When doing ORDER BY, we have to put the NULL value somewhere, so we put > > it at the end, but with aggregates, we aren't required to put the NULL > > somewhere, so we ignore it. Should that be the same for NaN? I just > > don't see how we can arbitrarly say it is greater/less than other > > values. > > But we already do. When doing a less than/greater than comparison, 'NaN' > is considered greater than normal values which is different from NULL > which returns unknown for both.
Stephan Szabo wrote: > > On Tue, 22 Jul 2003, Bruce Momjian wrote: > > > Well, my 2 cents is that though we consider NULL when ordering via ORDER > > BY, we ignore it in MAX because it really isn't a value, and NaN seems > > to be similar to NULL. > > > > When doing ORDER BY, we have to put the NULL value somewhere, so we put > > it at the end, but with aggregates, we aren't required to put the NULL > > somewhere, so we ignore it. Should that be the same for NaN? I just > > don't see how we can arbitrarly say it is greater/less than other > > values. > > But we already do. When doing a less than/greater than comparison, 'NaN' > is considered greater than normal values which is different from NULL > which returns unknown for both. But maybe that logic is the same as ORDER BY, where we have to give it some location in sorting order, while with aggregates we don't. I am not strong on this, but just point it out. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Well, my 2 cents is that though we consider NULL when ordering via ORDER > BY, we ignore it in MAX because it really isn't a value, and NaN seems > to be similar to NULL. Good idea, but I don't think we can get away with it. The spec says that MAX/MIN have to be consistent with the comparison operators (and therefore with ORDER BY): iii) If MAX or MIN is specified, then the result is respec- tively the maximum or minimum valuein TXA. These results are determined using the comparison rules specified in Subclause8.2, "<comparison predicate>". NULL can be special, because it acts specially in comparisons anyway. But NaN is just a value of the datatype. I'd be willing to go against the spec if I thought that having ignore-NaNs behavior was sufficiently important, but I don't think it's important enough to disregard the spec... regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Well, my 2 cents is that though we consider NULL when ordering via ORDER > > BY, we ignore it in MAX because it really isn't a value, and NaN seems > > to be similar to NULL. > > Good idea, but I don't think we can get away with it. The spec says > that MAX/MIN have to be consistent with the comparison operators (and > therefore with ORDER BY): > > iii) If MAX or MIN is specified, then the result is respec- > tively the maximum or minimum value in TXA. These results > are determined using the comparison rules specified in > Subclause 8.2, "<comparison predicate>". > > NULL can be special, because it acts specially in comparisons anyway. > But NaN is just a value of the datatype. > > I'd be willing to go against the spec if I thought that having > ignore-NaNs behavior was sufficiently important, but I don't think it's > important enough to disregard the spec... Yep. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Treating NaN's as larger(or smaller) than all ordinary values seems a fine way to go. It avoids the situation where you request MIN and get an ordinary value which is greater than the minimum ordinary value in the table. If MIN(or MAX given the ordering you're suggesting) returns NaN, the user would stand better odds of figuring out that something about the query needs to be changed. Returning an plausible, though possibly incorrect, ordinary value from MIN or MAX if there are NaN's in the column can lead users to make some unfortunate mistakes(voice of experience?). Thanks for your help, Mike > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Is this a TODO? > > It'll only take ten minutes to make it a DONE, once we figure out what > the behavior ought to be. So far I think both Stephan and I argued that > MIN/MAX ought to treat NaN as larger than all ordinary values, for > consistency with the comparison operators. That was not the behavior > Michael wanted, but I don't see that we have much choice given the > wording of the SQL spec. Does anyone want to argue against that > definition? > > regards, tom lane
Greg Stark <gsstark@mit.edu> writes: > Does postgres intend to support all the different types of NaN? Does you > intend to have +Inf and -Inf and underflow detection and all the other goodies > you actually need to make it useful? We have some of that; it needs work, and it's always going to be dependent on the platform having proper IEEE support, but that's no excuse to throw it away. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > NULL can be special, because it acts specially in comparisons anyway. > But NaN is just a value of the datatype. Does postgres intend to support all the different types of NaN? Does you intend to have +Inf and -Inf and underflow detection and all the other goodies you actually need to make it useful? If not it seems more useful to just use the handy unknown-value thing SQL already has and turn NaN into a NULL. -- greg