Thread: avg() with floating-point types

avg() with floating-point types

From
"George Pavlov"
Date:
I have city and postal_code tables linked by city_postal_code through a
city_id and postal_code_id. The postal_codes have latitude/longitude,
the cities don't. I want to set the city lat/long to the average of the
associated postal codes (abstract for a minute on whether that actually
makes sense from a geographical perspective), so I have a statement:

update city set latitude = city2.lat from (select c.city_id, avg(pc.latitude) as lat    from city c      left join
city_postal_codecpc using (city_id)      left join postal_code pc using (postal_code_id)    group by c.city_id) city2 
where city2.city_id = city.city_id

The datatype of both city.latitude and postal_code.latitude is
number(16,12).

This works, but I would like to understand why there is sometimes a
discrepancy between avg(pc.latitude) and what actually gets inserted
into the city table -- is it the usual floating-point discrepancy or is
there something I can do about it? E.g. after the above update:

select c.latitude, avg(pc.latitude), c.latitude-avg(pc.latitude) as diff from city c   left join city_postal_code cpc
using(city_id)     left join postal_code pc using (postal_code_id)   group by c.city_id,c.latitude having
avg(pc.latitude)!= c.latitude 
   latitude     |         avg         |        diff
-----------------+---------------------+---------------------36.709374333333 | 36.7093743333333333 |
-0.000000000000333341.078385733333| 41.0783857333333333 | -0.000000000000333331.576437888889 | 31.5764378888888889 |
0.000000000000111142.666669666667| 42.6666696666666667 |  0.000000000000333335.104581166667 | 35.1045811666666667 |
0.000000000000333331.263006142857| 31.2630061428571429 | -0.000000000000142938.805648772727 | 38.8056487727272727 |
-0.0000000000002727
...

An additional question -- is the UPDATE above written as cleanly as
possible (I am not very confident on my understanding of UPDATE-SET-FROM
syntax)?


Re: avg() with floating-point types

From
Tom Lane
Date:
"George Pavlov" <gpavlov@mynewplace.com> writes:
> The datatype of both city.latitude and postal_code.latitude is
> number(16,12).
> This works, but I would like to understand why there is sometimes a
> discrepancy between avg(pc.latitude) and what actually gets inserted
> into the city table -- is it the usual floating-point discrepancy or is
> there something I can do about it? E.g. after the above update:

You're forcing the result of the avg() calculation to be rounded to 12
digits when you store it into city.latitude.  Your example simply shows
that the avg() calculation is being done to more precision than that.
        regards, tom lane


Re: avg() with floating-point types

From
Michael Fuhr
Date:
On Sun, Jan 01, 2006 at 04:40:18PM -0800, George Pavlov wrote:
> The datatype of both city.latitude and postal_code.latitude is
> number(16,12).
> 
> This works, but I would like to understand why there is sometimes a
> discrepancy between avg(pc.latitude) and what actually gets inserted
> into the city table -- is it the usual floating-point discrepancy or is
> there something I can do about it? E.g. after the above update:

You've declared the numeric column as (16,12) so the inserted values
are rounded to the 12th decimal place.  Notice that that's where the
values start to differ:

>     latitude     |         avg         |        diff         
> -----------------+---------------------+---------------------
>  36.709374333333 | 36.7093743333333333 | -0.0000000000003333

Is such a difference significant to your application?  The distance
discrepancy is on the order of tens of nanometers, which seems
absurdly precise.  With lat/lon coordinates five places after the
decimal point is about meter-precision, which is probably more than
adequate for whatever you're doing.

Incidentally, if you're doing anything geospatial then you might
want to check out PostGIS:

http://www.postgis.org/

If you look around you should be able to find shapefiles with points
or polygons for cities and postal codes.

-- 
Michael Fuhr