Thread: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

19 May 2007, 14:31:24

Hi guys of the pgsql-hackers list.

I've received a bug report on the OLE DB list, which I suspect is
actually a server bug. The correspondence so far is listed further on,
but, in a nutshell, user runs an OLE DB client on windows (OLE DB uses
the binary interface), and server version 8.1.9 on Windows, and all is
fine. When the server is switched to 8.1.8 on Debian Etch on ARM9,
float8 type is not received properly by OLE DB.

Since OLE DB doesn't really care what version the server is running, the
chances of this being a server side bug are really high. I don't know
ARM9 well enough to comment on floating point format there.

Julian Heeb wrote:
> Shachar Shemesh schrieb:
>   
>> Julian Heeb wrote:
>>   
>>     
>>> Hello
>>>
>>> Our acounting software can use the PostgreSQL OLE DB driver to access
>>> a postgreSQL database. With the pg server installed on windows,
>>> everything works fine.
>>>
>>> I moved now the database to a postgreSQL server on a linux server, but
>>> now every floating point number gets wrongly interpreted by the
>>> acounting software, either by replacing it with a 0 or a very large
>>> number (e.g. xxxE+308). Only the floating point numbers are affected,
>>> integer or characters are correct. pgAdmin shows even the fp numbers
>>> correctly, so I guess it has something to do with the pgoledb driver.
>>>
>>> Can someone give me a hint, how to solve the problem?
>>>     
>>>       
>> It's hard to give a precise answer. Let's try a couple of venues.
>>
>> First of all, what platform is the Linux server? Is that an Intel, or
>> something else?
>>   
>>     
> It is an ARM9 platform with Debian Etch (Linkstation Pro Fileserver with 
> Freelink).
>   
>> Also, what is the precise type of the floating point var on the server?
>> Can you give the SQL line that generated the table?
>>   
>>     
> The table has been generated by the following SQL line. The problem 
> occures at the double precision fields.
>   
I have some bad news. This is the comment in the Postgresql source code.
This seems to be a core problem at the server side of things:
> /* --------------------------------
>  *        pq_sendfloat8    - append a float8 to a StringInfo buffer
>  *
>  * The point of this routine is to localize knowledge of the external
> binary
>  * representation of float8, which is a component of several datatypes.
>  *
>  * We currently assume that float8 should be byte-swapped in the same way
>  * as int8.  This rule is not perfect but it gives us portability across
>  * most IEEE-float-using architectures.
>  * --------------------------------
>  */
Could it be that ARM9 is not IEEE float standard? Can anyone from the
"hackers" list give any insight into this? The function for the data
type import on the client side seems to be in order (switch the byte
order around, and assume it's a valid "double" C type).

Shachar

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Stefan Kaltenbrunner

Date:

19 May 2007, 14:51:26

Shachar Shemesh wrote:
> Hi guys of the pgsql-hackers list.
> 
> I've received a bug report on the OLE DB list, which I suspect is
> actually a server bug. The correspondence so far is listed further on,
> but, in a nutshell, user runs an OLE DB client on windows (OLE DB uses
> the binary interface), and server version 8.1.9 on Windows, and all is
> fine. When the server is switched to 8.1.8 on Debian Etch on ARM9,
> float8 type is not received properly by OLE DB.

if 8.1.8 is built based on the debian packages it is probably compiled
with --enable-integer-datetimes.
Is the oledb client library able to cope with that ?

Stefan

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Tom Lane

Date:

19 May 2007, 15:03:19

Shachar Shemesh <shachar@shemesh.biz> writes:
> I've received a bug report on the OLE DB list, which I suspect is
> actually a server bug. The correspondence so far is listed further on,
> but, in a nutshell, user runs an OLE DB client on windows (OLE DB uses
> the binary interface), and server version 8.1.9 on Windows, and all is
> fine. When the server is switched to 8.1.8 on Debian Etch on ARM9,
> float8 type is not received properly by OLE DB.

Perhaps OLE is trying to use binary instead of text transmission of
data?  It's not a server bug if ARM has a different FP format than
the client hardware; it's the client's responsibility to either use
text format or be prepared to cope with whatever the binary format is.
        regards, tom lane

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Tom Lane

Date:

19 May 2007, 16:25:08

Shachar Shemesh <shachar@shemesh.biz> writes:
> I'll reiterate - the problem is not that PG is exporting the internal
> ARM FP format. The problem is that the server is exporting the internal
> ARM FP format when the server is ARM, and the IEEE format when the
> server is Intel. It's not the format, it's the inconsistency.

This is not a bug, it's intentional.  While IEEE-spec floats are
reasonably interchangeable these days (modulo the endianness issue),
other FP formats tend to have different ranges, special cases, etc.
If we try to force them to IEEE spec we may have problems with overflow,
loss of precision, who knows what.

> Like I said elsewhere, I'm willing to write a patch to "pq_sendfloat8"
> (and probably "pq_getmsgfloat8" too) to make sure it does the conversion
> on ARM platforms. Hell, I think I can even write it portable enough so
> that it will work on all non-IEEE platforms

Really?  Will it be faster and more reliable than conversion to text?
(In this context "reliable" means "can reproduce the original datum
exactly when transmitted back".)
        regards, tom lane

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

19 May 2007, 16:41:12

Stefan Kaltenbrunner wrote:
> Shachar Shemesh wrote:
>   
>> Hi guys of the pgsql-hackers list.
>>
>> I've received a bug report on the OLE DB list, which I suspect is
>> actually a server bug. The correspondence so far is listed further on,
>> but, in a nutshell, user runs an OLE DB client on windows (OLE DB uses
>> the binary interface), and server version 8.1.9 on Windows, and all is
>> fine. When the server is switched to 8.1.8 on Debian Etch on ARM9,
>> float8 type is not received properly by OLE DB.
>>     
>
> if 8.1.8 is built based on the debian packages it is probably compiled
> with --enable-integer-datetimes.
> Is the oledb client library able to cope with that ?
>   
I'm calling "GetPgStatus(&stat, _bstr_t("integer_datetimes") );" to find
out what the server representation is, and cope accordingly. This only
doesn't work on 7.4 servers, but otherwise I'm fine.

either way, this is off topic to this problem.

From further reading, it seems that ARM indeed uses its own
representation for IEEE floats. I'll try to poll my sources, try and
find out what the %(!@#&$ this format actually is (google was no help),
and try and formulate a patch for PG to export it in IEEE despite the
discrepancies.
> Stefan
>   
Shachar

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

19 May 2007, 16:41:35

Tom Lane wrote:
> Shachar Shemesh <shachar@shemesh.biz> writes:
>   
>> I've received a bug report on the OLE DB list, which I suspect is
>> actually a server bug. The correspondence so far is listed further on,
>> but, in a nutshell, user runs an OLE DB client on windows (OLE DB uses
>> the binary interface), and server version 8.1.9 on Windows, and all is
>> fine. When the server is switched to 8.1.8 on Debian Etch on ARM9,
>> float8 type is not received properly by OLE DB.
>>     
>
> Perhaps OLE is trying to use binary instead of text transmission of
> data?
Of course it does. That's what the OLE DB specs say. Said so in my
original email.
>   It's not a server bug if ARM has a different FP format than
> the client hardware;
No. The server can use, internally, whatever it wants.
>  it's the client's responsibility to either use
> text format or be prepared to cope with whatever the binary format is.
>   
I agree 100%.

I'll reiterate - the problem is not that PG is exporting the internal
ARM FP format. The problem is that the server is exporting the internal
ARM FP format when the server is ARM, and the IEEE format when the
server is Intel. It's not the format, it's the inconsistency.

I can (and I do) handle, in PgOleDb, binary format that are vastly
different than those that I need (anyone said "timestamps"?). Handling a
format that is inconsistent across same version backends merely because
of platform, now that's a server bug if I ever saw one.
>             regards, tom lane
>   
Like I said elsewhere, I'm willing to write a patch to "pq_sendfloat8"
(and probably "pq_getmsgfloat8" too) to make sure it does the conversion
on ARM platforms. Hell, I think I can even write it portable enough so
that it will work on all non-IEEE platforms (I'm not sure yet, but I
have a general idea). What I'm hoping for, however, is that if I do, you
(or another comitter) will get it in.

Shachar

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

19 May 2007, 17:16:40

Tom Lane wrote:
> Shachar Shemesh <shachar@shemesh.biz> writes:
>   
>> I'll reiterate - the problem is not that PG is exporting the internal
>> ARM FP format. The problem is that the server is exporting the internal
>> ARM FP format when the server is ARM, and the IEEE format when the
>> server is Intel. It's not the format, it's the inconsistency.
>>     
>
> This is not a bug, it's intentional.  While IEEE-spec floats are
> reasonably interchangeable these days (modulo the endianness issue),
> other FP formats tend to have different ranges, special cases, etc.
> If we try to force them to IEEE spec we may have problems with overflow,
> loss of precision, who knows what.
>   
Yes, but if we do not then we have a data interchange library that is
useless for data interchange. I think overflow and precision loss is
preferable.

Please remember that I'm only trying to help Postgresql here. I have a
spec to work with on the outside. I'm more than willing to do what's
necessary (see the STRRRANGE date conversion code) in order to adapt
whatever PG throws my way to the no less strange representation expected
of me. That's what I do as a driver hacker.

Sometimes, the specs don't help me. Windows' notion of "timezone free
timestamps" is nothing short of a disgrace, and some of the hacks that
are needed around that issues are, well, hacks. I don't come complaining
here, because this has nothing to do with PG. It's bad design on the
other end of the two ends that a driver has to make meet.

But sometimes, like now, PG puts me in an impossible position. You are
essentially telling me "you will get the numbers in an unknown format,
you will not have any way of knowing whether you got them in a strange
format or not, nor will you have any docs on what that format is going
to be". That is no way to treat your driver developers.
>   
>> Like I said elsewhere, I'm willing to write a patch to "pq_sendfloat8"
>> (and probably "pq_getmsgfloat8" too) to make sure it does the conversion
>> on ARM platforms. Hell, I think I can even write it portable enough so
>> that it will work on all non-IEEE platforms
>>     
>
> Really?  Will it be faster
Absolutely. Do you honestly believe that turning a 64bit binary number
into a 40 something byte decimal number will be quicker than turning a
64 bit binary number into another 64 bit number? For one thing, I really
doubt that my technique will require division, modulo or, in fact, any
math operations at all. It will likely be done with a few bit shifting
and that's it.

I also find it strange, though, that you berate me for using binary
rather than text format, and then complain about speed. That's what
makes OLE DB faster than ODBC - binary interface.
>  and more reliable than conversion to text?
>   
Well, depends on how you define "more reliable". If you define it to
mean "exactly represents what happens in the server internals", then the
answer is "no". If you define it to mean "make more sense to the client,
and have better chances of producing results that more closely
approximate the right number than the current code", then the answer is
a definite yes.
> (In this context "reliable" means "can reproduce the original datum
> exactly when transmitted back".)
>   
Who cares? If you are using the same function for binary communication
inside the server and for communications to the clients (or, for that
matter, another server), then there is something wrong in your design.
What are the "send" functions used for, beside server to client
communication, anyways? You are asking me to treat the binary data as an
opaque. Well, I'll counter with a question - what good is that to me?

Please note that the current code is useless for communicating binary
data between two servers, even if they are guaranteed to be of the same
version! How much less reliable can you get?

Please, give your own interface designers something to work with. Your
attitude essentially leaves me out in the cold.
>             regards, tom lane
>   
Shachar

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Heikki Linnakangas

Date:

19 May 2007, 17:39:43

Shachar Shemesh wrote:
> Tom Lane wrote:
>> Shachar Shemesh <shachar@shemesh.biz> writes:
>>   
>>> I'll reiterate - the problem is not that PG is exporting the internal
>>> ARM FP format. The problem is that the server is exporting the internal
>>> ARM FP format when the server is ARM, and the IEEE format when the
>>> server is Intel. It's not the format, it's the inconsistency.
>>>     
>> This is not a bug, it's intentional.  While IEEE-spec floats are
>> reasonably interchangeable these days (modulo the endianness issue),
>> other FP formats tend to have different ranges, special cases, etc.
>> If we try to force them to IEEE spec we may have problems with overflow,
>> loss of precision, who knows what.
>>   
> Yes, but if we do not then we have a data interchange library that is
> useless for data interchange. I think overflow and precision loss is
> preferable.

I agree in principle that the wire protocol should be 
platform-independent. At the very least, if the server always sends in 
native format, the client needs to know which format it's receiving and 
be able to convert between all formats. Switching to a common format on 
all platforms would of course make life easier for client library 
developers.

> But sometimes, like now, PG puts me in an impossible position. You are
> essentially telling me "you will get the numbers in an unknown format,
> you will not have any way of knowing whether you got them in a strange
> format or not, nor will you have any docs on what that format is going
> to be". That is no way to treat your driver developers.

You seem to be ignoring the fact that the text format is 
platform-independent. That's what for example JDBC uses, and I'd imagine 
other interfaces as well. Is it not possible to use text format in OLE 
DB, for floating points?

>> (In this context "reliable" means "can reproduce the original datum
>> exactly when transmitted back".)
>>   
> Who cares? If you are using the same function for binary communication
> inside the server and for communications to the clients (or, for that
> matter, another server), then there is something wrong in your design.

> What are the "send" functions used for, beside server to client
> communication, anyways? You are asking me to treat the binary data as an
> opaque. Well, I'll counter with a question - what good is that to me?

Imagine an application that does this:

Issue query "SELECT foofloat FROM footable", and store the value to a 
variable
Issue "INSERT INTO footable2 VALUES (?)", and send back the same value.

Don't you think footable and footable2 should now have the same value? 
If "SELECT * FROM footable, footable2 WHERE foofloat=foofloat" returns 0 
rows, I'd be pissed.

It's possible that the conversion within the driver loses some 
precision, depending on the data types supported by the language and 
platform, but the wire protocol should at least give the driver a chance 
to get it right.

> Please note that the current code is useless for communicating binary
> data between two servers, even if they are guaranteed to be of the same
> version! How much less reliable can you get?

The current code is not used for communicating between two servers. And 
it works fine as long as the client and the server are on the same platform.

Don't get me wrong, I agree that the binary format is broken as it is, 
but the cure mustn't be worse than the disease.

> Please, give your own interface designers something to work with. Your
> attitude essentially leaves me out in the cold.

Design a wire protocol that
1. Doesn't lose information on any platform
2. Is more efficient than text format

and I'm pretty sure it'll be accepted.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Tom Lane

Date:

19 May 2007, 18:02:17

Heikki Linnakangas <heikki@enterprisedb.com> writes:
> I agree in principle that the wire protocol should be 
> platform-independent.

The *TEXT* format is for that.  The problem here is that Shachar is
insisting on using binary format in a context where it is inappropriate.
Binary format has other goals that are not always compatible with 100%
platform independence --- that's unfortunate, sure, but it's reality.
        regards, tom lane

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

19 May 2007, 18:12:24

Tom Lane wrote:
> Binary format has other goals that are not always compatible with 100%
> platform independence --- that's unfortunate, sure, but it's reality.
>   
Maybe the misunderstanding is mine. What are the goals for the binary
format?

Shachar

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

19 May 2007, 18:20:32

Heikki Linnakangas wrote:
>> But sometimes, like now, PG puts me in an impossible position. You are
>> essentially telling me "you will get the numbers in an unknown format,
>> you will not have any way of knowing whether you got them in a strange
>> format or not, nor will you have any docs on what that format is going
>> to be". That is no way to treat your driver developers.
>
> You seem to be ignoring the fact that the text format is
> platform-independent. That's what for example JDBC uses, and I'd
> imagine other interfaces as well. Is it not possible to use text
> format in OLE DB, for floating points?
It is impossible to use text format for just floating point. I often
don't know in advance what type the result is going to be.

I can switch EVERYTHING to text mode, but as the other end of the
interface requires me to get things out in binary format, I get the
following absurd path:
PG stores in binary
PG translates to text
PG sends to driver
driver translates to binary

As long as I get consistent binary format, I prefer to translate between
binary formats than between text and binary.
>
>> What are the "send" functions used for, beside server to client
>> communication, anyways? You are asking me to treat the binary data as an
>> opaque. Well, I'll counter with a question - what good is that to me?
>
> Imagine an application that does this:
>
> Issue query "SELECT foofloat FROM footable", and store the value to a
> variable
> Issue "INSERT INTO footable2 VALUES (?)", and send back the same value.
Why would I want to do that over "insert into footable2 select foofloat
from footable"?

I know, even if it makes no sense you'd want it to work. All I'm saying
is that something has got to give, and there is no reason to assume that
your usage is more likely than mine.

For that reason, without looking into the ARM float implementation, it
is just as likely that it contains LESS precision than the IEEE one.
Would that change the objection?
>
> Don't you think footable and footable2 should now have the same value?
> If "SELECT * FROM footable, footable2 WHERE foofloat=foofloat" returns
> 0 rows, I'd be pissed.
>
> It's possible that the conversion within the driver loses some
> precision, depending on the data types supported by the language and
> platform, but the wire protocol should at least give the driver a
> chance to get it right.
I'm not sure why there is a difference here. How is that above example
different than the exact same example written in ADODB (and thus passing
driver conversion)?

I'll take this one step further. Does the text representation never lose
precision? Ever?

What if I send a patch that sends a 64bit float as 128bit number,
containing two integers, one for mantissa and one for exponent. This
format is guaranteed to never lose precision, but is wire compatible
across platforms. Would that be considered a good solution?
>
> The current code is not used for communicating between two servers.
> And it works fine as long as the client and the server are on the same
> platform.
So does the driver. And yet, a user complained! In my dictionary, this
means that "as long as the client and server are on the same platform"
is not a good enough requirement.
>
> Don't get me wrong, I agree that the binary format is broken as it is,
> but the cure mustn't be worse than the disease.
>
>> Please, give your own interface designers something to work with. Your
>> attitude essentially leaves me out in the cold.
>
> Design a wire protocol that
> 1. Doesn't lose information on any platform
> 2. Is more efficient than text format
>
> and I'm pretty sure it'll be accepted.
>
I just offered one. I would hate it myself, and it would mean that pre
8.3 (or whenever it is that it will go in) will have a different
representation than post the change, but it will live up to your requests.

Shachar

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Tom Lane

Date:

19 May 2007, 18:22:27

Shachar Shemesh <shachar@shemesh.biz> writes:
> Tom Lane wrote:
>> Binary format has other goals that are not always compatible with 100%
>> platform independence --- that's unfortunate, sure, but it's reality.
>> 
> Maybe the misunderstanding is mine. What are the goals for the binary
> format?

Well, the one that I'm concerned about at the moment is that dump/reload
using COPY BINARY ought to be 100% guaranteed to reproduce the original
datum.

Obviously, if you are transporting the dump across platforms then that
may be an impossibility.  In that case you use a text dump and accept
that you get an approximation.  But there should be a way to ensure that
you can have a lossless dump of whatever strange FP format your server
may use, as long as you are restoring onto a similar machine.

If there is a guaranteed-reversible transform between the ARM FP format
and IEEE format, then I'd be interested in hacking things the way you
suggest --- but what I suspect is that such a conversion must lose
either range or precision.  There are only so many bits in a double.
        regards, tom lane

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Andrew Dunstan

Date:

19 May 2007, 18:36:23

Shachar Shemesh wrote:
>> Perhaps OLE is trying to use binary instead of text transmission of
>> data?
>>     
> Of course it does. That's what the OLE DB specs say. Said so in my
> original email.
>   

Why the heck do the OLE DB specs care about the internals of the 
client-server prototocol? It is documented fairly clearly that text is 
the only portable way to transfer data.

Perhaps we need to expand this sentence in the docs: "Keep in mind that 
binary representations for complex data types may change across server 
versions;"

The COPY docs are probably more correct: "The BINARY key word causes all 
data to be stored/read as binary format rather than as text. It is 
somewhat faster than the normal text mode, but a binary-format file is 
less portable across machine architectures and PostgreSQL versions."

I do recall someone telling me that text mode transfer could actually be 
faster than binary, somewhat to their (and my) surprise.

cheers

andrew

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Tom Lane

Date:

19 May 2007, 18:44:34

Shachar Shemesh <shachar@shemesh.biz> writes:
> Heikki Linnakangas wrote:
>> Is it not possible to use text
>> format in OLE DB, for floating points?

> It is impossible to use text format for just floating point. I often
> don't know in advance what type the result is going to be.

Sure it's "possible".  Send a Parse command, ask for Describe Statement
output, then specify the column formats as desired in Bind.  Now this
does imply an extra server round trip, which might be annoying if your
client code doesn't have another reason to need to peek at Describe
output.

An idea that's been in the back of my mind for awhile is to provide some
way to let the client say things like "I want float and timestamp
results in text and everything else in binary", so that one setup step
at the start of the session avoids the need for the extra round trips.
Haven't got a detailed proposal at the moment though.

> What if I send a patch that sends a 64bit float as 128bit number,
> containing two integers, one for mantissa and one for exponent. This
> format is guaranteed to never lose precision, but is wire compatible
> across platforms. Would that be considered a good solution?

No, not unless you can make the case why this handles NaNs and
denormalized numbers compatibly across platforms...
        regards, tom lane

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Tom Lane

Date:

19 May 2007, 18:57:04

Andrew Dunstan <andrew@dunslane.net> writes:
> I do recall someone telling me that text mode transfer could actually be 
> faster than binary, somewhat to their (and my) surprise.

Seems a bit improbable --- what was their test case?

The only such situation that comes to mind is that some values are
smaller as text than binary (eg "2" is shorter as text than as any
binary numeric format), so in a situation where number of bytes sent
dominates all other costs, text would win.  But of course there are also
many values that're smaller in binary format, so I'd think this would
only happen in restricted test cases.
        regards, tom lane

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Andrew Dunstan

Date:

19 May 2007, 19:13:17

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> I do recall someone telling me that text mode transfer could actually be 
>> faster than binary, somewhat to their (and my) surprise.
>>     
>
> Seems a bit improbable --- what was their test case?
>
>
>   

No idea - this was idle chat on IRC I think. I am similarly skeptical. 
After all, we just had a discussion about improving performance of PLs 
by avoiding use of the input/output functions in some cases.

cheers

andrew

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

20 May 2007, 01:05:21

Tom Lane wrote:
> Sure it's "possible".  Send a Parse command, ask for Describe Statement
> output, then specify the column formats as desired in Bind.  Now this
> does imply an extra server round trip, which might be annoying if your
> client code doesn't have another reason to need to peek at Describe
> output.
>   
No, it's far worse than annoying. It kills my performance. OLE DB does
have a mechanism for explicit "prepare", and I certainly COULD do it
only for those cases, but it is meaningless.
>
>> What if I send a patch that sends a 64bit float as 128bit number,
>> containing two integers, one for mantissa and one for exponent. This
>> format is guaranteed to never lose precision, but is wire compatible
>> across platforms. Would that be considered a good solution?
>>     
>
> No, not unless you can make the case why this handles NaNs and
> denormalized numbers compatibly across platforms...
>   
NaNs and infinite (plus and minus) should not be a problem. I'm not sure
what denormalized numbers are. If you mean (switching to base 10 for a
second) that 2*10^3 vs. 20*10^2, then I would have to ask why you want
them treated differently. What is the scenario in which you would want
to tell them apart? Likewise, would you really want to tell +0 and -0
apart? If I have an export/import round trip that turns -0 into +0, is
that really a problem?
>             regards, tom lane
>   
Shachar

Re: [Oledb-dev] Re: double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

20 May 2007, 01:18:42

Tom Lane wrote:
> Obviously, if you are transporting the dump across platforms then that
> may be an impossibility.  In that case you use a text dump and accept
> that you get an approximation.
‎That's something that I've been meaning to ask about, but you all
seemed so sure of yourself. What you are essentially saying is that no
one aside from the server itself is allowed to get full precision. That
seems like a strange decision to make.
>   But there should be a way to ensure that
> you can have a lossless dump of whatever strange FP format your server
> may use, as long as you are restoring onto a similar machine.
>   
Personally (and I know my opinion "doesn't count"), I find the "similar
machine" requirement a bit hard. It is usually accepted for HA or live
load balancing, but is really inappropriate for backups or data
transfers. Just my humble opinion.
> If there is a guaranteed-reversible transform between the ARM FP format
> and IEEE format, then I'd be interested in hacking things the way you
> suggest
I find it highly likely that there will be. After all, the ARM format
was not designed to be better packed than IEEE, just easier to hardware
implement in an efficient way. However
>  --- but what I suspect is that such a conversion must lose
> either range or precision.  There are only so many bits in a double.
>   
Like I said elsewhere, a 64bit FP only has 64bits, but there is nothing
constraining us to export a 64bit number to 64bits.
>             regards, tom lane
>   
What I'm worried about in that regard is about other platforms that PG
may be running on. Like I said above, I'm fairly sure (will get the
specs and make sure) that there shouldn't be a problem in exporting ARM
FP into 64bit IEEE with no loss at all. This says nothing, however,
about other platforms. Unless we map all cases, we had better choose an
export format that is capable of extension.

Shachar

Re: [Oledb-dev] Re: double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

20 May 2007, 01:26:02

Andrew Dunstan wrote:
> Why the heck do the OLE DB specs care about the internals of the 
> client-server prototocol? It is documented fairly clearly that text is 
> the only portable way to transfer data.
>   
Is it?
> Perhaps we need to expand this sentence in the docs: "Keep in mind that 
> binary representations for complex data types may change across server 
> versions;"
>   
Where is that in the docs. It does not appear in the page discussing
PQLIB and binary data transfers
(http://www.postgresql.org/docs/8.2/interactive/libpq-exec.html).

Even the original sentence does not describe the problem we're seeing
here. It does not mention cross platform incompatibility.
> The COPY docs are probably more correct: "The BINARY key word causes all 
> data to be stored/read as binary format rather than as text. It is 
> somewhat faster than the normal text mode, but a binary-format file is 
> less portable across machine architectures and PostgreSQL versions."
>   
Again, to me this sentence spells "you have a problem, we're not going
to help you out, deal with it". This is especially true if what Tom said
was true, that the text format does not maintain total precision. You
are essentially telling the user "you cannot move your data reliably
even between servers of the same version".

Shachar

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Tom Lane

Date:

20 May 2007, 01:33:44

Shachar Shemesh <shachar@shemesh.biz> writes:
> Tom Lane wrote:
>> No, not unless you can make the case why this handles NaNs and
>> denormalized numbers compatibly across platforms...
>> 
> NaNs and infinite (plus and minus) should not be a problem.

Really?  Need I point out that these concepts, let alone their
representation, are not standardized in non-IEEE float implementations?

> I'm not sure what denormalized numbers are.

You should find out before proposing representation replacements for
floats.

> What is the scenario in which you would want to tell them apart?
> Likewise, would you really want to tell +0 and -0 apart?

There are competent authorities --- for instance, the guys who created
the IEEE float standard --- who think it's worth distinguishing them.
IIRC (it's been fifteen or so years since I did any serious numerical
analysis) the arguments in favor have mostly to do with preserving
maximal accuracy for intermediate results in a series of calculations.
So maybe you could claim that these arguments are not so relevant to
storage in a database.  But personally I don't think it's the province
of a database to decide that it need not accurately preserve the data
it's given to store.
        regards, tom lane

Re: [Oledb-dev] Re: double precision error with pg linux server, but not with windows pg server

From

Andrew Dunstan

Date:

20 May 2007, 01:52:19


Shachar Shemesh wrote:
>
> Even the original sentence does not describe the problem we're seeing
> here. It does not mention cross platform incompatibility.
>   

That's why I suggested it should be improved.

>> The COPY docs are probably more correct: "The BINARY key word causes all 
>> data to be stored/read as binary format rather than as text. It is 
>> somewhat faster than the normal text mode, but a binary-format file is 
>> less portable across machine architectures and PostgreSQL versions."
>>   
>>     
> Again, to me this sentence spells "you have a problem, we're not going
> to help you out, deal with it". 

Eh? It spells out what we provide and what the limitations are. You 
could put that spin on every documented limitation, if you were so inclined.

> This is especially true if what Tom said
> was true, that the text format does not maintain total precision. You
> are essentially telling the user "you cannot move your data reliably
> even between servers of the same version".
>
>
>   

Since this is the exact mechanism used by pg_dump, we would surely have 
been long since deafened by complaints if this were a problem of any 
significance.

cheers

andrew

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

20 May 2007, 02:43:18

Tom Lane wrote:
> Shachar Shemesh <shachar@shemesh.biz> writes:
>   
>> Tom Lane wrote:
>>     
>>> No, not unless you can make the case why this handles NaNs and
>>> denormalized numbers compatibly across platforms...
>>>
>>>       
>> NaNs and infinite (plus and minus) should not be a problem.
>>     
>
> Really?  Need I point out that these concepts, let alone their
> representation, are not standardized in non-IEEE float implementations?
>   
So what? They only need be standardtized in our export format, which we
define. That's why we define it, after all....
>   
>> I'm not sure what denormalized numbers are.
>>     
>
> You should find out before proposing representation replacements for
> floats.
>   
Yes, but we're still discussing the preliminary stages. At these stages,
it is enough to know that they CAN be defined (and they can).
>   
>> What is the scenario in which you would want to tell them apart?
>> Likewise, would you really want to tell +0 and -0 apart?
>>     
>
> IIRC (it's been fifteen or so years since I did any serious numerical
> analysis) the arguments in favor have mostly to do with preserving
> maximal accuracy for intermediate results in a series of calculations.
> So maybe you could claim that these arguments are not so relevant to
> storage in a database.  But personally I don't think it's the province
> of a database to decide that it need not accurately preserve the data
> it's given to store.
>   
This is not data given to store. It's data being exported.

I think you are failing to consider something. The simple truth of the
matter is that drivers are used far more often to access the server than
pqlib or direct TCP programming. OLE DB has been stagnant for over two
years now, is only available for one platform, and the easiest install
option for it is through the postgresql installer, and yet it is still
the third most popular download on pgfoundry (with the .NET provider
being #1).

What I'm getting at is that drivers are important. It is important that
they have good performance. It is important that they be stable. I
really think the backend should take driver considerations more
seriously. The suggested method, of switching to text mode, will surely
work, but it will also hurt performance.

I've said it before. I find it highly unlikely that the ARM FP format
will have any problem with being exported, even to a 64bit IEEE number.
Not knowing the FP format, but knowing the platform, it likely just got
rid of all the corner cases (NaN, denormalized numbers) merely so they
can implement it more efficiently in hardware. I find the chances that
it will have a wider range than IEEE in either mantissa or exponent
unlikely.

The question here is a broader question, though. Should we strive for
binary compatibility across all platforms of a given version? The
benefit is faster drivers and being able to COPY across platforms (but,
still, not across versions). The cost are a couple of types (I would
really change timestamp too, while at it) that need a non-straight
forward export/import function.

Tom seems to think this is not a goal (though, aside from his disbelief
that such a goal is attainable, I have heard no arguments against it).
What do the other people think?

Shachar

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Greg Smith

Date:

20 May 2007, 21:07:21

On Sun, 20 May 2007, Shachar Shemesh wrote:

> This is not data given to store. It's data being exported.

Data being exported has a funny way of turning around and being stored in 
the database again.  It's kind of nice to know the damage done during that 
round trip is minimized.

> Tom seems to think this is not a goal (though, aside from his disbelief
> that such a goal is attainable, I have heard no arguments against it).

If Tom thinks it's not attainable, the best way to convince him otherwise 
would be demonstrate that it's not.  From here, it looks like your 
response to his concerns for the pitfalls he pointed out has been waving 
your hands and saying "no, that can't really be a problem" while making it 
clear you haven't dug into the details.  One reason people use text 
formats for cross-platform exchanges is that getting portable binary 
compatibility for things like floating point numbers is much harder than 
you seem to think it is.

Stepping back for a second, your fundamental argument seem to be based on 
the idea that doing conversions to text is such a performance issue in a 
driver that it's worth going through these considerable contortions to 
avoid it.  Given how many other places performance can be throttled along 
that path, that itself is a position that requires defending nowadays. 
In the typical driver-bound setups I work with, there's plenty of CPU time 
to burn for simple data conversion work because either the network wire 
speed or the speed of the underlying database I/O are the real 
bottlenecks.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

21 May 2007, 02:13:56

Greg Smith wrote:
> On Sun, 20 May 2007, Shachar Shemesh wrote:
>
>> This is not data given to store. It's data being exported.
>
> Data being exported has a funny way of turning around and being stored
> in the database again.  It's kind of nice to know the damage done
> during that round trip is minimized.
I agree. All I'm asking, and have not received an answer yet, is whether
assuring that we don't have any SEMANTIC damage is enough.

In other words, if I can assure that data exported and then imported
will always, under all circumstances, compare the same to the original,
would that be enough of a requirement? In other words, if I offer a
format that is assured of preserving both mantissa and exponent
precision and range, as well as all extra attributes (+/-Infinity and
NaN), but does not guarantee that the semantically identical constructs
are told apart (+0 vs. -0, and the different NaNs), would that format be
acceptable?
>
>> Tom seems to think this is not a goal (though, aside from his disbelief
>> that such a goal is attainable, I have heard no arguments against it).
>
> If Tom thinks it's not attainable, the best way to convince him
> otherwise would be demonstrate that it's not.
Granted. That's why I've been quite. I'm pulling my sources for the ARM
FP format details, to make sure what I have in mind would work.
> One reason people use text formats for cross-platform exchanges is
> that getting portable binary compatibility for things like floating
> point numbers is much harder than you seem to think it is.
I'll just point out that none of the things that Tom seems to be
concerned about are preserved over text format.
>
> Stepping back for a second, your fundamental argument seem to be based
> on the idea that doing conversions to text is such a performance issue
> in a driver that it's worth going through these considerable
> contortions to avoid it.
Converting to text adds a CPU overhead in both client and server, as
well as a network transmission overhead. Even if it's not determental to
performance, I'm wondering why insist on paying it.

You are right that I offered no concrete implementation. I'll do it now,
but it is dependent on an important question - what is the range for the
ARM floating point. Not having either an ARM to test it on, nor the
floating point specs, it may be that a simpler implementation is
possible. I offer this implementation up because I see people think I'm
talking up my ass.

A 64 bit IEEE float can distinguish between almost all 2^64 distinct
floats. It loses two combinations for the + and - infinity, one
combination for the dual zero notation, and we also lose all of the
NaNs, which means (2^mantissa)-2 combinations. Over all, an n bit IEEE
float with m bits of mantissa will be able to represent 2^n - 2^m - 1
actual floating point numbers.

That means that if we take a general signed floating point number, of
which representation we know nothing but the fact it is n bits wide, and
that it has a mantissa and an exponent, and we want to encode it as an
IEEE number of the same width with mantissa size m and exponent of size
e=n-m-1, we will have at most 2^m+1 unrepresentable numbers.

In a nutshell, what I suggest is that we export floating points in
binary form in IEEE format, and add a status word to it. The status word
with dictate how many bits of mantissa there are in the IEEE format,
what the exponent bias is, as well as add between one and two bits to
the actual number, in case the number of floats the exported platform
has is larger than the number of floats that can be represented in IEEE
with the same word length.

The nice thing about this format is that exporting from an IEEE platform
is as easy as exporting the binary image of the float, plus a status
word that is a constant. Virtually no overhead. Importing from an IEEE
platform to an IEEE platform is, likewise, as easy as comparing the
status word to your own constant, and if they match, just copy the
binary. This maintains all of Tom's strict round trip requirements. In
fact, for export/import on the same IEEE platform no data conversion of
any kind takes place at all.

There are questions that need to be answered. For example, what happens
if you try to import a NaN into a platform that has no such concept?
You'd have to put in a NULL or something similar. Similarly, how do you
import Infinity. These, however, are questions that should be answered
the same way for text imports, so there is nothing binary specific here.

I hope that, at least, presents a workable plan. As I said before, I'm
waiting for the specs for ARM's floating point before I can move
forward. If, as I suspect, ARM's range is even more limited, then I may
try and suggest a more compact export representation pending question of
whether we have any other platform that is non-IEEE, and what is the
situation there.

Shachar

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Peter Eisentraut

Date:

21 May 2007, 10:23:43

Am Montag, 21. Mai 2007 07:04 schrieb Shachar Shemesh:
> In other words, if I offer a
> format that is assured of preserving both mantissa and exponent
> precision and range, as well as all extra attributes (+/-Infinity and
> NaN), but does not guarantee that the semantically identical constructs
> are told apart (+0 vs. -0, and the different NaNs), would that format be
> acceptable?

I don't think so.  We have fought many years to get closer to IEEE 754 
conformance.  I don't think we want to throw that away without a good 
reason.  "I want to use the binary format but I don't like what it does." is 
not a good enough reason in my mind.  For one thing, I'd want to see some 
performance measurements.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Csaba Nagy

Date:

21 May 2007, 10:54:19

> In other words, if I can assure that data exported and then imported
> will always, under all circumstances, compare the same to the original,
> would that be enough of a requirement? In other words, if I offer a
> format that is assured of preserving both mantissa and exponent
> precision and range, as well as all extra attributes (+/-Infinity and
> NaN), but does not guarantee that the semantically identical constructs
> are told apart (+0 vs. -0, and the different NaNs), would that format be
> acceptable?

If you care about the +/- for +/-Infinity, you must also care about +/-0
too, so you get the right type of infinity if you divide with 0... so +0
and -0 are far from being semantically identical.

Cheers,
Csaba.

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

21 May 2007, 10:58:51

Csaba Nagy wrote:
> If you care about the +/- for +/-Infinity, you must also care about +/-0
> too, so you get the right type of infinity if you divide with 0... so +0
> and -0 are far from being semantically identical.
>
> Cheers,
> Csaba.
>
>   
My suggestion accommodates that.

Shachar

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

21 May 2007, 11:02:57

Peter Eisentraut wrote:

> "I want to use the binary format but I don't like what it does."
It doesn't do anything. The set of circumstances under which it is
useful has been limited, on purpose, and, as far as I can see, without
any good reason. Spending not much time (and I suggested to spend it
myself, so it's not like I'm asking anyone to do this for me) this
format can be made to allow transfer of data between installations. Why
wouldn't we want that?

> We have fought many years to get closer to IEEE 754 
> conformance.

Please notice that the format I offered *is* IEEE. In fact, what I'm
offering is to export the binary in IEEE format EVEN IF THE NATIVE
FORMAT ISN'T.

As for the ARM architecture, I've pulled my sources, and the answer is
this: ARM doesn't have one standard floating point format. Different ARM
architectures will use different formats. Most architectures will
actually use IEEE, but some will use decimal based and such. According
to my source (a distributor of ARM based hardware), none of the other
formats will lose precision if translated to IEEE.

So, I will repeat my original question. I can write portable code that
will translate the native format to IEEE (if it's not already the same).
It seems that it will be good enough for all platforms discussed here.
Failing that, we can adopt my later proposal which is IEEE + status for
all places where that is good enough.

Shachar

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

"Kevin Grittner"

Date:

21 May 2007, 20:20:00

>>> On Mon, May 21, 2007 at  9:02 AM, in message <4651A689.6010909@shemesh.biz>,
Shachar Shemesh <shachar@shemesh.biz> wrote:
>
>> We have fought many years to get closer to IEEE 754
>> conformance.
>
> Please notice that the format I offered *is* IEEE. In fact, what I'm
> offering is to export the binary in IEEE format EVEN IF THE NATIVE
> FORMAT ISN'T.
>
> As for the ARM architecture, I've pulled my sources, and the answer is
> this: ARM doesn't have one standard floating point format. Different ARM
> architectures will use different formats. Most architectures will
> actually use IEEE, but some will use decimal based and such. According
> to my source (a distributor of ARM based hardware), none of the other
> formats will lose precision if translated to IEEE.
>
> So, I will repeat my original question. I can write portable code that
> will translate the native format to IEEE (if it's not already the same).
> It seems that it will be good enough for all platforms discussed here.
> Failing that, we can adopt my later proposal which is IEEE + status for
> all places where that is good enough.
It sounds to me like there are two issues:
(1)  How do you get things to work under the current communications protocol?
(2)  Should PostgreSQL consider moving toward a platform independent binary protocol in some future release?
Based on what Tom and others have said, you need to resort to text representation for portability with the current
protocol. You might be surprised at how minimal the impact is, especially if the CPUs aren't saturated. 
Clearly a platform independent protocol is possible.  I send binary information between machines with different
hardwareand operating systems all the time.  A big question for any PostgreSQL implementation of this has been whether
anyof the internal representations used on supported platforms are incapable of IEEE representation without data loss.
You'veasserted that you've done research which shows compatibility.  Can anyone show a counter-example, where IEEE
representationon the wire would not work? 
For reference on how Java has addressed this issue for floating point numbers and how they go over the wire, see:
http://java.sun.com/docs/books/jvms/second_edition/html/Concepts.doc.html#33377
http://java.sun.com/docs/books/jvms/second_edition/html/Overview.doc.html#28147
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Double.html#doubleToLongBits(double)
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Double.html#doubleToRawLongBits(double)
http://java.sun.com/j2se/1.5.0/docs/api/java/io/DataOutput.html#writeLong(long)
-Kevin

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Tom Lane

Date:

21 May 2007, 20:40:04

Shachar Shemesh <shachar@shemesh.biz> writes:
> As for the ARM architecture, I've pulled my sources, and the answer is
> this: ARM doesn't have one standard floating point format. Different ARM
> architectures will use different formats.

So how will you know which one is in use, which I'd think you'd need to
know to translate it?

> Most architectures will
> actually use IEEE, but some will use decimal based and such. According
> to my source (a distributor of ARM based hardware), none of the other
> formats will lose precision if translated to IEEE.

Your source appears fairly ignorant of things-float.  If they really are
using decimal FP, it's easy to demonstrate that a lossless conversion
to/from binary representation of similar size is impossible.  The set of
exactly representable values is simply different.

I have no objection to standardizing on IEEE-on-the-wire if you can
prove that's a superset of everything else.  Unfortunately, you can't,
because it isn't.  As soon as you get into lossy conversions, you might
as well use text and avoid the issue.
        regards, tom lane

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Tom Lane

Date:

22 May 2007, 00:58:50

Shachar Shemesh <shachar@shemesh.biz> writes:
> As for the ARM architecture, I've pulled my sources, and the answer is
> this: ARM doesn't have one standard floating point format. Different ARM
> architectures will use different formats. Most architectures will
> actually use IEEE, but some will use decimal based and such.

Okay, I spent some time googling this question, and I can't find any
suggestion that any ARM variant uses non-IEEE-compliant float format.
What *is* real clear is that depending on ARM model and a run time (!)
CPU endianness flag, there are three or four different possibilities
for the endianness of the data, including a PDP-endian-like alternative
in which the order of the high and low words is at variance with the
order of bytes within the words.  (Pardon me while I go vomit...)

The intent of the FP binary I/O code we have is that for platforms
supporting IEEE-compliant FP formats, the on-the-wire representation
should be uniformly big-endian, same as is true for integers.  So
I would concur with a patch that ensures that this is what happens
on the different ARM variants ... though I'll still be interested
to see how you make that happen given the rather poor visibility
into which model and endianness we are running on.

PS: Of course this does not resolve the generic issue of what to do
with platforms that have outright non-IEEE-format floats.  But at the
moment I don't see evidence that we need reach that issue for ARM.

PPS: I'm sort of wondering if the PDP-endian business doesn't afflict
int8 too on this platform.
        regards, tom lane

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Martijn van Oosterhout

Date:

22 May 2007, 03:56:44

On Mon, May 21, 2007 at 11:58:35PM -0400, Tom Lane wrote:
> The intent of the FP binary I/O code we have is that for platforms
> supporting IEEE-compliant FP formats, the on-the-wire representation
> should be uniformly big-endian, same as is true for integers.  So
> I would concur with a patch that ensures that this is what happens
> on the different ARM variants ... though I'll still be interested
> to see how you make that happen given the rather poor visibility
> into which model and endianness we are running on.

Well, I have an idea how you might do this: figure out the ordering of
the float at runtime. You can easily construct a float with any given
bit pattern. You can then examine the bytes to determine the order and
build a mapping table to reorder them. The program below creates a
float with the bit pattern 01020304. You can then examine the bits of
the float to determine the rearranging needed. You could do the same
for 64-bit floats.

This is obviously only needed for systems where the order can't be
determined at compile time.

ldexp is in SVr4, 4.3BSD and C89.

#include <stdio.h>
#include <math.h>

int main()
{   float f = ldexp(1.0,-119) + ldexp(1.0,-125) + ldexp(1.0,-126) + ldexp(1.0,-133) + ldexp(1.0,-142);   unsigned char
*a= (char*)&f;   printf("Float: %g, char: %02x%02X%02X%02X\n", f, a[0], a[1], a[2], a[3]); 
}

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

22 May 2007, 11:15:07

Tom Lane wrote:
> Okay, I spent some time googling this question, and I can't find any
> suggestion that any ARM variant uses non-IEEE-compliant float format.
> What *is* real clear is that depending on ARM model and a run time (!)
> CPU endianness flag, there are three or four different possibilities
> for the endianness of the data, including a PDP-endian-like alternative
> in which the order of the high and low words is at variance with the
> order of bytes within the words.  (Pardon me while I go vomit...)
>
Welcome to the wonderful world of embedded CPUs. These buggers will do
ANYTHING, and I do mean anything, in order to squeeze a little more
performance with a little less power consumption, while keeping the end
price tag under 10$. The ARM9, for example, can switch, on the fly,
between 32 and 16 bit machine language in order to save a few bytes in
code size and gain a few MIPS in execution speed.

As an amusing side note, I have heard a claim that the only reason we
need endianity at all is because the Europeans didn't understand that
Arabic is written from right to left. In Arabic you read "17" as "seven
and ten", which means that it is already little endian. Just one
request, please don't quote this story without also mentioning that this
story is wrong, and that 1234 is said, in Arabic, as "one thousand two
hundred four and thirty".

Mixed endianity is usually relic of a 16bit processor that was enhanced
to 32bit. The parts that were atomic before would be big endian, but the
parts that the old CPU required to do in separate operations are stored
low to high.
> So
> I would concur with a patch that ensures that this is what happens
> on the different ARM variants ... though I'll still be interested
> to see how you make that happen given the rather poor visibility
> into which model and endianness we are running on.
>
You do it semantically. Attached is the outline for the code (I can form
a patch only after we agree where it should go)
I should note a few things:
On IEEE platforms, the code will, of course, translate to/from the same
format. This can be verified by the dump at the end.
I have tested the code on several numbers, and it does work for normal
and for denormalized numbers. I have not tested whether the detection
whether we should generate one or the other actually works, so there may
be an off by one there.
The are a few corner cases that are not yet handled. Two are documented
(underflow and rounding on denormalized numbers). There is one
undocumented, of overflow.
The IEEE -> native code is not yet written, but I think it should be
fairly obvious how it will look once it is.
There is also a function in the code called "calcsize". It's the
beginning of a function to calculate the parameters for the current
platform, again, without knowing the native format. I was thinking of
putting it in the "configure" test, except, of course, the platforms we
refer to are, typically, ones for which you cross compile. See below.

Comments welcome.
> PS: Of course this does not resolve the generic issue of what to do
> with platforms that have outright non-IEEE-format floats.  But at the
> moment I don't see evidence that we need reach that issue for ARM.
>
The code above does detect when the float isn't being precisely
represented by the IEEE float. We could have another format for those
cases, and distinguish between the cases on import by testing its size.
> PPS: I'm sort of wondering if the PDP-endian business doesn't afflict
> int8 too on this platform.
>
It's likely. I would say that a configure test would be the best way to
test it, but I suspect that most programs for ARM are cross compiled.
I'm not sure how to resolve that. Maybe if there's a way to
automatically test what gets into memory when you let the compiler
create the constant 0123456789abcdef. At least for smaller than 8 bytes,
the "hton" functions SHOULD do the right thing always.

I COULD go back to my source (he's on vacation until Sunday anyways),
but I'll throw in a guess. Since the ARMs (at least the 7 and the 9) are
not 64 bit native, it's compiler dependent. There are two main compilers
for the ARM, with one of them being gcc. That's, more or less, where my
insights into this end.

Shachar
#include <stdio.h>
#include <limits.h>
#include <math.h>
#include <assert.h>

// What type would we be working on?
#if 1

// Double
#define TYPE double
#define FRAC_BITS 52
#define EXP_BITS 11
#define EXP_BIAS 1023

#else

// Float
#define TYPE float
#define FRAC_BITS 23
#define EXP_BITS 8
#define EXP_BIAS 127

#endif

union fp {
   TYPE flt;
   struct {
      unsigned long low;
      unsigned long high;
   } i;
   unsigned long long l;
   struct {
      unsigned long long int frac:FRAC_BITS;
      unsigned long long int exp:EXP_BITS;
      unsigned long long int sign:1;
   } fp;
};

void dumpnum( TYPE n )
{
   union fp val;
   val.flt=n;
   val.fp.sign=0;
   val.fp.exp=0x7ff;
   val.fp.frac=12;

   printf("%g %08x%08x\n", val.flt, val.i.high, val.i.low );
   printf("Sign: %d, exp: %d(%d), frac: %013x\n", val.fp.sign, val.fp.exp, val.fp.exp-EXP_BIAS, val.fp.frac );
}

int calcsize(volatile TYPE v)
{
   /* Find out the current mantissa, exponent and other attributes size */

   /* Find out the number of bits in the mantissa */
   int numbits=0;

   for( v=0; v!=2; numbits++ )
   {
      v/=2;
      v+=1;

      printf("v %.20g\n", v);
   }

   printf("Number of bits in the mantissa: %d\n", numbits );

   return 0;
}

int main()
{
   TYPE v=0;
   assert(__FLT_RADIX__==2); // We can get around this limitation, but not easilly

   union fp res;

   printf("%.20g\n", v);

   // Copy the sign bit over. signbit is only defined in C99
   res.fp.sign=signbit(v)!=0?1:0;

   // Explicitly check for NaN and Infinity. fpclassify is C99 only
   switch(fpclassify(v)) {
   case FP_NAN:
      res.fp.exp=~0;
      res.fp.frac=1;
      break;
   case FP_INFINITE:
      res.fp.exp=~0;
      res.fp.frac=0;
      break;
   case FP_ZERO:
      res.fp.exp=0;
      res.fp.frac=0;
      break;
   default:
      // This is a number. We don't want to assume that denormalized numbers on the platform will also be
      // denormalized in IEEE and vice versa, so we detect it in a different way
      {
         int exp;
         TYPE frac=frexp( v, &exp );

         // Accumolate the mantissa here
         unsigned long long mantissa=0;
         int currentbit=FRAC_BITS-1;

         int denormal=0;

         // Is the number in the denormalized area?
         if( exp<-EXP_BIAS ) {
            denormal=1;
            currentbit-=-(EXP_BIAS-1)-exp;

            if( currentbit<0 )
               // We have an underflow here!
               // XXX What to do?
               return 1;

            res.fp.exp=0;
         } else {
            // We need a leading 1, we have a leading zero
            frac*=__FLT_RADIX__;
            frac-=1;
            exp--;
            res.fp.exp=exp+EXP_BIAS;
         }

         while(frac!=0 && currentbit>=0) {
            frac*=2; // We no longer work with the native radix
            if( frac>=1 ) {
               frac-=1;
               mantissa|=1ll<<currentbit;
            }

            currentbit--;
         }

         if( frac!=0 ) {
            // We failed to provide an accurate representation. Round the result

            // Will the rounding overflow?
            if( ~mantissa==0 ) {
               mantissa=0; // XXX - bug here in case original is denormalized
               exp++;
            } else {
               mantissa+=1;
            }
         }

         res.fp.frac=mantissa;
      }
      break;
   }

   {
      union fp old;

      old.flt=v;

      printf("Original: sign %d, exp %d, frac %014llx %.20g\n", old.fp.sign, old.fp.exp, old.fp.frac, old.flt );
      printf("New: sign %d, exp %d, frac %014llx %.20g\n", res.fp.sign, res.fp.exp, res.fp.frac, res.flt );
   }


   return 0;
}

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Martijn van Oosterhout

Date:

22 May 2007, 13:56:38

On Tue, May 22, 2007 at 05:14:54PM +0300, Shachar Shemesh wrote:
> As an amusing side note, I have heard a claim that the only reason we
> need endianity at all is because the Europeans didn't understand that
> Arabic is written from right to left. In Arabic you read "17" as "seven
> and ten", which means that it is already little endian. Just one
> request, please don't quote this story without also mentioning that this
> story is wrong, and that 1234 is said, in Arabic, as "one thousand two
> hundred four and thirty".

For the record, dutch works like too, which leads to a fascinating way
of reading phone numbers. 345678 becomes: four and thirty, six and
fifty, eight and seventy.

Takes a while to get used to that...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Tom Lane

Date:

22 May 2007, 14:29:56

Shachar Shemesh <shachar@shemesh.biz> writes:
> Tom Lane wrote:
>> I would concur with a patch that ensures that this is what happens
>> on the different ARM variants ... though I'll still be interested
>> to see how you make that happen given the rather poor visibility
>> into which model and endianness we are running on.
>> 
> You do it semantically. Attached is the outline for the code (I can form
> a patch only after we agree where it should go)

Cross-compile situations make life interesting.

[ hold your nose before reading further... ]

After studying how AC_C_BIGENDIAN does it, I propose that the best
answer might be to compile a test program that contains carefully-chosen
"double" constants, then grep the object file for the expected patterns.
This works as long as the compiler knows what format it's supposed to
emit (and if it doesn't, lots of other stuff will fall over).

The only alternative that would work reliably is to run the test once
when the result is first needed, which is kind of unfortunate because
it involves continuing runtime overhead (at least a "switch" on every
conversion).  We in fact did things that way for integer endianness
awhile back, but since we are now depending on AC_C_BIGENDIAN to get
it right, I'd feel more comfortable using a similar solution for float
endianness.
        regards, tom lane

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

"Andrej Ricnik-Bay"

Date:

22 May 2007, 22:13:40

On 5/23/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
> > As an amusing side note, I have heard a claim that the only reason we
> > need endianity at all is because the Europeans didn't understand that
> > Arabic is written from right to left. In Arabic you read "17" as "seven
> > and ten", which means that it is already little endian. Just one
> > request, please don't quote this story without also mentioning that this
> > story is wrong, and that 1234 is said, in Arabic, as "one thousand two
> > hundred four and thirty".
> For the record, dutch works like too,
Same for German and Slovene.
"Ein tausend zwei hundert vier und dreissig."
"Tisoch dvesto shtiri in trideset." (sorry, can't produce the s and c
with the hacek
trivially here, replaced it with a sh and ch respectively ... ).


Cheers,
Andrej

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

23 May 2007, 01:20:09

Please note - I'm not trying to pick up a fight.

Tom Lane wrote:
>
> Your source appears fairly ignorant of things-float.
That is possible, and even likely, however
> If they really are
> using decimal FP, it's easy to demonstrate that a lossless conversion
> to/from binary representation of similar size is impossible. The set of
> exactly representable values is simply different.
When I originally read this statement my initial response was *dough*.
After having time to sleep over it, however, I'm no longer as certain as
I was.

Before you explode at me (again :), I'm not arguing that you can do
binary based calculations of decimal numbers without having rounding
errors that come to bite you. I know you can't. What I'm saying is that
we have two cases to consider. In one of them the above is irrelevant,
and in the other I'm not so sure it's true.

The first case to consider is that of the client getting a number from
the server and doing calculations on it. Since the client works in base
2, the inaccuracies are built into the model no matter what we'll do and
how we export the actual number. As such, I don't think we need worry
about it. If the client also works in base 10, see the second case.

The second case is of a number being exported from the server, stored in
binary (excuse the pun) format on the client, and then resent back to
the server, where it is translated from base 2 to base 10 again. You
will notice that no actual calculation will be performed on the number
while in base 2. The only question is whether the number, when
translated to base 2 and then back to base 10 is guaranteed to maintain
its original value.

I don't have a definite answer to that, but I did calculate the
difference in representation. A 64 bit IEEE floating point has 1 bit of
sign, 52 bit of mantissa and 11 bit of exponent. The number actually has
53 bits of mantissa for non-denormalized numbers, as there is another
implied "1" at the beginning. I'm going to assume, however, that all
binary numbers are denormalized, and only use 52. I'm allowed to assume
that for two reasons. The first is that it decreases the accuracy of the
base 2 representation, and thus makes my own argument harder to prove.
If I can prove it under this assumption, it's obvious that it's still
going to hold true with an extra bit of accuracy.

The second reason I'm going to assume it is because I don't see how we
can have "normalized" numbers under the base 10 representation. The
assumed "1" is there because a base 2 number will have to have a leading
"1" somewhere, and having it at the start will give best accuracy. The
moment the leading number can be 1-9, it is no longer possible to assume
it. In other words, I don't see how a base 10 representation can assume
that bit, and it is thus losing it. Since this assumption may be wrong,
I am "penalizing" the base 2 representation as well to compensate.

To recap, then. With base 2 we have 52 bits of mantissa, which will get
us as high as 4,503,599,627,370,500 combinations. These will have an
effective exponent range (not including denormalized numbers) of 2,048
different combinations, which can get us (let's assume no fractions on
both bases) as high as 2^2048, or 616.51 decimal digits.

With decimal representation, each 4 bits are one digit, so the same 52
bits account for 13 digits, giving 10,000,000,000,000 possible
mantissas, with an exponent range of 11 bits, but raised to the power of
10, so resulting in a range of 2048 decimal digits.

Of course, we have no use for such a huge exponent range with such small
mantissa, so we are likely to move bits from the exponent to the
mantissa. Since we have no use for fractions of a decimal digit, we will
move the bits in multiples of 4. I'm going now to assume an absurd
assumption. I'll assume we move 8 bits from the exponent to the
mantissa. This leaves us with only three bits of exponent, which will
only cover 8 decimal digits, but give us 60 bits, or 15 decimal digits
in the mantissa, or a range of 1,000,000,000,000,000 numbers. Please
note that the base 2 representation still has 4.5 times more mantissas
it can represent using only 52 bits.

So what have we got so far? A 64 bit decimal based floating point can
give up almost all of its exponent in order to create a mantissa that
has, roughly, the same range as the base 2, and still be outnumbered by
2.17 bits worth ASSUMING WE DON'T USE THE IMPLIED BIT IN THE BASE 2
REPRESENTATION.

Now, I suggest that even with "just" 2.17 bits extra, the binary
representation will be accurate enough to hold the approximation of the
decimal number to such precision that the back and forth translation
will reliably produce the original number. Of course, if we do use the
extra bit, it's 3.17 bits extra. If we don't give up 8, but only 4 bits
from the exponent, we now have 6.49 bits extra (5.49 if you want the
above assumption), while having an exponent range of only 128 decimal
digits (as opposed to 616 with IEEE).

Now, I am by no means as knowledgeable about these things as Tom, so it
is possible that rounding considerations will STILL cause us to lose
precision. I'm just claiming that the safety margins we have are quite wide.

Shachar

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Tom Lane

Date:

23 May 2007, 11:12:07

Shachar Shemesh <shachar@shemesh.biz> writes:
> Before you explode at me (again :), I'm not arguing that you can do
> binary based calculations of decimal numbers without having rounding
> errors that come to bite you. I know you can't. What I'm saying is that
> we have two cases to consider. In one of them the above is irrelevant,
> and in the other I'm not so sure it's true.

You're setting up a straw-man argument, though.  The real-world problem
cases here are not decimal, they are non-IEEE binary floating
arithmetic.  The typical difference from IEEE is slightly different
tradeoffs in number of mantissa bits vs number of exponent bits within a
32- or 64-bit value.  I seem to recall also that there are machines that
treat the exponent as power-of-16 not power-of-2.  So depending on which
way the tradeoffs went, the other format will have either more precision
or more range than IEEE.
        regards, tom lane

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Shachar Shemesh

Date:

24 May 2007, 02:28:06

Tom Lane wrote:
> Shachar Shemesh <shachar@shemesh.biz> writes:
>   
>> Before you explode at me (again :), I'm not arguing that you can do
>> binary based calculations of decimal numbers without having rounding
>> errors that come to bite you. I know you can't. What I'm saying is that
>> we have two cases to consider. In one of them the above is irrelevant,
>> and in the other I'm not so sure it's true.
>>     
>
> You're setting up a straw-man argument, though.
I was answering your claim that it's impossible to convert decimal to
binary based floats without rounding errors.
>   The real-world problem
> cases here are not decimal, they are non-IEEE binary floating
> arithmetic.  The typical difference from IEEE is slightly different
> tradeoffs in number of mantissa bits vs number of exponent bits within a
> 32- or 64-bit value.
I answered that elsewhere while suggesting a different format that would
address that. These numbers do not appear to be a concern in our
situation, however.
>   I seem to recall also that there are machines that
> treat the exponent as power-of-16 not power-of-2.
I'm pretty sure I don't understand this. Maybe I misunderstood the
format, but wouldn't that actually lose you precision with, at most,
marginal gain in range? As far as I can see, the moment you no longer
work in base 2 you lose the implicit bit, which means you have a one bit
less starting point than base 2 notations (all number are denormalized).
>   So depending on which
> way the tradeoffs went, the other format will have either more precision
> or more range than IEEE.
>   
Again, should that be a real concern, see my message at
http://archives.postgresql.org/pgsql-hackers/2007-05/msg00892.php for
details about what the suggestion actually is. Just be sure to read
"IEEE" there as meaning "IEEE like". I allowed different sizes for the
fields.
>             regards, tom lane
>   
Shachar

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Tom Lane

Date:

24 May 2007, 03:04:31

Shachar Shemesh <shachar@shemesh.biz> writes:
> Tom Lane wrote:
>> I seem to recall also that there are machines that
>> treat the exponent as power-of-16 not power-of-2.

> I'm pretty sure I don't understand this. Maybe I misunderstood the
> format, but wouldn't that actually lose you precision with, at most,
> marginal gain in range?

I didn't say it was *better*, only that it was *different*.

Some marginal googling suggests that only IBM made this choice, and that
they now (thirty-some years later) concede the IEEE format is better.
Still, if we're running on s/390 hardware ... or ARM, or whatever ...
it behooves us not to lose data.  I'm fine with doing byte-swapping to
present a uniform face for platforms that are at bottom implementing the
same FP format; but I've got a problem with trying to force
noncompatible platforms into that format.
        regards, tom lane

Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

From

Peter Eisentraut

Date:

18 June 2007, 11:31:03

Am Dienstag, 22. Mai 2007 05:58 schrieb Tom Lane:
> Okay, I spent some time googling this question, and I can't find any
> suggestion that any ARM variant uses non-IEEE-compliant float format.

Some news I'm picking up at DebConf is that the existing Debian "arm" port 
will be replaced by a new "armel" port with a new ABI that has, among other 
things, the effect that the double format ceases to be "completely different 
from everyone else".  So the problem under discussion here will probably go 
away soon.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/