Thread: float output precision questions
Hi All, I wrote a Matlab interface to PostgreSQL based on libpq. It is working fine, supports multiple connections, supports all data types and arrays. It is a C program to do the interface and some Matlab wrapper functions around it to do the job at application level. Matlab has an ODBC toolbox but we dont want to buy it since the interface does the job and we have been using PostgreSQL for long time. We want to use PostgreSQl to store numeric data from simulation, computation and data acquisition sources. Basically a big bunch of float numbers. There is still one problem, regarding float output formatting in querys and dumps, which essential for this type of application. If I have a float8 field (named real8 below) in a table and insert the following, insert into test(real8) values (4503599627370496); INSERT 21192 1 A select produces, select real8 from test; real8 --------------------- 4.5035996273705e+15 (1 row) This is the string I would get from libpq's PQgetvalue(). PQftype() correctly returns float8. pg_dump produces the same result. If I convert PQgetvalue() to a C double I wont get the same number I inserted. If I do a restore from a dump I also wont have the same number. I can see that the original number is correctly stored because, select to_char(real8,'9999999999999999999.99999') from test; to_char ---------------------- 4503599627370496 (1 row) This way PQftype wont tell this is a float8. Is there a way to set query output precision to maximum precision ? For the type of application I mentioned this is crucial. People want to get the 'same' numbers, from querys or dumps, as they inserted them. Matlab has a toolbox fucntion, claiming maximum precision, to convert from its double type (PostgreSQL float8) to string which does a sprintf('%25.18g',number). Would it be possible to have a a parameter which could be changed by a SET command, in order to control output precision ? I searched the docs but could not find this. -- ---------------------------------------------------------------------- Pedro Miguel Frazao Fernandes Ferreira Universidade do Algarve Faculdade de Ciencias e Tecnologia Campus de Gambelas 8000-117 Faro Portugal Tel./Fax: (+351) 289 800950 / 289 819403 http://w3.ualg.pt/~pfrazao
Sorry. I forgot to thank for any help from all of you in the previous message. Thanks! :) Just one more thing: I now I can go to the source and change the code which converts floats to strings, and have my problem solved. But this wont be general. Others might need this kind of application. I could post this interface to postgres interfaces site. Do you think this is ok ? Thanks again ! Pedro M. Ferreira Pedro Miguel Frazao Fernandes Ferreira wrote: > Hi All, > > I wrote a Matlab interface to PostgreSQL based on libpq. It is working > fine, supports multiple connections, supports all data types and arrays. > It is a C program to do the interface and some Matlab wrapper functions > around it to do the job at application level. > > Matlab has an ODBC toolbox but we dont want to buy it since the > interface does the job and we have been using PostgreSQL for long time. > We want to use PostgreSQl to store numeric data from simulation, > computation and data acquisition sources. Basically a big bunch of float > numbers. > > There is still one problem, regarding float output formatting in querys > and dumps, which essential for this type of application. > > If I have a float8 field (named real8 below) in a table and insert the > following, > > insert into test(real8) values (4503599627370496); > INSERT 21192 1 > > A select produces, > > select real8 from test; > real8 > --------------------- > 4.5035996273705e+15 > (1 row) > > This is the string I would get from libpq's PQgetvalue(). PQftype() > correctly returns float8. pg_dump produces the same result. If I convert > PQgetvalue() to a C double I wont get the same number I inserted. If I > do a restore from a dump I also wont have the same number. I can see > that the original number is correctly stored because, > > select to_char(real8,'9999999999999999999.99999') from test; > to_char > ---------------------- > 4503599627370496 > (1 row) > > This way PQftype wont tell this is a float8. > > Is there a way to set query output precision to maximum precision ? > For the type of application I mentioned this is crucial. People want to > get the 'same' numbers, from querys or dumps, as they inserted them. > > Matlab has a toolbox fucntion, claiming maximum precision, to convert > from its double type (PostgreSQL float8) to string which does a > sprintf('%25.18g',number). > > Would it be possible to have a a parameter which could be changed by a > SET command, in order to control output precision ? I searched the docs > but could not find this. > -- ---------------------------------------------------------------------- Pedro Miguel Frazao Fernandes Ferreira Universidade do Algarve Faculdade de Ciencias e Tecnologia Campus de Gambelas 8000-117 Faro Portugal Tel./Fax: (+351) 289 800950 / 289 819403 http://w3.ualg.pt/~pfrazao
TODO has: o Add SET REAL_FORMAT and SET DOUBLE_PRECISION_FORMAT using printf args so we have not implemented it yet. --------------------------------------------------------------------------- Pedro Miguel Frazao Fernandes Ferreira wrote: > Hi All, > > I wrote a Matlab interface to PostgreSQL based on libpq. It is working > fine, supports multiple connections, supports all data types and arrays. > It is a C program to do the interface and some Matlab wrapper functions > around it to do the job at application level. > > Matlab has an ODBC toolbox but we dont want to buy it since the > interface does the job and we have been using PostgreSQL for long time. > We want to use PostgreSQl to store numeric data from simulation, > computation and data acquisition sources. Basically a big bunch of float > numbers. > > There is still one problem, regarding float output formatting in querys > and dumps, which essential for this type of application. > > If I have a float8 field (named real8 below) in a table and insert the > following, > > insert into test(real8) values (4503599627370496); > INSERT 21192 1 > > A select produces, > > select real8 from test; > real8 > --------------------- > 4.5035996273705e+15 > (1 row) > > This is the string I would get from libpq's PQgetvalue(). PQftype() > correctly returns float8. pg_dump produces the same result. If I convert > PQgetvalue() to a C double I wont get the same number I inserted. If I > do a restore from a dump I also wont have the same number. I can see > that the original number is correctly stored because, > > select to_char(real8,'9999999999999999999.99999') from test; > to_char > ---------------------- > 4503599627370496 > (1 row) > > This way PQftype wont tell this is a float8. > > Is there a way to set query output precision to maximum precision ? > For the type of application I mentioned this is crucial. People want to > get the 'same' numbers, from querys or dumps, as they inserted them. > > Matlab has a toolbox fucntion, claiming maximum precision, to convert > from its double type (PostgreSQL float8) to string which does a > sprintf('%25.18g',number). > > Would it be possible to have a a parameter which could be changed by a > SET command, in order to control output precision ? I searched the docs > but could not find this. > > -- > ---------------------------------------------------------------------- > Pedro Miguel Frazao Fernandes Ferreira > Universidade do Algarve > Faculdade de Ciencias e Tecnologia > Campus de Gambelas > 8000-117 Faro > Portugal > Tel./Fax: (+351) 289 800950 / 289 819403 > http://w3.ualg.pt/~pfrazao > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Pedro Miguel Frazao Fernandes Ferreira writes: > Is there a way to set query output precision to maximum precision ? > For the type of application I mentioned this is crucial. People want to > get the 'same' numbers, from querys or dumps, as they inserted them. There isn't a way right now, but it's planned to be able to dump floating-point numbers in some binary form (like printf("%A")) to be able to restore them exactly. Not sure how that would satisfy the needs of client interfaces, though. > Matlab has a toolbox fucntion, claiming maximum precision, to convert > from its double type (PostgreSQL float8) to string which does a > sprintf('%25.18g',number). Do we have some mathematical guarantee that this is sufficient and necessary? If so, then it might do. -- Peter Eisentraut peter_e@gmx.net
Bruce Momjian <pgman@candle.pha.pa.us> writes: > TODO has: > o Add SET REAL_FORMAT and SET DOUBLE_PRECISION_FORMAT > using printf args > so we have not implemented it yet. IIRC, the last time it was discussed there was disagreement about how it should work; check the pghackers archives for details. The feature probably won't go anywhere until those issues are resolved. regards, tom lane
On Tue, 29 Oct 2002, Peter Eisentraut wrote: > Pedro Miguel Frazao Fernandes Ferreira writes: > > > Is there a way to set query output precision to maximum precision ? > > For the type of application I mentioned this is crucial. People want to > > get the 'same' numbers, from querys or dumps, as they inserted them. How do you define maximum precision and "same"? With simple test programs in C, using two digits more than DBL_DIG for printf specifier, it's easy for me to find numbers that "change" decimal string representation in the decimal representation -> double -> decimal representation conversion(*). The final double you get from the second conversion should be the same as the first, but is that what you need or do you need a stronger guarantee than that?
On Tue, Oct 29, 2002 at 23:19:05 +0100, Peter Eisentraut <peter_e@gmx.net> wrote: > > There isn't a way right now, but it's planned to be able to dump > floating-point numbers in some binary form (like printf("%A")) to be able > to restore them exactly. Not sure how that would satisfy the needs of > client interfaces, though. Why not print it as a floating binary number instead of a floating decimal number? I would think that would give you better portability than a system specific binary representation.
Peter Eisentraut wrote: > >>Matlab has a toolbox fucntion, claiming maximum precision, to convert >>from its double type (PostgreSQL float8) to string which does a >>sprintf('%25.18g',number). > > > Do we have some mathematical guarantee that this is sufficient and > necessary? If so, then it might do. It is necessary if you want to do this type of (huge amount of number storage) application: [Some client] (insert) [PostgreSQL] (query) [Some client] (double number a)-------->(float8 number)------->(double number b) In order for a=b this is necessary. With current float8 output this is not allways true. Here is the help for that particular Matlab function: NUM2MSTR Convert number to string in maximum precision. S = NUM2MSTR(N) converts real numbers of input matrix N tostring output vector S, in maximum precision. See also NUM2STR. If you want I can try to contact the guys who coded this function to know if this is sufficient. Thanks, Pedro M. Ferreira -- ---------------------------------------------------------------------- Pedro Miguel Frazao Fernandes Ferreira Universidade do Algarve Faculdade de Ciencias e Tecnologia Campus de Gambelas 8000-117 Faro Portugal Tel./Fax: (+351) 289 800950 / 289 819403 http://w3.ualg.pt/~pfrazao
Stephan Szabo wrote: > On Tue, 29 Oct 2002, Peter Eisentraut wrote: > > >>Pedro Miguel Frazao Fernandes Ferreira writes: >> >> >>>Is there a way to set query output precision to maximum precision ? >>>For the type of application I mentioned this is crucial. People want to >>>get the 'same' numbers, from querys or dumps, as they inserted them. >> > > How do you define maximum precision and "same"? With simple test programs > in C, using two digits more than DBL_DIG for printf specifier, it's easy > for me to find numbers that "change" decimal string representation in the > decimal representation -> double -> decimal representation conversion(*). > The final double you get from the second conversion should be the same as > the first, but is that what you need or do you need a stronger guarantee > than that? When I say "same" I am talking about having a number 'stored' in double format in some client, inserting it in PostgreSQL float8 field and get it to the client as it was before: [Some client] (insert) [PostgreSQL] (query) [Some client] (double number a)-------->(float8 number)------->(double number b) "same" is so that a==b is true. With current float8 output this is not allways true. I believe this should allways be true for numbers which are originally stored in double format. Thanks, Pedro M. Ferreira > > > > -- ---------------------------------------------------------------------- Pedro Miguel Frazao Fernandes Ferreira Universidade do Algarve Faculdade de Ciencias e Tecnologia Campus de Gambelas 8000-117 Faro Portugal Tel./Fax: (+351) 289 800950 / 289 819403 http://w3.ualg.pt/~pfrazao
Bruno Wolff III wrote: > On Tue, Oct 29, 2002 at 23:19:05 +0100, > Peter Eisentraut <peter_e@gmx.net> wrote: > >>There isn't a way right now, but it's planned to be able to dump >>floating-point numbers in some binary form (like printf("%A")) to be able >>to restore them exactly. Not sure how that would satisfy the needs of >>client interfaces, though. > > > Why not print it as a floating binary number instead of a floating decimal > number? I would think that would give you better portability than a system > specific binary representation. Having a way to get the binary storage representation for float numbers would be good and would suffice within the same float number standard. Thanks, Pedro M. Ferreira -- ---------------------------------------------------------------------- Pedro Miguel Frazao Fernandes Ferreira Universidade do Algarve Faculdade de Ciencias e Tecnologia Campus de Gambelas 8000-117 Faro Portugal Tel./Fax: (+351) 289 800950 / 289 819403 http://w3.ualg.pt/~pfrazao
On Wed, 30 Oct 2002, Pedro Miguel Frazao Fernandes Ferreira wrote: > Stephan Szabo wrote: > > On Tue, 29 Oct 2002, Peter Eisentraut wrote: > > > > > >>Pedro Miguel Frazao Fernandes Ferreira writes: > >> > >> > >>>Is there a way to set query output precision to maximum precision ? > >>>For the type of application I mentioned this is crucial. People want to > >>>get the 'same' numbers, from querys or dumps, as they inserted them. > >> > > > > How do you define maximum precision and "same"? With simple test programs > > in C, using two digits more than DBL_DIG for printf specifier, it's easy > > for me to find numbers that "change" decimal string representation in the > > decimal representation -> double -> decimal representation conversion(*). > > The final double you get from the second conversion should be the same as > > the first, but is that what you need or do you need a stronger guarantee > > than that? > > When I say "same" I am talking about having a number 'stored' in double > format in some client, inserting it in PostgreSQL float8 field and get > it to the client as it was before: > > [Some client] (insert) [PostgreSQL] (query) [Some client] > (double number a)-------->(float8 number)------->(double number b) > > "same" is so that a==b is true. > With current float8 output this is not allways true. > > I believe this should allways be true for numbers which are originally > stored in double format. The problem is that there are two competing needs here. One is the above, the other other is that you get something that has the same decimal representation (within the float's ability to store the number). Right now the system does the latter since for most people, that seems to be the guarantee they want. This would probably make sense as an option, so why don't you look at the past discussions and see if you can come up with a solution that keeps everyone happy (and preferably implement it, but...) :)
Stephan Szabo wrote: > On Wed, 30 Oct 2002, Pedro Miguel Frazao Fernandes Ferreira wrote: >>Stephan Szabo wrote: >>>On Tue, 29 Oct 2002, Peter Eisentraut wrote: >>>>Pedro Miguel Frazao Fernandes Ferreira writes: >>>>>Is there a way to set query output precision to maximum precision ? >>>>>For the type of application I mentioned this is crucial. People want to >>>>>get the 'same' numbers, from querys or dumps, as they inserted them. >>>> >>>How do you define maximum precision and "same"? With simple test programs >>>in C, using two digits more than DBL_DIG for printf specifier, it's easy >>>for me to find numbers that "change" decimal string representation in the >>>decimal representation -> double -> decimal representation conversion(*). >>>The final double you get from the second conversion should be the same as >>>the first, but is that what you need or do you need a stronger guarantee >>>than that? >> >>When I say "same" I am talking about having a number 'stored' in double >>format in some client, inserting it in PostgreSQL float8 field and get >>it to the client as it was before: >> >> [Some client] (insert) [PostgreSQL] (query) [Some client] >>(double number a)-------->(float8 number)------->(double number b) >> >>"same" is so that a==b is true. >>With current float8 output this is not allways true. >> >>I believe this should allways be true for numbers which are originally >>stored in double format. > > The problem is that there are two competing needs here. One is the above, > the other other is that you get something that has the same decimal > representation (within the float's ability to store the number). Right now > the system does the latter since for most people, that seems to be the > guarantee they want. Look at this example: 1.79769313486231571e+308 is the largest floating point number representable by a C double in x86. In C this is possible: #include <stdio.h> #include <stdlib.h> int main(void) { double v; char a[30]; v=1.79769313486231571e+308; printf(" Stored double number: %25.18g\n",v); sprintf(a,"%25.18g",v); printf(" Converted to string:%s\n",a); v=atof(a); printf("Converted from string to double: %25.18g\n",v); } Using standard PostgreSQL query output it would not be possible to get this number, which has representation as a double. I fetched the PostgreSQL source from Debian, changed src/backend/utils/adt/float.c to do sprintf(ascii, "%25.18g", num) instead of sprintf(ascii, "%.*g", DBL_DIG, num), compiled and installed. Now I can get the number as is. I understand that if people insert a value of 1.1 in a double, they want to get 1.1 without knowing that in fact the stored number is 1.10000000000000009. But do you understand that if some people insert, for example, a value of 1.79769313486231571e+308 they dont want to get 1.79769313486232e+308 which does not compare equal (in Matlab or C) to the first ? This is a bug. > This would probably make sense as an option, so why don't you look at the > past discussions and see if you can come up with a solution that keeps > everyone happy (and preferably implement it, but...) :) but ??? I have a sugestion: To have parameters, say DOUBLE_FORMAT and FLOAT_FORMAT, which could have option values of 'SHORT' and 'LONG'. Option 'SHORT' would be default and produce the standard sprintf(ascii,... Option 'LONG' would produce sprintf(ascii, "%25.18g", num). Other way would be to have number parameters to be used in the sprintf calls, in place of 25 and 18, in the format string. > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > > -- ---------------------------------------------------------------------- Pedro Miguel Frazao Fernandes Ferreira Universidade do Algarve Faculdade de Ciencias e Tecnologia Campus de Gambelas 8000-117 Faro Portugal Tel./Fax: (+351) 289 800950 / 289 819403 http://w3.ualg.pt/~pfrazao
Bruno Wolff III writes: > On Tue, Oct 29, 2002 at 23:19:05 +0100, > Peter Eisentraut <peter_e@gmx.net> wrote: > > > > There isn't a way right now, but it's planned to be able to dump > > floating-point numbers in some binary form (like printf("%A")) to be able > > to restore them exactly. Not sure how that would satisfy the needs of > > client interfaces, though. > > Why not print it as a floating binary number instead of a floating decimal > number? I would think that would give you better portability than a system > specific binary representation. The printf("%A") output is not system-specific. -- Peter Eisentraut peter_e@gmx.net
On Wed, 30 Oct 2002, Pedro Miguel Frazao Fernandes Ferreira wrote: > In C this is possible: > > #include <stdio.h> > #include <stdlib.h> > > int main(void) > { > double v; > char a[30]; > > v=1.79769313486231571e+308; > > printf(" Stored double number: %25.18g\n",v); > sprintf(a,"%25.18g",v); > printf(" Converted to string: %s\n",a); > v=atof(a); > printf("Converted from string to double: %25.18g\n",v); > } AFAICT, this is not guaranteed to give you the same representation that you're using in the initializer however. > Using standard PostgreSQL query output it would not be possible to get > this number, which has representation as a double. > I fetched the PostgreSQL source from Debian, changed > src/backend/utils/adt/float.c to do sprintf(ascii, "%25.18g", num) > instead of sprintf(ascii, "%.*g", DBL_DIG, num), compiled and installed. > Now I can get the number as is. > > I understand that if people insert a value of 1.1 in a double, they want > to get 1.1 without knowing that in fact the stored number is > 1.10000000000000009. But do you understand that if some people insert, > for example, a value of 1.79769313486231571e+308 they dont want to get > 1.79769313486232e+308 which does not compare equal (in Matlab or C) to > the first ? This is a bug. I disagree to some extent. I'm not sure it's meaningful to expect that (what if the database and the client are on different architectures) in general. In any case, you're effectively going from decimal representation to double to decimal representation (the string you used to insert it -> internal representation -> string used to output it) and that's only guaranteed to be correct up to DBL_DIG digits as far as I can tell. I think it'd be nice to have an option to get more digits for those sorts of applications, however. > > This would probably make sense as an option, so why don't you look at the > > past discussions and see if you can come up with a solution that keeps > > everyone happy (and preferably implement it, but...) :) > > but ??? , but I realize that you might not be interested in doing such. (I figured the last part was implied) > I have a sugestion: > > To have parameters, say DOUBLE_FORMAT and FLOAT_FORMAT, which could have > option values of 'SHORT' and 'LONG'. > Option 'SHORT' would be default and produce the standard sprintf(ascii,... > Option 'LONG' would produce sprintf(ascii, "%25.18g", num). > > Other way would be to have number parameters to be used in the sprintf > calls, in place of 25 and 18, in the format string. From what Tom said, something similar was suggested and there were issues brought up. I don't know what they were, since I wasn't personally terribly interested, but it should be in the archives. If there were any concerns, you'll probably need to deal with those as well.
On Wed, Oct 30, 2002 at 19:27:57 +0100, Peter Eisentraut <peter_e@gmx.net> wrote: > > The printf("%A") output is not system-specific. Just out of curiosity, can you tell me a web page or keywords to use in a search to see what that format does? I tried using google, but searching for printf with and "A" format didn't go to well. I only found one correct reference and it didn't explain what %A did. Thanks.
Stephan Szabo wrote: > On Wed, 30 Oct 2002, Pedro Miguel Frazao Fernandes Ferreira wrote: >> >>I understand that if people insert a value of 1.1 in a double, they want >>to get 1.1 without knowing that in fact the stored number is >>1.10000000000000009. But do you understand that if some people insert, >>for example, a value of 1.79769313486231571e+308 they dont want to get >>1.79769313486232e+308 which does not compare equal (in Matlab or C) to >>the first ? This is a bug. > > > I disagree to some extent. I'm not sure it's meaningful to expect that > (what if the database and the client are on different architectures) in > general. In any case, you're effectively going from decimal > representation to double to decimal representation (the string you used > to insert it -> internal representation -> string used to output it) and > that's only guaranteed to be correct up to DBL_DIG digits as far as I can > tell. I think it'd be nice to have an option to get more digits for those > sorts of applications, however. In the previous email example, in C, I was going from decimal to double and so on, but this is not the case when I do some simulation. In this case it will allways be from Matlab double to PostgreSQL float8 and from libpq PQgetvalue() string to Matlab double. The example was just a x86 number example where (got the string from Matlab double) query output would fail. >>>This would probably make sense as an option, so why don't you look at the >>>past discussions and see if you can come up with a solution that keeps >>>everyone happy (and preferably implement it, but...) :) >> >>but ??? > , but I realize that you might not be interested in doing such. (I figured > the last part was implied) ok! :) No problem. I have seen the GUC thing in the source (guc.c etc...) and it does not look too dificult. It has lots of examples in the code itself. What I am saying is that I can do it if pg-people agree on the (some) way to do it. >>To have parameters, say DOUBLE_FORMAT and FLOAT_FORMAT, which could have >> option values of 'SHORT' and 'LONG'. >>Option 'SHORT' would be default and produce the standard sprintf(ascii,... >>Option 'LONG' would produce sprintf(ascii, "%25.18g", num). >> >>Other way would be to have number parameters to be used in the sprintf >>calls, in place of 25 and 18, in the format string. > > >>From what Tom said, something similar was suggested and there were issues > brought up. I don't know what they were, since I wasn't personally > terribly interested, but it should be in the archives. If there were any > concerns, you'll probably need to deal with those as well. I looked at some of these emails and it seemed to me that the problem was that Tom did'nt want a parameter that would force people to know about printf number formatting. I think the first solution above (the SHORT and LONG way) is simple, maintains usual output as default and enables 'maximum' precision at request. > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > -- ---------------------------------------------------------------------- Pedro Miguel Frazao Fernandes Ferreira Universidade do Algarve Faculdade de Ciencias e Tecnologia Campus de Gambelas 8000-117 Faro Portugal Tel./Fax: (+351) 289 800950 / 289 819403 http://w3.ualg.pt/~pfrazao
On Wed, 30 Oct 2002, Pedro M. Ferreira wrote: > Stephan Szabo wrote: > > On Wed, 30 Oct 2002, Pedro Miguel Frazao Fernandes Ferreira wrote: > >> > >>I understand that if people insert a value of 1.1 in a double, they want > >>to get 1.1 without knowing that in fact the stored number is > >>1.10000000000000009. But do you understand that if some people insert, > >>for example, a value of 1.79769313486231571e+308 they dont want to get > >>1.79769313486232e+308 which does not compare equal (in Matlab or C) to > >>the first ? This is a bug. > > > > > > I disagree to some extent. I'm not sure it's meaningful to expect that > > (what if the database and the client are on different architectures) in > > general. In any case, you're effectively going from decimal > > representation to double to decimal representation (the string you used > > to insert it -> internal representation -> string used to output it) and > > that's only guaranteed to be correct up to DBL_DIG digits as far as I can > > tell. I think it'd be nice to have an option to get more digits for those > > sorts of applications, however. > > In the previous email example, in C, I was going from decimal to double > and so on, but this is not the case when I do some simulation. In this > case it will allways be from Matlab double to PostgreSQL float8 and from > libpq PQgetvalue() string to Matlab double. The example was just a x86 > number example where (got the string from Matlab double) query output > would fail. I was just responding to it being a bug. I don't think that expecting a float->db->float (double->db->double) giving the same double is always safe when you assume that the PostgreSQL machine might be on a system with different guarantees about precision. In practice, it's probably not a big deal. > > , but I realize that you might not be interested in doing such. (I figured > > the last part was implied) > > ok! :) > > No problem. I have seen the GUC thing in the source (guc.c etc...) and it > does not look too dificult. It has lots of examples in the code itself. > What I am saying is that I can do it if pg-people agree on the (some) > way to do it. Yeah, I didn't think it'd be hard, but sometimes people are unable or unwilling to do C code for things. > I looked at some of these emails and it seemed to me that the problem > was that Tom did'nt want a parameter that would force people to know > about printf number formatting. I think the first solution above (the > SHORT and LONG way) is simple, maintains usual output as default and > enables 'maximum' precision at request. That seems reasonable then, Tom'll probably give any other objections he might have if he has any.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > On Wed, 30 Oct 2002, Pedro M. Ferreira wrote: >> I looked at some of these emails and it seemed to me that the problem >> was that Tom did'nt want a parameter that would force people to know >> about printf number formatting. I think the first solution above (the >> SHORT and LONG way) is simple, maintains usual output as default and >> enables 'maximum' precision at request. > That seems reasonable then, Tom'll probably give any other objections he > might have if he has any. My recollection is that other people (perhaps Peter?) were the ones objecting before. However I'd be somewhat unhappy with the proposal as given: >>Option 'SHORT' would be default and produce the standard sprintf(ascii,... >>Option 'LONG' would produce sprintf(ascii, "%25.18g", num). since this seems to me to hardwire inappropriate assumptions about the number of significant digits in a double. (Yes, I know practically everyone uses IEEE floats these days. But it's inappropriate for PG to assume that.) AFAICT the real issue here is that binary float representations will have a fractional decimal digit of precision beyond what DBL_DIG claims. I think I could support adding an option that switches between the current output format:sprintf(ascii, "%.*g", DBL_DIG, num); and:sprintf(ascii, "%.*g", DBL_DIG+1, num); and similarly for float4. Given carefully written float I/O routines, reading the latter output should reproduce the originally stored value. (And if the I/O routines are not carefully written, you probably lose anyway.) I don't see a need for allowing more flexibility than that. Comments? regards, tom lane
On Wed, 30 Oct 2002, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > On Wed, 30 Oct 2002, Pedro M. Ferreira wrote: > >> I looked at some of these emails and it seemed to me that the problem > >> was that Tom did'nt want a parameter that would force people to know > >> about printf number formatting. I think the first solution above (the > >> SHORT and LONG way) is simple, maintains usual output as default and > >> enables 'maximum' precision at request. > > > That seems reasonable then, Tom'll probably give any other objections he > > might have if he has any. > > My recollection is that other people (perhaps Peter?) were the ones > objecting before. However I'd be somewhat unhappy with the proposal > as given: > > >>Option 'SHORT' would be default and produce the standard sprintf(ascii,... > >>Option 'LONG' would produce sprintf(ascii, "%25.18g", num). > > since this seems to me to hardwire inappropriate assumptions about the > number of significant digits in a double. (Yes, I know practically > everyone uses IEEE floats these days. But it's inappropriate for PG > to assume that.) True (which I actually was trying to get at in my messages as well). I'll admit to having not read the precise proposal. It's really pretty outside what I work with in any case. > AFAICT the real issue here is that binary float representations will > have a fractional decimal digit of precision beyond what DBL_DIG claims. > I think I could support adding an option that switches between the > current output format: > sprintf(ascii, "%.*g", DBL_DIG, num); > and: > sprintf(ascii, "%.*g", DBL_DIG+1, num); > and similarly for float4. Given carefully written float I/O routines, > reading the latter output should reproduce the originally stored value. > (And if the I/O routines are not carefully written, you probably lose > anyway.) I don't see a need for allowing more flexibility than that. Well, on my system, it doesn't look like doing the above sprintfs will actually work for all numbers. I did a simple program using an arbitrary big number and the DBL_DIG+1 output when stuck into another double actually was a different double value. DBL_DIG+2 worked on my system, but...
Tom Lane wrote: > My recollection is that other people (perhaps Peter?) were the ones > objecting before. However I'd be somewhat unhappy with the proposal > as given: > >>>Option 'SHORT' would be default and produce the standard sprintf(ascii,... >>>Option 'LONG' would produce sprintf(ascii, "%25.18g", num).> > since this seems to me to hardwire inappropriate assumptions about the > number of significant digits in a double. (Yes, I know practically > everyone uses IEEE floats these days. But it's inappropriate for PG > to assume that.) I understand this. Unfortunately I only have IEEE compliant stuff. > AFAICT the real issue here is that binary float representations will > have a fractional decimal digit of precision beyond what DBL_DIG claims. In fact, for some numbers I have been testing with, the double representation can distinguish up to DBL_BIG+2. > I think I could support adding an option that switches between the > current output format: > sprintf(ascii, "%.*g", DBL_DIG, num); > and: > sprintf(ascii, "%.*g", DBL_DIG+1, num); Easy to find numbers with double representation which would need DBL_BIG+2. > and similarly for float4. Given carefully written float I/O routines, > reading the latter output should reproduce the originally stored value. For some numbers it does not. Not true as I said above. > (And if the I/O routines are not carefully written, you probably lose > anyway.) I don't see a need for allowing more flexibility than that. Tests like a==b will fail for some numbers with DBL_BIG+1. Its like I said before, the guys from matlab (in x86 IEEE float) go to DBL_BIG+3 to have 'maximum precision'. > > Comments? Yes. I think there are several options. I checked the sprintf(ascii, "%A", num) output format and all the numbers that would fail because of DBL_DIG=15 are ok. After insertion on a table and conversion to double after a query, comparison a==b holds. AFAICT "%A" is system independent. I would (if I may) propose the following: Have two parameters, say DOUBLE_OUTPUT and EXTRA_DIGITS. DOUBLE_OUTPUT would select from decimal output or normalized output. EXTRA_DIGITS would add the required extra digits, from 0 (default) to 3, when output is decimal. EXTRA_DIGITS: in the range [0:3]. 0 as defualt. DOUBLE_OUTPUT: 'DECIMAL': sprintf(ascii, "%.*g", DBL_DIG+EXTRA_DIGITS, num); (default) 'NORMALIZED': sprintf(ascii, "%A", num); The same could be done for floats (float4). This way PG does not assume anything (DOUBLE_OUTPUT as 'NORMALIZED'), it does not hardwire 'inappropriate' assumptions about the number of significant digits in a double (default EXTRA_DIGITS=0), and it gives flexibility (EXTRA_DIGITS!=0) if needed. I think this is functional and reasonable. Regards, Pedro M. Ferreira > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > -- ---------------------------------------------------------------------- Pedro Miguel Frazao Fernandes Ferreira Universidade do Algarve Faculdade de Ciencias e Tecnologia Campus de Gambelas 8000-117 Faro Portugal Tel./Fax: (+351) 289 800950 / 289 819403 http://w3.ualg.pt/~pfrazao
Just one more note, Maybe it makes sense that in the proposal below the parameter EXTRA_DIGITS could be SIGNIFICANT_DIGITS with a default value of 15 and maximum 18. Its more 'documentable' and maybe easy to understand in general. Pedro M. Ferreira wrote: > Yes. I think there are several options. > I checked the sprintf(ascii, "%A", num) output format and all the > numbers that would fail because of DBL_DIG=15 are ok. After insertion on > a table and conversion to double after a query, comparison a==b holds. > AFAICT "%A" is system independent. > > I would (if I may) propose the following: > > Have two parameters, say DOUBLE_OUTPUT and EXTRA_DIGITS. DOUBLE_OUTPUT > would select from decimal output or normalized output. EXTRA_DIGITS > would add the required extra digits, from 0 (default) to 3, when output > is decimal. > > EXTRA_DIGITS: > in the range [0:3]. 0 as defualt. > > DOUBLE_OUTPUT: > > 'DECIMAL': sprintf(ascii, "%.*g", DBL_DIG+EXTRA_DIGITS, num); (default) > 'NORMALIZED': sprintf(ascii, "%A", num); > > The same could be done for floats (float4). > > This way PG does not assume anything (DOUBLE_OUTPUT as 'NORMALIZED'), it > does not hardwire 'inappropriate' assumptions about the number of > significant digits in a double (default EXTRA_DIGITS=0), and it gives > flexibility (EXTRA_DIGITS!=0) if needed. > I think this is functional and reasonable. > > Regards, > Pedro M. Ferreira > >> >> regards, tom lane >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 2: you can get off all lists at once with the unregister command >> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >> >> > > -- ---------------------------------------------------------------------- Pedro Miguel Frazao Fernandes Ferreira Universidade do Algarve Faculdade de Ciencias e Tecnologia Campus de Gambelas 8000-117 Faro Portugal Tel./Fax: (+351) 289 800950 / 289 819403 http://w3.ualg.pt/~pfrazao
> Maybe it makes sense that in the proposal below the parameter > EXTRA_DIGITS could be SIGNIFICANT_DIGITS with a default value > of 15 and maximum 18. > Its more 'documentable' and maybe easy to understand in general. Yes agree (or double_significant_digits or format_double_digits ?), but default to DBL_DIG and allow range between 1 and DBL_DIG + 3. format_* could be used for all future output format tweaks. Unfortunately %A is not portable :-( Andreas
Zeugswetter Andreas SB SD wrote: > Yes agree (or double_significant_digits or format_double_digits ?), > but default to DBL_DIG and allow range between 1 and DBL_DIG + 3. > format_* could be used for all future output format tweaks. > > Unfortunately %A is not portable :-( What do you mean ? It is C99, introduced in glibc 2.1. What are the requirements for PostgreSQL ? Pedro
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > On Wed, 30 Oct 2002, Tom Lane wrote: >> sprintf(ascii, "%.*g", DBL_DIG+1, num); >> and similarly for float4. Given carefully written float I/O routines, >> reading the latter output should reproduce the originally stored value. > Well, on my system, it doesn't look like doing the above sprintfs will > actually work for all numbers. I did a simple program using an arbitrary > big number and the DBL_DIG+1 output when stuck into another double > actually was a different double value. DBL_DIG+2 worked on my system, > but... Oh, you're right; I had forgotten about the effects of scale. DBL_DIG=15 means that the system claims to distinguish all 15-digit values, but in a binary system there's more headroom at the bottom end of a decimal order of magnitude. For example, 15-digit values are fine: regression=# select 100000000000001::float8 - 100000000000000::float8;?column? ---------- 1 (1 row) regression=# select 999999999999999::float8 - 999999999999998::float8;?column? ---------- 1 (1 row) but the 9-etc values are over three binary orders of magnitude larger than the 1-etc values, and so they have three less spare bits at the right end. The system would be lying to claim DBL_DIG=16: regression=# select 9999999999999999::float8 - 9999999999999998::float8;?column? ---------- 2 (1 row) even though values a little over 1e15 are represented perfectly accurately: regression=# select 1000000000000001::float8 - 1000000000000000::float8;?column? ---------- 1 (1 row) If you experiment with 17-digit values, you find that the representable values are about 2 counts apart near 1e16: regression=# select 10000000000000001::float8 - 10000000000000000::float8;?column? ---------- 0 (1 row) regression=# select 10000000000000002::float8 - 10000000000000000::float8;?column? ---------- 2 (1 row) but they're about 16 counts apart near 9e16: regression=# select 99999999999999992::float8 - 99999999999999990::float8;?column? ---------- 16 (1 row) regression=# select 99999999999999991::float8 - 99999999999999990::float8;?column? ---------- 0 (1 row) which is exactly what you'd expect seeing that the values are about a factor of 8 apart. Bottom line: if DBL_DIG=15 and the float arithmetic is binary, then there are some double values that require 17 displayed digits to distinguish, even though not all 16-digit numbers are distinct. So I retract my original proposal and instead suggest that we offer a switch to display either DBL_DIG or DBL_DIG+2 significant digits (and correspondingly increase the digits for float4). The DBL_DIG+2 case should handle the need for exact dump/reload. regards, tom lane
"Pedro M. Ferreira" <pfrazao@ualg.pt> writes: > Zeugswetter Andreas SB SD wrote: >> Unfortunately %A is not portable :-( > What do you mean ? Just what he said. > It is C99, introduced in glibc 2.1. > What are the requirements for PostgreSQL ? glibc does not define the universe; nor are all platforms supporting C99 yet. regards, tom lane
"Pedro M. Ferreira" <pfrazao@ualg.pt> writes: > Its like I said before, the guys from matlab (in x86 IEEE float) go to > DBL_BIG+3 to have 'maximum precision'. Apparently they have not read the canonical papers in the field. [ googles for a moment... ] See How to read floating point numbers accuratelyWilliam D. Clinger How to print floating-point numbers accuratelyGuy L. Steele,Jr., Jon L. White both published at the 1990 ACM Conference on Programming Language Design and Implementation and subsequently reprinted in ACM SIGPLAN Notices Volume 25, Issue 6 (June 1990). I was misremembering these papers to claim DBL_DIG+1 is enough, but actually they prove that DBL_DIG+2 is necessary and sufficient (and give code to do it correctly, too). Printing DBL_DIG+3 is just producing an extra garbage digit; it won't help matters. Any reasonably well-written C library is going to be able to reproduce a double value with DBL_DIG+2 digits of I/O; and if it's not well-written, I would have no confidence in its ability to do so with DBL_DIG+3 digits... regards, tom lane
Tom Lane wrote: > > I was misremembering these papers to claim DBL_DIG+1 is enough, but > actually they prove that DBL_DIG+2 is necessary and sufficient (and > give code to do it correctly, too). Yeahh! If there's a proof its safe to implement. I also Googled a bit and found another paper saying that 17 is the minimum number of significant digits guaranteed to distinguish among IEEE double-precision floating point numbers: "Robert G. Burger and R. Kent Dybvig. Printing floating-point numbers quickly and accurately. In Proceedings of the ACM SIGPLAN '96 Conference on Programming Language Design and Implementation, pages 108--116" http://citeseer.nj.nec.com/28233.html > > Printing DBL_DIG+3 is just producing an extra garbage digit; it won't > help matters. Any reasonably well-written C library is going to be > able to reproduce a double value with DBL_DIG+2 digits of I/O; and if > it's not well-written, I would have no confidence in its ability to do > so with DBL_DIG+3 digits... Off course. This is also good in terms of dump storage for big float8 databases. Its one byte less for every float8. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > -- ---------------------------------------------------------------------- Pedro Miguel Frazao Fernandes Ferreira Universidade do Algarve Faculdade de Ciencias e Tecnologia Campus de Gambelas 8000-117 Faro Portugal Tel./Fax: (+351) 289 800950 / 289 819403 http://w3.ualg.pt/~pfrazao
"Pedro M. Ferreira" <pfrazao@ualg.pt> writes: > Have two parameters, say DOUBLE_OUTPUT and EXTRA_DIGITS. DOUBLE_OUTPUT > would select from decimal output or normalized output. EXTRA_DIGITS > would add the required extra digits, from 0 (default) to 3, when output > is decimal. I'm not happy with adding the hex-output option, since it's not very portable and doesn't seem necessary to solve the problem anyway. But I think an EXTRA_DIGITS setting might be interesting. In particular, suppose we allowed EXTRA_DIGITS to be negative? Setting it to -1 or -2 would go a long way towards eliminating our problems with platform variations in the geometry regression test. Perhaps something like extra_float_digits int range -2 to 2, default 0 extra_float_digits adjusts the number of digits displayed for float4 and float8 output; the base value of 0 means we output FLT_DIG or DBL_DIG digits respectively. Per discussion, there's no reason to allow a value greater than 2, but I'm not as sure what the lower limit should be --- maybe there's some use in setting it less than -2? regards, tom lane
Tom Lane wrote: > > Bottom line: if DBL_DIG=15 and the float arithmetic is binary, then > there are some double values that require 17 displayed digits to > distinguish, even though not all 16-digit numbers are distinct. > > So I retract my original proposal and instead suggest that we offer > a switch to display either DBL_DIG or DBL_DIG+2 significant digits > (and correspondingly increase the digits for float4). The DBL_DIG+2 > case should handle the need for exact dump/reload. Nice. This will be good for number storage purposes. Shall it be done with two parameters, 'DOUBLE_FORMAT' and 'SINGLE_FORMAT', with options 'SHORT' and 'LONG' controlling how the sprintf's are done ? Will someone from pg-people do it or shall I do it for you ? As I said previously, I have seen the GUC stuff and it seem's ok for me to do it. I really do not know if there are any restrictions on who implements what respecting PostgreSQL. Tomorrow we have an holyday in Portugal and I shall leave for the whole week-end, but I can do it on monday. Best regards, Pedro M. Ferreira > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > > -- ---------------------------------------------------------------------- Pedro Miguel Frazao Fernandes Ferreira Universidade do Algarve Faculdade de Ciencias e Tecnologia Campus de Gambelas 8000-117 Faro Portugal Tel./Fax: (+351) 289 800950 / 289 819403 http://w3.ualg.pt/~pfrazao
I sent an email before receiving the one below. I am happier also with the extra_digits way (from the previous email I thought the options were DBL_DIG or DBL_DIG+2). > I'm not happy with adding the hex-output option, since it's not > very portable and doesn't seem necessary to solve the problem anyway. Agree. > Perhaps something like > > extra_float_digits int range -2 to 2, default 0 > > extra_float_digits adjusts the number of digits displayed for float4 and > float8 output; the base value of 0 means we output FLT_DIG or DBL_DIG > digits respectively. Agree. > > Per discussion, there's no reason to allow a value greater than 2, but > I'm not as sure what the lower limit should be --- maybe there's some > use in setting it less than -2? I could see some use. At least in my type of application. When people are shure they only need p significant digits, they can set extra_float_digits to an apropriate negative value and spare a lot in storage for dumps and backups. In this case it would make sense to let extra_float_digits go to -13. Regards, Pedro > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > > -- ---------------------------------------------------------------------- Pedro Miguel Frazao Fernandes Ferreira Universidade do Algarve Faculdade de Ciencias e Tecnologia Campus de Gambelas 8000-117 Faro Portugal Tel./Fax: (+351) 289 800950 / 289 819403 http://w3.ualg.pt/~pfrazao
On Thu, Oct 31, 2002 at 12:58:21 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > But I think an EXTRA_DIGITS setting might be interesting. In > particular, suppose we allowed EXTRA_DIGITS to be negative? Setting > it to -1 or -2 would go a long way towards eliminating our problems > with platform variations in the geometry regression test. My attempt to avoid this problem with the earthdistance regression used a cast to numeric to limit the number of digits to the right of the decimal point. If the normal number of digits displayed is different between two systems, than displaying a fixed number less digits is still going to result in differences.
Bruno Wolff III <bruno@wolff.to> writes: > If the normal number of digits displayed is different between two systems, > than displaying a fixed number less digits is still going to result in > differences. That is, however, not the problem we have with the geometry test. regards, tom lane
Added to TODO: * Add GUC variables extra_float_digits and extra_double_digitsto control output digits --------------------------------------------------------------------------- Tom Lane wrote: > "Pedro M. Ferreira" <pfrazao@ualg.pt> writes: > > Have two parameters, say DOUBLE_OUTPUT and EXTRA_DIGITS. DOUBLE_OUTPUT > > would select from decimal output or normalized output. EXTRA_DIGITS > > would add the required extra digits, from 0 (default) to 3, when output > > is decimal. > > I'm not happy with adding the hex-output option, since it's not > very portable and doesn't seem necessary to solve the problem anyway. > > But I think an EXTRA_DIGITS setting might be interesting. In > particular, suppose we allowed EXTRA_DIGITS to be negative? Setting > it to -1 or -2 would go a long way towards eliminating our problems > with platform variations in the geometry regression test. > > Perhaps something like > > extra_float_digits int range -2 to 2, default 0 > > extra_float_digits adjusts the number of digits displayed for float4 and > float8 output; the base value of 0 means we output FLT_DIG or DBL_DIG > digits respectively. > > Per discussion, there's no reason to allow a value greater than 2, but > I'm not as sure what the lower limit should be --- maybe there's some > use in setting it less than -2? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073