Thread: numeric_to_number() function skipping some digits
Hi,<br /><br />With PG84, I have tried something like this which seem incorrect to me.<br /><br /># SELECT '' AS to_number_2, to_number('-347,58', '99G999');<br /> to_number_2 | to_number <br />-------------+-----------<br /> | -3458<br />(1 row)<br /><br />After browsing the code (numeric_to_number), I have found that number stringis processed according to the number format. Which means, when 'G' is found in the format we have moved the numberpointer to next character. This is why the digit '7' is skipped. And while processing ',', as it is not a digit itis ignored.<br /><br />Is this expected behaviour? or a bug?<br /><br />Thanks<br /><br />-- <br />Jeevan B Chalke<br />EnterpriseDBSoftware India Private Limited, Pune<br />Visit us at: <a href="http://www.enterprisedb.com">www.enterprisedb.com</a><br/> --- <br />If better is possible, then good is not enough<br/><br />
Jeevan Chalke <jeevan.chalke@enterprisedb.com> writes: > With PG84, I have tried something like this which seem incorrect to me. > # SELECT '' AS to_number_2, to_number('-347,58', '99G999'); Well, the input doesn't actually match the format, so I'm not totally excited about this. You do get sane results from: regression=# select to_number('-347,58', '999G99');to_number ----------- -34758 (1 row) Should we have it throw an error if the input corresponding to a G symbol doesn't match the expected group separator? I'm concerned that that would break applications that work okay today. regards, tom lane
2009/9/19 Tom Lane <tgl@sss.pgh.pa.us>: > Should we have it throw an error if the input corresponding to a G > symbol doesn't match the expected group separator? I'm concerned that > that would break applications that work okay today. > It would be a substantial change to the behaviour, and to do it properly we'd have to change to_date() to actually parse separator characters as well. That is, you can currently write to_date('2009/09/19', 'YYYY-MM-DD') -- it doesn't matter what the separator characters actually look like, since per the format pattern they cannot affect the date outcome. This naturally leads to the question we always have to ask with these functions: What Does Oracle Do? But FWIW, a -1 from me for changing this. Cheers, BJ
Hi,
--
Jeevan B Chalke
EnterpriseDB Software India Private Limited, Pune
Visit us at: www.enterprisedb.com
---
If better is possible, then good is not enough
On Sat, Sep 19, 2009 at 1:52 AM, Brendan Jurd <direvus@gmail.com> wrote:
Oracle returns "19-SEP-09" irrespective of the format.
Here in PG, we have getting the proper date irrespective of the format as Oracle. But in the case to to_number the returned value is wrong. For example following query returns '340' on PG where as it returns '3450' on Oracle.
select to_number('34,50','999,99') from dual;
Do you mean this is the expected behaviour then?
2009/9/19 Tom Lane <tgl@sss.pgh.pa.us>:> Should we have it throw an error if the input corresponding to a GIt would be a substantial change to the behaviour, and to do it
> symbol doesn't match the expected group separator? I'm concerned that
> that would break applications that work okay today.
>
properly we'd have to change to_date() to actually parse separator
characters as well.
That is, you can currently write to_date('2009/09/19', 'YYYY-MM-DD')
-- it doesn't matter what the separator characters actually look like,
since per the format pattern they cannot affect the date outcome.
This naturally leads to the question we always have to ask with these
functions: What Does Oracle Do?
Oracle returns "19-SEP-09" irrespective of the format.
Here in PG, we have getting the proper date irrespective of the format as Oracle. But in the case to to_number the returned value is wrong. For example following query returns '340' on PG where as it returns '3450' on Oracle.
select to_number('34,50','999,99') from dual;
But FWIW, a -1 from me for changing this.
Do you mean this is the expected behaviour then?
Cheers,
BJ
--
Jeevan B Chalke
EnterpriseDB Software India Private Limited, Pune
Visit us at: www.enterprisedb.com
---
If better is possible, then good is not enough
2009/9/21 Jeevan Chalke <jeevan.chalke@enterprisedb.com>: > Oracle returns "19-SEP-09" irrespective of the format. > Here in PG, we have getting the proper date irrespective of the format as > Oracle. But in the case to to_number the returned value is wrong. For > example following query returns '340' on PG where as it returns '3450' on > Oracle. > > select to_number('34,50','999,99') from dual; > Hi Jeevan, Thanks for checking up on the Oracle behaviour. It appears to silently disregard grouping characters in the format pattern, and also disregard them wherever they appear in the input string (or else it reads the string from right-to-left?). It seems that, to match Oracle, we'd need to teach the code that 'G' and ',' are no-ops for to_number(), and also that such characters should be ignored in the input. To be honest, though, I'm not sure it's worth pursuing. If you want to feed in numbers that have decorative characters all through them, it's far more predictable to just regex out the cruft and use ordinary numeric parsing than to use to_number(), which is infamous for its idiosyncrasies: # SELECT regexp_replace('34,50', E'[\\d.]', '', 'g')::numeric; 3450 Cheers, BJ
2009/9/21 Brendan Jurd <direvus@gmail.com>: > > # SELECT regexp_replace('34,50', E'[\\d.]', '', 'g')::numeric; > 3450 > Sorry, that regex ought to have read E'[^\\d.]'.
Hi,
Thanks
--
Jeevan B Chalke
EnterpriseDB Software India Private Limited, Pune
Visit us at: www.enterprisedb.com
---
If better is possible, then good is not enough
On Mon, Sep 21, 2009 at 12:36 PM, Brendan Jurd <direvus@gmail.com> wrote:
It seems that Oracle reads formatting string from right-to-left. Here are few results:
('number','format') ==> Oracle PG
--------------------------------------------
('34,50','999,99') ==> 3450 340
('34,50','99,99') ==> 3450 3450
('34,50','99,999') ==> Invalid Number 3450
('34,50','999,999') ==> Invalid Number 340
That means we cannot simply ignore such characters from the input. Rather we can process the string R-L. But yes this will definitely going to break the current applications running today.
This (with E'[^\\d.]') ignores/replaces all the characters except digits from the input which we certainly not wishing to do. Instead we can continue with the current implementation. But IMHO, somewhere in the time-line we need to fix this.
2009/9/21 Jeevan Chalke <jeevan.chalke@enterprisedb.com>:> Oracle returns "19-SEP-09" irrespective of the format.Hi Jeevan,
> Here in PG, we have getting the proper date irrespective of the format as
> Oracle. But in the case to to_number the returned value is wrong. For
> example following query returns '340' on PG where as it returns '3450' on
> Oracle.
>
> select to_number('34,50','999,99') from dual;
>
Thanks for checking up on the Oracle behaviour. It appears to
silently disregard grouping characters in the format pattern, and also
disregard them wherever they appear in the input string (or else it
reads the string from right-to-left?).
It seems that Oracle reads formatting string from right-to-left. Here are few results:
('number','format') ==> Oracle PG
--------------------------------------------
('34,50','999,99') ==> 3450 340
('34,50','99,99') ==> 3450 3450
('34,50','99,999') ==> Invalid Number 3450
('34,50','999,999') ==> Invalid Number 340
It seems that, to match Oracle, we'd need to teach the code that 'G'
and ',' are no-ops for to_number(), and also that such characters
should be ignored in the input.
That means we cannot simply ignore such characters from the input. Rather we can process the string R-L. But yes this will definitely going to break the current applications running today.
To be honest, though, I'm not sure it's worth pursuing. If you want
to feed in numbers that have decorative characters all through them,
it's far more predictable to just regex out the cruft and use ordinary
numeric parsing than to use to_number(), which is infamous for its
idiosyncrasies:
# SELECT regexp_replace('34,50', E'[\\d.]', '', 'g')::numeric;
3450
This (with E'[^\\d.]') ignores/replaces all the characters except digits from the input which we certainly not wishing to do. Instead we can continue with the current implementation. But IMHO, somewhere in the time-line we need to fix this.
Cheers,
BJ
Thanks
--
Jeevan B Chalke
EnterpriseDB Software India Private Limited, Pune
Visit us at: www.enterprisedb.com
---
If better is possible, then good is not enough
Jeevan Chalke escreveu: > That means we cannot simply ignore such characters from the input. > Rather we can process the string R-L. But yes this will definitely going > to break the current applications running today. > IIRC we tight the to_char() and to_timestamp() input for 8.4. Why don't we do it for to_number() too? The fact of breaking application is not a strong argument against having compatibility with Oracle to_*() functions. People that are building applications around these two databases and are using the to_*() functions are expecting that the behavior being the same (i didn't remember any complaints about that but ...). So +1 for this TODO. -- Euler Taveira de Oliveira http://www.timbira.com/
On Tue, Sep 22, 2009 at 10:27:19AM +0530, Jeevan Chalke wrote: > > It seems that Oracle reads formatting string from right-to-left. Here are > few results: > ('number','format') ==> Oracle PG > -------------------------------------------- > ('34,50','999,99') ==> 3450 340 > ('34,50','99,99') ==> 3450 3450 > ('34,50','99,999') ==> Invalid Number 3450 > ('34,50','999,999') ==> Invalid Number 340 It seems worse to to give a wrong answer silently then to throw an error. What we do now seems sort of MySqlish. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
daveg <daveg@sonic.net> wrote: > On Tue, Sep 22, 2009 at 10:27:19AM +0530, Jeevan Chalke wrote: >> >> It seems that Oracle reads formatting string from right-to-left. >> Here are few results: >> ('number','format') ==> Oracle PG >> -------------------------------------------- >> ('34,50','999,99') ==> 3450 340 >> ('34,50','99,99') ==> 3450 3450 >> ('34,50','99,999') ==> Invalid Number 3450 >> ('34,50','999,999') ==> Invalid Number 340 > > It seems worse to to give a wrong answer silently then to throw an > error. What we do now seems sort of MySqlish. I agree with David on that. Further, it seems to me that functions which are there only for Oracle compatibility (i.e., they are not part of the SQL standard and are not particularly sensible, but are popular in Oracle) should behave like the corresponding Oracle function. An argument could even be made for "bug compatibility" for such functions, at least to some extent; although we're not talking about that here -- the Oracle results seem more sane than current PostgreSQL behavior.. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > daveg <daveg@sonic.net> wrote: >> On Tue, Sep 22, 2009 at 10:27:19AM +0530, Jeevan Chalke wrote: >>> It seems that Oracle reads formatting string from right-to-left. >> It seems worse to to give a wrong answer silently then to throw an >> error. What we do now seems sort of MySqlish. > I agree with David on that. So who is volunteering to do the presumably rather major rewrite involved? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> daveg <daveg@sonic.net> wrote: >>> On Tue, Sep 22, 2009 at 10:27:19AM +0530, Jeevan Chalke wrote: >>>> It seems that Oracle reads formatting string from right-to-left. > >>> It seems worse to to give a wrong answer silently then to throw an >>> error. What we do now seems sort of MySqlish. > >> I agree with David on that. > > So who is volunteering to do the presumably rather major rewrite > involved? We don't typically identify an author before putting something on the TODO list. That said, this one seems like it would be within my skill set. I seem to have trouble finding tasks that are for which we can get consensus, so I just might pick this one up if we achieve such consensus on it. The biggest problem, should I take it on, would be that I don't currently have access to an Oracle database, so someone would have to supply me with accurate specs for how it should behave, or point to unambiguous documentation. -Kevin
On Wed, 2009-09-30 at 19:08 -0500, Kevin Grittner wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > >> daveg <daveg@sonic.net> wrote: > >>> On Tue, Sep 22, 2009 at 10:27:19AM +0530, Jeevan Chalke wrote: > >>>> It seems that Oracle reads formatting string from right-to-left. > > > >>> It seems worse to to give a wrong answer silently then to throw an > >>> error. What we do now seems sort of MySqlish. > > > >> I agree with David on that. > > > > So who is volunteering to do the presumably rather major rewrite > > involved? > > We don't typically identify an author before putting something on the > TODO list. > > That said, this one seems like it would be within my skill set. I > seem to have trouble finding tasks that are for which we can get > consensus, so I just might pick this one up if we achieve such > consensus on it. The biggest problem, should I take it on, would be > that I don't currently have access to an Oracle database, so someone > would have to supply me with accurate specs for how it should behave, > or point to unambiguous documentation. Just download developer edition? > > -Kevin > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering If the world pushes look it in the eye and GRR. Then push back harder. - Salamander
"Joshua D. Drake" <jd@commandprompt.com> wrote: > On Wed, 2009-09-30 at 19:08 -0500, Kevin Grittner wrote: >> I don't currently have access to an Oracle database > Just download developer edition? [quick google search] Looks like that would do it. Thanks. -Kevin
On Wed, 2009-09-30 at 19:08 -0500, Kevin Grittner wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > >> daveg <daveg@sonic.net> wrote: > >>> On Tue, Sep 22, 2009 at 10:27:19AM +0530, Jeevan Chalke wrote: > >>>> It seems that Oracle reads formatting string from right-to-left. > > > >>> It seems worse to to give a wrong answer silently then to throw an > >>> error. What we do now seems sort of MySqlish. > > > >> I agree with David on that. > > > > So who is volunteering to do the presumably rather major rewrite > > involved? > > We don't typically identify an author before putting something on the > TODO list. > > That said, this one seems like it would be within my skill set. I > seem to have trouble finding tasks that are for which we can get > consensus, so I just might pick this one up if we achieve such > consensus on it. The biggest problem, should I take it on, would be > that I don't currently have access to an Oracle database, so someone > would have to supply me with accurate specs for how it should behave, > or point to unambiguous documentation. Just download developer edition? > > -Kevin > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering If the world pushes look it in the eye and GRR. Then push back harder. - Salamander
Added to TODO: |Fix to_number() handling for values not matching the format string --------------------------------------------------------------------------- Jeevan Chalke wrote: > Hi, > > On Mon, Sep 21, 2009 at 12:36 PM, Brendan Jurd <direvus@gmail.com> wrote: > > > 2009/9/21 Jeevan Chalke <jeevan.chalke@enterprisedb.com>: > > > Oracle returns "19-SEP-09" irrespective of the format. > > > Here in PG, we have getting the proper date irrespective of the format as > > > Oracle. But in the case to to_number the returned value is wrong. For > > > example following query returns '340' on PG where as it returns '3450' on > > > Oracle. > > > > > > select to_number('34,50','999,99') from dual; > > > > > > > Hi Jeevan, > > > > Thanks for checking up on the Oracle behaviour. It appears to > > silently disregard grouping characters in the format pattern, and also > > disregard them wherever they appear in the input string (or else it > > reads the string from right-to-left?). > > > > It seems that Oracle reads formatting string from right-to-left. Here are > few results: > ('number','format') ==> Oracle PG > -------------------------------------------- > ('34,50','999,99') ==> 3450 340 > ('34,50','99,99') ==> 3450 3450 > ('34,50','99,999') ==> Invalid Number 3450 > ('34,50','999,999') ==> Invalid Number 340 > > > > > > It seems that, to match Oracle, we'd need to teach the code that 'G' > > and ',' are no-ops for to_number(), and also that such characters > > should be ignored in the input. > > > > That means we cannot simply ignore such characters from the input. Rather we > can process the string R-L. But yes this will definitely going to break the > current applications running today. > > > > To be honest, though, I'm not sure it's worth pursuing. If you want > > to feed in numbers that have decorative characters all through them, > > it's far more predictable to just regex out the cruft and use ordinary > > numeric parsing than to use to_number(), which is infamous for its > > idiosyncrasies: > > > > # SELECT regexp_replace('34,50', E'[\\d.]', '', 'g')::numeric; > > 3450 > > > > This (with E'[^\\d.]') ignores/replaces all the characters except digits > from the input which we certainly not wishing to do. Instead we can continue > with the current implementation. But IMHO, somewhere in the time-line we > need to fix this. > > > > Cheers, > > BJ > > > > > Thanks > -- > Jeevan B Chalke > EnterpriseDB Software India Private Limited, Pune > Visit us at: www.enterprisedb.com > --- > If better is possible, then good is not enough -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.comPG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive,Christ can be your backup. +