Thread: casting Bangla characters to NUMERIC
Hi I have an interesting problem. I have a string field in a table which (sometimes) is expected to contain numeric values ie. a cast of the field to INTEGER is supposed to succeed. My issue is that the application is running in Bangladesh, and sometimes the users have entered values using Bangla characters. eg. "৮" rather than "8". (Note that this shouldnt have been possible to enter in the first place, but we can ignore that for now. They are there, I have to somehow deal with it.) Is there a way with LOCALE settings to get this cast to work correctly? Currently I get: postgres=# select cast('৮' as INTEGER); ERROR: invalid input syntax for integer: "৮" LINE 1: select cast('৮' as INTEGER); Regards Bob
Bob Jolliffe <bobjolliffe@gmail.com> writes: > I have an interesting problem. I have a string field in a table which > (sometimes) is expected to contain numeric values ie. a cast of the > field to INTEGER is supposed to succeed. > My issue is that the application is running in Bangladesh, and > sometimes the users have entered values using Bangla characters. eg. > "৮" rather than "8". (Note that this shouldnt have been possible to > enter in the first place, but we can ignore that for now. They are > there, I have to somehow deal with it.) > Is there a way with LOCALE settings to get this cast to work > correctly? Doubt it :-(. ISTM it's certainly outside the charter of int4in to do that. Ideally to_number() would handle it, but I don't think it does at the moment. Probably your best bet is to translate those characters to regular ASCII digits using replace(), then cast. It'd be a bit tedious, but fortunately there are only 10 cases to consider, and you could wrap that up in a function. regards, tom lane
Thanks Tom. That is what I expected to hear. Was being hopeful ... On Mon, 25 Nov 2019 at 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Bob Jolliffe <bobjolliffe@gmail.com> writes: > > I have an interesting problem. I have a string field in a table which > > (sometimes) is expected to contain numeric values ie. a cast of the > > field to INTEGER is supposed to succeed. > > > My issue is that the application is running in Bangladesh, and > > sometimes the users have entered values using Bangla characters. eg. > > "৮" rather than "8". (Note that this shouldnt have been possible to > > enter in the first place, but we can ignore that for now. They are > > there, I have to somehow deal with it.) > > > Is there a way with LOCALE settings to get this cast to work > > correctly? > > Doubt it :-(. ISTM it's certainly outside the charter of int4in to > do that. Ideally to_number() would handle it, but I don't think it > does at the moment. Probably your best bet is to translate those > characters to regular ASCII digits using replace(), then cast. > It'd be a bit tedious, but fortunately there are only 10 cases > to consider, and you could wrap that up in a function. > > regards, tom lane
select translate(string,'০১২৩৪৫৬৭৮৯','0123456789'); seems to do the trick. On Mon, 25 Nov 2019 at 16:38, Bob Jolliffe <bobjolliffe@gmail.com> wrote: > > Thanks Tom. That is what I expected to hear. Was being hopeful ... > > On Mon, 25 Nov 2019 at 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Bob Jolliffe <bobjolliffe@gmail.com> writes: > > > I have an interesting problem. I have a string field in a table which > > > (sometimes) is expected to contain numeric values ie. a cast of the > > > field to INTEGER is supposed to succeed. > > > > > My issue is that the application is running in Bangladesh, and > > > sometimes the users have entered values using Bangla characters. eg. > > > "৮" rather than "8". (Note that this shouldnt have been possible to > > > enter in the first place, but we can ignore that for now. They are > > > there, I have to somehow deal with it.) > > > > > Is there a way with LOCALE settings to get this cast to work > > > correctly? > > > > Doubt it :-(. ISTM it's certainly outside the charter of int4in to > > do that. Ideally to_number() would handle it, but I don't think it > > does at the moment. Probably your best bet is to translate those > > characters to regular ASCII digits using replace(), then cast. > > It'd be a bit tedious, but fortunately there are only 10 cases > > to consider, and you could wrap that up in a function. > > > > regards, tom lane