Thread: casting Bangla characters to NUMERIC

casting Bangla characters to NUMERIC

From
Bob Jolliffe
Date:
Hi

I have an interesting problem.  I have a string field in a table which
(sometimes) is expected to contain numeric values  ie. a cast of the
field to INTEGER is supposed to succeed.

My issue is that the application is running in Bangladesh, and
sometimes the users have entered values using Bangla characters.  eg.
"৮" rather than "8".  (Note that this shouldnt have been possible to
enter in the first place, but we can ignore that for now.  They are
there, I have to somehow deal with it.)

Is there a way with LOCALE settings to get this cast to work
correctly?  Currently I get:
postgres=# select cast('৮' as INTEGER);
ERROR:  invalid input syntax for integer: "৮"
LINE 1: select cast('৮' as INTEGER);

Regards
Bob



Re: casting Bangla characters to NUMERIC

From
Tom Lane
Date:
Bob Jolliffe <bobjolliffe@gmail.com> writes:
> I have an interesting problem.  I have a string field in a table which
> (sometimes) is expected to contain numeric values  ie. a cast of the
> field to INTEGER is supposed to succeed.

> My issue is that the application is running in Bangladesh, and
> sometimes the users have entered values using Bangla characters.  eg.
> "৮" rather than "8".  (Note that this shouldnt have been possible to
> enter in the first place, but we can ignore that for now.  They are
> there, I have to somehow deal with it.)

> Is there a way with LOCALE settings to get this cast to work
> correctly?

Doubt it :-(.  ISTM it's certainly outside the charter of int4in to
do that.  Ideally to_number() would handle it, but I don't think it
does at the moment.  Probably your best bet is to translate those
characters to regular ASCII digits using replace(), then cast.
It'd be a bit tedious, but fortunately there are only 10 cases
to consider, and you could wrap that up in a function.

            regards, tom lane



Re: casting Bangla characters to NUMERIC

From
Bob Jolliffe
Date:
Thanks Tom.  That is what I expected to hear.  Was being hopeful ...

On Mon, 25 Nov 2019 at 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Bob Jolliffe <bobjolliffe@gmail.com> writes:
> > I have an interesting problem.  I have a string field in a table which
> > (sometimes) is expected to contain numeric values  ie. a cast of the
> > field to INTEGER is supposed to succeed.
>
> > My issue is that the application is running in Bangladesh, and
> > sometimes the users have entered values using Bangla characters.  eg.
> > "৮" rather than "8".  (Note that this shouldnt have been possible to
> > enter in the first place, but we can ignore that for now.  They are
> > there, I have to somehow deal with it.)
>
> > Is there a way with LOCALE settings to get this cast to work
> > correctly?
>
> Doubt it :-(.  ISTM it's certainly outside the charter of int4in to
> do that.  Ideally to_number() would handle it, but I don't think it
> does at the moment.  Probably your best bet is to translate those
> characters to regular ASCII digits using replace(), then cast.
> It'd be a bit tedious, but fortunately there are only 10 cases
> to consider, and you could wrap that up in a function.
>
>                         regards, tom lane



Re: casting Bangla characters to NUMERIC

From
Bob Jolliffe
Date:
 select translate(string,'০১২৩৪৫৬৭৮৯','0123456789');

seems to do the trick.

On Mon, 25 Nov 2019 at 16:38, Bob Jolliffe <bobjolliffe@gmail.com> wrote:
>
> Thanks Tom.  That is what I expected to hear.  Was being hopeful ...
>
> On Mon, 25 Nov 2019 at 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Bob Jolliffe <bobjolliffe@gmail.com> writes:
> > > I have an interesting problem.  I have a string field in a table which
> > > (sometimes) is expected to contain numeric values  ie. a cast of the
> > > field to INTEGER is supposed to succeed.
> >
> > > My issue is that the application is running in Bangladesh, and
> > > sometimes the users have entered values using Bangla characters.  eg.
> > > "৮" rather than "8".  (Note that this shouldnt have been possible to
> > > enter in the first place, but we can ignore that for now.  They are
> > > there, I have to somehow deal with it.)
> >
> > > Is there a way with LOCALE settings to get this cast to work
> > > correctly?
> >
> > Doubt it :-(.  ISTM it's certainly outside the charter of int4in to
> > do that.  Ideally to_number() would handle it, but I don't think it
> > does at the moment.  Probably your best bet is to translate those
> > characters to regular ASCII digits using replace(), then cast.
> > It'd be a bit tedious, but fortunately there are only 10 cases
> > to consider, and you could wrap that up in a function.
> >
> >                         regards, tom lane