Re: [PATCH] Add roman support for to_number function - Mailing list pgsql-hackers

From Hunaid Sohail
Subject Re: [PATCH] Add roman support for to_number function
Date
Msg-id CAMWA6yZORgUAwCt=e9=A1W41naq-wrGcpXgXZmX2t_fdP1Z+PA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Add roman support for to_number function  (Maciek Sakrejda <m.sakrejda@gmail.com>)
Responses Re: [PATCH] Add roman support for to_number function
List pgsql-hackers


On Mon, Sep 2, 2024 at 11:41 PM Maciek Sakrejda <m.sakrejda@gmail.com> wrote:
Thanks for the contribution.

I took a look at the patch, and it works as advertised. It's too late
for the September commitfest, but I took the liberty of registering
your patch for the November CF [1]. In the course of that, I found an
older thread proposing this feature seven years ago [2]. That patch
was returned with feedback and (as far as I can tell), was not
followed-up on by the author. You may want to review that thread for
feedback; I won't repeat it here.

I submitted the patch on Aug 30 because I read that new patches should be submitted in CF with "Open" status.


On Fri, Aug 30, 2024 at 12:22 AM Hunaid Sohail <hunaidpgml@gmail.com> wrote:
>While looking at formatting.c file, I noticed a TODO about "add support for roman number to standard number conversion"  (https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/formatting.c#L52)

Your patch should also remove the TODO =)

Noted. 

> - How should we handle Roman numerals with leading or trailing spaces, like ' XIV' or 'MC '? Should we trim the spaces, or would it be better to throw an error in such cases?

I thought we could reference existing to_number behavior here, but
after playing with it a bit, I'm not really sure what that is:

-- single leading space
maciek=# select to_number(' 1', '9');
 to_number
-----------
         1
(1 row)

-- two leading spaces
maciek=# select to_number('  1', '9');
ERROR:  invalid input syntax for type numeric: " "
-- two leading spaces and template pattern with a decimal
maciek=# select to_number('  1', '9D9');
 to_number
-----------
         1
(1 row)

Yes, you are right. I can't understand the behaviour of trailing spaces too. Trailing spaces are ignored (doesn't matter how many spaces) but leading spaces are ignored if there is 1 leading space. For more leading spaces, error is returned.
A few cases of trailing spaces.
postgres=# select to_number('  1', '9');
ERROR:  invalid input syntax for type numeric: " "
postgres=# select to_number('1 ', '9');
 to_number
-----------
         1
(1 row)

postgres=# select to_number('1  ', '9');
 to_number
-----------
         1
(1 row)

postgres=# select to_number('1       ', '9');
 to_number
-----------
         1
(1 row)
 

Separately, I also noticed some unusual Roman representations work
with your patch:

postgres=# select to_number('viv', 'RN');
 to_number
-----------
         9
(1 row)

Is this expected? In contrast, some somewhat common alternative
representations don't work:

postgres=# select to_number('iiii', 'RN');
ERROR:  invalid roman numeral

I know this is expected, but is this the behavior we want? If so, we
probably want to reject the former case, too. If not, maybe that one
is okay, too.
 
Yes, 'viv' is invalid. Thanks for pointing this out. Also, found a few other invalid cases like 'lxl' or 'dcd'. I will fix them in the next patch.
Thank you for your feedback.

Regards,
Hunaid Sohail

pgsql-hackers by date:

Previous
From: Andrei Lepikhov
Date:
Subject: Re: using extended statistics to improve join estimates
Next
From: Alexander Korotkov
Date:
Subject: Re: POC: make mxidoff 64 bits