Thread: Re: [PATCH] Add roman support for to_number function

Re: [PATCH] Add roman support for to_number function

From
Maciek Sakrejda
Date:
Thanks for the contribution.

I took a look at the patch, and it works as advertised. It's too late
for the September commitfest, but I took the liberty of registering
your patch for the November CF [1]. In the course of that, I found an
older thread proposing this feature seven years ago [2]. That patch
was returned with feedback and (as far as I can tell), was not
followed-up on by the author. You may want to review that thread for
feedback; I won't repeat it here.

On Fri, Aug 30, 2024 at 12:22 AM Hunaid Sohail <hunaidpgml@gmail.com> wrote:
>While looking at formatting.c file, I noticed a TODO about "add support for roman number to standard number
conversion" (https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/formatting.c#L52) 

Your patch should also remove the TODO =)

> - Is it okay for the function to handle Roman numerals in a case-insensitive way? (e.g., 'XIV', 'xiv', and 'Xiv' are
allseen as 14). 

The patch in the thread I linked also took a case-insensitive
approach. I did not see any objections to that, and it seems
reasonable to me as well.

> - How should we handle Roman numerals with leading or trailing spaces, like ' XIV' or 'MC '? Should we trim the
spaces,or would it be better to throw an error in such cases? 

I thought we could reference existing to_number behavior here, but
after playing with it a bit, I'm not really sure what that is:

-- single leading space
maciek=# select to_number(' 1', '9');
 to_number
-----------
         1
(1 row)

-- two leading spaces
maciek=# select to_number('  1', '9');
ERROR:  invalid input syntax for type numeric: " "
-- two leading spaces and template pattern with a decimal
maciek=# select to_number('  1', '9D9');
 to_number
-----------
         1
(1 row)

Separately, I also noticed some unusual Roman representations work
with your patch:

postgres=# select to_number('viv', 'RN');
 to_number
-----------
         9
(1 row)

Is this expected? In contrast, some somewhat common alternative
representations don't work:

postgres=# select to_number('iiii', 'RN');
ERROR:  invalid roman numeral

I know this is expected, but is this the behavior we want? If so, we
probably want to reject the former case, too. If not, maybe that one
is okay, too.

I know I've probably offered more questions than answers, but I hope
finding the old thread here is useful.

Thanks,
Maciek

[1]: https://commitfest.postgresql.org/50/5233/
[2]: https://www.postgresql.org/message-id/flat/CAGMVOduAJ9wKqJXBYnmFmEetKxapJxrG3afUwpbOZ6n_dWaUnA%40mail.gmail.com



Re: [PATCH] Add roman support for to_number function

From
Hunaid Sohail
Date:


On Mon, Sep 2, 2024 at 11:41 PM Maciek Sakrejda <m.sakrejda@gmail.com> wrote:
Thanks for the contribution.

I took a look at the patch, and it works as advertised. It's too late
for the September commitfest, but I took the liberty of registering
your patch for the November CF [1]. In the course of that, I found an
older thread proposing this feature seven years ago [2]. That patch
was returned with feedback and (as far as I can tell), was not
followed-up on by the author. You may want to review that thread for
feedback; I won't repeat it here.

I submitted the patch on Aug 30 because I read that new patches should be submitted in CF with "Open" status.


On Fri, Aug 30, 2024 at 12:22 AM Hunaid Sohail <hunaidpgml@gmail.com> wrote:
>While looking at formatting.c file, I noticed a TODO about "add support for roman number to standard number conversion"  (https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/formatting.c#L52)

Your patch should also remove the TODO =)

Noted. 

> - How should we handle Roman numerals with leading or trailing spaces, like ' XIV' or 'MC '? Should we trim the spaces, or would it be better to throw an error in such cases?

I thought we could reference existing to_number behavior here, but
after playing with it a bit, I'm not really sure what that is:

-- single leading space
maciek=# select to_number(' 1', '9');
 to_number
-----------
         1
(1 row)

-- two leading spaces
maciek=# select to_number('  1', '9');
ERROR:  invalid input syntax for type numeric: " "
-- two leading spaces and template pattern with a decimal
maciek=# select to_number('  1', '9D9');
 to_number
-----------
         1
(1 row)

Yes, you are right. I can't understand the behaviour of trailing spaces too. Trailing spaces are ignored (doesn't matter how many spaces) but leading spaces are ignored if there is 1 leading space. For more leading spaces, error is returned.
A few cases of trailing spaces.
postgres=# select to_number('  1', '9');
ERROR:  invalid input syntax for type numeric: " "
postgres=# select to_number('1 ', '9');
 to_number
-----------
         1
(1 row)

postgres=# select to_number('1  ', '9');
 to_number
-----------
         1
(1 row)

postgres=# select to_number('1       ', '9');
 to_number
-----------
         1
(1 row)
 

Separately, I also noticed some unusual Roman representations work
with your patch:

postgres=# select to_number('viv', 'RN');
 to_number
-----------
         9
(1 row)

Is this expected? In contrast, some somewhat common alternative
representations don't work:

postgres=# select to_number('iiii', 'RN');
ERROR:  invalid roman numeral

I know this is expected, but is this the behavior we want? If so, we
probably want to reject the former case, too. If not, maybe that one
is okay, too.
 
Yes, 'viv' is invalid. Thanks for pointing this out. Also, found a few other invalid cases like 'lxl' or 'dcd'. I will fix them in the next patch.
Thank you for your feedback.

Regards,
Hunaid Sohail

Re: [PATCH] Add roman support for to_number function

From
Maciek Sakrejda
Date:
On Tue, Sep 3, 2024 at 6:29 AM Hunaid Sohail <hunaidpgml@gmail.com> wrote:
> I submitted the patch on Aug 30 because I read that new patches should be submitted in CF with "Open" status.

Oh my bad! I missed that you had submitted it to the September CF:
https://commitfest.postgresql.org/49/5221/

I don't see a way to just delete CF entries, so I've marked the
November one that I created as Withdrawn.

I'll add myself as reviewer to the September CF entry.

Thanks,
Maciek



Re: [PATCH] Add roman support for to_number function

From
Aleksander Alekseev
Date:
Hi,

> I have attached a new patch v2 with following changes:
>
> - Handled invalid cases like 'viv', 'lxl', and 'dcd'.
> - Changed errcode to 22P07 because 22P06 was already taken.
> - Removed TODO.
> - Added a few positive & negative tests.
> - Updated documentation.
>
> Looking forward to your feedback.

While playing with the patch I noticed that to_char(..., 'RN') doesn't
seem to be test-covered. I suggest adding the following test:

```
WITH rows AS (
    SELECT i, to_char(i, 'FMRN') AS roman
    FROM generate_series(1, 3999) AS i
) SELECT bool_and(to_number(roman, 'RN') = i) FROM rows;

 bool_and
----------
 t
```

... in order to fix this while on it. The query takes ~12 ms on my laptop.

-- 
Best regards,
Aleksander Alekseev



Re: [PATCH] Add roman support for to_number function

From
Hunaid Sohail
Date:


On Thu, Sep 5, 2024 at 2:41 PM Aleksander Alekseev <aleksander@timescale.com> wrote:

While playing with the patch I noticed that to_char(..., 'RN') doesn't
seem to be test-covered. I suggest adding the following test:

```
WITH rows AS (
    SELECT i, to_char(i, 'FMRN') AS roman
    FROM generate_series(1, 3999) AS i
) SELECT bool_and(to_number(roman, 'RN') = i) FROM rows;

 bool_and
----------
 t
```

I also noticed there are no tests for to_char roman format. The test you provided covers roman format in both to_char and to_number. I will add it.
Thank you.

Regards,
Hunaid Sohail