Re: [HACKERS] Add Roman numeral conversion to to_number - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Add Roman numeral conversion to to_number
Date
Msg-id 13700.1511029950@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Add Roman numeral conversion to to_number  (Oliver Ford <ojford@gmail.com>)
Responses Re: [HACKERS] Add Roman numeral conversion to to_number  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
Oliver Ford <ojford@gmail.com> writes:
> Attached is v2 of src, tests and docs. Doc patch is unchanged from v1.

I took a quick look at this patch.  The roman_to_int function itself seems
fairly reasonable, except that I don't like anything about the error
reporting.  ERRCODE_INVALID_PARAMETER_VALUE doesn't really seem like the
right thing for bogus data; probably ERRCODE_INVALID_TEXT_REPRESENTATION
is the closest thing we've got.  More generally, reporting a character
position in a string you're not showing to the user is neither helpful nor
per project style.  It'd be much better to just report the whole input
string.  I'm tempted to suggest that all the error reports could beerrmsg("incorrect Roman numeral: \"%s\"", numerals)
I'm not sure it's worth breaking it down more finely than that, and
even if it is, several of these messages aren't too intelligible as-is.

An angle that might be worth considering is whether we really want to
reject non-canonical Roman input.  As is, the function rejects "IIII",
insisting that you must spell it "IV", but is that helpful or just
pedantic?  Likewise I'm not that excited about expending code and a
separate translatable message to insist that people have to write "X"
rather than "VV".

However, what I really don't like is the way that you plugged roman_to_int
into the existing to_number infrastructure, which is to say that you
didn't.  Putting in an "if roman then <x> else <all the existing code>"
test is not the way to make this work.  As an example, that fails to
handle literal text in the format.  If this works:

regression=# select to_char(123, 'foo FMRN'); to_char   
------------foo CXXIII
(1 row)

then this should, but it fails:

regression=# select to_number('foo CXXIII', 'foo FMRN');
ERROR:  invalid character " "

I think you need to be calling roman_to_int from the NUM_RN/NUM_rn
switch cases in NUM_processor, not trying to bypass NUM_processor
entirely.

Another issue is that on the input side, we really need to reject formats
like 'RN 999', since it's very unclear how we'd combine the input values.
There already is some logic that rejects 'RN EEEE', but it needs
extension.  On the other hand, I do not see any really good reason to
reject 'RN 999' for output; it would result in duplicative output, but
if the user asked for it why not do it?  (I see the existing code fails to
handle that case for some reason.)  So some refactoring of the existing
roman-numeral code seems needed anyway.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Consistently catch errors from Python _New() functions
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] Consistently catch errors from Python _New() functions