On Mon, 2024-04-15 at 17:05 -0700, Andres Freund wrote:
> Can't we test this as part of the normal testsuite?
One thing that complicates things a bit is that the test compares the
results against ICU, so a mismatch in Unicode version between ICU and
Postgres can cause test failures. The test ignores unassigned code
points, so normally it just results in less-exhaustive test coverage.
But sometimes things really do change, and that would cause a failure.
I'm not quite sure how we should handle that -- maybe only run the test
when the ICU version is known to be in a range where that's not a
problem?
Another option is to look for another way to test this code without
ICU. We could generate a list of known mappings and compare to that,
but we'd have to do it some way other than what the code is doing now,
otherwise we'd just be testing the code against itself. Maybe we can
load the Unicode data into a Postgres table and then test with a SELECT
statement or something?
I am worried that it will end looking like an over-engineered way to
compare a text file to itself.
Stepping back a moment, my top worry is really not to test those C
functions, but to test the perl code that parses the text files and
generates those arrays. Imagine a future Unicode version does something
that the perl scripts didn't anticipate, and they fail to add array
entries for half the code points, or something like that. By testing
the arrays generated from freshly-parsed files exhaustively against
ICU, then we have a good defense against that. That situation really
only comes up when updating Unicode.
That's not to say that the C code shouldn't be tested, of course. Maybe
we can just do some spot checks for the functions that are reachable
via SQL and get rid of the functions that aren't yet reachable (and re-
add them when they are)?
> I don't at all like that the tests depend on downloading new unicode
> data. What if there was an update but I just want to test the current
> state?
I was mostly following the precedent for normalization. Should we
change that, also?
Regards,
Jeff Davis