On Tue, 2024-04-16 at 11:58 -0700, Andres Freund wrote:
>
> Hm, that seems annoying, even for update-unicode :/. But I guess it
> won't be
> very common to have such failures?
Things don't change a lot between Unicode versions (and are subject to
the stability policy), but the tests are exhaustive, so even a single
character's property being changed will cause a failure when compared
against an older version of ICU. The case mapping test succeeds back to
ICU 64 (based on Unicode 12.1), but the category/properties test
succeeds only back to ICU 72 (based on Unicode 15.0).
I agree this is annoying, and I briefly documented it in
src/common/unicode/README. It means whoever updates Unicode for a
Postgres version should probably know how to build ICU from source and
point the Postgres build process at it. Maybe I should add more details
in the README to make that easier for others.
But it's also a really good test. The ICU parsing, interpretation of
data files, and lookup code is entirely independent of ours. Therefore,
if the results agree for all codepoints, we have a high degree of
confidence that the results are correct. That level of confidence seems
worth a bit of annoyance.
This kind of test is possible because the category/property and case
mapping functions accept a single code point, and there are only
0x10FFFF code points.
> > That's not to say that the C code shouldn't be tested, of course.
> > Maybe
> > we can just do some spot checks for the functions that are
> > reachable
> > via SQL and get rid of the functions that aren't yet reachable (and
> > re-
> > add them when they are)?
>
> Yes, I think that'd be a good start. I don't think we necessarily
> need
> exhaustive coverage, just a bit more coverage than we have.
OK, I'll submit a test module or something.
Regards,
Jeff Davis