Hi,
On 2024-04-15 18:23:21 -0700, Jeff Davis wrote:
> On Mon, 2024-04-15 at 17:05 -0700, Andres Freund wrote:
> > Can't we test this as part of the normal testsuite?
>
> One thing that complicates things a bit is that the test compares the
> results against ICU, so a mismatch in Unicode version between ICU and
> Postgres can cause test failures. The test ignores unassigned code
> points, so normally it just results in less-exhaustive test coverage.
> But sometimes things really do change, and that would cause a failure.
Hm, that seems annoying, even for update-unicode :/. But I guess it won't be
very common to have such failures?
> Stepping back a moment, my top worry is really not to test those C
> functions, but to test the perl code that parses the text files and
> generates those arrays. Imagine a future Unicode version does something
> that the perl scripts didn't anticipate, and they fail to add array
> entries for half the code points, or something like that. By testing
> the arrays generated from freshly-parsed files exhaustively against
> ICU, then we have a good defense against that. That situation really
> only comes up when updating Unicode.
That's a good point.
> That's not to say that the C code shouldn't be tested, of course. Maybe
> we can just do some spot checks for the functions that are reachable
> via SQL and get rid of the functions that aren't yet reachable (and re-
> add them when they are)?
Yes, I think that'd be a good start. I don't think we necessarily need
exhaustive coverage, just a bit more coverage than we have.
> > I don't at all like that the tests depend on downloading new unicode
> > data. What if there was an update but I just want to test the current
> > state?
>
> I was mostly following the precedent for normalization. Should we
> change that, also?
Yea, I think we should. But I think it's less urgent if we end up testing more
of the code without those test binaries. I don't immediately know what
dependencies would be best, tbh.
Greetings,
Andres Freund