I wrote:
> I agree with your point that this is a shouldn't-happen corner case.
> The question boils down to, if it *does* happen, does that constitute
> a meaningful information leak? Up to now we've taken quite a hard
> line about what leakproofness means, so deciding that varstr_cmp
> is leakproof would constitute moving the goalposts a bit. They'd
> still be in the same stadium, though, IMO.
For most of us it might be more meaningful to look at the non-Windows
code paths, for which the question reduces to what we think of this:
UErrorCode status;
status = U_ZERO_ERROR;
result = ucol_strcollUTF8(mylocale->info.icu.ucol,
arg1, len1,
arg2, len2,
&status);
if (U_FAILURE(status))
ereport(ERROR,
(errmsg("collation failed: %s", u_errorName(status))));
which, as it happens, is also a UTF8-encoding-only code path.
Can this throw an error in practice, and if so does that
constitute a meaningful information leak? (For bonus points:
is this error report up to project standards?)
Thumbing through the list of UErrorCode values, it seems like the only
ones that are applicable here and aren't internal-error cases are
U_INVALID_CHAR_FOUND and the like, so that this boils down to "one of
the strings contains a character that ICU can't cope with". Maybe that's
impossible except with a pre-existing encoding violation, or maybe not.
In any case, from a purely theoretical viewpoint, such an error message
*does* constitute a leak of information about the input strings. Whether
it's a usable leak is very debatable, but that's basically what we've
got to decide.
regards, tom lane