11.06.2024 02:10, Michael Paquier wrote:
> So, valgrind is noisy here because the incomplete byte sequence
> \x8b passed as input of the function after a valid byte sequence
> causes pg_encoding_mblen() to return 2, which is 1 byte passed the end
> of the actual string. report_invalid_encoding() is careful to not
> show any data passed in its input. I am not really sure that this is
> worth bothering with here, TBH. As far as I know this is related to
> the fact that we want test_enc_conversion() to be flexible enough to
> accept any kind of input, working as a cheap shortcut to call these
> internal encoding routines. What do you think?
I think that maybe this case illustrates an API shortcoming. The
report_invalid_encoding function takes len as an argument, but then passes
mbstr as a null-terminated string to pg_encoding_mblen():
/*
* report_invalid_encoding: complain about invalid multibyte character
*
* note: len is remaining length of string, not length of character;
* len must be greater than zero, as we always examine the first byte.
*/
void
report_invalid_encoding(int encoding, const char *mbstr, int len)
{
int l = pg_encoding_mblen(encoding, mbstr);
...
Maybe the comment should say something about null-termination expected?
I could not find other report_invalid_encoding() callers that pass not
null-terminated strings to it, so maybe leave test_enc_conversion() as-is,
just keep in mind that it is an imperfect example of
report_invalid_encoding() usage and it might require some modifications
for fuzz testing.
Best regards,
Alexander