Just wondering, do you have the code in a GitHub/Gitlab branch?
>+ utf8_advance(s, state, len);
>+
>+ /*
>+ * If we saw an error during the loop, let the caller handle it. We treat
>+ * all other states as success.
>+ */
>+ if (state == ERR)
>+ return 0;
Did you mean state = utf8_advance(s, state, len); there? (reassign state variable)
>I wanted to try different strides for the DFA
Does that (and "len >= 32" condition) mean the patch does not improve validation of the shorter strings (the ones less than 32 bytes)?
It would probably be nice to cover them as well (e.g. with 4 or 8-byte strides)