On Wed, May 29, 2019 at 03:03:13PM -0400, Tom Lane wrote:
> Pursuant to today's discussion at PGCon about code coverage, I went
> nosing into some of the particularly under-covered subdirectories
> in our tree, and immediately tripped over an interesting factoid:
> the ASCII<->MIC and ASCII<->UTF8 encoding conversion functions are
> untested ... not because the regression tests don't try, but because
> those conversions are unreachable. pg_do_encoding_conversion() and
> its sister functions have hard-wired fast paths for any conversion
> in which the source or target encoding is SQL_ASCII, so that an
> encoding conversion function declared for such a case will never
> be used.
> This situation seems kinda silly. My inclination is to delete
> these functions as useless, but I suppose another approach is
> to suppress the fast paths if there's a declared conversion function.
> (Doing so would likely require added catalog lookups in places we
> might not want them...)
Removing the fast paths to make ascii_to_utf8() reachable would cause ERROR
when server_encoding=SQL_ASCII, client_encoding=UTF8, and a query would
otherwise send the client any character outside 7-bit ASCII. That's fairly
defensible, but doing it for only UTF8 and MULE_INTERNAL is not. So if we
like the ascii_to_utf8() behavior, I think the action would be to replace the
fast path with an encoding-independent verification that all bytes are 7-bit
ASCII. (The check would not apply when both server_encoding and
client_encoding are SQL_ASCII, of course.) Alternately, one might prefer to
replace the fast path with an encoding verification; in the SQL_ASCII-to-UTF8
case, we'd allow byte sequences that are valid UTF8, even though the validity
may be a coincidence and mojibake may ensue. SQL_ASCII is for being casual
about encoding, so it's not clear to me whether or not either prospective
behavior change would be an improvement. However, I do find it clear to
delete ascii_to_utf8() and ascii_to_mic().
> If we do delete them as useless, it might also be advisable to change
> CreateConversionCommand() to refuse creation of conversions to/from
> SQL_ASCII, to prevent future confusion.
Sounds good.