Re: Dead encoding conversion functions - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Dead encoding conversion functions
Date
Msg-id 20190615180732.GA239428@rfd.leadboat.com
Whole thread Raw
In response to Dead encoding conversion functions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, May 29, 2019 at 03:03:13PM -0400, Tom Lane wrote:
> Pursuant to today's discussion at PGCon about code coverage, I went
> nosing into some of the particularly under-covered subdirectories
> in our tree, and immediately tripped over an interesting factoid:
> the ASCII<->MIC and ASCII<->UTF8 encoding conversion functions are
> untested ... not because the regression tests don't try, but because
> those conversions are unreachable.  pg_do_encoding_conversion() and
> its sister functions have hard-wired fast paths for any conversion
> in which the source or target encoding is SQL_ASCII, so that an
> encoding conversion function declared for such a case will never
> be used.

> This situation seems kinda silly.  My inclination is to delete
> these functions as useless, but I suppose another approach is
> to suppress the fast paths if there's a declared conversion function.
> (Doing so would likely require added catalog lookups in places we
> might not want them...)

Removing the fast paths to make ascii_to_utf8() reachable would cause ERROR
when server_encoding=SQL_ASCII, client_encoding=UTF8, and a query would
otherwise send the client any character outside 7-bit ASCII.  That's fairly
defensible, but doing it for only UTF8 and MULE_INTERNAL is not.  So if we
like the ascii_to_utf8() behavior, I think the action would be to replace the
fast path with an encoding-independent verification that all bytes are 7-bit
ASCII.  (The check would not apply when both server_encoding and
client_encoding are SQL_ASCII, of course.)  Alternately, one might prefer to
replace the fast path with an encoding verification; in the SQL_ASCII-to-UTF8
case, we'd allow byte sequences that are valid UTF8, even though the validity
may be a coincidence and mojibake may ensue.  SQL_ASCII is for being casual
about encoding, so it's not clear to me whether or not either prospective
behavior change would be an improvement.  However, I do find it clear to
delete ascii_to_utf8() and ascii_to_mic().

> If we do delete them as useless, it might also be advisable to change
> CreateConversionCommand() to refuse creation of conversions to/from
> SQL_ASCII, to prevent future confusion.

Sounds good.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Extracting only the columns needed for a query
Next
From: Noah Misch
Date:
Subject: Re: pg_upgrade can result in early wraparound on databases with hightransaction load