Hi,
On Wed, Nov 27, 2024 at 11:03:36AM -0500, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
> > That being said, I'm growing quite uneasy about the size of this hack, and
> > I'm wondering if it would be better to leave it alone (perhaps with an
> > update to the release notes) or just revert commit 562bee0 until we have a
> > better way of dealing with multibyte characters in identifiers (e.g.,
> > tracking their encoding). I suspect there are similar problems in other
> > places (e.g., pg_dumpall).
>
> Yeah, there is something to be said for reverting.
Agree that the size of the hack is growing quite uneasy.
But also it is not (because it currently just can't be) "perfect" (as in case of
multiple matches it would pick up the first one).
Producing multiple possible matches could be as simple as:
CREATE DATABASE "aäääääääääääääääääääääääääääääää";
CREATE DATABASE "aääääääääääääääääääääääääääääää";
and then:
psql -d "aääääääääääääääääääääääääääääääää"
I said "simple" because:
- both pg_database.datname are in the same encoding.
- none are truncated at creation time.
I think that could easily lead to bad surprise.
Leaving the current behavior (as in 17) alone has the pros of being consistent
for both ASCII and non-ASCII characters (as compared to reverting).
I'd vote for "leave it alone" or wait to see if we get more reports before
deciding.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com