On 2020-Jan-23, Robert Haas wrote:
> No, that's not it. Suppose that Álvaro Herrera has some custom
> settings he likes to put on all the PostgreSQL clusters that he uses,
> so he creates a file álvaro.conf and uses an "include" directive in
> postgresql.conf to suck in those settings. If he also likes UTF-8,
> then the file name will be stored in the file system as a 12-byte
> value of which the first two bytes will be 0xc3 0xa1. In that case,
> everything will be fine, because JSON is supposed to always be UTF-8,
> and the file name is UTF-8, and it's all good. But suppose he instead
> likes LATIN-1.
I do have files with Latin-1-encoded names in my filesystem, even though
my system is UTF-8, so I understand the problem. I was wondering if it
would work to encode any non-UTF8-valid name using something like
base64; the encoded name will be plain ASCII and can be put in the
manifest, probably using a different field of the JSON object -- so for
a normal file you'd have { path => '1234/2345' } but for a
Latin-1-encoded file you'd have { path_base64 => '4Wx2YXJvLmNvbmYK' }.
Then it's the job of the tool to ensure it decodes the name to its
original form when creating/querying for the file.
A problem I have with this idea is that this is very corner-casey, so
most tool implementors will never realize that there's a need to decode
certain file names.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services