Re: making the backend's json parser work in frontend code - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: making the backend's json parser work in frontend code
Date
Msg-id 20200123174958.GA3138@momjian.us
Whole thread Raw
In response to Re: making the backend's json parser work in frontend code  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: making the backend's json parser work in frontend code  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Jan 23, 2020 at 02:23:14PM -0300, Alvaro Herrera wrote:
> On 2020-Jan-23, Robert Haas wrote:
> 
> > No, that's not it. Suppose that Álvaro Herrera has some custom
> > settings he likes to put on all the PostgreSQL clusters that he uses,
> > so he creates a file álvaro.conf and uses an "include" directive in
> > postgresql.conf to suck in those settings. If he also likes UTF-8,
> > then the file name will be stored in the file system as a 12-byte
> > value of which the first two bytes will be 0xc3 0xa1. In that case,
> > everything will be fine, because JSON is supposed to always be UTF-8,
> > and the file name is UTF-8, and it's all good. But suppose he instead
> > likes LATIN-1.
> 
> I do have files with Latin-1-encoded names in my filesystem, even though
> my system is UTF-8, so I understand the problem.  I was wondering if it
> would work to encode any non-UTF8-valid name using something like
> base64; the encoded name will be plain ASCII and can be put in the
> manifest, probably using a different field of the JSON object -- so for
> a normal file you'd have { path => '1234/2345' } but for a
> Latin-1-encoded file you'd have { path_base64 => '4Wx2YXJvLmNvbmYK' }.
> Then it's the job of the tool to ensure it decodes the name to its
> original form when creating/querying for the file.
> 
> A problem I have with this idea is that this is very corner-casey, so
> most tool implementors will never realize that there's a need to decode
> certain file names.

Another idea is to use base64 for all non-ASCII file names, so we don't
need to check if the file name is valid UTF8 before outputting --- we
just need to check for non-ASCII, which is much easier.  Another
problem, though, is how do you _flag_ file names as being
base64-encoded?  Use another JSON field to specify that?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: ssl passphrase callback
Next
From: Tom Lane
Date:
Subject: Re: Allow to_date() and to_timestamp() to accept localized names