Thread: Re: [COMMITTERS] pgsql: Cause pg_proc.probin to be declared as text, not bytea.

Greg Stark <gsstark@mit.edu> writes:
> On Tue, Aug 4, 2009 at 5:04 AM, Tom Lane<tgl@postgresql.org> wrote:
>> Cause pg_proc.probin to be declared as text, not bytea.

> Doesn't this relate to the earlier discussion of whether to re-encode
> filenames and paths?

> What's going to happen if I have filenames which aren't valid encoded
> strings in the server encoding -- say UTF8 filenames but I'm using
> latin1 in the server or vice versa. Will my CREATE FUNCTION command
> end up storing an invalid encoded string? Or re-encode the filename
> and then fail to find the file?

Right at the moment we simply aren't considering any of those cases.
If you'd like to propose and implement a solution, feel free.  I think
the last proposal foundered on the fact that it had no idea what
encoding the filesystem was expecting anyway.

I'll point out though that having probin declared bytea would surely
be antithetical to any attempt to treat shlib filenames in an
encoding-aware fashion.  Declaring it that way implies that it is
*not* storing a character string that has any particular encoding.
        regards, tom lane


On Tue, Aug 4, 2009 at 2:46 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>
> I'll point out though that having probin declared bytea would surely
> be antithetical to any attempt to treat shlib filenames in an
> encoding-aware fashion.  Declaring it that way implies that it is
> *not* storing a character string that has any particular encoding.

Well that's kind of the point. Unix filesystems traditionally prohibit'/' and '\0' but otherwise allowing any series of
byteswithout 
requiring any particular encoding. If we used bytea to store
filesystem paths then you could specify any arbitrary series of bytes
without worrying that the server will re-encode it differently.

--
greg
http://mit.edu/~gsstark/resume.pdf



Greg Stark wrote:
> On Tue, Aug 4, 2009 at 2:46 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>   
>> I'll point out though that having probin declared bytea would surely
>> be antithetical to any attempt to treat shlib filenames in an
>> encoding-aware fashion.  Declaring it that way implies that it is
>> *not* storing a character string that has any particular encoding.
>>     
>
> Well that's kind of the point. Unix filesystems traditionally prohibit
>  '/' and '\0' but otherwise allowing any series of bytes without
> requiring any particular encoding. If we used bytea to store
> filesystem paths then you could specify any arbitrary series of bytes
> without worrying that the server will re-encode it differently.
>
>   

Is this any different from the path in "COPY foo to '/path/to/file'"?

I suspect the probin stuff is a solution in search of a problem.

cheers

andrew


Andrew Dunstan <andrew@dunslane.net> writes:
> Is this any different from the path in "COPY foo to '/path/to/file'"?
> I suspect the probin stuff is a solution in search of a problem.

Well, the previous probin behavior is demonstrably broken.  Make a shlib
with backslash or non-ASCII in the name, create a function referencing
it, dump and reload.  Whatever your opinions are about encodings, you
won't think pg_dump did the right thing.

I'm not sure whether the more general pathname encoding issue is worth
working on or not.  In general it's a non-problem if the paths in the
server filesystem are written in the database encoding.  If they are
not, then you have to figure out what they *are* written in, and that
seems a bit tough.  But anyway that problem is hardly restricted to
probin, and a solution that works only for probin doesn't seem terribly
interesting.
        regards, tom lane