Thread: Why are default encoding conversions namespace-specific?

Why are default encoding conversions namespace-specific?

From
Tom Lane
Date:
See $SUBJECT.  It seems to me this is a bad idea for much the same
reasons that we recently decided default index operator classes should
not be namespace-specific:
http://archives.postgresql.org/pgsql-hackers/2006-02/msg00284.php

I don't mind having encoding conversions be named within schemas,
but I propose that any given encoding pair be allowed to have only
one default conversion, period, and that when we are looking for
a default conversion we find it by a non-namespace-aware search.

With the existing definition, any change in search_path could
theoretically cause a change in client-to-server encoding conversion
behavior, and this just seems like a really bad idea.  (It's only
theoretical because we don't actually redo the conversion function
search on a search_path change ... but if you think the existing
definition is good then that's a bug.)

Comments?
        regards, tom lane


Re: Why are default encoding conversions

From
Tatsuo Ishii
Date:
> See $SUBJECT.  It seems to me this is a bad idea for much the same
> reasons that we recently decided default index operator classes should
> not be namespace-specific:
> http://archives.postgresql.org/pgsql-hackers/2006-02/msg00284.php
> 
> I don't mind having encoding conversions be named within schemas,
> but I propose that any given encoding pair be allowed to have only
> one default conversion, period, and that when we are looking for
> a default conversion we find it by a non-namespace-aware search.

That doesn't sound good idea to me.

> With the existing definition, any change in search_path could
> theoretically cause a change in client-to-server encoding conversion
> behavior, and this just seems like a really bad idea.  (It's only
> theoretical because we don't actually redo the conversion function
> search on a search_path change ... but if you think the existing
> definition is good then that's a bug.)

Then why do we have CREATE DEFAULT CONVERSION command at all?
--
Tatsuo Ishii
SRA OSS, Inc. Japan


Re: Why are default encoding conversions namespace-specific?

From
Tom Lane
Date:
Tatsuo Ishii <ishii@sraoss.co.jp> writes:
>> I don't mind having encoding conversions be named within schemas,
>> but I propose that any given encoding pair be allowed to have only
>> one default conversion, period, and that when we are looking for
>> a default conversion we find it by a non-namespace-aware search.

> That doesn't sound good idea to me.

What does it mean to have different "default" encoding conversions in
different schemas?  Even if this had a sensible interpretation, I don't
think the existing code implements it properly.

> Then why do we have CREATE DEFAULT CONVERSION command at all?

So you can create the one you're allowed to have, of course ...
        regards, tom lane


Re: Why are default encoding conversions namespace-specific?

From
"Andrew Dunstan"
Date:
Tom Lane said:
> Tatsuo Ishii <ishii@sraoss.co.jp> writes:
>>> I don't mind having encoding conversions be named within schemas, but
>>> I propose that any given encoding pair be allowed to have only one
>>> default conversion, period, and that when we are looking for a
>>> default conversion we find it by a non-namespace-aware search.
>
>> That doesn't sound good idea to me.
>
> What does it mean to have different "default" encoding conversions in
> different schemas?  Even if this had a sensible interpretation, I don't
> think the existing code implements it properly.

perhaps I'm misunderstanding, but why not just resolve the namespace at the
time the default conversion is created?

cheers

andrew







Re: Why are default encoding conversions namespace-specific?

From
Tom Lane
Date:
"Andrew Dunstan" <andrew@dunslane.net> writes:
> Tom Lane said:
>> What does it mean to have different "default" encoding conversions in
>> different schemas?  Even if this had a sensible interpretation, I don't
>> think the existing code implements it properly.

> perhaps I'm misunderstanding, but why not just resolve the namespace at the
> time the default conversion is created?

Isn't that the same thing as saying that there can only be one default
conversion across all schemas?  ("Only one" for a given source and
target encoding pair, of course.)  If it isn't the same, please explain
more clearly.
        regards, tom lane


Re: Why are default encoding conversions namespace-specific?

From
"Andrew Dunstan"
Date:
Tom Lane said:
> "Andrew Dunstan" <andrew@dunslane.net> writes:
>> Tom Lane said:
>>> What does it mean to have different "default" encoding conversions in
>>> different schemas?  Even if this had a sensible interpretation, I
>>> don't think the existing code implements it properly.
>
>> perhaps I'm misunderstanding, but why not just resolve the namespace
>> at the time the default conversion is created?
>
> Isn't that the same thing as saying that there can only be one default
> conversion across all schemas?  ("Only one" for a given source and
> target encoding pair, of course.)  If it isn't the same, please explain
> more clearly.
>
>

Yeah, I guess it is. I was thinking of it more as "namespace-specified" than
as "non-namespace-aware". I guess it's a matter of perspective.

cheers

andrew





Re: Why are default encoding conversions

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <ishii@sraoss.co.jp> writes:
> >> I don't mind having encoding conversions be named within schemas,
> >> but I propose that any given encoding pair be allowed to have only
> >> one default conversion, period, and that when we are looking for
> >> a default conversion we find it by a non-namespace-aware search.
> 
> > That doesn't sound good idea to me.
> 
> What does it mean to have different "default" encoding conversions in
> different schemas?  Even if this had a sensible interpretation, I don't
> think the existing code implements it properly.
>
> > Then why do we have CREATE DEFAULT CONVERSION command at all?
> 
> So you can create the one you're allowed to have, of course ...

If you do allow only one default conversion for encodings A and B
regardless schemas, then how one can have different default conversion
for A and B?

I'm sure we need more than one default conversion for encoding A and
B. For example, different vendors provide different conversion maps
for SJIS and UTF-8. M$ has its own and Apple has another one, etc. The
differences are not huge but some customers might think the difference
is critical. In this case they could create their own conversion in
their schema.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


Re: Why are default encoding conversions

From
Tom Lane
Date:
Tatsuo Ishii <ishii@sraoss.co.jp> writes:
> I'm sure we need more than one default conversion for encoding A and
> B. For example, different vendors provide different conversion maps
> for SJIS and UTF-8. M$ has its own and Apple has another one, etc. The
> differences are not huge but some customers might think the difference
> is critical. In this case they could create their own conversion in
> their schema.

Well, being able to switch to a different conversion is fine, but I don't
think that's a good argument for tying it to the schema search path.
What would make more sense to me is a command specifically setting the
conversion to use --- perhaps a GUC variable, since then ALTER USER SET
and ALTER DATABASE SET would provide convenient ways of controlling it.
        regards, tom lane


Re: Why are default encoding conversions

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <ishii@sraoss.co.jp> writes:
> > I'm sure we need more than one default conversion for encoding A and
> > B. For example, different vendors provide different conversion maps
> > for SJIS and UTF-8. M$ has its own and Apple has another one, etc. The
> > differences are not huge but some customers might think the difference
> > is critical. In this case they could create their own conversion in
> > their schema.
> 
> Well, being able to switch to a different conversion is fine, but I don't
> think that's a good argument for tying it to the schema search path.
> What would make more sense to me is a command specifically setting the
> conversion to use --- perhaps a GUC variable, since then ALTER USER SET
> and ALTER DATABASE SET would provide convenient ways of controlling it.

If it does work, then it's ok. However still I'm not sure why current
method is evil.

BTW, what does the standard say about conversion vs. schema? Doesn't
conversion belong to schema? If so, then schema specific default
conversion seems more standard-friendly way.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


Re: Why are default encoding conversions

From
Tom Lane
Date:
Tatsuo Ishii <ishii@sraoss.co.jp> writes:
>> Well, being able to switch to a different conversion is fine, but I don't
>> think that's a good argument for tying it to the schema search path.

> If it does work, then it's ok. However still I'm not sure why current
> method is evil.

Because with the current definition, any change in search_path really
ought to lead to repeating the lookup for the default conversion proc.
That's a bad idea from a performance point of view and I don't think
it's a particularly good idea from the definitional point of view
either --- do you really want the client conversion changing because
some function altered the search path?

> BTW, what does the standard say about conversion vs. schema? Doesn't
> conversion belong to schema? If so, then schema specific default
> conversion seems more standard-friendly way.

AFAICT we invented the entire concept of conversions ourselves.  I see
nothing about CREATE CONVERSION in the SQL spec.  There is a CREATE
TRANSLATION in SQL2003, which we'd probably not seen when we invented
CREATE CONVERSION, but it does *not* have a DEFAULT clause.  I don't
think you can point to the spec to defend our current method of
selecting which conversion to use.
        regards, tom lane


Re: Why are default encoding conversions

From
Martijn van Oosterhout
Date:
On Wed, Mar 29, 2006 at 01:09:08AM +0900, Tatsuo Ishii wrote:
> BTW, what does the standard say about conversion vs. schema? Doesn't
> conversion belong to schema? If so, then schema specific default
> conversion seems more standard-friendly way.

The standard says nothing about conversions. They're only used when
communicating between the client and the server. By having them belong
to a schema you suggest that your queries be interpreted differently
character set-wise depending on the schema.

SELECT * FROM myschema.mytable;
SET search_path=otherschema;
SELECT * FROM myschema.mytable;

So the second may produce a different output because the schema changed
and the data to the client will be encoded in a different encoding.
Ofcourse, if the client and server are using the same encoding then the
queries will produce the same result. That sounds broken to me.

The reason it doesn't happen now is because (as Tom said) we only do
the lookup once. But can trigger it if you're careful.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Why are default encoding conversions

From
Tatsuo Ishii
Date:
> > If it does work, then it's ok. However still I'm not sure why current
> > method is evil.
> 
> Because with the current definition, any change in search_path really
> ought to lead to repeating the lookup for the default conversion proc.
> That's a bad idea from a performance point of view and I don't think
> it's a particularly good idea from the definitional point of view
> either --- do you really want the client conversion changing because
> some function altered the search path?

That argument does not strike me too strongly. I cannot imagine the
case search_path changed so frequently.

> AFAICT we invented the entire concept of conversions ourselves.  I see
> nothing about CREATE CONVERSION in the SQL spec.  There is a CREATE
> TRANSLATION in SQL2003, which we'd probably not seen when we invented
> CREATE CONVERSION, but it does *not* have a DEFAULT clause.  I don't
> think you can point to the spec to defend our current method of
> selecting which conversion to use.

SQL's CONVERT and TRANSLATE are different things. CONVERT changes
encodings, while TRANSLATE changes character sets. However sometimes
the difference between encodings and character sets are vague,
for some encodings such as LATIN* and SJIS.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


Re: Why are default encoding conversions

From
Tom Lane
Date:
Tatsuo Ishii <ishii@sraoss.co.jp> writes:
>> Because with the current definition, any change in search_path really
>> ought to lead to repeating the lookup for the default conversion proc.
>> That's a bad idea from a performance point of view and I don't think
>> it's a particularly good idea from the definitional point of view
>> either --- do you really want the client conversion changing because
>> some function altered the search path?

> That argument does not strike me too strongly. I cannot imagine the
> case search_path changed so frequently.

I can.  There's been talk for example of having a search path associated
with every function definition, so that it might need to be changed at
every function call and return.  In any case I don't like the notion
that the client conversion is tied to search_path; they really should
be independent.
        regards, tom lane