Thread: Why are default encoding conversions namespace-specific?
See $SUBJECT. It seems to me this is a bad idea for much the same reasons that we recently decided default index operator classes should not be namespace-specific: http://archives.postgresql.org/pgsql-hackers/2006-02/msg00284.php I don't mind having encoding conversions be named within schemas, but I propose that any given encoding pair be allowed to have only one default conversion, period, and that when we are looking for a default conversion we find it by a non-namespace-aware search. With the existing definition, any change in search_path could theoretically cause a change in client-to-server encoding conversion behavior, and this just seems like a really bad idea. (It's only theoretical because we don't actually redo the conversion function search on a search_path change ... but if you think the existing definition is good then that's a bug.) Comments? regards, tom lane
> See $SUBJECT. It seems to me this is a bad idea for much the same > reasons that we recently decided default index operator classes should > not be namespace-specific: > http://archives.postgresql.org/pgsql-hackers/2006-02/msg00284.php > > I don't mind having encoding conversions be named within schemas, > but I propose that any given encoding pair be allowed to have only > one default conversion, period, and that when we are looking for > a default conversion we find it by a non-namespace-aware search. That doesn't sound good idea to me. > With the existing definition, any change in search_path could > theoretically cause a change in client-to-server encoding conversion > behavior, and this just seems like a really bad idea. (It's only > theoretical because we don't actually redo the conversion function > search on a search_path change ... but if you think the existing > definition is good then that's a bug.) Then why do we have CREATE DEFAULT CONVERSION command at all? -- Tatsuo Ishii SRA OSS, Inc. Japan
Tatsuo Ishii <ishii@sraoss.co.jp> writes: >> I don't mind having encoding conversions be named within schemas, >> but I propose that any given encoding pair be allowed to have only >> one default conversion, period, and that when we are looking for >> a default conversion we find it by a non-namespace-aware search. > That doesn't sound good idea to me. What does it mean to have different "default" encoding conversions in different schemas? Even if this had a sensible interpretation, I don't think the existing code implements it properly. > Then why do we have CREATE DEFAULT CONVERSION command at all? So you can create the one you're allowed to have, of course ... regards, tom lane
Tom Lane said: > Tatsuo Ishii <ishii@sraoss.co.jp> writes: >>> I don't mind having encoding conversions be named within schemas, but >>> I propose that any given encoding pair be allowed to have only one >>> default conversion, period, and that when we are looking for a >>> default conversion we find it by a non-namespace-aware search. > >> That doesn't sound good idea to me. > > What does it mean to have different "default" encoding conversions in > different schemas? Even if this had a sensible interpretation, I don't > think the existing code implements it properly. perhaps I'm misunderstanding, but why not just resolve the namespace at the time the default conversion is created? cheers andrew
"Andrew Dunstan" <andrew@dunslane.net> writes: > Tom Lane said: >> What does it mean to have different "default" encoding conversions in >> different schemas? Even if this had a sensible interpretation, I don't >> think the existing code implements it properly. > perhaps I'm misunderstanding, but why not just resolve the namespace at the > time the default conversion is created? Isn't that the same thing as saying that there can only be one default conversion across all schemas? ("Only one" for a given source and target encoding pair, of course.) If it isn't the same, please explain more clearly. regards, tom lane
Tom Lane said: > "Andrew Dunstan" <andrew@dunslane.net> writes: >> Tom Lane said: >>> What does it mean to have different "default" encoding conversions in >>> different schemas? Even if this had a sensible interpretation, I >>> don't think the existing code implements it properly. > >> perhaps I'm misunderstanding, but why not just resolve the namespace >> at the time the default conversion is created? > > Isn't that the same thing as saying that there can only be one default > conversion across all schemas? ("Only one" for a given source and > target encoding pair, of course.) If it isn't the same, please explain > more clearly. > > Yeah, I guess it is. I was thinking of it more as "namespace-specified" than as "non-namespace-aware". I guess it's a matter of perspective. cheers andrew
> Tatsuo Ishii <ishii@sraoss.co.jp> writes: > >> I don't mind having encoding conversions be named within schemas, > >> but I propose that any given encoding pair be allowed to have only > >> one default conversion, period, and that when we are looking for > >> a default conversion we find it by a non-namespace-aware search. > > > That doesn't sound good idea to me. > > What does it mean to have different "default" encoding conversions in > different schemas? Even if this had a sensible interpretation, I don't > think the existing code implements it properly. > > > Then why do we have CREATE DEFAULT CONVERSION command at all? > > So you can create the one you're allowed to have, of course ... If you do allow only one default conversion for encodings A and B regardless schemas, then how one can have different default conversion for A and B? I'm sure we need more than one default conversion for encoding A and B. For example, different vendors provide different conversion maps for SJIS and UTF-8. M$ has its own and Apple has another one, etc. The differences are not huge but some customers might think the difference is critical. In this case they could create their own conversion in their schema. -- Tatsuo Ishii SRA OSS, Inc. Japan
Tatsuo Ishii <ishii@sraoss.co.jp> writes: > I'm sure we need more than one default conversion for encoding A and > B. For example, different vendors provide different conversion maps > for SJIS and UTF-8. M$ has its own and Apple has another one, etc. The > differences are not huge but some customers might think the difference > is critical. In this case they could create their own conversion in > their schema. Well, being able to switch to a different conversion is fine, but I don't think that's a good argument for tying it to the schema search path. What would make more sense to me is a command specifically setting the conversion to use --- perhaps a GUC variable, since then ALTER USER SET and ALTER DATABASE SET would provide convenient ways of controlling it. regards, tom lane
> Tatsuo Ishii <ishii@sraoss.co.jp> writes: > > I'm sure we need more than one default conversion for encoding A and > > B. For example, different vendors provide different conversion maps > > for SJIS and UTF-8. M$ has its own and Apple has another one, etc. The > > differences are not huge but some customers might think the difference > > is critical. In this case they could create their own conversion in > > their schema. > > Well, being able to switch to a different conversion is fine, but I don't > think that's a good argument for tying it to the schema search path. > What would make more sense to me is a command specifically setting the > conversion to use --- perhaps a GUC variable, since then ALTER USER SET > and ALTER DATABASE SET would provide convenient ways of controlling it. If it does work, then it's ok. However still I'm not sure why current method is evil. BTW, what does the standard say about conversion vs. schema? Doesn't conversion belong to schema? If so, then schema specific default conversion seems more standard-friendly way. -- Tatsuo Ishii SRA OSS, Inc. Japan
Tatsuo Ishii <ishii@sraoss.co.jp> writes: >> Well, being able to switch to a different conversion is fine, but I don't >> think that's a good argument for tying it to the schema search path. > If it does work, then it's ok. However still I'm not sure why current > method is evil. Because with the current definition, any change in search_path really ought to lead to repeating the lookup for the default conversion proc. That's a bad idea from a performance point of view and I don't think it's a particularly good idea from the definitional point of view either --- do you really want the client conversion changing because some function altered the search path? > BTW, what does the standard say about conversion vs. schema? Doesn't > conversion belong to schema? If so, then schema specific default > conversion seems more standard-friendly way. AFAICT we invented the entire concept of conversions ourselves. I see nothing about CREATE CONVERSION in the SQL spec. There is a CREATE TRANSLATION in SQL2003, which we'd probably not seen when we invented CREATE CONVERSION, but it does *not* have a DEFAULT clause. I don't think you can point to the spec to defend our current method of selecting which conversion to use. regards, tom lane
On Wed, Mar 29, 2006 at 01:09:08AM +0900, Tatsuo Ishii wrote: > BTW, what does the standard say about conversion vs. schema? Doesn't > conversion belong to schema? If so, then schema specific default > conversion seems more standard-friendly way. The standard says nothing about conversions. They're only used when communicating between the client and the server. By having them belong to a schema you suggest that your queries be interpreted differently character set-wise depending on the schema. SELECT * FROM myschema.mytable; SET search_path=otherschema; SELECT * FROM myschema.mytable; So the second may produce a different output because the schema changed and the data to the client will be encoded in a different encoding. Ofcourse, if the client and server are using the same encoding then the queries will produce the same result. That sounds broken to me. The reason it doesn't happen now is because (as Tom said) we only do the lookup once. But can trigger it if you're careful. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
> > If it does work, then it's ok. However still I'm not sure why current > > method is evil. > > Because with the current definition, any change in search_path really > ought to lead to repeating the lookup for the default conversion proc. > That's a bad idea from a performance point of view and I don't think > it's a particularly good idea from the definitional point of view > either --- do you really want the client conversion changing because > some function altered the search path? That argument does not strike me too strongly. I cannot imagine the case search_path changed so frequently. > AFAICT we invented the entire concept of conversions ourselves. I see > nothing about CREATE CONVERSION in the SQL spec. There is a CREATE > TRANSLATION in SQL2003, which we'd probably not seen when we invented > CREATE CONVERSION, but it does *not* have a DEFAULT clause. I don't > think you can point to the spec to defend our current method of > selecting which conversion to use. SQL's CONVERT and TRANSLATE are different things. CONVERT changes encodings, while TRANSLATE changes character sets. However sometimes the difference between encodings and character sets are vague, for some encodings such as LATIN* and SJIS. -- Tatsuo Ishii SRA OSS, Inc. Japan
Tatsuo Ishii <ishii@sraoss.co.jp> writes: >> Because with the current definition, any change in search_path really >> ought to lead to repeating the lookup for the default conversion proc. >> That's a bad idea from a performance point of view and I don't think >> it's a particularly good idea from the definitional point of view >> either --- do you really want the client conversion changing because >> some function altered the search path? > That argument does not strike me too strongly. I cannot imagine the > case search_path changed so frequently. I can. There's been talk for example of having a search path associated with every function definition, so that it might need to be changed at every function call and return. In any case I don't like the notion that the client conversion is tied to search_path; they really should be independent. regards, tom lane