Thread: Facing issue in using special characters
Hi all,
Facing issue in using special characters. We are trying to insert records to a remote Postgres Server and our application not able to perform this because of errors.
It seems that issue is because of the special characters that has been used in one of the field of a row.
Regards
Tarkeshwar
Facing issue in using special characters. We are trying to insert records to a remote Postgres Server and our application not able to perform this because of errors.
It seems that issue is because of the special characters that has been used in one of the field of a row.
This is not an issue for "hackers" nor "performance" in fact even for "general" it isn't really an issue.
"Special characters" is actually nonsense.
When people complain about "special characters" they haven't thought things through.
If you are unwilling to think things through and go step by step to make sure you know what you are doing, then you will not get it and really nobody can help you.
In my professional experience, people who complain about "special characters" need to be cut loose or be given a chance (if they are established employees who carry some weight). If a contractor complains about "special characters" they need to be fired.
Understand charsets -- character set, code point, and encoding. Then understand how encoding and string literals and "escape sequences" in string literals might work.
Know that UNICODE today is the one standard, and there is no more need to do code table switch. There is nothing special about a Hebrew alef or a greek lower case alpha or a latin A. Nor a hyphen and en-dash or an em-dash. All these characters are in the UNICODE. Yes, there are some Japanese who claim that they don't like that their Chinese character versions are put together with simplified reform Chinese font. But that's a font issue, not a character code issue.
7 bit ASCII is the first page of UNICODE, even in the UTF-8 encoding.
ISO Latin 1, or the Windoze 123 whatever special table of ISO Latin 1 has the same code points as UNICODE pages 0 and 1, but not compatible with UTF-8 coding because of the way UTF-8 uses the 8th bit.
But none of this is likely your problem.
Your problem is about string literals in SQL for examples. About the configuration of your database (I always use initdb with --locale C and --encoding UTF-8). Use UTF-8 in the database. Then all your issues are about string literals in SQL and in JAVA and JSON and XML or whatever you are using.
You have to do the right thing. If you produce any representation, whether that is XML or JSON or SQL or URL query parameters, or a CSV file, or anything at all, you need to escape your string values properly.
This question with no detail didn't deserve such a thorough answer, but it's my soap box. I do not accept people complaining about "special characters". My own people get that same sermon from me when they make that mistake.
-Gunther
Hi all,
Facing issue in using special characters. We are trying to insert records to a remote Postgres Server and our application not able to perform this because of errors.
It seems that issue is because of the special characters that has been used in one of the field of a row.
Regards
Tarkeshwar
This is not an issue for "hackers" nor "performance" in fact even for "general" it isn't really an issue.
"Special characters" is actually nonsense.
When people complain about "special characters" they haven't thought things through.
If you are unwilling to think things through and go step by step to make sure you know what you are doing, then you will not get it and really nobody can help you.
In my professional experience, people who complain about "special characters" need to be cut loose or be given a chance (if they are established employees who carry some weight). If a contractor complains about "special characters" they need to be fired.
Understand charsets -- character set, code point, and encoding. Then understand how encoding and string literals and "escape sequences" in string literals might work.
Know that UNICODE today is the one standard, and there is no more need to do code table switch. There is nothing special about a Hebrew alef or a greek lower case alpha or a latin A. Nor a hyphen and en-dash or an em-dash. All these characters are in the UNICODE. Yes, there are some Japanese who claim that they don't like that their Chinese character versions are put together with simplified reform Chinese font. But that's a font issue, not a character code issue.
7 bit ASCII is the first page of UNICODE, even in the UTF-8 encoding.
ISO Latin 1, or the Windoze 123 whatever special table of ISO Latin 1 has the same code points as UNICODE pages 0 and 1, but not compatible with UTF-8 coding because of the way UTF-8 uses the 8th bit.
But none of this is likely your problem.
Your problem is about string literals in SQL for examples. About the configuration of your database (I always use initdb with --locale C and --encoding UTF-8). Use UTF-8 in the database. Then all your issues are about string literals in SQL and in JAVA and JSON and XML or whatever you are using.
You have to do the right thing. If you produce any representation, whether that is XML or JSON or SQL or URL query parameters, or a CSV file, or anything at all, you need to escape your string values properly.
This question with no detail didn't deserve such a thorough answer, but it's my soap box. I do not accept people complaining about "special characters". My own people get that same sermon from me when they make that mistake.
-Gunther
Hi all,
Facing issue in using special characters. We are trying to insert records to a remote Postgres Server and our application not able to perform this because of errors.
It seems that issue is because of the special characters that has been used in one of the field of a row.
Regards
Tarkeshwar
This is not an issue for "hackers" nor "performance" in fact even for "general" it isn't really an issue.
"Special characters" is actually nonsense.
When people complain about "special characters" they haven't thought things through.
If you are unwilling to think things through and go step by step to make sure you know what you are doing, then you will not get it and really nobody can help you.
In my professional experience, people who complain about "special characters" need to be cut loose or be given a chance (if they are established employees who carry some weight). If a contractor complains about "special characters" they need to be fired.
Understand charsets -- character set, code point, and encoding. Then understand how encoding and string literals and "escape sequences" in string literals might work.
Know that UNICODE today is the one standard, and there is no more need to do code table switch. There is nothing special about a Hebrew alef or a greek lower case alpha or a latin A. Nor a hyphen and en-dash or an em-dash. All these characters are in the UNICODE. Yes, there are some Japanese who claim that they don't like that their Chinese character versions are put together with simplified reform Chinese font. But that's a font issue, not a character code issue.
7 bit ASCII is the first page of UNICODE, even in the UTF-8 encoding.
ISO Latin 1, or the Windoze 123 whatever special table of ISO Latin 1 has the same code points as UNICODE pages 0 and 1, but not compatible with UTF-8 coding because of the way UTF-8 uses the 8th bit.
But none of this is likely your problem.
Your problem is about string literals in SQL for examples. About the configuration of your database (I always use initdb with --locale C and --encoding UTF-8). Use UTF-8 in the database. Then all your issues are about string literals in SQL and in JAVA and JSON and XML or whatever you are using.
You have to do the right thing. If you produce any representation, whether that is XML or JSON or SQL or URL query parameters, or a CSV file, or anything at all, you need to escape your string values properly.
This question with no detail didn't deserve such a thorough answer, but it's my soap box. I do not accept people complaining about "special characters". My own people get that same sermon from me when they make that mistake.
-Gunther
Hi all,
Facing issue in using special characters. We are trying to insert records to a remote Postgres Server and our application not able to perform this because of errors.
It seems that issue is because of the special characters that has been used in one of the field of a row.
Regards
Tarkeshwar
On 3/15/19 11:59 AM, Gunther wrote: > This is not an issue for "hackers" nor "performance" in fact even for > "general" it isn't really an issue. As long as it's already been posted, may as well make it something helpful to find in the archive. > Understand charsets -- character set, code point, and encoding. Then > understand how encoding and string literals and "escape sequences" in > string literals might work. Good advice for sure. > Know that UNICODE today is the one standard, and there is no more need I wasn't sure from the question whether the original poster was in a position to choose the encoding of the database. Lots of things are easier if it can be set to UTF-8 these days, but perhaps it's a legacy situation. Maybe a good start would be to go do SHOW server_encoding; SHOW client_encoding; and then hit the internet and look up what that encoding (or those encodings, if different) can and can't represent, and go from there. It's worth knowing that, when the server encoding isn't UTF-8, PostgreSQL will have the obvious limitations entailed by that, but also some non-obvious ones that may be surprising, e.g. [1]. -Chap [1] https://www.postgresql.org/message-id/CA%2BTgmobUp8Q-wcjaKvV%3DsbDcziJoUUvBCB8m%2B_xhgOV4DjiA1A%40mail.gmail.com
On 2019-03-17 15:01:40 +0000, Warner, Gary, Jr wrote: > Many of us have faced character encoding issues because we are not in control > of our input sources and made the common assumption that UTF-8 covers > everything. UTF-8 covers "everything" in the sense that there is a round-trip from each character in every commonly-used charset/encoding to Unicode and back. The actual code may of course be different. For example, the € sign is 0xA4 in iso-8859-15, but U+20AC in Unicode. So you need an encoding/decoding step. And "commonly-used" means just that. Unicode covers a lot of character sets, but it can't cover every character set ever invented (I invented my own character sets when I was sixteen. Nobody except me ever used them and they have long succumbed to bit rot). > In my lab, as an example, some of our social media posts have included ZawGyi > Burmese character sets rather than Unicode Burmese. (Because Myanmar developed > technology In a closed to the world environment, they made up their own > non-standard character set which is very common still in Mobile phones.). I'd be surprised if there was a character set which is "very common in Mobile phones", even in a relatively poor country like Myanmar. Does ZawGyi actually include characters which aren't in Unicode are are they just encoded differently? hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>