Thread: UTF-8 collation on Windows?
What is the equivalent for "en_US.UTF-8" collation in case of windows?
-E utf8 -l en_US.UTF-8 -T template0
As a result the sorting is not same in case of windows and linux database.
Regards...
On 02/19/2014 06:41 AM, Dev Kumkar wrote: > Am really going no where with this after so many searching over net or > am missing some basic things, not sure! > > What is the equivalent for "en_US.UTF-8" collation in case of windows? > > In Linux am creating database with following options, as follows: > -E utf8 -l en_US.UTF-8 -T template0 > > This creates utf8 encoding and also the collation is set as en_US.UTF-8. > > in case of windows utf8 encoding parameter works but not sure abou the > equivalent for en_US.UTF-8 collation. > Default database gets created with 'English_United States.1252' collation. > > As a result the sorting is not same in case of windows and linux database. > > Any suggestions/ pointers here? I found the below that might help. I do not use Windows much any more so I do not have a machine handy to confirm. http://www.g-loaded.eu/2011/02/27/locale-windows/ > > Regards... > > > -- Adrian Klaver adrian.klaver@aklaver.com
I found the below that might help. I do not use Windows much any more so I do not have a machine handy to confirm.
http://www.g-loaded.eu/2011/02/27/locale-windows/
Has anyone set the LC_COLLATE as utf-8 on windows?
On 02/19/2014 11:42 AM, Dev Kumkar wrote: > On Wed, Feb 19, 2014 at 10:16 PM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > I found the below that might help. I do not use Windows much any > more so I do not have a machine handy to confirm. > > http://www.g-loaded.eu/2011/02/27/locale-windows/ > > > Thanks for the pointer. "*american_usa*" works however it sets the > LC_COLLATE to 'English_United States.1252' which is basically "ANSI > Latin 1" and is not utf8. The third parameter as the link says is > codeset - "language_territory.codeset". So what is the exact command you are using? > > Here is the list of all codesets > http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx but > still no success. > > One of the link says codepage 65001 and utf-8 is same - > http://stackoverflow.com/questions/1629437/is-codepage-65001-and-utf-8-the-same-thing. > > Am not able to find windows codeset equivalent of utf-8 and set it as > LC_COLLATE while creating database. > > Has anyone set the LC_COLLATE as utf-8 on windows? > > Regards...
So what is the exact command you are using?
createdb -U postgres -E utf8 -l american_usa <DBNAME>
On 02/19/2014 12:03 PM, Dev Kumkar wrote: > On Thu, Feb 20, 2014 at 1:19 AM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > So what is the exact command you are using? > > > createdb -U postgres -E utf8 -l american_usa <DBNAME> > Above command fails to create utf-8 LC_COLLATE. What does it set LC_CTYPE to? So what happens if you do?: createdb -U postgres -E utf8 -l american_usa.65001 <DBNAME> or createdb -U postgres -E utf8 --lc-ctype=american_usa --lc-collate=american_usa <DBNAME> > > Regards...
What does it set LC_CTYPE to?
So what happens if you do?:
createdb -U postgres -E utf8 -l american_usa.65001 <DBNAME>
createdb: database creation failed: ERROR: invalid locale name: "american_usa.65001"
or
createdb -U postgres -E utf8 --lc-ctype=american_usa --lc-collate=american_usa <DBNAME>
On 02/19/2014 12:16 PM, Dev Kumkar wrote: > On Thu, Feb 20, 2014 at 1:41 AM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > What does it set LC_CTYPE to? > > So what happens if you do?: > > createdb -U postgres -E utf8 -l american_usa.65001 <DBNAME> > > > *createdb: database creation failed: ERROR: invalid locale name: > "american_usa.65001" * > > > or > > createdb -U postgres -E utf8 --lc-ctype=american_usa > --lc-collate=american_usa <DBNAME> > > Succeeds but as replied earlier it creates database with LC_COLLATE = > 'English_United States.1252' which corresponds to Latin1. Just noticed you are not specifying the template database. Try using template0: createdb -U postgres -E utf8 --lc-ctype=american_usa --lc-collate=american_usa -T template0 <DBNAME> > > Regards...
Just noticed you are not specifying the template database. Try using template0:
createdb -U postgres -E utf8 --lc-ctype=american_usa --lc-collate=american_usa -T template0 <DBNAME>
Had specified template option this earlier too.
Btw in case the specified collation is incompatible then "createdb" binary complains and instructs to use template0 as template, which isn't the case here.
On 02/19/2014 12:43 PM, Dev Kumkar wrote: > On Thu, Feb 20, 2014 at 2:01 AM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > Just noticed you are not specifying the template database. Try using > template0: > > createdb -U postgres -E utf8 --lc-ctype=american_usa > --lc-collate=american_usa -T template0 <DBNAME> > > > Same result i.e. LC_COLLATE and LC_CTYPE gets set as 'English_United > States.1252' > Had specified template option this earlier too. > > Btw in case the specified collation is incompatible then "createdb" > binary complains and instructs to use template0 as template, which isn't > the case here. > > The codeset parameter is missed here and hence it takes the collation as > the default one which is 'English_United States.1252'. Alright last shot:) Taking hint from here: http://msdn.microsoft.com/en-us/library/x99tb11d.aspx try: createdb -U postgres -E utf8 -l en-US <DBNAME> If that does not work, not sure where to go. > > Regards...
Alright last shot:)
Taking hint from here:
http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
try:
createdb -U postgres -E utf8 -l en-US <DBNAME>
If that does not work, not sure where to go.
language string, country/region code, and code page. The set of available locale names, languages, country/region codes, and
code pages includes all those supported by the Windows NLS API except code pages that require more than two bytes per
character, such as UTF-7 and UTF-8. If you provide a code page value of UTF-7 or UTF-8, setlocale will fail, returning NULL."
However am sure there would be some codepage which can be used in postgreSQL to set the collation to UTF8 equivalent of linux.
Please suggest? Am sure this not something new which am looking for.
Regards...
On 02/19/2014 01:09 PM, Dev Kumkar wrote: > On Thu, Feb 20, 2014 at 2:24 AM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > Alright last shot:) > > Taking hint from here: > > http://msdn.microsoft.com/en-__us/library/x99tb11d.aspx > <http://msdn.microsoft.com/en-us/library/x99tb11d.aspx> > > try: > > createdb -U postgres -E utf8 -l en-US <DBNAME> > > If that does not work, not sure where to go. > > > This won't work on Windows. Note that en-US collation name is specific > to linux and in case of Windows these names are different which is where > am facing issues to find the exact code page which corresponds to utf8. Have you tried it? Note that the locale name is different then the one Linux. On Linux it is en_US. What I suggested is en-US.
Have you tried it?
Note that the locale name is different then the one Linux.
On Linux it is en_US.
What I suggested is en-US.
Yes. Here is the output:
createdb -U postgres -E utf8 -l en-US -T template0 mynewdb
Password:
createdb: database creation failed: ERROR: invalid locale name: "en-US"
On 02/19/2014 01:21 PM, Dev Kumkar wrote: > On Thu, Feb 20, 2014 at 2:45 AM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > > Have you tried it? > > Note that the locale name is different then the one Linux. > > On Linux it is en_US. > > What I suggested is en-US. > > > Yes. Here is the output: > createdb -U postgres -E utf8 -l en-US -T template0 mynewdb > Password: > *createdb: database creation failed: ERROR: invalid locale name: "en-US"* Hmmm, well I am out of ideas:( > > Regards... -- Adrian Klaver adrian.klaver@aklaver.com
createdb -U postgres -E utf8 -l en-US -T template0 mynewdb
Password:
createdb: database creation failed: ERROR: invalid locale name: "en-US"
I believe its en_US ... _ not -
-- john r pierce 37N 122W somewhere on the middle of the left coast
On 20/02/14 10:28, Adrian Klaver wrote: > On 02/19/2014 01:21 PM, Dev Kumkar wrote: >> On Thu, Feb 20, 2014 at 2:45 AM, Adrian Klaver >> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: >> >> >> Have you tried it? >> >> Note that the locale name is different then the one Linux. >> >> On Linux it is en_US. >> >> What I suggested is en-US. >> >> >> Yes. Here is the output: >> createdb -U postgres -E utf8 -l en-US -T template0 mynewdb >> Password: >> *createdb: database creation failed: ERROR: invalid locale name: >> "en-US"* > > Hmmm, well I am out of ideas:( > >> >> Regards... > > Upgrade servers to Linux? :-P Cheers, Gavin
On 02/19/2014 01:30 PM, John R Pierce wrote: > On 2/19/2014 1:21 PM, Dev Kumkar wrote: >> createdb -U postgres -E utf8 -l en-US -T template0 mynewdb >> Password: >> *createdb: database creation failed: ERROR: invalid locale name: "en-US"* > > I believe its en_US ... _ not - Unfortunately this is a Windows install and that does not work either. > > > > > > > -- > john r pierce 37N 122W > somewhere on the middle of the left coast > -- Adrian Klaver adrian.klaver@aklaver.com
On 2/19/2014 1:35 PM, Adrian Klaver wrote: > > Unfortunately this is a Windows install and that does not work either. windows encodings are a pain. their Unicode is NOT utf8, its ucs2 aka utf16. I just checked my default install of potsgres 9.2, it appears its using WIN1252 encoding, another bastard, this is a modified ISO-8859-1 -- john r pierce 37N 122W somewhere on the middle of the left coast
On 2/19/2014 1:35 PM, Adrian Klaver wrote:windows encodings are a pain. their Unicode is NOT utf8, its ucs2 aka utf16. I just checked my default install of potsgres 9.2, it appears its using WIN1252 encoding, another bastard, this is a modified ISO-8859-1
Unfortunately this is a Windows install and that does not work either.
Dev Kumkar wrote: > Succeeds but as replied earlier it creates database with LC_COLLATE = > 'English_United States.1252' which corresponds to Latin1. Despite windows-1252 being a monobyte encoding sharing most of LATIN1 codes and character set, it does not mean that English_United States.1252 is limited to this character set. You may use UTF-8 databases with that locale. Consider the 2nd paragraph of "Character Set Support" in the doc: http://www.postgresql.org/docs/current/static/multibyte.html "For C or POSIX locale, any character set is allowed, but for other locales there is only one character set that will work correctly. (On Windows, however, UTF-8 encoding can be used with any locale.)" This is a key difference with Unix when choosing a locale. As for getting the exact same sort order than Linux, it's not possible but that's not a Windows-vs-Unix issue. If you used FreeBSD or MacOS X, some en_US.UTF-8 collation rules would differ from Linux's libc too, resulting in a different sort order for certain strings. Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
Despite windows-1252 being a monobyte encoding sharing most
of LATIN1 codes and character set, it does not mean that
English_United States.1252 is limited to this character set.
You may use UTF-8 databases with that locale.
Consider the 2nd paragraph of "Character Set Support"
in the doc:
http://www.postgresql.org/docs/current/static/multibyte.html
"For C or POSIX locale, any character set is allowed, but for other
locales there is only one character set that will work
correctly. (On Windows, however, UTF-8 encoding can be used with
any locale.)"
This is a key difference with Unix when choosing a locale.
As for getting the exact same sort order than Linux, it's not possible but
that's not a Windows-vs-Unix issue. If you used FreeBSD or MacOS X, some
en_US.UTF-8 collation rules would differ from Linux's libc too, resulting in
a different sort order for certain strings.
Well overall with the discussion so far and whatever search I could over net/community it looks like there is no code page on windows corresponding to what is utf8 of linux. If there is then please let me know?
Regards...
Upgrade servers to Linux? :-P
There is a heavy footprint of customers and huge market on windows too and so not that easy to migrate and convince in market.
Regards...
On Thu, Feb 20, 2014 at 3:04 AM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:Upgrade servers to Linux? :-PActually that's not the solution but running away from it.
There is a heavy footprint of customers and huge market on windows too and so not that easy to migrate and convince in market.
Regards...
I am aware of the heavy presence of Microsoft in the market place and the huge inertia of Microsoft dominated companies (even where management would like to change), hence I was not trying to push upgrading to Linux too strongly for this particular situation - more light hearted exasperation!
None-the-less there are more and more companies making that move - as there are a whole raft of very good reasons to do so.
If the sole reason for going to Linux was the collation problem, then it would probably be considered by most people to be a silly reason.
Cheers,
Gavin
P.S. Once a Senior Systems Analyst left the company I was working for to become the DP manager of an IT department. I spoke to him shortly after and he said it was an 'IBM shop' - about ten years later he was at the same place, but he now said it was a 'Microsoft shop'. The dominant technology that appears set in stone, does eventually change despite the market inertia – my first commercial languages were FORTRAN & COBOL on minicomputers & IBM style mainframes, now I use Java on Linux.
On 21/02/14 02:04, Dev Kumkar wrote:On Thu, Feb 20, 2014 at 3:04 AM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:Upgrade servers to Linux? :-PActually that's not the solution but running away from it.
There is a heavy footprint of customers and huge market on windows too and so not that easy to migrate and convince in market.
Regards...I am aware of the heavy presence of Microsoft in the market place and the huge inertia of Microsoft dominated companies (even where management would like to change), hence I was not trying to push upgrading to Linux too strongly for this particular situation - more light hearted exasperation!
None-the-less there are more and more companies making that move - as there are a whole raft of very good reasons to do so.
If the sole reason for going to Linux was the collation problem, then it would probably be considered by most people to be a silly reason.
Cheers,
Gavin
P.S. Once a Senior Systems Analyst left the company I was working for to become the DP manager of an IT department. I spoke to him shortly after and he said it was an 'IBM shop' - about ten years later he was at the same place, but he now said it was a 'Microsoft shop'. The dominant technology that appears set in stone, does eventually change despite the market inertia – my first commercial languages were FORTRAN & COBOL on minicomputers & IBM style mainframes, now I use Java on Linux.
Here would really appreciate if there are any suggestions for UTF-8 collation on Windows?
Regards...
On 02/20/2014 11:40 AM, Dev Kumkar wrote: > > > Hmm. Don't want to digress here and loose the topic context. > Here would really appreciate if there are any suggestions for UTF-8 > collation on Windows? Well I dug out a Windows machine and tried to get what you wanted, to no avail. As far as I know there is no UTF8 collation, it is an encoding. What you want if I am following, is the en_US locale (or equivalent for another language) on Windows. Anything I tried resolved back to a Windows code page. So the answer from my tests, is no you cannot match en_US on Windows. > > Regards...
Well I dug out a Windows machine and tried to get what you wanted, to no avail. As far as I know there is no UTF8 collation, it is an encoding. What you want if I am following, is the en_US locale (or equivalent for another language) on Windows. Anything I tried resolved back to a Windows code page. So the answer from my tests, is no you cannot match en_US on Windows.
On 02/20/2014 11:40 AM, Dev Kumkar wrote: > > > Hmm. Don't want to digress here and loose the topic context. > Here would really appreciate if there are any suggestions for UTF-8 > collation on Windows? Just had idea, not sure how feasible it is in your situation though. Run Postgres in a Linux VM on Windows machines. > > Regards...
On 02/20/2014 12:27 PM, Dev Kumkar wrote: > On Fri, Feb 21, 2014 at 1:26 AM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > Well I dug out a Windows machine and tried to get what you wanted, > to no avail. As far as I know there is no UTF8 collation, it is an > encoding. What you want if I am following, is the en_US locale (or > equivalent for another language) on Windows. Anything I tried > resolved back to a Windows code page. So the answer from my tests, > is no you cannot match en_US on Windows. > > > Thanks for taking time out and looking into it ! > Yes all the scenarios we tested didn't work for any of the utf8 code > pages specified on MSDN or may be I don't know the correct > representation of "language_territory.code". It seems to be more basic then that. Microsoft has its own locale mechanism and you will always be redirected back to it. > > Regards...