Codepage Win1252 - Mailing list pgsql-general

From Jörg Schulz
Subject Codepage Win1252
Date
Msg-id bjru5n$7uk$1@news.hub.org
Whole thread Raw
List pgsql-general
I am missing this codepage quiet some time, but I
was able to patch another unneeded mapping to my needs.
Unfortunately I wasn't able to add a complete new mapping.
Maybe someone of you can do this better...  :-)

I added some tiny scripts that generate at least the
needed mappings in the src/backend/utils/mb/Unicode/*.map files.

Hope this helps to get PostgreSQL support more codepages.

Jörg



jschulz@opal:~/programme/postgresql/pgmaps> cat README

Do a copy and paste from a codepage reference under
http://www.microsoft.com/globaldev/reference/cphome.mspx

For example win1252 was copied from
http://www.microsoft.com/globaldev/reference/sbcs/1252.htm

then type e.g. make_pgmaps win1252 ...



jschulz@opal:~/programme/postgresql/pgmaps> cat make_pgmaps
#!/bin/bash

for f in $*; do
   echo -e "${f}: ${f}_to_utf8.map...\c"
   ./codepage_to_utf8 ${f} > ${f}_to_utf8.map
   echo -e "ok   utf8_to_${f}.map...\c"
   ./utf8_to_codepage ${f} > utf8_to_${f}.map
   echo "ok"
done


jschulz@opal:~/programme/postgresql/pgmaps> cat codepage_to_utf8
#!/bin/bash

while read l;
do
   cp=`echo "$l" | cut -c1-2`
   u16=`echo "$l" | cut -c8-11`
   u8=`echo "0x$u16" | recode utf-16/x4..utf-8/x4`
   echo "  {0x00$cp, $u8},"
done < $1 | awk '{print tolower($0)}'


jschulz@opal:~/programme/postgresql/pgmaps> cat utf8_to_codepage
#!/bin/bash

while read l;
do
   cp=`echo "$l" | cut -c1-2`
   u16=`echo "$l" | cut -c8-11`
   u8=`echo "0x$u16" | recode utf-16/x4..utf-8/x4`
   echo "  {$u8, 0x00$cp},"
done < $1 | awk '{print tolower($0)}' | sort


jschulz@opal:~/programme/postgresql/pgmaps> cat win1252
 80 = U+20AC : EURO SIGN
 82 = U+201A : SINGLE LOW-9 QUOTATION MARK
 83 = U+0192 : LATIN SMALL LETTER F WITH HOOK
 84 = U+201E : DOUBLE LOW-9 QUOTATION MARK
 85 = U+2026 : HORIZONTAL ELLIPSIS
 86 = U+2020 : DAGGER
 87 = U+2021 : DOUBLE DAGGER
 88 = U+02C6 : MODIFIER LETTER CIRCUMFLEX ACCENT
 89 = U+2030 : PER MILLE SIGN
 8A = U+0160 : LATIN CAPITAL LETTER S WITH CARON
 8B = U+2039 : SINGLE LEFT-POINTING ANGLE QUOTATION MARK
 8C = U+0152 : LATIN CAPITAL LIGATURE OE
 8E = U+017D : LATIN CAPITAL LETTER Z WITH CARON
 91 = U+2018 : LEFT SINGLE QUOTATION MARK
 92 = U+2019 : RIGHT SINGLE QUOTATION MARK
 93 = U+201C : LEFT DOUBLE QUOTATION MARK
 94 = U+201D : RIGHT DOUBLE QUOTATION MARK
 95 = U+2022 : BULLET
 96 = U+2013 : EN DASH
 97 = U+2014 : EM DASH
 98 = U+02DC : SMALL TILDE
 99 = U+2122 : TRADE MARK SIGN
 9A = U+0161 : LATIN SMALL LETTER S WITH CARON
 9B = U+203A : SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
 9C = U+0153 : LATIN SMALL LIGATURE OE
 9E = U+017E : LATIN SMALL LETTER Z WITH CARON
 9F = U+0178 : LATIN CAPITAL LETTER Y WITH DIAERESIS
 A0 = U+00A0 : NO-BREAK SPACE
 A1 = U+00A1 : INVERTED EXCLAMATION MARK
 A2 = U+00A2 : CENT SIGN
 A3 = U+00A3 : POUND SIGN
 A4 = U+00A4 : CURRENCY SIGN
 A5 = U+00A5 : YEN SIGN
 A6 = U+00A6 : BROKEN BAR
 A7 = U+00A7 : SECTION SIGN
 A8 = U+00A8 : DIAERESIS
 A9 = U+00A9 : COPYRIGHT SIGN
 AA = U+00AA : FEMININE ORDINAL INDICATOR
 AB = U+00AB : LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
 AC = U+00AC : NOT SIGN
 AD = U+00AD : SOFT HYPHEN
 AE = U+00AE : REGISTERED SIGN
 AF = U+00AF : MACRON
 B0 = U+00B0 : DEGREE SIGN
 B1 = U+00B1 : PLUS-MINUS SIGN
 B2 = U+00B2 : SUPERSCRIPT TWO
 B3 = U+00B3 : SUPERSCRIPT THREE
 B4 = U+00B4 : ACUTE ACCENT
 B5 = U+00B5 : MICRO SIGN
 B6 = U+00B6 : PILCROW SIGN
 B7 = U+00B7 : MIDDLE DOT
 B8 = U+00B8 : CEDILLA
 B9 = U+00B9 : SUPERSCRIPT ONE
 BA = U+00BA : MASCULINE ORDINAL INDICATOR
 BB = U+00BB : RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
 BC = U+00BC : VULGAR FRACTION ONE QUARTER
 BD = U+00BD : VULGAR FRACTION ONE HALF
 BE = U+00BE : VULGAR FRACTION THREE QUARTERS
 BF = U+00BF : INVERTED QUESTION MARK
 C0 = U+00C0 : LATIN CAPITAL LETTER A WITH GRAVE
 C1 = U+00C1 : LATIN CAPITAL LETTER A WITH ACUTE
 C2 = U+00C2 : LATIN CAPITAL LETTER A WITH CIRCUMFLEX
 C3 = U+00C3 : LATIN CAPITAL LETTER A WITH TILDE
 C4 = U+00C4 : LATIN CAPITAL LETTER A WITH DIAERESIS
 C5 = U+00C5 : LATIN CAPITAL LETTER A WITH RING ABOVE
 C6 = U+00C6 : LATIN CAPITAL LETTER AE
 C7 = U+00C7 : LATIN CAPITAL LETTER C WITH CEDILLA
 C8 = U+00C8 : LATIN CAPITAL LETTER E WITH GRAVE
 C9 = U+00C9 : LATIN CAPITAL LETTER E WITH ACUTE
 CA = U+00CA : LATIN CAPITAL LETTER E WITH CIRCUMFLEX
 CB = U+00CB : LATIN CAPITAL LETTER E WITH DIAERESIS
 CC = U+00CC : LATIN CAPITAL LETTER I WITH GRAVE
 CD = U+00CD : LATIN CAPITAL LETTER I WITH ACUTE
 CE = U+00CE : LATIN CAPITAL LETTER I WITH CIRCUMFLEX
 CF = U+00CF : LATIN CAPITAL LETTER I WITH DIAERESIS
 D0 = U+00D0 : LATIN CAPITAL LETTER ETH
 D1 = U+00D1 : LATIN CAPITAL LETTER N WITH TILDE
 D2 = U+00D2 : LATIN CAPITAL LETTER O WITH GRAVE
 D3 = U+00D3 : LATIN CAPITAL LETTER O WITH ACUTE
 D4 = U+00D4 : LATIN CAPITAL LETTER O WITH CIRCUMFLEX
 D5 = U+00D5 : LATIN CAPITAL LETTER O WITH TILDE
 D6 = U+00D6 : LATIN CAPITAL LETTER O WITH DIAERESIS
 D7 = U+00D7 : MULTIPLICATION SIGN
 D8 = U+00D8 : LATIN CAPITAL LETTER O WITH STROKE
 D9 = U+00D9 : LATIN CAPITAL LETTER U WITH GRAVE
 DA = U+00DA : LATIN CAPITAL LETTER U WITH ACUTE
 DB = U+00DB : LATIN CAPITAL LETTER U WITH CIRCUMFLEX
 DC = U+00DC : LATIN CAPITAL LETTER U WITH DIAERESIS
 DD = U+00DD : LATIN CAPITAL LETTER Y WITH ACUTE
 DE = U+00DE : LATIN CAPITAL LETTER THORN
 DF = U+00DF : LATIN SMALL LETTER SHARP S
 E0 = U+00E0 : LATIN SMALL LETTER A WITH GRAVE
 E1 = U+00E1 : LATIN SMALL LETTER A WITH ACUTE
 E2 = U+00E2 : LATIN SMALL LETTER A WITH CIRCUMFLEX
 E3 = U+00E3 : LATIN SMALL LETTER A WITH TILDE
 E4 = U+00E4 : LATIN SMALL LETTER A WITH DIAERESIS
 E5 = U+00E5 : LATIN SMALL LETTER A WITH RING ABOVE
 E6 = U+00E6 : LATIN SMALL LETTER AE
 E7 = U+00E7 : LATIN SMALL LETTER C WITH CEDILLA
 E8 = U+00E8 : LATIN SMALL LETTER E WITH GRAVE
 E9 = U+00E9 : LATIN SMALL LETTER E WITH ACUTE
 EA = U+00EA : LATIN SMALL LETTER E WITH CIRCUMFLEX
 EB = U+00EB : LATIN SMALL LETTER E WITH DIAERESIS
 EC = U+00EC : LATIN SMALL LETTER I WITH GRAVE
 ED = U+00ED : LATIN SMALL LETTER I WITH ACUTE
 EE = U+00EE : LATIN SMALL LETTER I WITH CIRCUMFLEX
 EF = U+00EF : LATIN SMALL LETTER I WITH DIAERESIS
 F0 = U+00F0 : LATIN SMALL LETTER ETH
 F1 = U+00F1 : LATIN SMALL LETTER N WITH TILDE
 F2 = U+00F2 : LATIN SMALL LETTER O WITH GRAVE
 F3 = U+00F3 : LATIN SMALL LETTER O WITH ACUTE
 F4 = U+00F4 : LATIN SMALL LETTER O WITH CIRCUMFLEX
 F5 = U+00F5 : LATIN SMALL LETTER O WITH TILDE
 F6 = U+00F6 : LATIN SMALL LETTER O WITH DIAERESIS
 F7 = U+00F7 : DIVISION SIGN
 F8 = U+00F8 : LATIN SMALL LETTER O WITH STROKE
 F9 = U+00F9 : LATIN SMALL LETTER U WITH GRAVE
 FA = U+00FA : LATIN SMALL LETTER U WITH ACUTE
 FB = U+00FB : LATIN SMALL LETTER U WITH CIRCUMFLEX
 FC = U+00FC : LATIN SMALL LETTER U WITH DIAERESIS
 FD = U+00FD : LATIN SMALL LETTER Y WITH ACUTE
 FE = U+00FE : LATIN SMALL LETTER THORN
 FF = U+00FF : LATIN SMALL LETTER Y WITH DIAERESIS


jschulz@opal:~/programme/postgresql/pgmaps> cat utf8_to_win1252.map
  {0xc2a0, 0x00a0},
  {0xc2a1, 0x00a1},
  {0xc2a2, 0x00a2},
  {0xc2a3, 0x00a3},
  {0xc2a4, 0x00a4},
  {0xc2a5, 0x00a5},
  {0xc2a6, 0x00a6},
  {0xc2a7, 0x00a7},
  {0xc2a8, 0x00a8},
  {0xc2a9, 0x00a9},
  {0xc2aa, 0x00aa},
  {0xc2ab, 0x00ab},
  {0xc2ac, 0x00ac},
  {0xc2ad, 0x00ad},
  {0xc2ae, 0x00ae},
  {0xc2af, 0x00af},
  {0xc2b0, 0x00b0},
  {0xc2b1, 0x00b1},
  {0xc2b2, 0x00b2},
  {0xc2b3, 0x00b3},
  {0xc2b4, 0x00b4},
  {0xc2b5, 0x00b5},
  {0xc2b6, 0x00b6},
  {0xc2b7, 0x00b7},
  {0xc2b8, 0x00b8},
  {0xc2b9, 0x00b9},
  {0xc2ba, 0x00ba},
  {0xc2bb, 0x00bb},
  {0xc2bc, 0x00bc},
  {0xc2bd, 0x00bd},
  {0xc2be, 0x00be},
  {0xc2bf, 0x00bf},
  {0xc380, 0x00c0},
  {0xc381, 0x00c1},
  {0xc382, 0x00c2},
  {0xc383, 0x00c3},
  {0xc384, 0x00c4},
  {0xc385, 0x00c5},
  {0xc386, 0x00c6},
  {0xc387, 0x00c7},
  {0xc388, 0x00c8},
  {0xc389, 0x00c9},
  {0xc38a, 0x00ca},
  {0xc38b, 0x00cb},
  {0xc38c, 0x00cc},
  {0xc38d, 0x00cd},
  {0xc38e, 0x00ce},
  {0xc38f, 0x00cf},
  {0xc390, 0x00d0},
  {0xc391, 0x00d1},
  {0xc392, 0x00d2},
  {0xc393, 0x00d3},
  {0xc394, 0x00d4},
  {0xc395, 0x00d5},
  {0xc396, 0x00d6},
  {0xc397, 0x00d7},
  {0xc398, 0x00d8},
  {0xc399, 0x00d9},
  {0xc39a, 0x00da},
  {0xc39b, 0x00db},
  {0xc39c, 0x00dc},
  {0xc39d, 0x00dd},
  {0xc39e, 0x00de},
  {0xc39f, 0x00df},
  {0xc3a0, 0x00e0},
  {0xc3a1, 0x00e1},
  {0xc3a2, 0x00e2},
  {0xc3a3, 0x00e3},
  {0xc3a4, 0x00e4},
  {0xc3a5, 0x00e5},
  {0xc3a6, 0x00e6},
  {0xc3a7, 0x00e7},
  {0xc3a8, 0x00e8},
  {0xc3a9, 0x00e9},
  {0xc3aa, 0x00ea},
  {0xc3ab, 0x00eb},
  {0xc3ac, 0x00ec},
  {0xc3ad, 0x00ed},
  {0xc3ae, 0x00ee},
  {0xc3af, 0x00ef},
  {0xc3b0, 0x00f0},
  {0xc3b1, 0x00f1},
  {0xc3b2, 0x00f2},
  {0xc3b3, 0x00f3},
  {0xc3b4, 0x00f4},
  {0xc3b5, 0x00f5},
  {0xc3b6, 0x00f6},
  {0xc3b7, 0x00f7},
  {0xc3b8, 0x00f8},
  {0xc3b9, 0x00f9},
  {0xc3ba, 0x00fa},
  {0xc3bb, 0x00fb},
  {0xc3bc, 0x00fc},
  {0xc3bd, 0x00fd},
  {0xc3be, 0x00fe},
  {0xc3bf, 0x00ff},
  {0xc592, 0x008c},
  {0xc593, 0x009c},
  {0xc5a0, 0x008a},
  {0xc5a1, 0x009a},
  {0xc5b8, 0x009f},
  {0xc5bd, 0x008e},
  {0xc5be, 0x009e},
  {0xc692, 0x0083},
  {0xcb86, 0x0088},
  {0xcb9c, 0x0098},
  {0xe28093, 0x0096},
  {0xe28094, 0x0097},
  {0xe28098, 0x0091},
  {0xe28099, 0x0092},
  {0xe2809a, 0x0082},
  {0xe2809c, 0x0093},
  {0xe2809d, 0x0094},
  {0xe2809e, 0x0084},
  {0xe280a0, 0x0086},
  {0xe280a1, 0x0087},
  {0xe280a2, 0x0095},
  {0xe280a6, 0x0085},
  {0xe280b0, 0x0089},
  {0xe280b9, 0x008b},
  {0xe280ba, 0x009b},
  {0xe282ac, 0x0080},
  {0xe284a2, 0x0099},


jschulz@opal:~/programme/postgresql/pgmaps> cat win1252_to_utf8.map
  {0x0080, 0xe282ac},
  {0x0082, 0xe2809a},
  {0x0083, 0xc692},
  {0x0084, 0xe2809e},
  {0x0085, 0xe280a6},
  {0x0086, 0xe280a0},
  {0x0087, 0xe280a1},
  {0x0088, 0xcb86},
  {0x0089, 0xe280b0},
  {0x008a, 0xc5a0},
  {0x008b, 0xe280b9},
  {0x008c, 0xc592},
  {0x008e, 0xc5bd},
  {0x0091, 0xe28098},
  {0x0092, 0xe28099},
  {0x0093, 0xe2809c},
  {0x0094, 0xe2809d},
  {0x0095, 0xe280a2},
  {0x0096, 0xe28093},
  {0x0097, 0xe28094},
  {0x0098, 0xcb9c},
  {0x0099, 0xe284a2},
  {0x009a, 0xc5a1},
  {0x009b, 0xe280ba},
  {0x009c, 0xc593},
  {0x009e, 0xc5be},
  {0x009f, 0xc5b8},
  {0x00a0, 0xc2a0},
  {0x00a1, 0xc2a1},
  {0x00a2, 0xc2a2},
  {0x00a3, 0xc2a3},
  {0x00a4, 0xc2a4},
  {0x00a5, 0xc2a5},
  {0x00a6, 0xc2a6},
  {0x00a7, 0xc2a7},
  {0x00a8, 0xc2a8},
  {0x00a9, 0xc2a9},
  {0x00aa, 0xc2aa},
  {0x00ab, 0xc2ab},
  {0x00ac, 0xc2ac},
  {0x00ad, 0xc2ad},
  {0x00ae, 0xc2ae},
  {0x00af, 0xc2af},
  {0x00b0, 0xc2b0},
  {0x00b1, 0xc2b1},
  {0x00b2, 0xc2b2},
  {0x00b3, 0xc2b3},
  {0x00b4, 0xc2b4},
  {0x00b5, 0xc2b5},
  {0x00b6, 0xc2b6},
  {0x00b7, 0xc2b7},
  {0x00b8, 0xc2b8},
  {0x00b9, 0xc2b9},
  {0x00ba, 0xc2ba},
  {0x00bb, 0xc2bb},
  {0x00bc, 0xc2bc},
  {0x00bd, 0xc2bd},
  {0x00be, 0xc2be},
  {0x00bf, 0xc2bf},
  {0x00c0, 0xc380},
  {0x00c1, 0xc381},
  {0x00c2, 0xc382},
  {0x00c3, 0xc383},
  {0x00c4, 0xc384},
  {0x00c5, 0xc385},
  {0x00c6, 0xc386},
  {0x00c7, 0xc387},
  {0x00c8, 0xc388},
  {0x00c9, 0xc389},
  {0x00ca, 0xc38a},
  {0x00cb, 0xc38b},
  {0x00cc, 0xc38c},
  {0x00cd, 0xc38d},
  {0x00ce, 0xc38e},
  {0x00cf, 0xc38f},
  {0x00d0, 0xc390},
  {0x00d1, 0xc391},
  {0x00d2, 0xc392},
  {0x00d3, 0xc393},
  {0x00d4, 0xc394},
  {0x00d5, 0xc395},
  {0x00d6, 0xc396},
  {0x00d7, 0xc397},
  {0x00d8, 0xc398},
  {0x00d9, 0xc399},
  {0x00da, 0xc39a},
  {0x00db, 0xc39b},
  {0x00dc, 0xc39c},
  {0x00dd, 0xc39d},
  {0x00de, 0xc39e},
  {0x00df, 0xc39f},
  {0x00e0, 0xc3a0},
  {0x00e1, 0xc3a1},
  {0x00e2, 0xc3a2},
  {0x00e3, 0xc3a3},
  {0x00e4, 0xc3a4},
  {0x00e5, 0xc3a5},
  {0x00e6, 0xc3a6},
  {0x00e7, 0xc3a7},
  {0x00e8, 0xc3a8},
  {0x00e9, 0xc3a9},
  {0x00ea, 0xc3aa},
  {0x00eb, 0xc3ab},
  {0x00ec, 0xc3ac},
  {0x00ed, 0xc3ad},
  {0x00ee, 0xc3ae},
  {0x00ef, 0xc3af},
  {0x00f0, 0xc3b0},
  {0x00f1, 0xc3b1},
  {0x00f2, 0xc3b2},
  {0x00f3, 0xc3b3},
  {0x00f4, 0xc3b4},
  {0x00f5, 0xc3b5},
  {0x00f6, 0xc3b6},
  {0x00f7, 0xc3b7},
  {0x00f8, 0xc3b8},
  {0x00f9, 0xc3b9},
  {0x00fa, 0xc3ba},
  {0x00fb, 0xc3bb},
  {0x00fc, 0xc3bc},
  {0x00fd, 0xc3bd},
  {0x00fe, 0xc3be},
  {0x00ff, 0xc3bf},

pgsql-general by date:

Previous
From: "Marek Lewczuk"
Date:
Subject: plPHP for windows
Next
From: "Amin Schoeib"
Date:
Subject: Converting database-encoding from SQL_ASCII to UNICODE?????