Thread: support for DIN SPEC 91379 encoding
Hi,
Does anyone here know, if postgresql supports DIN SPEC 91379 encoding?
As far as I understand it is a “new” encoding supporting all “EU characters” based on Unicode, but is not compliant to UTF-8. As far as I di dread, there are a few characters in DIN SOEC 91379 that are not within UTF-8. As DIN SPEC 91379 seems to be a national specification (DE) it is based on a European law. I guess other similar international or aat least European compliant encodings should exist or at least other national specs that are compliant with the german DIN SOEC 91379.
i.A. Dr. Marco Lechner
Leiter Fachgebiet RN 1 │ Head RN 1
--
Bundesamt für Strahlenschutz │ Federal Office for Radiation Protection
Koordination Notfallschutzsysteme │ Coordination Emergency Systems │ RN 1
Rosastr. 9
D-79098 Freiburg
Tel.: +49 30 18333-6724
E-Mail: mlechner@bfs.de
🌐 Besuchen Sie unsere Website, folgen Sie uns auf Twitter und abonnieren Sie unseren 📢 Newsletter.
🔒 Informationen zum Datenschutz gemäß Artikel 13 DSGVO
💚 E-Mail drucken? Lieber die Umwelt schonen!
--
Hinweis zu Anhängen die auf .p7m/.p7c/.p7s oder .asc/.asc.sig enden:
Die .p7?- und .asc-Dateien sind ungefährliche Signaturdateien (digitale Unterschriften). In E-Mail-Clients mit S/MIME Konfiguration (.p7?) oder PGP-Erweiterung (.asc) dienen sie zur:
- Überprüfung des Absenders
- Überprüfung einer evtl. Veränderung des Inhalts während der Übermittlung über das Internet
Die Signaturdateien können ebenso dazu verwendet werden dem Absender dieser Signatur eine E-Mail mit verschlüsseltem Inhalt zu senden. In E-Mail-Clients ohne S/MIME Konfiguration oder PGP-Erweiterung erscheinen die Dateien als Anhang und können ignoriert werden.
Hi Marco, On 27 Mar 2022, at 12:54, Marco Lechner wrote: > Hi, > > Does anyone here know, if postgresql supports DIN SPEC 91379 encoding? > > As far as I understand it is a “new” encoding supporting all “EU characters” based on Unicode, but is not compliant toUTF-8. As far as I di dread, there are a few characters in DIN SOEC > 91379 that are not within UTF-8. where did you read, that this DIN SPEC 91379 norm is incompatible with UTF-8? In the document „String.Latin+ 1.2: eine kommentierte und erweiterte Fassung der DIN SPEC 91379. Inklusive einer umfangreichenListe häufig gestellter Fragen. Herausgegeben von der Fachgruppe String.Latin“ linked here https://www.xoev.de/downloads-2316#StringLatinit is said, that the spec is a strict subset of unicode (E.1.6), and it isalso mentioned in E.1.4, that in UTF-8 all unicode characters can be encoded. Therefore UTF-8 can be used to encode allDIN SPEC 91379 characters. On the other hand UTF-8 strings may have characters not included in the DIN SPEC. Ralf > As DIN SPEC 91379 seems to be a national specification (DE) it is based on a European law. I guess other similar internationalor aat least European compliant encodings should exist or at least other national specs that are compliant withthe german DIN SOEC 91379. > > i.A. Dr. Marco Lechner > Leiter Fachgebiet RN 1 │ Head RN 1 > > -- > Bundesamt für Strahlenschutz │ Federal Office for Radiation Protection > Koordination Notfallschutzsysteme │ Coordination Emergency Systems │ RN 1 > Rosastr. 9 > D-79098 Freiburg > > Tel.: +49 30 18333-6724 > E-Mail: mlechner@bfs.de<mailto:mlechner@bfs.de> > www.bfs.de<http://www.bfs.de/> > 🌐 Besuchen<https://www.bfs.de/> Sie unsere Website, folgen Sie uns auf Twitter<https://www.twitter.com/strahlenschutz>und abonnieren<https://www.bfs.de/strahlenschutzaktuell> Sie unseren 📢 Newsletter. > 🔒 Informationen zum Datenschutz<https://www.bfs.de/datenschutz> gemäß Artikel 13 DSGVO > 💚 E-Mail drucken? Lieber die Umwelt schonen! > > -- > Hinweis zu Anhängen die auf .p7m/.p7c/.p7s oder .asc/.asc.sig enden: > Die .p7?- und .asc-Dateien sind ungefährliche Signaturdateien (digitale Unterschriften). In E-Mail-Clients mit S/MIME Konfiguration(.p7?) oder PGP-Erweiterung (.asc) dienen sie zur: > - Überprüfung des Absenders > - Überprüfung einer evtl. Veränderung des Inhalts während der Übermittlung über das Internet > Die Signaturdateien können ebenso dazu verwendet werden dem Absender dieser Signatur eine E-Mail mit verschlüsseltem Inhaltzu senden. In E-Mail-Clients ohne S/MIME Konfiguration oder PGP-Erweiterung erscheinen die Dateien als Anhang und könnenignoriert werden.
On 2022-Mar-27, Ralf Schuchardt wrote: > where did you read, that this DIN SPEC 91379 norm is incompatible with UTF-8? > > In the document „String.Latin+ 1.2: eine kommentierte und erweiterte > Fassung der DIN SPEC 91379. Inklusive einer umfangreichen Liste häufig > gestellter Fragen. Herausgegeben von der Fachgruppe String.Latin“ > linked here https://www.xoev.de/downloads-2316#StringLatin it is said, > that the spec is a strict subset of unicode (E.1.6), and it is also > mentioned in E.1.4, that in UTF-8 all unicode characters can be > encoded. Therefore UTF-8 can be used to encode all DIN SPEC 91379 > characters. So the remaining question is whether DIN SPEC 91379 requires an implementation to support character U+0000. If it does, then PostgreSQL is not conformant, because that character is the only one in Unicode that we don't support. If U+0000 is not required, then PostgreSQL is okay. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2022-Mar-27, Ralf Schuchardt wrote: >> linked here https://www.xoev.de/downloads-2316#StringLatin it is said, >> that the spec is a strict subset of unicode (E.1.6), and it is also >> mentioned in E.1.4, that in UTF-8 all unicode characters can be >> encoded. Therefore UTF-8 can be used to encode all DIN SPEC 91379 >> characters. > So the remaining question is whether DIN SPEC 91379 requires an > implementation to support character U+0000. If it does, then PostgreSQL > is not conformant, because that character is the only one in Unicode > that we don't support. If U+0000 is not required, then PostgreSQL is > okay. Hmm ... UTF8 as defined in RFC3629/STD63 [1] does not allow "all unicode characters to be encoded". It disallows surrogate pairs (U+D800--U+DFFF) and code points above U+10FFFF. We follow that spec, so depending on what DIN 91379 *actually* says, we might have additional reasons not to be in compliance. I don't read German unfortunately. regards, tom lane [1] http://www.faqs.org/rfcs/rfc3629.html
U+0000 is not part of DIN SPEC 91379. -- Boris > Am 27.03.2022 um 19:47 schrieb Alvaro Herrera <alvherre@alvh.no-ip.org>: > > On 2022-Mar-27, Ralf Schuchardt wrote: > >> where did you read, that this DIN SPEC 91379 norm is incompatible with UTF-8? >> >> In the document „String.Latin+ 1.2: eine kommentierte und erweiterte >> Fassung der DIN SPEC 91379. Inklusive einer umfangreichen Liste häufig >> gestellter Fragen. Herausgegeben von der Fachgruppe String.Latin“ >> linked here https://www.xoev.de/downloads-2316#StringLatin it is said, >> that the spec is a strict subset of unicode (E.1.6), and it is also >> mentioned in E.1.4, that in UTF-8 all unicode characters can be >> encoded. Therefore UTF-8 can be used to encode all DIN SPEC 91379 >> characters. > > So the remaining question is whether DIN SPEC 91379 requires an > implementation to support character U+0000. If it does, then PostgreSQL > is not conformant, because that character is the only one in Unicode > that we don't support. If U+0000 is not required, then PostgreSQL is > okay. > > -- > Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ > >
On 2022-03-27 14:06:25 -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > On 2022-Mar-27, Ralf Schuchardt wrote: > >> linked here https://www.xoev.de/downloads-2316#StringLatin it is said, > >> that the spec is a strict subset of unicode (E.1.6), and it is also > >> mentioned in E.1.4, that in UTF-8 all unicode characters can be > >> encoded. Therefore UTF-8 can be used to encode all DIN SPEC 91379 > >> characters. > > > So the remaining question is whether DIN SPEC 91379 requires an > > implementation to support character U+0000. If it does, then PostgreSQL > > is not conformant, because that character is the only one in Unicode > > that we don't support. If U+0000 is not required, then PostgreSQL is > > okay. > > Hmm ... UTF8 as defined in RFC3629/STD63 [1] does not allow "all unicode > characters to be encoded". It disallows surrogate pairs (U+D800--U+DFFF) > and code points above U+10FFFF. From section 2.4 Code Points and Characters of the Unicode Standard, Version 14.0 - Core Specification: | In the Unicode Standard, the codespace consists of the integers from 0 | to 10FFFF 16, com- prising 1,114,112 code points available for | assigning the repertoire of abstract characters. So there are no characters above U+10FFFF. Also, | Not all assigned code points represent abstract characters; only | Graphic, Format, Control and Private-use do. Surrogates and | Noncharacters are assigned code points but are not assigned to | abstract characters. So Surrogates aren't characters either. UTF-8 can indeed be used to encode "all unicode characters". > We follow that spec, so depending on what DIN 91379 *actually* says, > we might have additional reasons not to be in compliance. I don't > read German unfortunately. It defines minimal character set that IT systems which process personal and company names in the EU must accept. Basically Latin, Greek and Cyrillic letters, digits and some symbols and interpunctation. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 2022-Mar-28, Peter J. Holzer wrote: > On 2022-03-27 14:06:25 -0400, Tom Lane wrote: > > We follow that spec, so depending on what DIN 91379 *actually* says, > > we might have additional reasons not to be in compliance. I don't > > read German unfortunately. > > It defines minimal character set that IT systems which process personal > and company names in the EU must accept. Basically Latin, Greek and > Cyrillic letters, digits and some symbols and interpunctation. Yeah, I had a look at the list of allowed characters and it's a reasonably simple set. The most complex you can find is stuff like LATIN CAPITAL LETTER R WITH COMBINING RING BELOW AND COMBINING MACRON LATIN CAPITAL LETTER K WITH COMBINING DOUBLE MACRON BELOW AND LATIN SMALL LETTER H -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Tiene valor aquel que admite que es un cobarde" (Fernandel)