Re: UTF8 problem - Mailing list pgsql-general

From Stephane Bortzmeyer
Subject Re: UTF8 problem
Date
Msg-id 20060615143430.GA17590@nic.fr
Whole thread Raw
In response to Re: UTF8 problem  (Douglas McNaught <doug@mcnaught.org>)
List pgsql-general
On Thu, Jun 08, 2006 at 07:25:35AM -0400,
 Douglas McNaught <doug@mcnaught.org> wrote
 a message of 29 lines which said:

> I would think it would (at least potentially) vary with each
> message.  The dbmail software should really set client_encoding
> based on the Content-Transfer-Encoding header in the message (or
> whatever it's called).

A *big* warning from someone who stores email in PostgreSQL: many
email messages *lie*. They have a Content-transfer-encoding and then
they actually use another encoding.

If you blindly try to inject the body of the message into PostgreSQL,
with the indicated encoding, you will sometimes fail, for instance if
the message claim to be in UTF-8 but is not (something that PostgreSQL
will detect).

Either you:

* "sanitize" all incoming data
* or you accept to reject these invalid email
* or you store them in a unstructured field (a blob)




pgsql-general by date:

Previous
From: "surabhi.ahuja"
Date:
Subject: B+ versus hash maps
Next
From: Jon Lapham
Date:
Subject: A few questions about carriage returns (\r)