On Thu, Jun 08, 2006 at 07:25:35AM -0400,
Douglas McNaught <doug@mcnaught.org> wrote
a message of 29 lines which said:
> I would think it would (at least potentially) vary with each
> message. The dbmail software should really set client_encoding
> based on the Content-Transfer-Encoding header in the message (or
> whatever it's called).
A *big* warning from someone who stores email in PostgreSQL: many
email messages *lie*. They have a Content-transfer-encoding and then
they actually use another encoding.
If you blindly try to inject the body of the message into PostgreSQL,
with the indicated encoding, you will sometimes fail, for instance if
the message claim to be in UTF-8 but is not (something that PostgreSQL
will detect).
Either you:
* "sanitize" all incoming data
* or you accept to reject these invalid email
* or you store them in a unstructured field (a blob)