Re: Message-ID as unique key? - Mailing list pgsql-general

From Steve Atkins
Subject Re: Message-ID as unique key?
Date
Msg-id 20041012155030.GA9764@gp.word-to-the-wise.com
Whole thread Raw
In response to Message-ID as unique key?  (Jerry LeVan <jerry.levan@eku.edu>)
Responses Re: Message-ID as unique key?  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-general
On Tue, Oct 12, 2004 at 11:01:08AM -0400, Jerry LeVan wrote:
> Hi,
> I am futzing around with Andrew Stuarts "Catchmail" program
> that stores emails into a postgresql database.
>
> I want to avoid inserting the same email more than once...
> (pieces of the email actually get emplaced into several
>  tables).
>
> Is the "Message-ID"  header field a globally unique identifer?

Not a postgresql related issue, but, yes Message-ID: is, by
definition, a globally unique identifier. If there are two
messages with the same Message-ID then the sender is asserting
that those two messages are identical. See RFC 2822 section 3.6.4.

You will sometimes see a message generated without a Message-ID at
all, but that will usually have had a Message-ID added by some MTA
along the delivery route. If your MX doesn't add Message-IDs when
missing then you may well see incoming email without Message-IDs
(mostly spam).

In practice there are varying levels of competence in implementation
of Message-ID generation, so you'll very rarely see syntactically
incorrect Message-IDs that may, in theory, clash.

> I eventually want to have a cron job process my inbox and don't
> want successive cron tasks to keep re-entering the same email :)

I wouldn't try and use Message-ID as a primary key, though. Give
yourself a serial field.

I don't use Message-ID at all in my postgresql-based
mailstore. Instead I use a maildir style spool directory for incoming
mail and the processes that import those spooled messages into the
mailstore use standard maildir techniques for locking the message on
disk, writing it to the DB, moving it atomically from the new/ to the
cur/ directory, then commiting the database write. I've pumped
millions of emails through this in production with no problems.

Cheers,
  Steve


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Confused with LABEL and LOOP
Next
From: Tom Lane
Date:
Subject: Re: Rule uses wrong value