Re: PGLister fails to de-dup messages addressed twice to same list - Mailing list pgsql-www

From Stephen Frost
Subject Re: PGLister fails to de-dup messages addressed twice to same list
Date
Msg-id 20171121154338.GZ4628@tamriel.snowman.net
Whole thread Raw
In response to Re: PGLister fails to de-dup messages addressed twice to same list  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: PGLister fails to de-dup messages addressed twice to same list  (Magnus Hagander <magnus@hagander.net>)
List pgsql-www
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> ... I have no doubt at all that that's
> >> going to happen a *lot* during the list domain changeover, so I'd
> >> strongly recommend putting something in place to de-dup.
>
> > Yeah, I'm already chatting w/ Magnus about this.
>
> Curiously, my replies to the same message seem to have been delivered
> only once, and that's not because I was awake enough to notice and
> remove the extra cc ;-).  So my guess at this point is that you do
> have some de-dup in there, but it ain't working for gmail-originated
> messages.

As near as I can tell, GMail delivered the message to us in two
independent runs with two connections to our mail server, while your
server only delivered one message in one run to our server.

I'm guessing that your server realized it was the same MX for both
postgresql.org and lists.postgresql.org and expected our server to
handle delivering to the multiple addresses, but PGLister, for a given
email that comes in, is only going to deliver once to each of the lists
that are listed in the inbound email.  On the other hand, GMail seems to
split the email on the source side for each domain/subdomain and
delivers them independently.

Unfortunately, we aren't going to be able to depend on the sender's MTA
to always put the message into one email to us, as made clear by GMail
but also because it's not really "correct."  We need to have a
message-id cache in the PG database that will throw away dups when they
come in on a per-list basis.  I don't anticipate it being too difficult
to implement, really, but I think we'll need it to last at least a
couple of days which implies having a cleanup job for it, et al.

Thanks!

Stephen

pgsql-www by date:

Previous
From: Tom Lane
Date:
Subject: Re: PGLister fails to de-dup messages addressed twice to same list
Next
From: Pierre Giraud
Date:
Subject: [pgcommitfest2] update README