Thread: Bizarre delays in mailing list messages

Bizarre delays in mailing list messages

From

Tom Lane

Date:

17 October 2012, 18:05:33

Since the cutover in mailing list servers a few months ago, I've been
noticing that some messages suffer unexpected delivery delays,
particularly on the pgsql-committers list.  An example today was that
I pushed two patches to three different branches at approximately 16:39
UTC.  Of the resulting six -committers messages, the arrival times
looked like this:

Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])by sss.pgh.pa.us (8.14.5/8.14.5) with ESMTP
idq9HGdDxS022945for <tgl@sss.pgh.pa.us>; Wed, 17 Oct 2012 12:39:14 -0400 (EDT)
 
Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])by sss.pgh.pa.us (8.14.5/8.14.5) with ESMTP
idq9HGdf14022979for <tgl@sss.pgh.pa.us>; Wed, 17 Oct 2012 12:39:42 -0400 (EDT)
 
Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])by sss.pgh.pa.us (8.14.5/8.14.5) with ESMTP
idq9HGdvZm022994for <tgl@sss.pgh.pa.us>; Wed, 17 Oct 2012 12:39:58 -0400 (EDT)
 
Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])by sss.pgh.pa.us (8.14.5/8.14.5) with ESMTP
idq9HGkIU9023125for <tgl@sss.pgh.pa.us>; Wed, 17 Oct 2012 12:46:18 -0400 (EDT)
 
Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])by sss.pgh.pa.us (8.14.5/8.14.5) with ESMTP
idq9HH11SO025023for <tgl@sss.pgh.pa.us>; Wed, 17 Oct 2012 13:01:02 -0400 (EDT)
 
Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])by sss.pgh.pa.us (8.14.5/8.14.5) with ESMTP
idq9HHNFDL027425for <tgl@sss.pgh.pa.us>; Wed, 17 Oct 2012 13:23:16 -0400 (EDT)
 

The next line down in each message shows that it got to
malur.postgresql.org promptly enough, eg for the last one

Received: from localhost ([127.0.0.1] helo=postgresql.org)by malur.postgresql.org with smtp (Exim 4.72)(envelope-from
<pgsql-committers-owner+M49505=tgl=sss.pgh.pa.us@postgresql.org>)id1TOWet-0004e8-5nfor tgl@sss.pgh.pa.us; Wed, 17 Oct
201216:39:51 +0000
 

so it seems like there is something flaky in malur's mail queuing logic.
I don't mind delays of a few minutes, but when it takes most of an hour
it seems like something must be wrong.
        regards, tom lane

Re: Bizarre delays in mailing list messages

From

Alvaro Herrera

Date:

17 October 2012, 19:34:51

Tom Lane wrote:
> Since the cutover in mailing list servers a few months ago, I've been
> noticing that some messages suffer unexpected delivery delays,
> particularly on the pgsql-committers list.  An example today was that
> I pushed two patches to three different branches at approximately 16:39
> UTC.

Hmm, yeah, I see that malur wimped out of sending a couple of those
timely due to high system load:

2012-10-17 16:39:48 1TOWeq-0004di-GZ <= pgsql-committers-owner+M49504=tgl=sss.pgh.pa.us@postgresql.org H=localhost
(postgresql.org)[127.0.0.1] P=smtp S=4405 id=E1TOWeC-0007fb-Eu@gemulon.postgresql.org 
2012-10-17 16:39:48 1TOWeq-0004di-GZ no immediate delivery: load average 10.54
2012-10-17 16:46:18 1TOWeq-0004di-GZ => tgl@sss.pgh.pa.us R=dnslookup T=remote_smtp H=sss.pgh.pa.us [66.207.139.130]
2012-10-17 16:46:18 1TOWeq-0004di-GZ Completed

2012-10-17 16:39:47 1TOWep-0004da-Ho <= pgsql-committers-owner+M49507=tgl=sss.pgh.pa.us@postgresql.org H=localhost
(postgresql.org)[127.0.0.1] P=smtp S=3831 id=E1TOWeC-0007fV-Bi@gemulon.postgresql.org 
2012-10-17 16:39:47 1TOWep-0004da-Ho no immediate delivery: load average 10.54
2012-10-17 17:01:02 1TOWep-0004da-Ho => tgl@sss.pgh.pa.us R=dnslookup T=remote_smtp H=sss.pgh.pa.us [66.207.139.130]
2012-10-17 17:01:02 1TOWep-0004da-Ho Completed

This is not specific to pgsql-committers in any way; I see a lot of
messages delayed like that at particular points in time.  When we did
the migration, we discussed the idea of having a secondary delivery
helper server, but in the interest of keeping things simple we stayed
away from it then.  We will need to discuss it to figure out the best
way to deal with it.

Thanks for pointing it out.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Bizarre delays in mailing list messages

From

"Greg Sabino Mullane"

Date:

20 October 2012, 02:13:03

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> 2012-10-17 16:39:48 1TOWeq-0004di-GZ no immediate delivery: load average 10.54
>
> This is not specific to pgsql-committers in any way; I see a lot of
> messages delayed like that at particular points in time.  When we did
> the migration, we discussed the idea of having a secondary delivery
> helper server, but in the interest of keeping things simple we stayed
> away from it then.  We will need to discuss it to figure out the best
> way to deal with it.

Can you simply bump that limit up for now? Unless this is a single-CPU 
box, 10 is really not so high of a load that I would think it needs 
to start delaying messages.

- -- 
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 201210192211
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlCCCHgACgkQvJuQZxSWSsgVPQCdFdg4NaOpVmWGyBI+IYDSQzz+
G7EAnj4xQmyAYXlSKzOEGjwK53LtDOX9
=dYQ9
-----END PGP SIGNATURE-----

Re: Bizarre delays in mailing list messages

From

Stefan Kaltenbrunner

Date:

30 November 2012, 20:07:42

On 10/20/2012 04:12 AM, Greg Sabino Mullane wrote:
> 
>> 2012-10-17 16:39:48 1TOWeq-0004di-GZ no immediate delivery: load average 10.54
> 
>> This is not specific to pgsql-committers in any way; I see a lot of
>> messages delayed like that at particular points in time.  When we did
>> the migration, we discussed the idea of having a secondary delivery
>> helper server, but in the interest of keeping things simple we stayed
>> away from it then.  We will need to discuss it to figure out the best
>> way to deal with it.
> 
> Can you simply bump that limit up for now? Unless this is a single-CPU 
> box, 10 is really not so high of a load that I would think it needs 
> to start delaying messages.

as an update to this - the problem itself still exists(though as we
think in a more limited form) but we are starting to get a handle on
what exactly triggers the problem, which might help in
diagnosing^understanding it fully.



Stefan