Thread: Bizarre delays in mailing list messages
Since the cutover in mailing list servers a few months ago, I've been noticing that some messages suffer unexpected delivery delays, particularly on the pgsql-committers list. An example today was that I pushed two patches to three different branches at approximately 16:39 UTC. Of the resulting six -committers messages, the arrival times looked like this: Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])by sss.pgh.pa.us (8.14.5/8.14.5) with ESMTP idq9HGdDxS022945for <tgl@sss.pgh.pa.us>; Wed, 17 Oct 2012 12:39:14 -0400 (EDT) Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])by sss.pgh.pa.us (8.14.5/8.14.5) with ESMTP idq9HGdf14022979for <tgl@sss.pgh.pa.us>; Wed, 17 Oct 2012 12:39:42 -0400 (EDT) Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])by sss.pgh.pa.us (8.14.5/8.14.5) with ESMTP idq9HGdvZm022994for <tgl@sss.pgh.pa.us>; Wed, 17 Oct 2012 12:39:58 -0400 (EDT) Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])by sss.pgh.pa.us (8.14.5/8.14.5) with ESMTP idq9HGkIU9023125for <tgl@sss.pgh.pa.us>; Wed, 17 Oct 2012 12:46:18 -0400 (EDT) Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])by sss.pgh.pa.us (8.14.5/8.14.5) with ESMTP idq9HH11SO025023for <tgl@sss.pgh.pa.us>; Wed, 17 Oct 2012 13:01:02 -0400 (EDT) Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])by sss.pgh.pa.us (8.14.5/8.14.5) with ESMTP idq9HHNFDL027425for <tgl@sss.pgh.pa.us>; Wed, 17 Oct 2012 13:23:16 -0400 (EDT) The next line down in each message shows that it got to malur.postgresql.org promptly enough, eg for the last one Received: from localhost ([127.0.0.1] helo=postgresql.org)by malur.postgresql.org with smtp (Exim 4.72)(envelope-from <pgsql-committers-owner+M49505=tgl=sss.pgh.pa.us@postgresql.org>)id1TOWet-0004e8-5nfor tgl@sss.pgh.pa.us; Wed, 17 Oct 201216:39:51 +0000 so it seems like there is something flaky in malur's mail queuing logic. I don't mind delays of a few minutes, but when it takes most of an hour it seems like something must be wrong. regards, tom lane
Tom Lane wrote: > Since the cutover in mailing list servers a few months ago, I've been > noticing that some messages suffer unexpected delivery delays, > particularly on the pgsql-committers list. An example today was that > I pushed two patches to three different branches at approximately 16:39 > UTC. Hmm, yeah, I see that malur wimped out of sending a couple of those timely due to high system load: 2012-10-17 16:39:48 1TOWeq-0004di-GZ <= pgsql-committers-owner+M49504=tgl=sss.pgh.pa.us@postgresql.org H=localhost (postgresql.org)[127.0.0.1] P=smtp S=4405 id=E1TOWeC-0007fb-Eu@gemulon.postgresql.org 2012-10-17 16:39:48 1TOWeq-0004di-GZ no immediate delivery: load average 10.54 2012-10-17 16:46:18 1TOWeq-0004di-GZ => tgl@sss.pgh.pa.us R=dnslookup T=remote_smtp H=sss.pgh.pa.us [66.207.139.130] 2012-10-17 16:46:18 1TOWeq-0004di-GZ Completed 2012-10-17 16:39:47 1TOWep-0004da-Ho <= pgsql-committers-owner+M49507=tgl=sss.pgh.pa.us@postgresql.org H=localhost (postgresql.org)[127.0.0.1] P=smtp S=3831 id=E1TOWeC-0007fV-Bi@gemulon.postgresql.org 2012-10-17 16:39:47 1TOWep-0004da-Ho no immediate delivery: load average 10.54 2012-10-17 17:01:02 1TOWep-0004da-Ho => tgl@sss.pgh.pa.us R=dnslookup T=remote_smtp H=sss.pgh.pa.us [66.207.139.130] 2012-10-17 17:01:02 1TOWep-0004da-Ho Completed This is not specific to pgsql-committers in any way; I see a lot of messages delayed like that at particular points in time. When we did the migration, we discussed the idea of having a secondary delivery helper server, but in the interest of keeping things simple we stayed away from it then. We will need to discuss it to figure out the best way to deal with it. Thanks for pointing it out. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > 2012-10-17 16:39:48 1TOWeq-0004di-GZ no immediate delivery: load average 10.54 > > This is not specific to pgsql-committers in any way; I see a lot of > messages delayed like that at particular points in time. When we did > the migration, we discussed the idea of having a secondary delivery > helper server, but in the interest of keeping things simple we stayed > away from it then. We will need to discuss it to figure out the best > way to deal with it. Can you simply bump that limit up for now? Unless this is a single-CPU box, 10 is really not so high of a load that I would think it needs to start delaying messages. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 201210192211 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAlCCCHgACgkQvJuQZxSWSsgVPQCdFdg4NaOpVmWGyBI+IYDSQzz+ G7EAnj4xQmyAYXlSKzOEGjwK53LtDOX9 =dYQ9 -----END PGP SIGNATURE-----
On 10/20/2012 04:12 AM, Greg Sabino Mullane wrote: > >> 2012-10-17 16:39:48 1TOWeq-0004di-GZ no immediate delivery: load average 10.54 > >> This is not specific to pgsql-committers in any way; I see a lot of >> messages delayed like that at particular points in time. When we did >> the migration, we discussed the idea of having a secondary delivery >> helper server, but in the interest of keeping things simple we stayed >> away from it then. We will need to discuss it to figure out the best >> way to deal with it. > > Can you simply bump that limit up for now? Unless this is a single-CPU > box, 10 is really not so high of a load that I would think it needs > to start delaying messages. as an update to this - the problem itself still exists(though as we think in a more limited form) but we are starting to get a handle on what exactly triggers the problem, which might help in diagnosing^understanding it fully. Stefan