Thread: what is up with the PG mailing lists?
I've seen enormously erratic performance from the mailing lists all day today --- delivery time for individual messages is up to several hours, but completely inconsistent. Is it just me, or are other people seeing the same? regards, tom lane
Tom Lane wrote: > I've seen enormously erratic performance from the mailing lists all day > today --- delivery time for individual messages is up to several hours, > but completely inconsistent. Is it just me, or are other people seeing > the same? I see the same. My posting to core took +2 hours to arrive and there are only a few subscribers. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I haven't noticed, but haven't been watchign for it ... can you send me full headers of one such affected? This one here seemed to be 'within minutes' :( - --On Wednesday, October 31, 2007 23:26:35 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > I've seen enormously erratic performance from the mailing lists all day > today --- delivery time for individual messages is up to several hours, > but completely inconsistent. Is it just me, or are other people seeing > the same? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHKVBd4QvfyHIvDvMRAkZEAJ9Y0A56g4rbkRKSfFSe51foW2Ff9gCcDc8n TPM1xzjxPNorQjXo/2CUlro= =ayj9 -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Thursday, November 01, 2007 00:02:20 -0400 Bruce Momjian <bruce@momjian.us> wrote: > Tom Lane wrote: >> I've seen enormously erratic performance from the mailing lists all day >> today --- delivery time for individual messages is up to several hours, >> but completely inconsistent. Is it just me, or are other people seeing >> the same? > > I see the same. My posting to core took +2 hours to arrive and there > are only a few subscribers. 'k, what is useful is that *when* you see this, you send me the full headers so that I can look at, and hopefully figur eout, where the delay is ... if its something on our side, knowing when you see the problem, vs hours later really helps ... I can't fidn something I don't know is broke, and 2+ hours to deliver to -core is definitely something being broke ... Now that I know, I also think I know what the problem was, and have fixed it ... was doing some cleaning up of the virus/spam checker, and took out one too many IPs from DNS, so it was running at half-capacity ... there might still be some backlog to go through, so give it a few hours, but if you are still seeing issues tomorrow, send me full headers of one that is having the problem and I'll look into it further ... - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHKVKs4QvfyHIvDvMRAi+wAJ9z4fWsMDL2UXy0fedNPTQPwC33PwCdHU2w lo0nkBR+P9Lau2JImbtmc/s= =1fZG -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ya, Bruce sent me one, and I think its the problem that I just fixed, where the scanner was running half-throttle ... let me know if you see any of these tomorrow, but I think we should be good now ... Thanks ... - --On Thursday, November 01, 2007 00:29:20 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Marc G. Fournier" <scrappy@hub.org> writes: >> I haven't noticed, but haven't been watchign for it ... can you send me full >> headers of one such affected? This one here seemed to be 'within minutes' >> :( > > Here's headers from a message I sent to -core today, which had about an > hour turnaround ... > > [ btw, it's pretty funny that -core messages get List-Unsubscribe: and > suchlike decorations ... ya think any of us are about to unsubscribe? ] > > regards, tom lane > > ------- Forwarded Message > > Received: from mx1.hub.org (mx1.hub.org [200.46.208.251]) > by sss.pgh.pa.us (8.14.1/8.14.1) with ESMTP id l9VMaxDa022176 > for <tgl@sss.pgh.pa.us>; Wed, 31 Oct 2007 18:37:01 -0400 (EDT) > Received: from postgresql.org (postgresql.org [200.46.204.71]) > by mx1.hub.org (Postfix) with ESMTP id E2D04607491 > for <tgl@sss.pgh.pa.us>; Wed, 31 Oct 2007 19:36:59 -0300 (ADT) > Received: from localhost (unknown [200.46.204.184]) > by postgresql.org (Postfix) with ESMTP id 00DCE9FCEDF > for <pgsql-core-postgresql.org@postgresql.org>; Wed, 31 Oct 2007 19:35:50 > -0300 (ADT) Received: from postgresql.org ([200.46.204.71]) > by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024) > with ESMTP id 11420-01-10 for <pgsql-core-postgresql.org@postgresql.org>; > Wed, 31 Oct 2007 19:35:13 -0300 (ADT) > X-Greylist: from auto-whitelisted by SQLgrey-1.7.5 > Received: from sss.pgh.pa.us (sss.pgh.pa.us [66.207.139.130]) > by postgresql.org (Postfix) with ESMTP id 34FA79FCF4F > for <pgsql-core@postgresql.org>; Wed, 31 Oct 2007 18:33:32 -0300 (ADT) > Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) > by sss.pgh.pa.us (8.14.1/8.14.1) with ESMTP id l9VLXUxR021063; > Wed, 31 Oct 2007 17:33:30 -0400 (EDT) > To: Devrim GÜNDÜZ <devrim@CommandPrompt.com> > cc: "pgsql-core@postgresql.org" <pgsql-core@postgresql.org> > Subject: Re: [CORE] [Fwd: [pgsql-packagers] 8.3 beta2, I am building Solaris > binaries] In-reply-to: <1193856881.10273.32.camel@laptop.gunduz.org> > References: <1193856881.10273.32.camel@laptop.gunduz.org> > Comments: In-reply-to Devrim GÜNDÜZ > <devrim@CommandPrompt.com> message dated "Wed, 31 Oct 2007 11:54:41 -0700" > Date: Wed, 31 Oct 2007 17:33:30 -0400 > Message-ID: <21062.1193866410@sss.pgh.pa.us> > From: Tom Lane <tgl@sss.pgh.pa.us> > X-Virus-Scanned: Maia Mailguard 1.0.1 > X-Mailing-List: pgsql-core > List-Archive: <http://archives.postgresql.org/pgsql-core> > List-Help: <mailto:majordomo@postgresql.org?body=help> > List-ID: <pgsql-core.postgresql.org> > List-Owner: <mailto:pgsql-core-owner@postgresql.org> > List-Post: <mailto:pgsql-core@postgresql.org> > List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-core> > List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-core> > Precedence: bulk > Sender: pgsql-core-owner@postgresql.org > > ------- End of Forwarded Message > - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHKVaz4QvfyHIvDvMRAutkAKDPLtgSspSOs+aaFbgPq9THiZluIgCgnej/ E1TY3nASMGjsuTh+MR8bSCU= =r3Ql -----END PGP SIGNATURE-----
Bruce Momjian wrote: > Tom Lane wrote: >> I've seen enormously erratic performance from the mailing lists all day >> today --- delivery time for individual messages is up to several hours, >> but completely inconsistent. Is it just me, or are other people seeing >> the same? > > I see the same. My posting to core took +2 hours to arrive and there > are only a few subscribers. > This is all fairly normal behavior for our lists :-( The mails are usually stuck in a queue somewhere at hub.org. Sometimes it's the antispam queue, somtimes it's just the slow network connection to the panama server, sometimes somewhere else - I don't think it's been fully diagnosed since it keeps happening quite often. You can usually get a decent pointer as to where it's stuck by looking at the Received-headers in the message. //Magnus
"Marc G. Fournier" <scrappy@hub.org> writes: > I haven't noticed, but haven't been watchign for it ... can you send me full > headers of one such affected? This one here seemed to be 'within minutes' :( Here's headers from a message I sent to -core today, which had about an hour turnaround ... [ btw, it's pretty funny that -core messages get List-Unsubscribe: and suchlike decorations ... ya think any of us are about to unsubscribe? ] regards, tom lane ------- Forwarded Message Received: from mx1.hub.org (mx1.hub.org [200.46.208.251])by sss.pgh.pa.us (8.14.1/8.14.1) with ESMTP id l9VMaxDa022176for<tgl@sss.pgh.pa.us>; Wed, 31 Oct 2007 18:37:01 -0400 (EDT) Received: from postgresql.org (postgresql.org [200.46.204.71])by mx1.hub.org (Postfix) with ESMTP id E2D04607491for <tgl@sss.pgh.pa.us>;Wed, 31 Oct 2007 19:36:59 -0300 (ADT) Received: from localhost (unknown [200.46.204.184])by postgresql.org (Postfix) with ESMTP id 00DCE9FCEDFfor <pgsql-core-postgresql.org@postgresql.org>;Wed, 31 Oct 2007 19:35:50 -0300 (ADT) Received: from postgresql.org ([200.46.204.71])by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024)withESMTP id 11420-01-10 for <pgsql-core-postgresql.org@postgresql.org>;Wed, 31 Oct 2007 19:35:13 -0300 (ADT) X-Greylist: from auto-whitelisted by SQLgrey-1.7.5 Received: from sss.pgh.pa.us (sss.pgh.pa.us [66.207.139.130])by postgresql.org (Postfix) with ESMTP id 34FA79FCF4Ffor <pgsql-core@postgresql.org>;Wed, 31 Oct 2007 18:33:32 -0300 (ADT) Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])by sss.pgh.pa.us (8.14.1/8.14.1) with ESMTP id l9VLXUxR021063;Wed,31 Oct 2007 17:33:30 -0400 (EDT) To: Devrim GÜNDÜZ <devrim@CommandPrompt.com> cc: "pgsql-core@postgresql.org" <pgsql-core@postgresql.org> Subject: Re: [CORE] [Fwd: [pgsql-packagers] 8.3 beta2, I am building Solaris binaries] In-reply-to: <1193856881.10273.32.camel@laptop.gunduz.org> References: <1193856881.10273.32.camel@laptop.gunduz.org> Comments: In-reply-to Devrim GÜNDÜZ <devrim@CommandPrompt.com>message dated "Wed, 31 Oct 2007 11:54:41 -0700" Date: Wed, 31 Oct 2007 17:33:30 -0400 Message-ID: <21062.1193866410@sss.pgh.pa.us> From: Tom Lane <tgl@sss.pgh.pa.us> X-Virus-Scanned: Maia Mailguard 1.0.1 X-Mailing-List: pgsql-core List-Archive: <http://archives.postgresql.org/pgsql-core> List-Help: <mailto:majordomo@postgresql.org?body=help> List-ID: <pgsql-core.postgresql.org> List-Owner: <mailto:pgsql-core-owner@postgresql.org> List-Post: <mailto:pgsql-core@postgresql.org> List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-core> List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-core> Precedence: bulk Sender: pgsql-core-owner@postgresql.org ------- End of Forwarded Message
Tom Lane wrote: > "Marc G. Fournier" <scrappy@hub.org> writes: >> I haven't noticed, but haven't been watchign for it ... can you send me full >> headers of one such affected? This one here seemed to be 'within minutes' :( > > Here's headers from a message I sent to -core today, which had about an > hour turnaround ... That one is clearly sitting waiting to go through the hub.org antispam for about an hour (between 18:33 ADT and 19:35 ADT). Oh, and the message you sent with this info just took about over 4 hours to get here, which happens fairly often. Headers from this one: Received: from mx2.hub.org (mx2.hub.org [200.46.204.254])by svr2.hagander.net (Postfix) with ESMTP id EC910DCC975for <magnus@hagander.net>;Thu, 1 Nov 2007 09:47:48 +0100 (CET) Received: from postgresql.org (postgresql.org [200.46.204.71])by mx2.hub.org (Postfix) with ESMTP id 4040A8B2859for <magnus@hagander.net>;Thu, 1 Nov 2007 05:47:46 -0300 (ADT) Received: from localhost (unknown [200.46.204.184])by postgresql.org (Postfix) with ESMTP id CA66D9FB96Ffor <pgsql-www-postgresql.org@postgresql.org>;Thu, 1 Nov 2007 01:29:30 -0300 (ADT) Received: from postgresql.org ([200.46.204.71])by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024)withESMTP id 91423-02-4 for <pgsql-www-postgresql.org@postgresql.org>;Thu, 1 Nov 2007 01:29:27 -0300 (ADT) X-Greylist: from auto-whitelisted by SQLgrey-1.7.5 Received: from sss.pgh.pa.us (sss.pgh.pa.us [66.207.139.130])by postgresql.org (Postfix) with ESMTP id 6F0E89FA40Cfor <pgsql-www@postgreSQL.org>;Thu, 1 Nov 2007 01:29:23 -0300 (ADT) Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])by sss.pgh.pa.us (8.14.1/8.14.1) with ESMTP id lA14TKZJ026670;Thu,1 Nov 2007 00:29:20 -0400 (EDT) In this case, it took 4 hours to get from postgresql.org to mx2.hub.org, whereas all other steps were quick. So something's broken in the mail delivery apart from the antispam. //Magnus
Magnus Hagander wrote: > Tom Lane wrote: >> "Marc G. Fournier" <scrappy@hub.org> writes: >>> I haven't noticed, but haven't been watchign for it ... can you send me full >>> headers of one such affected? This one here seemed to be 'within minutes' :( >> Here's headers from a message I sent to -core today, which had about an >> hour turnaround ... > > That one is clearly sitting waiting to go through the hub.org antispam > for about an hour (between 18:33 ADT and 19:35 ADT). > > Oh, and the message you sent with this info just took about over 4 hours > to get here, which happens fairly often. Headers from this one: It sat in the mod queue until I approved it earlier. /D
Dave Page wrote: > Magnus Hagander wrote: >> Tom Lane wrote: >>> "Marc G. Fournier" <scrappy@hub.org> writes: >>>> I haven't noticed, but haven't been watchign for it ... can you send me full >>>> headers of one such affected? This one here seemed to be 'within minutes' :( >>> Here's headers from a message I sent to -core today, which had about an >>> hour turnaround ... >> That one is clearly sitting waiting to go through the hub.org antispam >> for about an hour (between 18:33 ADT and 19:35 ADT). >> >> Oh, and the message you sent with this info just took about over 4 hours >> to get here, which happens fairly often. Headers from this one: > > It sat in the mod queue until I approved it earlier. Ah, that may explain some of the delays I've been seeing. But only those between those two hops (postgresql -> hub) I guess. //Magnus
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Thursday, November 01, 2007 10:30:38 +0100 Magnus Hagander <magnus@hagander.net> wrote: > Dave Page wrote: >> Magnus Hagander wrote: >>> Tom Lane wrote: >>>> "Marc G. Fournier" <scrappy@hub.org> writes: >>>>> I haven't noticed, but haven't been watchign for it ... can you send me >>>>> full headers of one such affected? This one here seemed to be 'within >>>>> minutes' :( >>>> Here's headers from a message I sent to -core today, which had about an >>>> hour turnaround ... >>> That one is clearly sitting waiting to go through the hub.org antispam >>> for about an hour (between 18:33 ADT and 19:35 ADT). >>> >>> Oh, and the message you sent with this info just took about over 4 hours >>> to get here, which happens fairly often. Headers from this one: >> >> It sat in the mod queue until I approved it earlier. > > Ah, that may explain some of the delays I've been seeing. But only those > between those two hops (postgresql -> hub) I guess. Probably alot of the problems you've been seeing have to do with moderator approval times for some posts ... Tom doesn't usually let me go very long with big delays before letting me know there "might be a problem" ... - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHKbY/4QvfyHIvDvMRAqehAKDSwRGjk9+f3M9j/ln5OyOy2RV61wCgp/XA PqDh7vJH3jzNjmMuVwWSZbA= =K9py -----END PGP SIGNATURE-----
On Thu, Nov 01, 2007 at 08:19:27AM -0300, Marc G. Fournier wrote: > > Probably alot of the problems you've been seeing have to do with moderator > approval times for some posts ... Tom doesn't usually let me go very long with > big delays before letting me know there "might be a problem" ... That could be -- I have been very busy lately, and having some trouble keeping up with moderation. Also, I stopped moderating -advocacy while Josh has his current rules in place, because I didn't feel comfortable moderating out a discussion I was part of. So some of this might just be moderation time. A -- Andrew Sullivan | ajs@crankycanuck.ca The whole tendency of modern prose is away from concreteness. --George Orwell
On Thu, Nov 01, 2007 at 12:29:20AM -0400, Tom Lane wrote: > > [ btw, it's pretty funny that -core messages get List-Unsubscribe: and > suchlike decorations ... ya think any of us are about to unsubscribe? ] Those headers are described in RFC 2369 <ftp://ftp.rfc-editor.org/in-notes/rfc2369.txt>, which says that if you have them, you SHOULD have all of them. I expect the ML manager just inserts them automatically for every list. A -- Andrew Sullivan | ajs@crankycanuck.ca I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin
Dave Page <dpage@postgresql.org> writes: > Magnus Hagander wrote: >> Oh, and the message you sent with this info just took about over 4 hours >> to get here, which happens fairly often. Headers from this one: > It sat in the mod queue until I approved it earlier. Magnus, don't you get notices from the mail daemon when a post of yours is held for moderator approval? I do, so I know the difference between "slow" and "no moderator handy" ... (I think actually that this behavior isn't default, which strikes me as a pretty dang poorly chosen default.) regards, tom lane
Tom Lane wrote: > Dave Page <dpage@postgresql.org> writes: >> Magnus Hagander wrote: >>> Oh, and the message you sent with this info just took about over 4 hours >>> to get here, which happens fairly often. Headers from this one: > >> It sat in the mod queue until I approved it earlier. > > Magnus, don't you get notices from the mail daemon when a post of yours > is held for moderator approval? I do, so I know the difference between > "slow" and "no moderator handy" ... > > (I think actually that this behavior isn't default, which strikes me as > a pretty dang poorly chosen default.) Yeah. For my own posts it's generally not moderation. It happens with big attachments sometimes, and then I get a notice. So in the cases that my own mails are delayed to the lists, it's one of the other problems. Harder to tell about somebody else. Would be kinda handy if majordomo would put that into the headers somewhere - e.g. "X-Released-From-Moderation: <date>" or something like that. //Magnus
On Thu, Nov 01, 2007 at 07:36:50AM -0400, Andrew Sullivan wrote: > <ftp://ftp.rfc-editor.org/in-notes/rfc2369.txt>, which says that if > you have them, you SHOULD have all of them. Actually, I just re-read it, and it doesn't say that. It says that if you have anything, you should have -help (and at the URL, you SHOULD have complete information for everything else). I'd still bet the ML manager just does these automatically, and I think it'd be a bad idea to make this list different than the others, since most MUAs can be configured to suppress certain headers in the normal view. A -- Andrew Sullivan | ajs@crankycanuck.ca Everything that happens in the world happens at some place. --Jane Jacobs
On Thu, 01 Nov 2007 08:13:28 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dave Page <dpage@postgresql.org> writes: > > Magnus Hagander wrote: > >> Oh, and the message you sent with this info just took about over 4 > >> hours to get here, which happens fairly often. Headers from this > >> one: > > > It sat in the mod queue until I approved it earlier. > > Magnus, don't you get notices from the mail daemon when a post of > yours is held for moderator approval? I do, so I know the difference > between "slow" and "no moderator handy" ... > > (I think actually that this behavior isn't default, which strikes me > as a pretty dang poorly chosen default.) I find the moderator argument holding zero water. We have a lot of moderators including myself (recently added) on a couple (if not all?) lists. Yesterday Devrim approved several messages that came through that I didn't see until hours later. Not because I wasn't checking email but because they never showed. Now, we know the current problem with that, which Marc just fixed. However the "mailing list" problem is a constant. Sometimes they work, sometimes I don't get messages for hours. This is not the first time I or others have brought up the mailing list issues. It would be great if the actual sysadmin team had management ability on the mail servers. I would actually argue that it should be a requirement and that the fact that the sysadmin team doesn't is a real problem. Note we still don't have documentation on this stuff, even though the request has come through well over a dozen times and been willingly ignored. Sincerely, Joshua D. Drake > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- TIP 6: explain analyze is your > friend > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On Thu, Nov 01, 2007 at 08:09:59AM -0700, Joshua D. Drake wrote: > However the "mailing list" problem is a constant. Sometimes they work, > sometimes I don't get messages for hours. This is not the first time I > or others have brought up the mailing list issues. There are indeed sometimes mailing list latency issues. But I caution everyone in being too glib about some of this: 1. All the mail RFCs are totally clear that latency is to be expected in the mail system. Every time I hear complaints about mail latency that entails delays of merely hours, I worry that people are treating SMTP as though it's XMPP. It ain't, and it's designed _not_ to be. 2. There are plenty of individual relays involved here, and saying "it's slow" without mail headers is no more helpful in demystifying mail issues than are posts to -performance without EXPLAIN ANALYSE output. 3. We know that sometimes, moderation _does_ cost. This is especially true because we've already cranked up a lot of rules to capture common abuses (spam, common admin keywords) that are far from free to run on lists with the volume of mail the postgres lists get. So we're really paying for two moderations: humans, and machines. > It would be great if the actual sysadmin team had management ability on > the mail servers. This seems true to me. More important, > Note we still don't have documentation on this stuff I think this is a very serious problem. Some of the issues have been perplexing to diagnose because of the poor documentation. We talked about this most recently with respect to MX records and higher-preference-number MXes having the user list from the final destination, so that we could generate rejects consistently, IIRC. A -- Andrew Sullivan | ajs@crankycanuck.ca
Andrew Sullivan wrote: > On Thu, Nov 01, 2007 at 08:09:59AM -0700, Joshua D. Drake wrote: >> However the "mailing list" problem is a constant. Sometimes they work, >> sometimes I don't get messages for hours. This is not the first time I >> or others have brought up the mailing list issues. > > There are indeed sometimes mailing list latency issues. But I > caution everyone in being too glib about some of this: > > 1. All the mail RFCs are totally clear that latency is to be > expected in the mail system. Every time I hear complaints about mail > latency that entails delays of merely hours, I worry that people are > treating SMTP as though it's XMPP. It ain't, and it's designed _not_ > to be. There's a difference between acceptable delay and what we're often getting. Sure, SMTP should have latency. But a modern SMTP system shouldn't take hours to deliver an email. > 2. There are plenty of individual relays involved here, and > saying "it's slow" without mail headers is no more helpful in > demystifying mail issues than are posts to -performance without > EXPLAIN ANALYSE output. Sure. But I can tell you that *every single time* I've looked at latencies, the problem has been at postgresql.org or hub.org. And in my own case, there is just one relay on the way, usually with a latency of <5 seconds. > 3. We know that sometimes, moderation _does_ cost. This is > especially true because we've already cranked up a lot of rules to > capture common abuses (spam, common admin keywords) that are far from > free to run on lists with the volume of mail the postgres lists get. > So we're really paying for two moderations: humans, and machines. That's very true. >> It would be great if the actual sysadmin team had management ability on >> the mail servers. > > This seems true to me. More important, > >> Note we still don't have documentation on this stuff > > I think this is a very serious problem. Some of the issues have been > perplexing to diagnose because of the poor documentation. We talked > about this most recently with respect to MX records and > higher-preference-number MXes having the user list from the final > destination, so that we could generate rejects consistently, IIRC. Can't agree more. //Magnus
On Thu, 01 Nov 2007 16:30:13 +0100 Magnus Hagander <magnus@hagander.net> wrote: > > 1. All the mail RFCs are totally clear that latency is to be > > expected in the mail system. Every time I hear complaints about > > mail latency that entails delays of merely hours, I worry that > > people are treating SMTP as though it's XMPP. It ain't, and it's > > designed _not_ to be. > > There's a difference between acceptable delay and what we're often > getting. Sure, SMTP should have latency. But a modern SMTP system > shouldn't take hours to deliver an email. Exactly. It is pretty silly to think that a modern, well engineered system will take hours to deliver mail. It is like people have just been brow beaten into accepting the poor performance of the lists. There are exception of course... I run into Exchange not really likely greylisting for example. > > 2. There are plenty of individual relays involved here, and > > saying "it's slow" without mail headers is no more helpful in > > demystifying mail issues than are posts to -performance without > > EXPLAIN ANALYSE output. > > Sure. But I can tell you that *every single time* I've looked at > latencies, the problem has been at postgresql.org or hub.org. And in > my own case, there is just one relay on the way, usually with a > latency of <5 seconds. > This is also the case with me and I just gave up because nobody actually seems to care about how bad the performance really is. > >> It would be great if the actual sysadmin team had management > >> ability on the mail servers. > > > > This seems true to me. More important, > > > >> Note we still don't have documentation on this stuff > > > > I think this is a very serious problem. Some of the issues have > > been perplexing to diagnose because of the poor documentation. We > > talked about this most recently with respect to MX records and > > higher-preference-number MXes having the user list from the final > > destination, so that we could generate rejects consistently, IIRC. > > Can't agree more. > I wish -core actually realized how good it could be. Thousands of people rely on these lists. We advertise them as the form of community support. They are, outside of the code our most important feature that we provide to our community. Yet... Sincerely, Joshua D. Drake > //Magnus > > ---------------------------(end of > broadcast)--------------------------- TIP 1: if posting/reading > through Usenet, please send an appropriate subscribe-nomail command > to majordomo@postgresql.org so that your message can get through to > the mailing list cleanly > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Thursday, November 01, 2007 16:30:13 +0100 Magnus Hagander <magnus@hagander.net> wrote: > There's a difference between acceptable delay and what we're often > getting. Sure, SMTP should have latency. But a modern SMTP system > shouldn't take hours to deliver an email. Well, when you can guarantee that the network connections between any relay server from the originator to the recipient is 100% up 100% of the time ... and that the mail server at each of the relays is processing properly 100% of the time ... and that there are no network delay issues at any one of the dozen or so routers, and ... then talk to me ... > Sure. But I can tell you that *every single time* I've looked at > latencies, the problem has been at postgresql.org or hub.org. And in my > own case, there is just one relay on the way, usually with a latency of > <5 seconds. Wow, I didn't know you were in Panama ... I have about 12 routers I go through from here, creating at least 12 failure points right there ... you being on the same network definitely should reduce your latency, we should look into that ... what is your IP, so that I can confirm that from our end, we are seeing you as being 1 router away also ... maybe bad routing? - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHKfUS4QvfyHIvDvMRAuyCAJ0dfZImj9c9c/ND2Oqkj2+pgh3HhACfSM1Y XwmNsYAj2dl8V9pWiGpmvq8= =SbaF -----END PGP SIGNATURE-----
Marc G. Fournier wrote: > > > --On Thursday, November 01, 2007 16:30:13 +0100 Magnus Hagander > <magnus@hagander.net> wrote: > >> There's a difference between acceptable delay and what we're often >> getting. Sure, SMTP should have latency. But a modern SMTP system >> shouldn't take hours to deliver an email. > > Well, when you can guarantee that the network connections between any relay > server from the originator to the recipient is 100% up 100% of the time ... and > that the mail server at each of the relays is processing properly 100% of the > time ... and that there are no network delay issues at any one of the dozen or > so routers, and ... then talk to me ... No. All those cases are reasons for acceptable delays. But how often does say network connectivity go away for an hour? If they do, you need to better hosting provider. >> Sure. But I can tell you that *every single time* I've looked at >> latencies, the problem has been at postgresql.org or hub.org. And in my >> own case, there is just one relay on the way, usually with a latency of >> <5 seconds. > > Wow, I didn't know you were in Panama ... I have about 12 routers I go through > from here, creating at least 12 failure points right there ... you being on the > same network definitely should reduce your latency, we should look into that > ... what is your IP, so that I can confirm that from our end, we are seeing you > as being 1 router away also ... maybe bad routing? Notice, I say 5 *seconds*, not 5 *milliseconds*. And yes, the mx machines of hub.org *most of the time* deliver in less than 5 seconds, once it reaches the final hop inside hub.org. This mail for example hit hub.org 200.46.204.184 (with broken name lookup, it seems) at 12:48:12 ADT, and arrived on my server at 16:48:14 CET, which means it took *2* seconds. A couple of minutes delay is perfectly acceptable. A couple of hours is an indication that something is wrong. //Magnus
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Thursday, November 01, 2007 16:59:14 +0100 Magnus Hagander <magnus@hagander.net> wrote: > No. All those cases are reasons for acceptable delays. But how often > does say network connectivity go away for an hour? If they do, you need > to better hosting provider. You really don't have a clue on how an SMTP server works, do you? If delivery fails, it backs up and tries again *later* ... if there is a high volume of email going through said server, *later* could very well be 1 hour ... and, in fact, its an incremental backup, so it actually works out to be something like: Try now, fail, try in 5 minutes, fail, try in 10 minutes, fail, try in 20 minutes, fail, etc ... I'm not sure if its a simple '2x' algorithm, but the delay between attempts does get progressively greater, so if it fails after trying at '40 minutes', then it will be another hour and a half after *that* beofre it will try again, etc ... > A couple of minutes delay is perfectly acceptable. A couple of hours is > an indication that something is wrong. Well, when you see a couple of hours delay, then do something *useful* and let me know ... the only *useful* reports I've had in the past 24 hours dealt with a problem that Tom reported yesterday and that I fixed within minutes of him reporting ... the headers that you and Bruce sent me were *from that problem* ... - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHKfvB4QvfyHIvDvMRAgApAJ4vB/bLJJaclnOD+OWG0J3P75YBgQCfdsG+ w8+K+VDaTerVojFGd3+DyHU= =dynu -----END PGP SIGNATURE-----
On Thu, 01 Nov 2007 13:16:01 -0300 "Marc G. Fournier" <scrappy@hub.org> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > - --On Thursday, November 01, 2007 16:59:14 +0100 Magnus Hagander > <magnus@hagander.net> wrote: > > > > No. All those cases are reasons for acceptable delays. But how often > > does say network connectivity go away for an hour? If they do, you > > need to better hosting provider. > > You really don't have a clue on how an SMTP server works, do you? Let's not turn this into an attack fest please. > If > delivery fails, it backs up and tries again *later* ... if there is a Why did delivery fail is the question. > high volume of email going through said server, *later* could very > well be 1 hour ... and, in fact, its an incremental backup, so it > actually works out to be something like: O.k. this is true but there is something wrong here. Why is the server so backed up that this is happening in the first place. If the server is actually loaded that much, we need to figure out how to get a better server for mail shipped to panama. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On Thu, Nov 01, 2007 at 12:47:30PM -0300, Marc G. Fournier wrote: > > Wow, I didn't know you were in Panama ... I have about 12 routers I > go through from here, creating at least 12 failure points right > there Um, if that's the case, then I _strongly_ suggest you learn the meaning of "BGP" and implement it. No ISP should have 12 _actual_ failure points from 12 routers in this day and age. A -- Andrew Sullivan | ajs@crankycanuck.ca The very definition of "news" is "something that hardly ever happens." --Bruce Schneier
On Thu, Nov 01, 2007 at 08:47:52AM -0700, Joshua D. Drake wrote: > Exactly. It is pretty silly to think that a modern, well engineered > system will take hours to deliver mail. You have a strange idea of "well engineered" for very large mail systems. I have news for those on this list who do not operate systems with thousands of simultaneous users: your assumption that a couple hundred users scales to several thousands by adding hardware is as wrong in mailing lists and DNS systems as it would be for newbies implementing PostgreSQL systems. It don't work that way, folks. Keep in mind that the mail server relies on DNS servers that are out there in ISP-land, and not in your-control land. Since DNS lookups can result in soft failures, it's not at all impossible that mail to your specific mailbox will get delayed by annoying DNS holdups that have more to do with, for instance, random nasty people trying to prevent resolution of every site under .org. Or under .postgresql.org (I'd be interested to see graphs of query volumes against that domain, BTW. Because I bet they're all over the place in unpredictable ways). This means that, even though the mail has made it into the relay, it might not make it out anything like as fast as you think, and there's more than one system that can easily cause the problem. Now, multiply by, say, 1000 simultaneous delivery attempts, and you have significant load issues that no mail system can solve, because the fundamental Internet infrastructure is kinda broken that way. So, without clear outlines of _exactly_ where the problem is, which means logging of months of headers, putting aside only those messages that hung up, this is going to amount to nothing more than "you did x" " no I didn't lalalalala" discussion. You want this to get better? Capture your logs, and let's do some analysis. A -- Andrew Sullivan | ajs@crankycanuck.ca When my information changes, I alter my conclusions. What do you do sir? --attr. John Maynard Keynes
On Thu, Nov 01, 2007 at 04:30:13PM +0100, Magnus Hagander wrote: > getting. Sure, SMTP should have latency. But a modern SMTP system > shouldn't take hours to deliver an email. This isn't automatically true, and is explicitly contradicted by the relevant RFCs. I think it shouldn't be the _habit_ on such systems, but AFAICT it isn't. But "hours to deliver an email" is in fact totally reasonable on a busy system. I think good mail administrators aim for "in general, minutes". The problem here is the perception that it is too often outside the "in general" assumption. I think that perhaps it'd be more useful in this discussion to archive, over (say) six months, cases where you think the headers are showing unexplainable lag. I think there probably _is_ a problem, actually, but I haven't yet written a procmail recipe to catch all pg-[list] mail that has any header where the hop time was (say) over one hour. _That_ is the sort of catalogue we need. A -- Andrew Sullivan | ajs@crankycanuck.ca The fact that technology doesn't work is no bar to success in the marketplace. --Philip Greenspun
On Thu, Nov 01, 2007 at 01:16:01PM -0300, Marc G. Fournier wrote: > minutes, fail, etc ... I'm not sure if its a simple '2x' algorithm, but the No, it's a progressive backoff with some randomisation in every modern server I know of. That's because, if there's some periodic issue that causes DoS, then just backing off the same period would do it again. We already _had_ that nightmare on the Internet, and we hope not to experience it again ;-) A -- Andrew Sullivan | ajs@crankycanuck.ca I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Thursday, November 01, 2007 09:28:06 -0700 "Joshua D. Drake" <jd@commandprompt.com> wrote: >> If >> delivery fails, it backs up and tries again *later* ... if there is a > > Why did delivery fail is the question. My logs go back 7 days for email ... if we had a message-id, we could look at that sort of thing relatively easily ... > O.k. this is true but there is something wrong here. Why is the server > so backed up that this is happening in the first place. I'm not so sure that it is 'so backed up' ... but, looking at mailq righ tnow, there are only 3867 messages in the queue ... some of the error codes are stuff like: "450 mailbox unavailable" - great, should be 550, but as its 450, we'll keep retrying "lost connection with mail.mindef.mil.gt" "Host or domain name not found" "451 SPAM not accepted" - again, if you aren't going to accept it, why temporary fail it so that I try again? "Greylisting in action" But, again, it would be interesting to look at a specific one, with message-id, and see what the logs show for reasons of delays ... - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHKgH34QvfyHIvDvMRAtJBAKCnRMv+Lb31p6l+tBJEQLL85q4oCACg5xv7 wlUXWJcjMh37496mJbyU8nc= =LwYQ -----END PGP SIGNATURE-----
On Thu, Nov 01, 2007 at 01:42:31PM -0300, Marc G. Fournier wrote: > "450 mailbox unavailable" > - great, should be 550, but as its 450, we'll keep retrying Hehe. We just had this one, no? ;-) This is Yet Another Victim of the "safe" mail delivery rules in postfix. They're safe for the recipient, of course, but hazardous to the Net. > "lost connection with mail.mindef.mil.gt" This is exactly the sort of thing we have to worry about. On global lists with subscribers possibly on crappy connections, the cost to the relaying mail server go up _exponentially_. Not everyone is in well-connected locales, and managing backoff queues is bloody expensive. Add a few thousand users to the mix, and things go south very quickly. > "Host or domain name not found" This should just bounce, no? > "451 SPAM not accepted" > - again, if you aren't going to accept it, why temporary fail it so that I > try again? Indeed. Some clever anti-spam people think that this is a clever mechanism -- they "force the spammers to pay extra". The problem is when their not-quite-as-clever-as-they-thought filters catch something the wrong way. The entire Internet pays. (This is also, by the way, the basis for the "no thanks" response we should and usually do give to people who show up with 80% solutions for "performance problems" in PostgreSQL.) A -- Andrew Sullivan | ajs@crankycanuck.ca The whole tendency of modern prose is away from concreteness. --George Orwell
On Thu, Nov 01, 2007 at 03:23:12PM -0400, Tom Lane wrote: > Well, the current problems seem to be entirely inside hub.org, and so > all this banter about DNS etc seems not relevant. A handy example is Could be. The main point is to get examples of the sort you just provided -- if we can get a nice collection, then it becomes reasonably likely we'll track the issue. That said, I'm not suggesting that DNS _is_ the problem, merely that it could be. Of course, if Marc is using the same DNS for internal and external DNS queries, and the externals are under attack, then his internal DNS is hosed. Or if there's a reverse tree somewhere in the path that isn't working 100% of the time, and the mail servers are all trying to do reverse lookups. (This is a remarkably common source of breakage, I have learned as a result of my work on a current internet draft.) Mail gremlins are notorious for their difficulty in debugging partly because of all the horrible interaction with DNS. Add a couple different name servers, 3 MX records, and the occasional bad reverse path, and you're into serious pain, especially after your spam filter proceeds to do much of it all over again. Even a single hop can be expensive then. But I do think there's something going on in the suspicious hop, and only Marc can diagnose it. A -- Andrew Sullivan | ajs@crankycanuck.ca "The year's penultimate month" is not in truth a good way of saying November. --H.W. Fowler
Andrew Sullivan wrote: > On Thu, Nov 01, 2007 at 04:30:13PM +0100, Magnus Hagander wrote: >> getting. Sure, SMTP should have latency. But a modern SMTP system >> shouldn't take hours to deliver an email. > > This isn't automatically true, and is explicitly contradicted by the > relevant RFCs. I think it shouldn't be the _habit_ on such systems, > but AFAICT it isn't. AFAICT, it is. And remember that we're talking about delivery *between two internal relay machines*. Not delivery to the end user. //Magnus
Marc G. Fournier wrote: >> No. All those cases are reasons for acceptable delays. But how often >> does say network connectivity go away for an hour? If they do, you need >> to better hosting provider. > > You really don't have a clue on how an SMTP server works, do you? If delivery Well, it's been a couple of years since I last wrote a code patch for a SMTP server, but yeah, I have a fair clue on how it works. And I do run servers that deliver some 100,000 mails a day. I know, it's not much, but I know enough to keep those working, and I've never seen internal delays like what we're seeing here. > fails, it backs up and tries again *later* ... if there is a high volume of > email going through said server, *later* could very well be 1 hour ... and, in > fact, its an incremental backup, so it actually works out to be something like: > > Try now, fail, try in 5 minutes, fail, try in 10 minutes, fail, try in 20 > minutes, fail, etc ... I'm not sure if its a simple '2x' algorithm, but the > delay between attempts does get progressively greater, so if it fails after > trying at '40 minutes', then it will be another hour and a half after *that* > beofre it will try again, etc ... That's an implementation detail, that differs wildly between different SMTP servers. But you already know that of course. Postfix, specifically, implements a '2x' algorithm. There's also a minimum backoff time (configurable in new versions, previously fixed at 1000 seconds) and a maximum backoff time (configurable). The main question remains. As Tom posted again in this thread, the delay happens *internally between hub.org machines*. By your reasoning, that means it's getting multiple failures to move mail internally. To me, that's a clear indication that something is wrong. I'm sorry to hear you don't agree. >> A couple of minutes delay is perfectly acceptable. A couple of hours is >> an indication that something is wrong. > > Well, when you see a couple of hours delay, then do something *useful* and let > me know ... the only *useful* reports I've had in the past 24 hours dealt with > a problem that Tom reported yesterday and that I fixed within minutes of him > reporting ... the headers that you and Bruce sent me were *from that problem* > ... I have given up. I used to send these, but nothing is fixed. Maybe I should set up a procmail script to capture them... Oh, and the headers I sent were because the email was stuck in the moderation queue. //Magnus
"Marc G. Fournier" <scrappy@hub.org> writes: > But, again, it would be interesting to look at a specific one, with message-id, > and see what the logs show for reasons of delays ... Well, the current problems seem to be entirely inside hub.org, and so all this banter about DNS etc seems not relevant. A handy example is this same message I'm replying to, which seems to have been hung up for 2.5 hours: Received: from maia-2.hub.org (maia-2.hub.org [200.46.204.187] (may be forged))by sss.pgh.pa.us (8.14.1/8.14.1) with ESMTPid lA1JCPC2010552for <tgl@sss.pgh.pa.us>; Thu, 1 Nov 2007 15:12:26 -0400 (EDT) Received: from postgresql.org (postgresql.org [200.46.204.71])by maia-2.hub.org (Postfix) with ESMTP id 912252C9544for <tgl@sss.pgh.pa.us>;Thu, 1 Nov 2007 16:12:22 -0300 (ADT) Received: from localhost (unknown [200.46.204.184])by postgresql.org (Postfix) with ESMTP id 9E4B49FC802for <pgsql-www-postgresql.org@postgresql.org>;Thu, 1 Nov 2007 16:12:18 -0300 (ADT) Received: from postgresql.org ([200.46.204.71])by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024)withESMTP id 62395-04-7 for <pgsql-www-postgresql.org@postgresql.org>;Thu, 1 Nov 2007 16:12:11 -0300 (ADT) Received: from hub.org (hub.org [200.46.204.220])by postgresql.org (Postfix) with ESMTP id E4E039FC8E6for <pgsql-www@postgresql.org>;Thu, 1 Nov 2007 14:22:40 -0300 (ADT) Received: from localhost (unknown [200.46.204.184])by hub.org (Postfix) with ESMTP id A428DB475C8for <pgsql-www@postgresql.org>;Thu, 1 Nov 2007 13:43:35 -0300 (ADT) Received: from hub.org ([200.46.204.220])by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024)with ESMTPid 56221-03; Thu, 1 Nov 2007 13:43:01 -0300 (ADT) Received: from fserv.hub.org (blk-137-93-67.eastlink.ca [24.137.93.67])by hub.org (Postfix) with ESMTP id BB0BCB47581;Thu, 1 Nov 2007 13:43:02 -0300 (ADT) Received: from [192.168.1.2] (unknown [192.168.1.2])by fserv.hub.org (Postfix) with ESMTP id AD2C34E165;Thu, 1 Nov 200713:43:02 -0300 (ADT) Date: Thu, 01 Nov 2007 13:42:31 -0300 From: "Marc G. Fournier" <scrappy@hub.org> To: "Joshua D. Drake" <jd@commandprompt.com> cc: Magnus Hagander <magnus@hagander.net>, Andrew Sullivan <ajs@crankycanuck.ca>, pgsql-www@postgresql.org Subject: Re: [pgsql-www] what is up with the PG mailing lists? Message-ID: <A43323733D66A14A086CABF7@ganymede.hub.org> In-Reply-To: <20071101092806.3b1fa452@scratch> References: <25716.1193887595@sss.pgh.pa.us> <DADF296033D290EF9634F611@ganymede.hub.org> <26669.1193891360@sss.pgh.pa.us> <47299585.7030402@hagander.net> <47299957.5020605@postgresql.org> <2968.1193919208@sss.pgh.pa.us> <20071101080959.49f3087b@scratch> <20071101152333.GM27676@crankycanuck.ca> <4729F105.30704@hagander.net> <1127E6493CBA8A29F343C4D7@ganymede.hub.org> <4729F7D2.6050608@hagander.net> <AD9BF3BA60F6634EA7FCDB76@ganymede.hub.org><20071101092806.3b1fa452@scratch> X-Mailer: Mulberry/4.0.8 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Virus-Scanned: Maia Mailguard 1.0.1 X-Virus-Scanned: Maia Mailguard 1.0.1 X-Mailing-List: pgsql-www List-Archive: <http://archives.postgresql.org/pgsql-www> List-Help: <mailto:majordomo@postgresql.org?body=help> List-ID: <pgsql-www.postgresql.org> List-Owner: <mailto:pgsql-www-owner@postgresql.org> List-Post: <mailto:pgsql-www@postgresql.org> List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-www> List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-www> Precedence: bulk Sender: pgsql-www-owner@postgresql.org regards, tom lane
Marc G. Fournier wrote: > Well, when you see a couple of hours delay, then do something *useful* and let > me know ... the only *useful* reports I've had in the past 24 hours dealt with > a problem that Tom reported yesterday and that I fixed within minutes of him > reporting ... the headers that you and Bruce sent me were *from that problem* > ... OK, how about this one. This is from the first message I've received since about 4:45PM GMT (~4 hours ago) - that alone tells me something is wrong; it's not even that quiet on Christmas day! Return-Path: <jd@commandprompt.com> X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) ondeveloper.pgadmin.org X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=UNPARSEABLE_RELAYautolearn=unavailable version=3.2.3 Received: from postgresql.org (postgresql.org [200.46.204.71])by developer.pgadmin.org (8.13.8/8.13.8) with ESMTP id lA1KXTXQ016661for<dpage@pgadmin.org>; Thu, 1 Nov 2007 20:33:29 GMT Received: from localhost (unknown [200.46.204.184])by postgresql.org (Postfix) with ESMTP id 843079FCD57for <dpage@pgadmin.org>;Thu, 1 Nov 2007 17:33:29 -0300 (ADT) Received: from postgresql.org ([200.46.204.71])by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024)withESMTP id 04880-01-2 for <dpage@pgadmin.org>;Thu, 1 Nov 2007 17:33:24 -0300 (ADT) Received: by postgresql.org (Postfix, from userid 60)id D07CE9FD9EE; Thu, 1 Nov 2007 15:47:00 -0300 (ADT) Received: from postgresql.org ([unix socket]) by postgresql.org (Cyrus v2.3.7) with LMTPA; Thu, 01 Nov 2007 15:46:59 -0300 X-Sieve: CMU Sieve 2.3 Received: from localhost (unknown [200.46.204.184])by postgresql.org (Postfix) with ESMTP id 231EA9FD9EBfor <dpage@postgresql.org>;Thu, 1 Nov 2007 15:46:59 -0300 (ADT) Received: from postgresql.org ([200.46.204.71])by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024)withESMTP id 32167-01-2 for <dpage@postgresql.org>;Thu, 1 Nov 2007 15:46:49 -0300 (ADT) X-Greylist: from auto-whitelisted by SQLgrey-1.7.5 Received: from lists.commandprompt.com (host-254.commandprompt.net [207.173.203.254])by postgresql.org (Postfix) with ESMTP id D3AE19FDA94for <dpage@postgresql.org>; Thu, 1 Nov 2007 13:58:41-0300 (ADT) Received: from scratch (or-69-34-217-90.sta.embarqhsd.net [69.34.217.90])(authenticated bits=0)by lists.commandprompt.com(8.13.7/8.13.6) with ESMTP id lA1GvME5003246;Thu, 1 Nov 2007 09:57:22 -0700 Date: Thu, 1 Nov 2007 09:57:23 -0700 From: "Joshua D. Drake" <jd@commandprompt.com> To: Dave Page <dpage@postgresql.org> Cc: "Marc G. Fournier" <scrappy@hub.org>, Postgresql Funds Group<funds-group@pgfoundry.org> Subject: Re: [PG-FG] Possible request for funds: PGCon Brazil Message-ID: <20071101095723.28b61042@scratch> In-Reply-To: <472917A2.1010007@postgresql.org> /D
Dave Page wrote: > Marc G. Fournier wrote: >> Well, when you see a couple of hours delay, then do something *useful* and let >> me know ... the only *useful* reports I've had in the past 24 hours dealt with >> a problem that Tom reported yesterday and that I fixed within minutes of him >> reporting ... the headers that you and Bruce sent me were *from that problem* >> ... > > OK, how about this one. This is from the first message I've received > since about 4:45PM GMT (~4 hours ago) - that alone tells me something is > wrong; it's not even that quiet on Christmas day! > > > Return-Path: <jd@commandprompt.com> Here's another one that may be of interest as it's not a list message. I won't send any more unless you ask for them - suffice it to say I have now had two or three others, all equally late; I assume you fed something a suitable laxative. Return-Path: <do_not_reply_con_en@euro.apple.com> X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) ondeveloper.pgadmin.org X-Spam-Level: **** X-Spam-Status: No, score=4.3 required=5.0 tests=HTML_MESSAGE,MIME_HEADER_CTYPE_ONLY,MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,MPART_ALT_DIFF,SPF_SOFTFAIL,UNPARSEABLE_RELAY autolearn=noversion=3.2.3 Received: from postgresql.org (postgresql.org [200.46.204.71])by developer.pgadmin.org (8.13.8/8.13.8) with ESMTP id lA1KnoXN016954for<dpage@pgadmin.org>; Thu, 1 Nov 2007 20:49:50 GMT Received: from localhost (unknown [200.46.204.184])by postgresql.org (Postfix) with ESMTP id 36A949FC908for <dpage@pgadmin.org>;Thu, 1 Nov 2007 17:47:17 -0300 (ADT) Received: from postgresql.org ([200.46.204.71])by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024)withESMTP id 28152-02 for <dpage@pgadmin.org>;Thu, 1 Nov 2007 17:47:11 -0300 (ADT) Received: by postgresql.org (Postfix, from userid 60)id BEB7B9FCC63; Thu, 1 Nov 2007 17:24:59 -0300 (ADT) Received: from postgresql.org ([unix socket]) by postgresql.org (Cyrus v2.3.7) with LMTPA; Thu, 01 Nov 2007 17:24:59 -0300 X-Sieve: CMU Sieve 2.3 Received: from localhost (unknown [200.46.204.184])by postgresql.org (Postfix) with ESMTP id 251529FCC5Efor <dpage@postgresql.org>;Thu, 1 Nov 2007 17:24:59 -0300 (ADT) Received: from postgresql.org ([200.46.204.71])by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024)withESMTP id 85606-02-9 for <dpage@postgresql.org>;Thu, 1 Nov 2007 17:24:44 -0300 (ADT) X-Greylist: from auto-whitelisted by SQLgrey-1.7.5 Received: from bz3.apple.com (bz3.apple.com [17.254.13.38])by postgresql.org (Postfix) with ESMTP id 1AFA29FD783for <dpage@postgresql.org>;Thu, 1 Nov 2007 15:32:09 -0300 (ADT) Received: from stratford.corp.apple.com (unknown [17.34.104.143])by bz3.apple.com (Postfix) with ESMTP id D4B974845906for<dpage@postgresql.org>; Thu, 1 Nov 2007 11:26:07 -0700 (PDT) Received: by stratford.corp.apple.com (Postfix, from userid 2396)id CD25E6C6E57; Thu, 1 Nov 2007 18:26:07 +0000 (GMT) To: dpage@postgresql.org From: do_not_reply_con_en@euro.apple.com Reply-To: Subject: Your Apple Order WXXXXXXX Has Been Shipped Content-type: multipart/alternative; boundary=Apple-Mail-1--981739137 Message-Id: <20071101182607.CD25E6C6E57@stratford.corp.apple.com> Date: Thu, 1 Nov 2007 18:26:07 +0000 (GMT) X-Virus-Scanned: Maia Mailguard 1.0.1 /D
Magnus Hagander <magnus@hagander.net> writes: > Oh, and the headers I sent were because the email was stuck in the > moderation queue. Yeah --- one of the worst problems in diagnosing this is that there's no way for anyone except the moderator to know if a delay was simply waiting-for-moderation or if it indicates an actual system problem. What are the chances of getting something into the headers to indicate whether a message was delayed by moderation? regards, tom lane
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Oh, and the headers I sent were because the email was stuck in the >> moderation queue. > > Yeah --- one of the worst problems in diagnosing this is that there's > no way for anyone except the moderator to know if a delay was simply > waiting-for-moderation or if it indicates an actual system problem. > > What are the chances of getting something into the headers to indicate > whether a message was delayed by moderation? Not sure - Marc? You're the one who know majordomo... Anyway, I just ran a script on all the mail sitting in my mailbox from postgresql mailinglists delivered in the past two hours. It's a total of 22 mails (should be more, I know - but I need to deal with differences in date before I get there, this is a first rough cut of the script). Here's the top lines from the query: select path,count(*),avg(t)::int from data group by path order by 2 desc,3 desc; on this data. Basically, it's the paths that appear often, and the average delivery time between them (in seconds). I've only included paths that appear 7 or more times, plus the two rows for mails posted by Tom and me (there are others that appear twice as well, but I'm not including those). This is all delivery to me, which is why the path mx2.hub.org->svr2.hagander.net and svr2.hagander.net->svr2.hagander.net (my antispam) is included. path | count | avg ---------------------------------------------------------------------------------------+-------+------ postgresql.org -> mx2.hub.org | 22 | 57 svr2.hagander.net -> svr2.hagander.net | 22 | 1 mx2.hub.org -> svr2.hagander.net | 22 | 1 postgresql.org -> localhost (mx1.hub.org [200.46.204.184]) | 15 | 3335 localhost (mx1.hub.org [200.46.204.184]) -> postgresql.org | 15 | 23 postgresql.org -> localhost (mx1.hub.org [200.46.204.183]) | 7 | 2710 localhost (mx1.hub.org [200.46.204.183]) -> postgresql.org | 7 | 17 sss.pgh.pa.us (8.14.1/8.14.1) -> postgresql.org | 3 | 8 svr2.hagander.net -> postgresql.org | 2 | 2 (sorry about the breakup of the table, it's the long lines for mx1.hub.org. Table attached as well for easier reading) To me this clearly shows that mail is backed up between postgresql.org (this is svr1, if I'm not mistaken) and mx1.hub.org - which has two different IPs, but claims to be localhost (that can't be right, can it?). FYI, the count measures the difference in time between the Received: lines in the header. It's not fool-proof, but it works on all the paths inside hub.org and from there to my machine - it misses some originating MUAs (it's missed 2 out of the 22 emails parsed here) //Magnus path | count | avg ---------------------------------------------------------------------------------------+-------+------ postgresql.org -> mx2.hub.org | 22 | 57 svr2.hagander.net -> svr2.hagander.net | 22 | 1 mx2.hub.org -> svr2.hagander.net | 22 | 1 postgresql.org -> localhost (mx1.hub.org [200.46.204.184]) | 15 | 3335 localhost (mx1.hub.org [200.46.204.184]) -> postgresql.org | 15 | 23 postgresql.org -> localhost (mx1.hub.org [200.46.204.183]) | 7 | 2710 localhost (mx1.hub.org [200.46.204.183]) -> postgresql.org | 7 | 17 sss.pgh.pa.us (8.14.1/8.14.1) -> postgresql.org | 3 | 8 svr2.hagander.net -> postgresql.org | 2 | 2
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Thursday, November 01, 2007 21:05:50 +0000 Dave Page <dpage@postgresql.org> wrote: > Here's another one that may be of interest as it's not a list message. I > won't send any more unless you ask for them - suffice it to say I have > now had two or three others, all equally late; I assume you fed > something a suitable laxative. postfix stop; postfix start - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHKk3L4QvfyHIvDvMRAt4oAKCzIP6qOEqQpb9ZmtS1JWrTa1IxdgCfUSwj CRNo0ToGuGYHSHUEjqBDciQ= =w3Ly -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Thursday, November 01, 2007 13:36:32 -0400 Andrew Sullivan <ajs@crankycanuck.ca> wrote: >> "Host or domain name not found" > > This should just bounce, no? The full message looks like: Host or domain name not found. Name service error for name=sigma.fr type=MX: Host not found, try again so it looks like postfix is treating it as a soft failure, vs hard ... - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHKj8a4QvfyHIvDvMRAv2aAKDX4bESMwksuzO25XzKNgLKrZLy/wCgq4J8 G3carTgbRkBkivyeAkMSXSU= =pWw6 -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Thursday, November 01, 2007 17:31:15 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Oh, and the headers I sent were because the email was stuck in the >> moderation queue. > > Yeah --- one of the worst problems in diagnosing this is that there's > no way for anyone except the moderator to know if a delay was simply > waiting-for-moderation or if it indicates an actual system problem. > > What are the chances of getting something into the headers to indicate > whether a message was delayed by moderation? I've asked on the mj2 list about this when someone mentioned it earlier in the thread, and am just awaiting confirmation, but the initial answer seemed to indicate that someone would have to do some perl programming for this, as it isn't a pre-defined variable that we can access ... As I said, I'm waiting to hear back, but if my assessment is right, anyone interested in writing some perl code? Have both a 'X-Approved-Date:' and an 'X-Approved-Delay:' header would be cool ... - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHKlBM4QvfyHIvDvMRAmQKAJ9BQiojK16/y8rbh6K12GYMarEkKQCfd590 wUJFiXruux4ZK+OJI6qlrt8= =O8yr -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This is where I hate debugging email issues ... Bruce IMd me about this, I looked around and the only problem that I could see was that the caching DNS server within the VPS itself was still reporting one scanner, so I restarted named, and postfix after to clear out any caches it might have, and mail started flowing again ... The only thing I have in the log file is: Nov 1 14:22:41 postgresql postfix/smtpd[26360]: E4E039FC8E6: client=hub.org[200.46.204.220] Nov 1 14:22:41 postgresql postfix/cleanup[73513]: E4E039FC8E6: message-id=<A43323733D66A14A086CABF7@ganymede.hub.org> Nov 1 14:23:58 postgresql postfix/qmgr[7655]: E4E039FC8E6: from=<scrappy@hub.org>, size=3553, nrcpt=1 (queue active) Nov 1 16:12:19 postgresql postfix/smtp[51424]: E4E039FC8E6: to=<pgsql-www-postgresql.org@postgresql.org>, orig_to=<pgsql-www@postgresql.org>, relay=maia.hub.org[200.46.204.184]:10024, conn_use=7, delay=6578, delays=78/6496/0.01/3.9, dsn=2.6.0, status=sent (250 2.6.0 Ok, id=62395-04-7, from MTA: 250 2.0.0 Ok: queued as 9E4B49FC802) Nov 1 16:12:19 postgresql postfix/qmgr[7655]: E4E039FC8E6: removed Which shows the 'delay', but, there are no errors on the scanner server, nor did I touch either of htem to fix the problem ... so it looks like the scanner received it and hung trying to send it back, until I reset the receiving side, whereby it finished .. But there is nothing on either side to indicate a problem ... but restarting postfix 'unstuck' it ... - --On Thursday, November 01, 2007 15:23:12 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Marc G. Fournier" <scrappy@hub.org> writes: >> But, again, it would be interesting to look at a specific one, with >> message-id, and see what the logs show for reasons of delays ... > > Well, the current problems seem to be entirely inside hub.org, and so > all this banter about DNS etc seems not relevant. A handy example is > this same message I'm replying to, which seems to have been hung up > for 2.5 hours: > > Received: from maia-2.hub.org (maia-2.hub.org [200.46.204.187] (may be > forged)) by sss.pgh.pa.us (8.14.1/8.14.1) with ESMTP id lA1JCPC2010552 > for <tgl@sss.pgh.pa.us>; Thu, 1 Nov 2007 15:12:26 -0400 (EDT) > Received: from postgresql.org (postgresql.org [200.46.204.71]) > by maia-2.hub.org (Postfix) with ESMTP id 912252C9544 > for <tgl@sss.pgh.pa.us>; Thu, 1 Nov 2007 16:12:22 -0300 (ADT) > Received: from localhost (unknown [200.46.204.184]) > by postgresql.org (Postfix) with ESMTP id 9E4B49FC802 > for <pgsql-www-postgresql.org@postgresql.org>; Thu, 1 Nov 2007 16:12:18 > -0300 (ADT) Received: from postgresql.org ([200.46.204.71]) > by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024) > with ESMTP id 62395-04-7 for <pgsql-www-postgresql.org@postgresql.org>; > Thu, 1 Nov 2007 16:12:11 -0300 (ADT) > Received: from hub.org (hub.org [200.46.204.220]) > by postgresql.org (Postfix) with ESMTP id E4E039FC8E6 > for <pgsql-www@postgresql.org>; Thu, 1 Nov 2007 14:22:40 -0300 (ADT) > Received: from localhost (unknown [200.46.204.184]) > by hub.org (Postfix) with ESMTP id A428DB475C8 > for <pgsql-www@postgresql.org>; Thu, 1 Nov 2007 13:43:35 -0300 (ADT) > Received: from hub.org ([200.46.204.220]) > by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024) > with ESMTP id 56221-03; Thu, 1 Nov 2007 13:43:01 -0300 (ADT) > Received: from fserv.hub.org (blk-137-93-67.eastlink.ca [24.137.93.67]) > by hub.org (Postfix) with ESMTP id BB0BCB47581; > Thu, 1 Nov 2007 13:43:02 -0300 (ADT) > Received: from [192.168.1.2] (unknown [192.168.1.2]) > by fserv.hub.org (Postfix) with ESMTP id AD2C34E165; > Thu, 1 Nov 2007 13:43:02 -0300 (ADT) > Date: Thu, 01 Nov 2007 13:42:31 -0300 > From: "Marc G. Fournier" <scrappy@hub.org> > To: "Joshua D. Drake" <jd@commandprompt.com> > cc: Magnus Hagander <magnus@hagander.net>, > Andrew Sullivan <ajs@crankycanuck.ca>, pgsql-www@postgresql.org > Subject: Re: [pgsql-www] what is up with the PG mailing lists? > Message-ID: <A43323733D66A14A086CABF7@ganymede.hub.org> > In-Reply-To: <20071101092806.3b1fa452@scratch> > References: <25716.1193887595@sss.pgh.pa.us> > <DADF296033D290EF9634F611@ganymede.hub.org> <26669.1193891360@sss.pgh.pa.us> > <47299585.7030402@hagander.net> <47299957.5020605@postgresql.org> > <2968.1193919208@sss.pgh.pa.us> <20071101080959.49f3087b@scratch> > <20071101152333.GM27676@crankycanuck.ca> <4729F105.30704@hagander.net> > <1127E6493CBA8A29F343C4D7@ganymede.hub.org> <4729F7D2.6050608@hagander.net> > <AD9BF3BA60F6634EA7FCDB76@ganymede.hub.org> > <20071101092806.3b1fa452@scratch> X-Mailer: Mulberry/4.0.8 (Linux/x86) > MIME-Version: 1.0 > Content-Type: text/plain; charset=us-ascii > Content-Transfer-Encoding: 7bit > Content-Disposition: inline > X-Virus-Scanned: Maia Mailguard 1.0.1 > X-Virus-Scanned: Maia Mailguard 1.0.1 > X-Mailing-List: pgsql-www > List-Archive: <http://archives.postgresql.org/pgsql-www> > List-Help: <mailto:majordomo@postgresql.org?body=help> > List-ID: <pgsql-www.postgresql.org> > List-Owner: <mailto:pgsql-www-owner@postgresql.org> > List-Post: <mailto:pgsql-www@postgresql.org> > List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-www> > List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-www> > Precedence: bulk > Sender: pgsql-www-owner@postgresql.org > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHKk2T4QvfyHIvDvMRAsA/AKCFj3U3RNAIvehPULV8S8PfbgeLlwCcDZSp qCWZnSOOTTQydS1Tk+e1Vs8= =DFh4 -----END PGP SIGNATURE-----
Marc G. Fournier wrote: > > > --On Thursday, November 01, 2007 21:05:50 +0000 Dave Page > <dpage@postgresql.org> wrote: > >> Here's another one that may be of interest as it's not a list message. I >> won't send any more unless you ask for them - suffice it to say I have >> now had two or three others, all equally late; I assume you fed >> something a suitable laxative. > > postfix stop; postfix start On developer.postgresql.org? Any idea what caused it to hang? Any ideas if it's a state we can monitor in nagios in any sensible way? /D
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Thursday, November 01, 2007 22:25:23 +0000 Dave Page <dpage@postgresql.org> wrote: > Marc G. Fournier wrote: >> >> >> --On Thursday, November 01, 2007 21:05:50 +0000 Dave Page >> <dpage@postgresql.org> wrote: >> >>> Here's another one that may be of interest as it's not a list message. I >>> won't send any more unless you ask for them - suffice it to say I have >>> now had two or three others, all equally late; I assume you fed >>> something a suitable laxative. >> >> postfix stop; postfix start > > On developer.postgresql.org? yes ... > Any idea what caused it to hang? Any ideas > if it's a state we can monitor in nagios in any sensible way? I'm finding absolutely zilch in the log files in the way of errors on either that server, or 200.46.204.184 ... its as if port 10025 (the report port for amavis) just stop'd responding ... Please do look at the logs on the mail server though (and/or login to 204.184) and let me know if you see something that I didn't ... note that on 204.184, there are some 10025 related errors at the time I restarted postfix, which do make sense, since 10025 would have 'disappeared' then ... - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHKlZ04QvfyHIvDvMRAtKpAJ9r2t35sKECVOldiQIwm+ECUr5IQACdGYYd tYmKPQiRH9jsXwXbON/sT0o= =nRk/ -----END PGP SIGNATURE-----
Marc G. Fournier wrote: > > > --On Thursday, November 01, 2007 13:36:32 -0400 Andrew Sullivan > <ajs@crankycanuck.ca> wrote: > >>> "Host or domain name not found" >> This should just bounce, no? > > The full message looks like: > > Host or domain name not found. Name service error for name=sigma.fr type=MX: > Host not found, try again > > so it looks like postfix is treating it as a soft failure, vs hard ... a query for the MX of that domain results in SERVFAIL which MTAs usually treat like a temporary and not a permanent error one. Stefan
Marc G. Fournier wrote: > > This is where I hate debugging email issues ... Bruce IMd me about this, I > looked around and the only problem that I could see was that the caching DNS > server within the VPS itself was still reporting one scanner, so I restarted > named, and postfix after to clear out any caches it might have, and mail > started flowing again ... You know, that DNS hangy thing reminds me a *lot* of the problems we had on wwwmaster prior to the move to tribble. Now, I don't think the move to tribble itself fixed it, but we did upgrade FreeBSD at that time, no? Does this machine by any chance have the same FreeBSD release as the *old* wwwmaster? > But there is nothing on either side to indicate a problem ... but restarting > postfix 'unstuck' it ... We used to have monitoring on the mailqueue length on svr1, that could catch these things. actually, I see that we still have monitoring, but we've disabled notifications for it. Perhaps we should re-enable that to keep track ofthings properly? (Won't work for the notifies sent through -slaves though, since they'll also get caught in the backlog if that happens) BTW, I'm seeing much better times this morning. Just did another run, this is the data for the past 9 hours: path | count | avg ------------------------------------------------------------+-------+-----postgresql.org -> mx2.hub.org | 21 | 169mx2.hub.org -> svr2.hagander.net | 21 | 1svr2.hagander.net -> svr2.hagander.net | 21 | 0postgresql.org -> localhost (mx1.hub.org [200.46.204.184]) | 13 | 18localhost(mx1.hub.org [200.46.204.184]) -> postgresql.org | 13 | 13localhost (mx1.hub.org [200.46.204.183]) -> postgresql.org| 8 | 6postgresql.org -> localhost (mx1.hub.org [200.46.204.183]) | 8 | 5sss.pgh.pa.us (8.14.1/8.14.1)-> postgresql.org | 4 | 15ug-out-1314.google.com -> postgresql.org | 3 | 1wdscexfe01.sc.wdc.com -> postgresql.org | 2 | 17 169 seconds is a lot more reasonable. It's still a bit of time, but I assume that's when it does the spamscanning? (sice it delivers to my machine in no more than a second after that) //Magnus
Marc G. Fournier wrote: > > > --On Thursday, November 01, 2007 17:31:15 -0400 Tom Lane <tgl@sss.pgh.pa.us> > wrote: > >> Magnus Hagander <magnus@hagander.net> writes: >>> Oh, and the headers I sent were because the email was stuck in the >>> moderation queue. >> Yeah --- one of the worst problems in diagnosing this is that there's >> no way for anyone except the moderator to know if a delay was simply >> waiting-for-moderation or if it indicates an actual system problem. > >> What are the chances of getting something into the headers to indicate >> whether a message was delayed by moderation? > > I've asked on the mj2 list about this when someone mentioned it earlier in the > thread, and am just awaiting confirmation, but the initial answer seemed to > indicate that someone would have to do some perl programming for this, as it > isn't a pre-defined variable that we can access ... > > As I said, I'm waiting to hear back, but if my assessment is right, anyone > interested in writing some perl code? Have both a 'X-Approved-Date:' and an > 'X-Approved-Delay:' header would be cool ... Do we have any pointers as to where the code goes? I don't know how to find our way around the code, but if we can get a pointer to where it needs to go, someone can perhaps do something about it. But it's probably as much work for the mj2 folks to tell us where to put it as it would be for them to put it there themselves :-) //Magnus
On Fri, Nov 02, 2007 at 08:32:38AM +0100, Magnus Hagander wrote: > We used to have monitoring on the mailqueue length on svr1, that could > catch these things. actually, I see that we still have monitoring, but > we've disabled notifications for it. Perhaps we should re-enable that to > keep track ofthings properly? This seems like an obvious thing to do, from where I sit. It's going to be rather hard to debug problems that one doesn't notice. > 169 seconds is a lot more reasonable. It's still a bit of time, but I > assume that's when it does the spamscanning? (sice it delivers to my > machine in no more than a second after that) I'd expect spamscanning to be a significant overhead, yes. A -- Andrew Sullivan | ajs@crankycanuck.ca The very definition of "news" is "something that hardly ever happens." --Bruce Schneier
On Thu, 1 Nov 2007 13:14:52 -0400 Andrew Sullivan wrote: Hello all, > Keep in mind that the mail server relies on DNS servers that are out > there in ISP-land, and not in your-control land. speaking of DNS, i get a lot of errors the last days: Nov 6 20:20:41 base postfix/smtpd[15808]: warning: smtpd_peer_init: 200.46.204.187: address not listed for hostname maia-2.hub.org I tested from different hosts in different networks at different times and i can't resolve 'maia-2.hub.org.' to an ip-address. So this seems to be a constant problem. Maybe someone can fix this? > So, without clear outlines of _exactly_ where the problem is, which > means logging of months of headers, putting aside only those messages > that hung up, this is going to amount to nothing more than "you did > x" " no I didn't lalalalala" discussion. You want this to get > better? Capture your logs, and let's do some analysis. I just posted one of the problems, let's start here and move forward. Kind regards -- Andreas 'ads' Scherbaum Failure is not an option. It comes bundled with your Microsoft product.(Ferenc Mantfeld)
"Andreas 'ads' Scherbaum" <adsmail@wars-nicht.de> writes: > speaking of DNS, i get a lot of errors the last days: > Nov 6 20:20:41 base postfix/smtpd[15808]: warning: smtpd_peer_init: 200.46.204.187: address not listed for hostname maia-2.hub.org > I tested from different hosts in different networks at different times and > i can't resolve 'maia-2.hub.org.' to an ip-address. So this seems to be a > constant problem. Yeah, I see the same: 200.46.204.187 resolves as maia-2.hub.org but forward resolution of that name fails. So, inconsistency in the DNS tables someplace ... regards, tom lane
Tom Lane wrote: > "Andreas 'ads' Scherbaum" <adsmail@wars-nicht.de> writes: >> speaking of DNS, i get a lot of errors the last days: > >> Nov 6 20:20:41 base postfix/smtpd[15808]: warning: smtpd_peer_init: 200.46.204.187: address not listed for hostname maia-2.hub.org > >> I tested from different hosts in different networks at different times and >> i can't resolve 'maia-2.hub.org.' to an ip-address. So this seems to be a >> constant problem. > > Yeah, I see the same: 200.46.204.187 resolves as maia-2.hub.org > but forward resolution of that name fails. So, inconsistency in > the DNS tables someplace ... Same issue exists for maia-1.hub.org (from the logs I checked yesterday, so I don't recall the IP. But it should be in Marcs inbox) //Magnus
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 'k, I'm confused ... the only name that should be referenced in any config file is maia.hub.org, which breaks down as: Name: maia.hub.org Address: 200.46.204.184 Name: maia.hub.org Address: 200.46.204.183 So, how/where are ppl seeing 200.46.204.187 :( Unless postfix is caching it internally ... I just restarted postfix, in case that is what is happening, but I just checked the mailq also, and nothing is queue'd up with them ... most odd ... I've re-added them to DNS while I look into it, to at least quiet the error ... - --On Tuesday, November 06, 2007 14:50:09 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Andreas 'ads' Scherbaum" <adsmail@wars-nicht.de> writes: >> speaking of DNS, i get a lot of errors the last days: > >> Nov 6 20:20:41 base postfix/smtpd[15808]: warning: smtpd_peer_init: >> 200.46.204.187: address not listed for hostname maia-2.hub.org > >> I tested from different hosts in different networks at different times and >> i can't resolve 'maia-2.hub.org.' to an ip-address. So this seems to be a >> constant problem. > > Yeah, I see the same: 200.46.204.187 resolves as maia-2.hub.org > but forward resolution of that name fails. So, inconsistency in > the DNS tables someplace ... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHMM8f4QvfyHIvDvMRAh7QAKCSMDavp1+zAJqy7K0VYV24tw7ItACfZmop Wj6rPYXRomHhKevN8p+D3Y4= =BK7d -----END PGP SIGNATURE-----
Marc G. Fournier wrote: > > 'k, I'm confused ... the only name that should be referenced in any config file > is maia.hub.org, which breaks down as: > > Name: maia.hub.org > Address: 200.46.204.184 > Name: maia.hub.org > Address: 200.46.204.183 > > So, how/where are ppl seeing 200.46.204.187 :( dig -x 200.46.204.187 returns 187.204.46.200.in-addr.arpa. 18000 IN PTR maia-2.hub.org. Similar for -1. //Magnus
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Tuesday, November 06, 2007 21:33:08 +0100 Magnus Hagander <magnus@hagander.net> wrote: > Marc G. Fournier wrote: >> >> 'k, I'm confused ... the only name that should be referenced in any config >> file is maia.hub.org, which breaks down as: >> >> Name: maia.hub.org >> Address: 200.46.204.184 >> Name: maia.hub.org >> Address: 200.46.204.183 >> >> So, how/where are ppl seeing 200.46.204.187 :( > > dig -x 200.46.204.187 > > returns > 187.204.46.200.in-addr.arpa. 18000 IN PTR maia-2.hub.org. > > Similar for -1. Right, and? - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHMN6O4QvfyHIvDvMRAmHmAJoC+11Ng+xbh+hgsucLt4Sz1xNGDgCgycmz 9KbkvxdGL5r/5Ilb5wR7x7M= =zki5 -----END PGP SIGNATURE-----
On Tue, 06 Nov 2007 17:37:18 -0400 Marc G. Fournier wrote: > - --On Tuesday, November 06, 2007 21:33:08 +0100 Magnus Hagander > <magnus@hagander.net> wrote: > > > Marc G. Fournier wrote: > >> > >> 'k, I'm confused ... the only name that should be referenced in any config > >> file is maia.hub.org, which breaks down as: > >> > >> Name: maia.hub.org > >> Address: 200.46.204.184 > >> Name: maia.hub.org > >> Address: 200.46.204.183 > >> > >> So, how/where are ppl seeing 200.46.204.187 :( > > > > dig -x 200.46.204.187 > > > > returns > > 187.204.46.200.in-addr.arpa. 18000 IN PTR maia-2.hub.org. > > > > Similar for -1. > > Right, and? ads@iridium:/home/ads > dig -x maia-2.hub.org. dig: '.org.hub.maia-2.in-addr.arpa.' is not a legal name (empty label) And where this came from? Nov 1 20:30:53 base postfix/smtpd[11763]: warning: smtpd_peer_init: 200.46.204.187: address not listed for hostname maia-2.hub.org Nov 1 20:30:54 base postfix/smtpd[11763]: NOQUEUE: reject: RCPT from unknown[200.46.204.187]: 450 <maia-2.hub.org>: Helocommand rejected: Host not found; from=<pgsql-committers-owner+M30281@postgresql.org> to=<adsmail@wars-nicht.de> proto=ESMTPhelo=<maia-2.hub.org> a) the ip-address has an invalid (missing) ptr record b) this server is knowing itself by maia-2.hub.org, which cannot resolved (but matches the invalid ptr record) Bye -- Andreas 'ads' Scherbaum Failure is not an option. It comes bundled with your Microsoft product.(Ferenc Mantfeld)
Andreas 'ads' Scherbaum wrote: > On Tue, 06 Nov 2007 17:37:18 -0400 Marc G. Fournier wrote: > >> - --On Tuesday, November 06, 2007 21:33:08 +0100 Magnus Hagander >> <magnus@hagander.net> wrote: >> >>> Marc G. Fournier wrote: >>>> 'k, I'm confused ... the only name that should be referenced in any config >>>> file is maia.hub.org, which breaks down as: >>>> >>>> Name: maia.hub.org >>>> Address: 200.46.204.184 >>>> Name: maia.hub.org >>>> Address: 200.46.204.183 >>>> >>>> So, how/where are ppl seeing 200.46.204.187 :( >>> dig -x 200.46.204.187 >>> >>> returns >>> 187.204.46.200.in-addr.arpa. 18000 IN PTR maia-2.hub.org. >>> >>> Similar for -1. >> Right, and? > > ads@iridium:/home/ads > dig -x maia-2.hub.org. > dig: '.org.hub.maia-2.in-addr.arpa.' is not a legal name (empty label) -x implies a reverse/PTR lookup so you need to supply an ip-address ... > > > And where this came from? > > Nov 1 20:30:53 base postfix/smtpd[11763]: warning: smtpd_peer_init: 200.46.204.187: address not listed for hostname maia-2.hub.org > Nov 1 20:30:54 base postfix/smtpd[11763]: NOQUEUE: reject: RCPT from unknown[200.46.204.187]: 450 <maia-2.hub.org>: Helocommand rejected: Host not found; from=<pgsql-committers-owner+M30281@postgresql.org> to=<adsmail@wars-nicht.de> proto=ESMTPhelo=<maia-2.hub.org> > > a) the ip-address has an invalid (missing) ptr record > b) this server is knowing itself by maia-2.hub.org, which cannot resolved > (but matches the invalid ptr record) well this problem (200.46.204.187 having a PTR of maia-2.hub.org but forward resolution is not correct/missing) seems to be fixed now (and you log entry is now nearly a week old) Stefan
Marc G. Fournier wrote: > - --On Tuesday, November 06, 2007 21:33:08 +0100 Magnus Hagander > <magnus@hagander.net> wrote: > > > dig -x 200.46.204.187 > > > > returns > > 187.204.46.200.in-addr.arpa. 18000 IN PTR maia-2.hub.org. > > > > Similar for -1. > > Right, and? It's an A RR for maia.hub.org. $ dig maia.hub.org ; <<>> DiG 9.4.1-P1 <<>> maia.hub.org ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53021 ;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 3, ADDITIONAL: 3 ;; QUESTION SECTION: ;maia.hub.org. IN A ;; ANSWER SECTION: maia.hub.org. 86400 IN A 200.46.204.182 maia.hub.org. 86400 IN A 200.46.204.183 maia.hub.org. 86400 IN A 200.46.204.184 maia.hub.org. 86400 IN A 200.46.204.187 maia.hub.org. 86400 IN A 200.46.204.191 These numbers all resolve back to a different name: $ for i in 182 183 184 187 191; do dig +short -x 200.46.204.$i; done maia-5.hub.org. maia-4.hub.org. maia-3.hub.org. maia-2.hub.org. maia-1.hub.org. The names resolve OK to the numbers: $ for i in 182 183 184 187 191; do dig +short -x 200.46.204.$i; done | xargs -n 1 dig +short 200.46.204.182 ;; Warning: ID mismatch: expected ID 15438, got 50558 200.46.204.183 200.46.204.184 200.46.204.187 200.46.204.191 -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)
On Wed, 07 Nov 2007 10:06:31 +0100 Stefan Kaltenbrunner wrote: > Andreas 'ads' Scherbaum wrote: > > > > ads@iridium:/home/ads > dig -x maia-2.hub.org. > > dig: '.org.hub.maia-2.in-addr.arpa.' is not a legal name (empty label) > > -x implies a reverse/PTR lookup so you need to supply an ip-address ... Ups, my fault, thanks. The days ago i mostly used 'host' and got errors back. > well this problem (200.46.204.187 having a PTR of maia-2.hub.org but > forward resolution is not correct/missing) seems to be fixed now (and > you log entry is now nearly a week old) I just grepped the monthly log and copied the first message ;-) The last of this error messages appeared yesterday evening 10pm my time. Since then it seems to work. Bye -- Andreas 'ads' Scherbaum Failure is not an option. It comes bundled with your Microsoft product.(Ferenc Mantfeld)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Actually, what is above is from my fixing the issue ... to date, there has been no indication as to *why* th e original problem, as those hosts were taken out of the RR, since nothing from someone else's server should even be going through those hosts (with or without the RR in place) ... - --On Wednesday, November 07, 2007 08:59:08 -0300 Alvaro Herrera <alvherre@commandprompt.com> wrote: > Marc G. Fournier wrote: > >> - --On Tuesday, November 06, 2007 21:33:08 +0100 Magnus Hagander >> <magnus@hagander.net> wrote: >> >> > dig -x 200.46.204.187 >> > >> > returns >> > 187.204.46.200.in-addr.arpa. 18000 IN PTR maia-2.hub.org. >> > >> > Similar for -1. >> >> Right, and? > > It's an A RR for maia.hub.org. > > $ dig maia.hub.org > > ; <<>> DiG 9.4.1-P1 <<>> maia.hub.org > ;; global options: printcmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53021 > ;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 3, ADDITIONAL: 3 > > ;; QUESTION SECTION: > ;maia.hub.org. IN A > > ;; ANSWER SECTION: > maia.hub.org. 86400 IN A 200.46.204.182 > maia.hub.org. 86400 IN A 200.46.204.183 > maia.hub.org. 86400 IN A 200.46.204.184 > maia.hub.org. 86400 IN A 200.46.204.187 > maia.hub.org. 86400 IN A 200.46.204.191 > > > These numbers all resolve back to a different name: > $ for i in 182 183 184 187 191; do dig +short -x 200.46.204.$i; done > maia-5.hub.org. > maia-4.hub.org. > maia-3.hub.org. > maia-2.hub.org. > maia-1.hub.org. > > The names resolve OK to the numbers: > > $ for i in 182 183 184 187 191; do dig +short -x 200.46.204.$i; done | > xargs -n 1 dig +short > 200.46.204.182 > ;; Warning: ID mismatch: expected ID 15438, got 50558 > 200.46.204.183 > 200.46.204.184 > 200.46.204.187 > 200.46.204.191 > > -- > Alvaro Herrera http://www.PlanetPostgreSQL.org/ > "Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias) > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHMfIy4QvfyHIvDvMRAty0AKDdStSAXpY5GZi+tAo6zYp7zynEvwCghn0i Mq9blo/pq0thJvsLdsYyNbY= =shiN -----END PGP SIGNATURE-----