Thread: Why so many buildfarm errors with contacting "git.postgresql.org"?
I've noticed that the frequency of nonrepeating fetch failures in the buildfarm seems to be a lot higher with git than it ever was with cvs. A typical example is today at: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sloth&dt=2010-12-07%2018%3A30%3A01 fatal: Unable to look up git.postgresql.org (port 9418) (Temporary failure in name resolution) and similarly two days ago: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=colugos&dt=2010-12-05%2021%3A05%3A56 11 days ago: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypu&dt=2010-11-26%2021%3A05%3A02 28 days ago: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mongoose&dt=2010-11-09%2011%3A45%3A01 45 days ago: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=polecat&dt=2010-10-23%2018%3A49%3A59 That's just name resolution failures; there are a similar number of Git-stage failures due to connection timeouts. The problem appears to be getting worse with time :-( Is there any difference between the network connectivity of git.postgresql.org and the old anoncvs server? regards, tom lane
On Wed, Dec 8, 2010 at 00:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I've noticed that the frequency of nonrepeating fetch failures in the > buildfarm seems to be a lot higher with git than it ever was with cvs. > > A typical example is today at: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sloth&dt=2010-12-07%2018%3A30%3A01 > > fatal: Unable to look up git.postgresql.org (port 9418) (Temporary failure in name resolution) > > and similarly two days ago: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=colugos&dt=2010-12-05%2021%3A05%3A56 > > 11 days ago: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypu&dt=2010-11-26%2021%3A05%3A02 > > 28 days ago: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mongoose&dt=2010-11-09%2011%3A45%3A01 > > 45 days ago: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=polecat&dt=2010-10-23%2018%3A49%3A59 > > That's just name resolution failures; there are a similar number of > Git-stage failures due to connection timeouts. The problem appears > to be getting worse with time :-( > > Is there any difference between the network connectivity of > git.postgresql.org and the old anoncvs server? Yes, they are in completely different datacenters. I had a discussoin with Stefan a couple of days ago about this, and the current estimate is that we're simply hitting the bandwidth limit of where it is now, because it now takes so much more traffic. We have some space on another machine that we can move the VM to, so we'll be looking at doing that when things have calmed down a bit after getting back from PGDay.EU. This will cause some short downtime as DNS switches over, so we'll post a note on exactly when we plan to do it, once it's planned. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > On Wed, Dec 8, 2010 at 00:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Is there any difference between the network connectivity of >> git.postgresql.org and the old anoncvs server? > Yes, they are in completely different datacenters. > I had a discussoin with Stefan a couple of days ago about this, and > the current estimate is that we're simply hitting the bandwidth limit > of where it is now, because it now takes so much more traffic. We have > some space on another machine that we can move the VM to, so we'll be > looking at doing that when things have calmed down a bit after getting > back from PGDay.EU. This will cause some short downtime as DNS > switches over, so we'll post a note on exactly when we plan to do it, > once it's planned. Sounds like a plan. Thanks. (BTW, since it's just a read-only clone of master, couldn't you avoid downtime by duplicating the VM and running two in parallel until the DNS change propagates fully? Or are you just thinking it's not worth the trouble?) regards, tom lane
On Thu, Dec 9, 2010 at 23:53, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Magnus Hagander <magnus@hagander.net> writes: >> On Wed, Dec 8, 2010 at 00:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Is there any difference between the network connectivity of >>> git.postgresql.org and the old anoncvs server? > >> Yes, they are in completely different datacenters. > >> I had a discussoin with Stefan a couple of days ago about this, and >> the current estimate is that we're simply hitting the bandwidth limit >> of where it is now, because it now takes so much more traffic. We have >> some space on another machine that we can move the VM to, so we'll be >> looking at doing that when things have calmed down a bit after getting >> back from PGDay.EU. This will cause some short downtime as DNS >> switches over, so we'll post a note on exactly when we plan to do it, >> once it's planned. > > Sounds like a plan. Thanks. > > (BTW, since it's just a read-only clone of master, couldn't you avoid > downtime by duplicating the VM and running two in parallel until the DNS > change propagates fully? Or are you just thinking it's not worth the > trouble?) It's not just that. It also runs git hosting for a bunch of projects, including but certainly not limited to pgadmin and slony, where it is the master. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/