Thread: Why so many buildfarm errors with contacting "git.postgresql.org"?

Why so many buildfarm errors with contacting "git.postgresql.org"?

From
Tom Lane
Date:
I've noticed that the frequency of nonrepeating fetch failures in the
buildfarm seems to be a lot higher with git than it ever was with cvs.

A typical example is today at:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sloth&dt=2010-12-07%2018%3A30%3A01

fatal: Unable to look up git.postgresql.org (port 9418) (Temporary failure in name resolution)

and similarly two days ago:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=colugos&dt=2010-12-05%2021%3A05%3A56

11 days ago:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypu&dt=2010-11-26%2021%3A05%3A02

28 days ago:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mongoose&dt=2010-11-09%2011%3A45%3A01

45 days ago:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=polecat&dt=2010-10-23%2018%3A49%3A59

That's just name resolution failures; there are a similar number of
Git-stage failures due to connection timeouts.  The problem appears
to be getting worse with time :-(

Is there any difference between the network connectivity of
git.postgresql.org and the old anoncvs server?
        regards, tom lane


Re: Why so many buildfarm errors with contacting "git.postgresql.org"?

From
Magnus Hagander
Date:
On Wed, Dec 8, 2010 at 00:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I've noticed that the frequency of nonrepeating fetch failures in the
> buildfarm seems to be a lot higher with git than it ever was with cvs.
>
> A typical example is today at:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sloth&dt=2010-12-07%2018%3A30%3A01
>
> fatal: Unable to look up git.postgresql.org (port 9418) (Temporary failure in name resolution)
>
> and similarly two days ago:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=colugos&dt=2010-12-05%2021%3A05%3A56
>
> 11 days ago:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypu&dt=2010-11-26%2021%3A05%3A02
>
> 28 days ago:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mongoose&dt=2010-11-09%2011%3A45%3A01
>
> 45 days ago:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=polecat&dt=2010-10-23%2018%3A49%3A59
>
> That's just name resolution failures; there are a similar number of
> Git-stage failures due to connection timeouts.  The problem appears
> to be getting worse with time :-(
>
> Is there any difference between the network connectivity of
> git.postgresql.org and the old anoncvs server?

Yes, they are in completely different datacenters.

I had a discussoin with Stefan a couple of days ago about this, and
the current estimate is that we're simply hitting the bandwidth limit
of where it is now, because it now takes so much more traffic. We have
some space on another machine that we can move the VM to, so we'll be
looking at doing that when things have calmed down a bit after getting
back from PGDay.EU. This will cause some short downtime as DNS
switches over, so we'll post a note on exactly when we plan to do it,
once it's planned.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Why so many buildfarm errors with contacting "git.postgresql.org"?

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Wed, Dec 8, 2010 at 00:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Is there any difference between the network connectivity of
>> git.postgresql.org and the old anoncvs server?

> Yes, they are in completely different datacenters.

> I had a discussoin with Stefan a couple of days ago about this, and
> the current estimate is that we're simply hitting the bandwidth limit
> of where it is now, because it now takes so much more traffic. We have
> some space on another machine that we can move the VM to, so we'll be
> looking at doing that when things have calmed down a bit after getting
> back from PGDay.EU. This will cause some short downtime as DNS
> switches over, so we'll post a note on exactly when we plan to do it,
> once it's planned.

Sounds like a plan.  Thanks.

(BTW, since it's just a read-only clone of master, couldn't you avoid
downtime by duplicating the VM and running two in parallel until the DNS
change propagates fully?  Or are you just thinking it's not worth the
trouble?)
        regards, tom lane


Re: Why so many buildfarm errors with contacting "git.postgresql.org"?

From
Magnus Hagander
Date:
On Thu, Dec 9, 2010 at 23:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Wed, Dec 8, 2010 at 00:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Is there any difference between the network connectivity of
>>> git.postgresql.org and the old anoncvs server?
>
>> Yes, they are in completely different datacenters.
>
>> I had a discussoin with Stefan a couple of days ago about this, and
>> the current estimate is that we're simply hitting the bandwidth limit
>> of where it is now, because it now takes so much more traffic. We have
>> some space on another machine that we can move the VM to, so we'll be
>> looking at doing that when things have calmed down a bit after getting
>> back from PGDay.EU. This will cause some short downtime as DNS
>> switches over, so we'll post a note on exactly when we plan to do it,
>> once it's planned.
>
> Sounds like a plan.  Thanks.
>
> (BTW, since it's just a read-only clone of master, couldn't you avoid
> downtime by duplicating the VM and running two in parallel until the DNS
> change propagates fully?  Or are you just thinking it's not worth the
> trouble?)

It's not just that. It also runs git hosting for a bunch of projects,
including but certainly not limited to pgadmin and slony, where it is
the master.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/