Re: robots.txt on git.postgresql.org - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: robots.txt on git.postgresql.org
Date
Msg-id CABUevEy9pS6ERtg3xqzo31wv_93br=AzEHfbeM5m4kidypgTRA@mail.gmail.com
Whole thread Raw
In response to Re: robots.txt on git.postgresql.org  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
On Thu, Jul 11, 2013 at 3:43 PM, Greg Stark <stark@mit.edu> wrote:
> On Wed, Jul 10, 2013 at 9:36 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> We already run this, that's what we did to make it survive at all. The
>> problem is there are so many thousands of different URLs you can get
>> to on that site, and google indexes them all by default.
>
> There's also https://support.google.com/webmasters/answer/48620?hl=en
> which lets us control how fast the Google crawler crawls. I think it's
> adaptive though so if the pages are slow it should be crawling slowly

Sure, but there are plenty of other search engines as well, not just
google... Google is actually "reasonably good" at scaling back it's
own speed, in my experience. Which is not true of all the others. Of
course, it's also got the problem of it then taking a long time to
actually crawl the site, since there are so many different URLs...

--Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: robots.txt on git.postgresql.org
Next
From: Claudio Freire
Date:
Subject: Re: SSL renegotiation