Home > mailing lists

Re: robots.txt on git.postgresql.org - Mailing list pgsql-hackers

From	Dave Page
Subject	Re: robots.txt on git.postgresql.org
Date	July 10, 2013 11:35:30
Msg-id	CA+OCxoyOiOLbk8PM_HJCfnNj=uxgmOYz+cA4s40CUm9vWYSOeA@mail.gmail.com Whole thread Raw
In response to	Re: robots.txt on git.postgresql.org (Craig Ringer <craig@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On Wed, Jul 10, 2013 at 9:25 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 07/09/2013 11:30 PM, Andres Freund wrote:
>> On 2013-07-09 16:24:42 +0100, Greg Stark wrote:
>>> I note that git.postgresql.org's robot.txt refuses permission to crawl
>>> the git repository:
>>>
>>> http://git.postgresql.org/robots.txt
>>>
>>> User-agent: *
>>> Disallow: /
>>>
>>>
>>> I'm curious what motivates this. It's certainly useful to be able to
>>> search for commits.
>>
>> Gitweb is horribly slow. I don't think anybody with a bigger git repo
>> using gitweb can afford to let all the crawlers go through it.
>
> Wouldn't whacking a reverse proxy in front be a pretty reasonable
> option? There's a disk space cost, but using Apache's mod_proxy or
> similar would do quite nicely.

It's already sitting behind Varnish, but the vast majority of pages on
that site would only ever be hit by crawlers anyway, so I doubt that'd
help a great deal as those pages would likely expire from the cache
before it really saved us anything.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Markus Wanner
Date: 10 July 2013, 11:34:22
Subject: Re: Review: extension template

From: Magnus Hagander
Date: 10 July 2013, 11:36:10
Subject: Re: robots.txt on git.postgresql.org

Re: robots.txt on git.postgresql.org - Mailing list pgsql-hackers

Previous

Next