Thread: Static mirror generation
Greetings. I've commited a script for static mirror generation. Unlike previous such scripts, it is just a generic spider that follows links and does not have any knowledge about the site structure. This immediately helped to fix several problems. The mirror of website not including docs is generated in ~5 minutes. I don't yet know how long it will take with all the docs, got tired after the first 1,5 hours. I've set up a proof-of-concept static mirror @ http://oc.cs.msu.su/pgorg/ The mirror uses Apache's content negotiation, so that if you have Russian set up as the preferred language in your browser, it'll come up in Russian, else in English.
> -----Original Message----- > From: pgsql-www-owner@postgresql.org > [mailto:pgsql-www-owner@postgresql.org] On Behalf Of Alexey Borzov > Sent: 17 June 2004 15:19 > To: pgsql-www@postgresql.org > Subject: [pgsql-www] Static mirror generation > > Greetings. > > I've commited a script for static mirror generation. Unlike > previous such scripts, it is just a generic spider that > follows links and does not have any knowledge about the site > structure. This immediately helped to fix several problems. > > The mirror of website not including docs is generated in ~5 > minutes. I don't yet know how long it will take with all the > docs, got tired after the first 1,5 hours. Meep, that's slow. The current build takes just a few minutes when the server is behaving. Still, the idea of using a crawler is a good one - at least that way nothing will get forgotten, and presumably it will create a report of any broken links? > I've set up a proof-of-concept static mirror @ > http://oc.cs.msu.su/pgorg/ > > The mirror uses Apache's content negotiation, so that if you > have Russian set up as the preferred language in your > browser, it'll come up in Russian, else in English. OK, sounds good. Nice work :-) Regards, Dave.
Hi, Dave Page wrote: >>The mirror of website not including docs is generated in ~5 >>minutes. I don't yet know how long it will take with all the >>docs, got tired after the first 1,5 hours. > > Meep, that's slow. The current build takes just a few minutes when the > server is behaving. I suspect this is because of the limited resources allocated to the dev server. Marc may know better. > Still, the idea of using a crawler is a good one - > at least that way nothing will get forgotten, and presumably it will > create a report of any broken links? Yes, of course: Jun 17 10:15:52 mirror [error] HTTP error 404 at page http://www.alexey.beta.postgresql.org/images/editorschoice2003.jpg Jun 17 10:16:56 mirror [error] HTTP error 404 at page http://www.alexey.beta.postgresql.org/presskit/en/presskit74.html Jun 17 10:17:31 mirror [error] HTTP error 404 at page http://www.alexey.beta.postgresql.org/pgsql-bugs@postgresql.org These are in news/events texts, I suppose. Couldn't find them in files.
On Fri, 18 Jun 2004, Alexey Borzov wrote: > Hi, > > Dave Page wrote: >>> The mirror of website not including docs is generated in ~5 minutes. I >>> don't yet know how long it will take with all the docs, got tired after >>> the first 1,5 hours. >> >> Meep, that's slow. The current build takes just a few minutes when the >> server is behaving. > > I suspect this is because of the limited resources allocated to the dev > server. Marc may know better. If its the same server, and teh current build takes minutes ... how could limited resources make the difference? its the same resources whether using teh current build, or the spider ... :) What I'm suspecting is that part of it is 'local machine' vs 'network lag' though ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Hi! Marc G. Fournier wrote: >> I suspect this is because of the limited resources allocated to the >> dev server. Marc may know better. > > If its the same server, and teh current build takes minutes ... how > could limited resources make the difference? its the same resources > whether using teh current build, or the spider ... :) I am confused here. You mean that www.postgresql.org and alexey.beta.postgresql.org are in the same machine? > What I'm suspecting is that part of it is 'local machine' vs 'network > lag' though ... I suspect that has to do with the server load... I tried doing the same mirror feat now and it completed in less than a minute. ab -n 1000 -c 10 http://alexey.beta.postgresql.org gave me 10 requests per second, which is small for my tastes, but reasonable. While yesterday / earlier today I saw ridiculous ~1 second page generation times.
On Fri, 18 Jun 2004, Alexey Borzov wrote: > Hi! > > Marc G. Fournier wrote: >>> I suspect this is because of the limited resources allocated to the dev >>> server. Marc may know better. >> >> If its the same server, and teh current build takes minutes ... how could >> limited resources make the difference? its the same resources whether >> using teh current build, or the spider ... :) > > I am confused here. You mean that www.postgresql.org and > alexey.beta.postgresql.org are in the same machine? of course ... > I suspect that has to do with the server load... I tried doing the same > mirror feat now and it completed in less than a minute. that could be ... its why I'm ordering a Dual-Athlon ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
On Fri, Jun 18, 2004 at 04:12:18PM -0300, Marc G. Fournier wrote: > On Fri, 18 Jun 2004, Alexey Borzov wrote: > > >Hi! > > > >Marc G. Fournier wrote: > >>>I suspect this is because of the limited resources allocated to > >>>the dev server. Marc may know better. > >> > >>If its the same server, and teh current build takes minutes ... > >>how could limited resources make the difference? its the same > >>resources whether using teh current build, or the spider ... :) > > > >I am confused here. You mean that www.postgresql.org and > >alexey.beta.postgresql.org are in the same machine? > > of course ... > > >I suspect that has to do with the server load... I tried doing the > >same mirror feat now and it completed in less than a minute. > > that could be ... its why I'm ordering a Dual-Athlon ... BTW, I've got more interest from people to chip in personally. If this can wait 'til August, the PG Foundation should be able to cut you a check. If it can't, I can coordinate some donations. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
On Fri, 18 Jun 2004, David Fetter wrote: > On Fri, Jun 18, 2004 at 04:12:18PM -0300, Marc G. Fournier wrote: >> On Fri, 18 Jun 2004, Alexey Borzov wrote: >> >>> Hi! >>> >>> Marc G. Fournier wrote: >>>>> I suspect this is because of the limited resources allocated to >>>>> the dev server. Marc may know better. >>>> >>>> If its the same server, and teh current build takes minutes ... >>>> how could limited resources make the difference? its the same >>>> resources whether using teh current build, or the spider ... :) >>> >>> I am confused here. You mean that www.postgresql.org and >>> alexey.beta.postgresql.org are in the same machine? >> >> of course ... >> >>> I suspect that has to do with the server load... I tried doing the >>> same mirror feat now and it completed in less than a minute. >> >> that could be ... its why I'm ordering a Dual-Athlon ... > > BTW, I've got more interest from people to chip in personally. If > this can wait 'til August, the PG Foundation should be able to cut you > a check. If it can't, I can coordinate some donations. Once the money gets in that I'm expecting (its "in transit"), I'm going to be ordering both the new server, and the new switch ... if the one quote I got today from a supplier in the US is any indication of what I can get, as long as I can co-ordinate shipping from there to the co-lo facility, the savings from what it would cost me either here, or in Panama, would allow me to pick up servers a bit more often ... The one quote I got was ~3/5ths the cost I was quoted in Panama, and about the same savings over what it would cost me here in Canada ... I always knew I lived in the wrong country for some things *sigh* ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
> -----Original Message----- > From: Alexey Borzov [mailto:borz_off@cs.msu.su] > Sent: Fri 6/18/2004 7:56 PM > To: Marc G. Fournier > Cc: Dave Page; pgsql-www@postgresql.org > Subject: Re: [pgsql-www] Static mirror generation > > ab -n 1000 -c 10 http://alexey.beta.postgresql.org > gave me 10 requests per second, which is small for my tastes, but reasonable. > While yesterday / earlier today I saw ridiculous ~1 second page generation times. My guess is that you tested whilst db backup was running. I've been caught that way before - serves us right for workingin the middle of the night Canadian time :-) Regards, Dave