Thread: New redirector
FYI - I've committed a new version of the URL redirector for downloads. The old version was being used for linkfilter-breakthrough to distribute viruses :-( Since I was hacking around that code anyway, I didn't just add a filter to it, but changed around how it works a bit. Apart from it no longer being possible to use it to break through stupid linkblockers, it has also made the URLs easier to read and copy/paste, and we're also storing the logging information in a way that's much easier to analyze than before. Do keep your eyes open for bugs, of course :-) //Magnus
Magnus Hagander wrote: > FYI - I've committed a new version of the URL redirector for downloads. > > The old version was being used for linkfilter-breakthrough to distribute > viruses :-( > > Since I was hacking around that code anyway, I didn't just add a filter > to it, but changed around how it works a bit. Apart from it no longer > being possible to use it to break through stupid linkblockers, it has > also made the URLs easier to read and copy/paste, and we're also storing > the logging information in a way that's much easier to analyze than before. > > Do keep your eyes open for bugs, of course :-) I have reverted the part of this that changes the format for logging, because it turned out that it was impossible to wrestle the stackbuilder traffic logging onto that format - since stackbuilder uses the redirector to log arbitrary downloads, and not just things coming off our mirror network. Also it seems that the mirror id primary key can change around, and should not be used for logging. I was not aware of these things, my apologies. I think we're fine just loosing the info of the about 500 downloads that happened into the new logging table. We could reconstruct the old format from it, but I don't think it's worth it. There should be no end-user visible changes in this revert, only the backend logging. //Magnus
Magnus Hagander wrote: > Magnus Hagander wrote: >> FYI - I've committed a new version of the URL redirector for downloads. >> >> The old version was being used for linkfilter-breakthrough to distribute >> viruses :-( >> >> Since I was hacking around that code anyway, I didn't just add a filter >> to it, but changed around how it works a bit. Apart from it no longer >> being possible to use it to break through stupid linkblockers, it has >> also made the URLs easier to read and copy/paste, and we're also storing >> the logging information in a way that's much easier to analyze than before. >> >> Do keep your eyes open for bugs, of course :-) > > I have reverted the part of this that changes the format for logging, > because it turned out that it was impossible to wrestle the stackbuilder > traffic logging onto that format - since stackbuilder uses the > redirector to log arbitrary downloads, and not just things coming off > our mirror network. Also it seems that the mirror id primary key can > change around, and should not be used for logging. Meh, another misunderstanding there. The primary key doesn't change. only the mirror index. I got them mixed up. //Magnus
On Sat, Dec 20, 2008 at 5:05 PM, Magnus Hagander <magnus@hagander.net> wrote: > FYI - I've committed a new version of the URL redirector for downloads. > > The old version was being used for linkfilter-breakthrough to distribute > viruses :-( FYI, I just removed 242769 log records from the clickthrus table. Some were created by this issue, a few were mis-parsed URLs and such. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
Magnus Hagander wrote: > FYI - I've committed a new version of the URL redirector for downloads. > > The old version was being used for linkfilter-breakthrough to distribute > viruses :-( > > Since I was hacking around that code anyway, I didn't just add a filter > to it, but changed around how it works a bit. Apart from it no longer > being possible to use it to break through stupid linkblockers, it has > also made the URLs easier to read and copy/paste, and we're also storing > the logging information in a way that's much easier to analyze than before. > > Do keep your eyes open for bugs, of course :-) this change broke most of the website replication code and is close to running out some of the website mirrors out of diskspace. It seems that the mirror script is now copying tons of /redir/<mirrorid> directories to the slaves and some of them contain indvidual copies of the full source tarball for all active releases. This causes both disk-usage related issues as well as very long sync-times between wwwmaster and the slaves... I don't have time to look into that more closely now so it would ge good if somebody else could. Stefan
On 24 dec 2008, at 10.24, Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote: > Magnus Hagander wrote: >> FYI - I've committed a new version of the URL redirector for >> downloads. >> The old version was being used for linkfilter-breakthrough to >> distribute >> viruses :-( >> Since I was hacking around that code anyway, I didn't just add a >> filter >> to it, but changed around how it works a bit. Apart from it no longer >> being possible to use it to break through stupid linkblockers, it has >> also made the URLs easier to read and copy/paste, and we're also >> storing >> the logging information in a way that's much easier to analyze than >> before. >> Do keep your eyes open for bugs, of course :-) > > this change broke most of the website replication code and is close > to running out some of the website mirrors out of diskspace. It > seems that the mirror script is now copying tons of /redir/ > <mirrorid> directories to the slaves and some of them contain > indvidual copies of the full source tarball for all active releases. > This causes both disk-usage related issues as well as very long sync- > times between wwwmaster and the slaves... > I don't have time to look into that more closely now so it would ge > good if somebody else could Oh shit... It shouldn't crawl explicit links to wwwmaster, I thought :( perhaps some place is forgetting to make it explicit? If not, then just making it exclude everything under /redir/ wyen mirroring should do the trick. Unfortunately it'll be a while before I can look at it, so I'd appreciate if yet someone else could! /Magnus
On Wed, Dec 24, 2008 at 10:08 AM, Magnus Hagander <magnus@hagander.net> wrote: > Oh shit... > > It shouldn't crawl explicit links to wwwmaster, I thought :( perhaps some > place is forgetting to make it explicit? Nice work :-) > If not, then just making it exclude everything under /redir/ wyen mirroring > should do the trick. Yeah - the exclude list had redir\? but not redir /. Added now. Hmmm... Update has been running since 2008-12-23T22:00:00+00:00 (12 hours 25 minutes 41 seconds). Killed that as well, rm -rf'd the static/redir/ directory and requested a sync. I've gotta leave to do Christmas now... have a good one :-) -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com