Re: Search (was: Web team meeting minutes) - Mailing list pgsql-www
From | Oleg Bartunov |
---|---|
Subject | Re: Search (was: Web team meeting minutes) |
Date | |
Msg-id | Pine.GSO.4.63.0607141710490.2921@ra.sai.msu.su Whole thread Raw |
In response to | Re: Search (was: Web team meeting minutes) ("Dave Page" <dpage@vale-housing.co.uk>) |
Responses |
Re: Search (was: Web team meeting minutes)
Re: Search (was: Web team meeting minutes) Re: Search (was: Web team meeting minutes) |
List | pgsql-www |
Dave, I see the main problem is not in search engine, but in the site engine ! It's just not database driven. So, I withdraw my words :) Does web team consider changing web site engine ? I suggest not to use home-made engines, since we have no power to support it, we do database development, and we don't want to depend on specific person. There are big open-source projects with stable, mature community and we could just add fts capability we need, for example, to Drupal. Oleg On Fri, 14 Jul 2006, Dave Page wrote: > > >> -----Original Message----- >> From: Oleg Bartunov [mailto:oleg@sai.msu.su] >> Sent: 14 July 2006 13:48 >> To: Dave Page >> Cc: Magnus Hagander; pgsql-www@postgresql.org >> Subject: RE: [pgsql-www] Web team meeting minutes >> >> I just wanted to say, that current search is not designed for >> Web site indexing. > > Err, from the site: > > ASPseek is an Internet search engine software developed by SWsoft and > licensed as free software under GNU GPL. > > ASPseek consists of an indexing robot, a search daemon, and a CGI search > frontend. It can index as many as a few million URLs and search for > words and phrases, use wildcards, and do a Boolean search. Search > results can be limited to time period given, site or Web space (set of > sites) and sorted by relevance (PageRank is used) or date. > >> Search, for example, latest news title "Open Technology >> Group, Inc. announces >> plPHP training" and you'll get nothing ! And will not be >> searched until new >> index gets build. This is exactly why we've developed >> tsearch2 - online >> indexing. If documents are in database, then requirement is just setup >> tsearch2, if not - then you need sort of openfts. > > Actually our port of Aspseek can do online indexing - John added an XML > feed in which you can directly insert index data (he used to use it to > accept catalogue feeds from online resellers iirc). The problem is that > we don't have any way to stream the data off the website in that way, so > we still end up crawling anyway. > > I do appreciate your point though, and if anyone can come up with a way > to stream data from the website (perhaps just as part of the static > build process) then it might be worth looking at. Archives would have > the same problems I guess - whilst it would be easy enough index mail > messages online, you have no way of knowing what the URL on > archives.postgresql.org would be at that point, unless we fundamentally > redesigned the entire archives site to run from the database. > > Regards, Dave. > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83