Re: Search (was: Web team meeting minutes) - Mailing list pgsql-www

From Joshua D. Drake
Subject Re: Search (was: Web team meeting minutes)
Date
Msg-id 44B7AFE4.8020808@commandprompt.com
Whole thread Raw
In response to Re: Search (was: Web team meeting minutes)  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-www
> It's just not database driven. So, I withdraw my words :)
> Does web team consider changing web site engine ? I suggest not to use
> home-made engines, since we have no power to support it, we do database
> development, and we don't want to depend on specific person. There are
> big open-source projects with stable, mature community and we could
> just add fts capability we need, for example, to Drupal.

Oleg this has been a huge undertaking to get the site where it is today.
I doubt we will be changing soon, and definately not to Drupal of all
things :)

Drupal is nasty on all kinds of levels under the covers. If we switched
to Drupal we would have to (at a minimum):

1. Insure that it runs with latest PHP (looks like the support 5 but no
mention of 5.1)

2. Completely rework the database schema so it looks like someone who
understands databases actually worked on the product

3. Rip out their search and replace it.

And that is all before we get to work on getting content in there.

Also keep in mind that Drupal's support for PostgreSQL is very limited.

If I sound sour, I apologize... I have been working alot latetely on
getting Drupal to be as cool as the website says.

Sincerely,

Joshua D. Drake





>
> Oleg
> On Fri, 14 Jul 2006, Dave Page wrote:
>
>>
>>
>>> -----Original Message-----
>>> From: Oleg Bartunov [mailto:oleg@sai.msu.su]
>>> Sent: 14 July 2006 13:48
>>> To: Dave Page
>>> Cc: Magnus Hagander; pgsql-www@postgresql.org
>>> Subject: RE: [pgsql-www] Web team meeting minutes
>>>
>>> I just wanted to say, that current search is not designed for
>>> Web site indexing.
>>
>> Err, from the site:
>>
>> ASPseek is an Internet search engine software developed by SWsoft and
>> licensed as free software under GNU GPL.
>>
>> ASPseek consists of an indexing robot, a search daemon, and a CGI search
>> frontend. It can index as many as a few million URLs and search for
>> words and phrases, use wildcards, and do a Boolean search. Search
>> results can be limited to time period given, site or Web space (set of
>> sites) and sorted by relevance (PageRank is used) or date.
>>
>>> Search, for example, latest news title "Open Technology
>>> Group, Inc. announces
>>> plPHP training" and you'll get nothing ! And will not be
>>> searched until new
>>> index gets build. This is exactly why we've developed
>>> tsearch2 - online
>>> indexing. If documents are in database, then requirement is just setup
>>> tsearch2, if not - then you need sort of openfts.
>>
>> Actually our port of Aspseek can do online indexing - John added an XML
>> feed in which you can directly insert index data (he used to use it to
>> accept catalogue feeds from online resellers iirc). The problem is that
>> we don't have any way to stream the data off the website in that way, so
>> we still end up crawling anyway.
>>
>> I do appreciate your point though, and if anyone can come up with a way
>> to stream data from the website (perhaps just as part of the static
>> build process) then it might be worth looking at. Archives would have
>> the same problems I guess - whilst it would be easy enough index mail
>> messages online, you have no way of knowing what the URL on
>> archives.postgresql.org would be at that point, unless we fundamentally
>> redesigned the entire archives site to run from the database.
>>
>> Regards, Dave.
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


--

    === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
    Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/



pgsql-www by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Web team meeting minutes
Next
From: "Dave Page"
Date:
Subject: Re: Web team meeting minutes