Thread: Search engine

Search engine

From
Magnus Hagander
Date:
In case people didn't notice from the website, or from the commits going
 in, I have now finally activated the new tsearch2 based search engine
on search.postgresql.org.

All code is in cvs, so if you want to improve on it, go right ahead :-)


//Magnus

Re: Search engine

From
"Joshua D. Drake"
Date:
On Mon, 2006-12-18 at 21:19 +0100, Magnus Hagander wrote:
> In case people didn't notice from the website, or from the commits going
>  in, I have now finally activated the new tsearch2 based search engine
> on search.postgresql.org.
>
> All code is in cvs, so if you want to improve on it, go right ahead :-)

I would like to take a moment and thank Magnus for his hard work on
this. search.postgresql.org is now humming along quite nicely since we
are using a PostgreSQL native backend.

Joshua D. Drake


>
>
> //Magnus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




Re: Search engine

From
Oleg Bartunov
Date:
On Mon, 18 Dec 2006, Magnus Hagander wrote:

> In case people didn't notice from the website, or from the commits going
> in, I have now finally activated the new tsearch2 based search engine
> on search.postgresql.org.
>
> All code is in cvs, so if you want to improve on it, go right ahead :-)

Just a thought.
I'd think about adding to rank a portion of title, when search PostgreSQL
documentation, so 'create table' would return latest version (8.2) first.
Or just use creation date.

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Search engine

From
"John Hansen"
Date:
Two things:

Navigate to http://www.postgresql.org/docs/8.2/static/index.html and search for 'create table'

    Pages 1-20 of 247.

Click 'next'

    Pages 21-40 of more than 1000.

Obviously, the hyperlinks need to include the &u=/docs/8.2/static/

Archives search is slow. (5+ seconds to search all lists)

Apart from that, bloody good work. Thumbs up.


... John

> -----Original Message-----
> From: pgsql-www-owner@postgresql.org
> [mailto:pgsql-www-owner@postgresql.org] On Behalf Of Magnus Hagander
> Sent: Tuesday, December 19, 2006 7:20 AM
> To: pgsql-www@postgresql.org
> Subject: [pgsql-www] Search engine
>
> In case people didn't notice from the website, or from the
> commits going  in, I have now finally activated the new
> tsearch2 based search engine on search.postgresql.org.
>
> All code is in cvs, so if you want to improve on it, go right
> ahead :-)
>
>
> //Magnus
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: Search engine

From
Dave Page
Date:
Magnus Hagander wrote:
> In case people didn't notice from the website, or from the commits going
>  in, I have now finally activated the new tsearch2 based search engine
> on search.postgresql.org.

Nice work - glad to see it's up and running now :-)

So, to keep you on your toes, here's a couple of bugs :-p

- On the archives search, the pgAdmin lists are under Community lists,
as are sfpug and sydpug. I would expect to see the latter two under
Regional, and the pgAdmin ones under Project lists. The list should
probably mirror the layout on the archves index page.

- Site restrictions are lost in the page links. For example, search
within a document set, and the first page returned is correctly limited
to that docset. Click on a page link, and the restriction is lost.

Regards, Dave.

Re: Search engine

From
Magnus Hagander
Date:
On Tue, Dec 19, 2006 at 08:53:45AM +0300, Oleg Bartunov wrote:
> On Mon, 18 Dec 2006, Magnus Hagander wrote:
>
> >In case people didn't notice from the website, or from the commits going
> >in, I have now finally activated the new tsearch2 based search engine
> >on search.postgresql.org.
> >
> >All code is in cvs, so if you want to improve on it, go right ahead :-)
>
> Just a thought.
> I'd think about adding to rank a portion of title, when search PostgreSQL
> documentation, so 'create table' would return latest version (8.2) first.
> Or just use creation date.

It's on my TODO to be able to do "suburl weighing", which would then
mean that we could rank say /docs/current/ higher than anything else,
etc. Just not done yet, I wanted to get the basics out there first.

//Magnus

Re: Search engine

From
Magnus Hagander
Date:
On Tue, Dec 19, 2006 at 07:01:34PM +1100, John Hansen wrote:
> Two things:
>
> Navigate to http://www.postgresql.org/docs/8.2/static/index.html and search for 'create table'
>
>     Pages 1-20 of 247.
>
> Click 'next'
>
>     Pages 21-40 of more than 1000.
>
> Obviously, the hyperlinks need to include the &u=/docs/8.2/static/

Should be fixed now, thanks for pointing it out.


> Archives search is slow. (5+ seconds to search all lists)

Depends on what you search for ;-) What did you search for?

The problem is that the search is very fast (single-digit millisecond
much of time), but if you get several thousand hits it takes a while to
sort them.
It could also be that you search for something that had been forced out
of the cache - happens for example when the backups run. Then the first
search takes a while, but it goes faster with time.


> Apart from that, bloody good work. Thumbs up.

Thanks!

//Magnus

Re: Search engine

From
Magnus Hagander
Date:
On Tue, Dec 19, 2006 at 08:22:13AM +0000, Dave Page wrote:
> Magnus Hagander wrote:
> >In case people didn't notice from the website, or from the commits going
> > in, I have now finally activated the new tsearch2 based search engine
> >on search.postgresql.org.
>
> Nice work - glad to see it's up and running now :-)
>
> So, to keep you on your toes, here's a couple of bugs :-p

But of course ;-)

> - On the archives search, the pgAdmin lists are under Community lists,
> as are sfpug and sydpug. I would expect to see the latter two under
> Regional, and the pgAdmin ones under Project lists. The list should
> probably mirror the layout on the archves index page.

It all lives in the database, so that's an easy fix. Are you saying that
the "Community lists" header should go away completely?

I have moved the regional ones already, will do pgadmin when you confirm
that :-)


> - Site restrictions are lost in the page links. For example, search
> within a document set, and the first page returned is correctly limited
> to that docset. Click on a page link, and the restriction is lost.

Fixed now.

//Magnus

Re: Search engine

From
Dave Page
Date:
Magnus Hagander wrote:
> On Tue, Dec 19, 2006 at 08:22:13AM +0000, Dave Page wrote:
>> Magnus Hagander wrote:
>>> In case people didn't notice from the website, or from the commits going
>>> in, I have now finally activated the new tsearch2 based search engine
>>> on search.postgresql.org.
>> Nice work - glad to see it's up and running now :-)
>>
>> So, to keep you on your toes, here's a couple of bugs :-p
>
> But of course ;-)
>
>> - On the archives search, the pgAdmin lists are under Community lists,
>> as are sfpug and sydpug. I would expect to see the latter two under
>> Regional, and the pgAdmin ones under Project lists. The list should
>> probably mirror the layout on the archves index page.
>
> It all lives in the database, so that's an easy fix. Are you saying that
> the "Community lists" header should go away completely?
>
> I have moved the regional ones already, will do pgadmin when you confirm
> that :-)

I'm saying the structure should follow that of the side menu on
http://archives.postgresql.org/

# User lists

     * pgsql-admin
     * pgsql-advocacy
     * pgsql-announce
     * pgsql-bugs
     * pgsql-docs
     * pgsql-cygwin
     * pgsql-general
     * pgsql-interfaces
     * pgsql-jdbc
     * pgsql-jobs
     * pgsql-novice
     * pgsql-odbc
     * pgsql-performance
     * pgsql-php
     * pgsql-ports
     * pgsql-sql

# Developer lists

     * pgsql-committers
     * pgsql-hackers
     * pgsql-patches
     * pgsql-www

# Regional lists

     * pgsql-de-allgemein
     * pgsql-es-ayuda
     * pgsql-fr-generale
     * pgsql-ru-general
     * pgsql-tr-genel

# Project lists

     * pgadmin-hackers
     * pgadmin-support

# User groups

     * San Francisco
     * Sydney

# Inactive lists

     * pgsql-benchmarks
     * pgsql-chat
     * pgsql-hackers- win32

So I was wrong about the PUGs - they have their own section. Sorry :-)

/D

Re: Search engine

From
"John Hansen"
Date:
Magnus Hagander Wrote:

> > Archives search is slow. (5+ seconds to search all lists)
>
> Depends on what you search for ;-) What did you search for?

'create table'

> The problem is that the search is very fast (single-digit
> millisecond much of time), but if you get several thousand
> hits it takes a while to sort them.
> It could also be that you search for something that had been
> forced out of the cache - happens for example when the
> backups run. Then the first search takes a while, but it goes
> faster with time.
>
>
> > Apart from that, bloody good work. Thumbs up.
>
> Thanks!
>
> //Magnus
>

Re: Search engine

From
Magnus Hagander
Date:
John Hansen wrote:
> Magnus Hagander Wrote:
>
>>> Archives search is slow. (5+ seconds to search all lists)
>> Depends on what you search for ;-) What did you search for?
>
> 'create table'

Yeah, that one is definitely the sort. Just the search over all lists
for create table takes about 170ms and returns about 6000 hits.
Calculating rank value and sorting it takes the rest :(

//Magnus

Re: Search engine

From
Robert Treat
Date:
On Tuesday 19 December 2006 17:13, Magnus Hagander wrote:
> John Hansen wrote:
> > Magnus Hagander Wrote:
> >>> Archives search is slow. (5+ seconds to search all lists)
> >>
> >> Depends on what you search for ;-) What did you search for?
> >
> > 'create table'
>
> Yeah, that one is definitely the sort. Just the search over all lists
> for create table takes about 170ms and returns about 6000 hits.
> Calculating rank value and sorting it takes the rest :(
>

Maybe we could add a "click here to see the explain analyze of your search
query" button.  :-)

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

Re: Search engine

From
Oleg Bartunov
Date:
On Thu, 21 Dec 2006, Robert Treat wrote:

> On Tuesday 19 December 2006 17:13, Magnus Hagander wrote:
>> John Hansen wrote:
>>> Magnus Hagander Wrote:
>>>>> Archives search is slow. (5+ seconds to search all lists)
>>>>
>>>> Depends on what you search for ;-) What did you search for?
>>>
>>> 'create table'
>>
>> Yeah, that one is definitely the sort. Just the search over all lists
>> for create table takes about 170ms and returns about 6000 hits.
>> Calculating rank value and sorting it takes the rest :(
>>
>
> Maybe we could add a "click here to see the explain analyze of your search
> query" button.  :-)

Just to make clear the problem, why ordinary SE doesn't have such problem -
it'so because all information needed for ranking is available from index
itself.

Database based SE, like tsearch2, should be able to work even
without an index, so after search is done, which is very fast, we need to
consult heap to get positional information, weights, etc.
Also, since GiST index is lossy, we must check heap to exclude false hits.

I'm wondering if we could store positional information in GiN index,
which is not lossy !

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Search engine

From
Magnus Hagander
Date:
Dave Page wrote:
>>> - On the archives search, the pgAdmin lists are under Community
>>> lists, as are sfpug and sydpug. I would expect to see the latter two
>>> under Regional, and the pgAdmin ones under Project lists. The list
>>> should probably mirror the layout on the archves index page.
>>
>> It all lives in the database, so that's an easy fix. Are you saying that
>> the "Community lists" header should go away completely?
>>
>> I have moved the regional ones already, will do pgadmin when you confirm
>> that :-)
>
> I'm saying the structure should follow that of the side menu on
> http://archives.postgresql.org/

Updated per this structure.

//Magnus