Re: website doc search is extremely SLOW - Mailing list pgsql-general
From | John Sidney-Woollett |
---|---|
Subject | Re: website doc search is extremely SLOW |
Date | |
Msg-id | 3546.192.168.0.64.1072873007.squirrel@mercury.wardbrook.com Whole thread Raw |
In response to | Re: website doc search is extremely SLOW (Dave Cramer <pg@fastcrypt.com>) |
Responses |
Re: website doc search is extremely SLOW
|
List | pgsql-general |
I think that Oleg's new search offering looks really good and fast. (I can't wait till I have some task that needs tsearch!). I agree with Dave that searching the docs is more important for me than the sites - but it would be really nice to have both, in one tool. I built something similar for the Tate Gallery in the UK - here you can select the type of content that you want returned, either static pages or dynamic. You can see the idea at http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&action=new This is custom built (using java/Oracle), supports stemming, boolean operators, exact phrase matching, relevancy and matched term highlighting. You can switch on/off the types of documents that you are not interested in. Using this analogy, a search facility that could offer you results from i) the docs and/or ii) the postgres sites static pages would be very useful. John Sidney-Woollett Dave Cramer said: > Marc, > > No it doesn't spider, it is a specialized tool for searching documents. > > I'm curious, what value is there to being able to count the number of > url's ? > > It does do things like query all documents where CREATE AND TABLE are n > words apart, just as fast, I would think these are more valuable to > document searching? > > I think the challenge here is what do we want to search. I am betting > that folks use this page as they would man? ie. what is the command for > create trigger? > > As I said my offer stands to help out, but I think if the goal is to > search the entire website, then this particular tool is not useful. > > At this point I am working on indexing the sgml directly as it has less > cruft in it. For instance all the links that appear in every summary are > just noise. > > > Dave > > On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote: >> On Wed, 31 Dec 2003, Dave Cramer wrote: >> >> > I can modify mine to be client server if you want? >> > >> > It is a java app, so we need to be able to run jdk1.3 at least? >> >> jdk1.4 is available on the VMs ... does your spider? for instance, you >> mention that you have the docs indexed right now, but we are currently >> indexing: >> >> Server http://archives.postgresql.org/ >> Server http://advocacy.postgresql.org/ >> Server http://developer.postgresql.org/ >> Server http://gborg.postgresql.org/ >> Server http://pgadmin.postgresql.org/ >> Server http://techdocs.postgresql.org/ >> Server http://www.postgresql.org/ >> >> will it be able to handle: >> >> 186_archives=# select count(*) from url; >> count >> -------- >> 393551 >> (1 row) >> >> as fast as you are finding with just the docs? >> >> ---- >> Marc G. Fournier Hub.Org Networking Services >> (http://www.hub.org) >> Email: scrappy@hub.org Yahoo!: yscrappy ICQ: >> 7615664 >> > -- > Dave Cramer > 519 939 0336 > ICQ # 1467551 > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match >
pgsql-general by date: