Re: website doc search is extremely SLOW - Mailing list pgsql-general
From | Dave Cramer |
---|---|
Subject | Re: website doc search is extremely SLOW |
Date | |
Msg-id | 1072879377.2167.7.camel@localhost.localdomain Whole thread Raw |
In response to | Re: website doc search is extremely SLOW ("John Sidney-Woollett" <johnsw@wardbrook.com>) |
Responses |
Re: website doc search is extremely SLOW
Re: website doc search is extremely SLOW |
List | pgsql-general |
Well it appears there are quite a few solutions to use so the next question should be what are we trying to accomplish here? One thing that I think is that the documentation search should be limited to the documentation. Who is in a position to make the decision of which solution to use? Dave On Wed, 2003-12-31 at 08:44, John Sidney-Woollett wrote: > Wow, you're right - I could have probably saved myself a load of time! :) > > Although you do learn a lot reinventing the wheel... ...or at least you > hit the same issues and insights others did before... > > John > > Ericson Smith said: > > You should probably take a look at the Swish project. For a certain > > project, we tried Tsearch2/Tsearch, even (gasp) MySQL fulltext search, > > but with over 600,000 documents to index, both took too long to conduct > > searches, especially as the database was swapped in and out of memory > > based on search segment. MySQL full text was the most unusable. > > > > Swish uses its own internal DB format, and comes with a simple spider as > > well. You can make it search by category, date and other nifty criteria > > also. > > http://swish-e.org > > > > You can take a look over at the project and do some searches to see what > > I mean: > > http://cbd-net.com > > > > Warmest regards, > > Ericson Smith > > Tracking Specialist/DBA > > +-----------------------+----------------------------+ > > | http://www.did-it.com | "When I'm paid, I always | > > | eric@did-it.com | follow the job through. | > > | 516-255-0500 | You know that." -Angel Eyes| > > +-----------------------+----------------------------+ > > > > > > > > John Sidney-Woollett wrote: > > > >>I think that Oleg's new search offering looks really good and fast. (I > >>can't wait till I have some task that needs tsearch!). > >> > >>I agree with Dave that searching the docs is more important for me than > >>the sites - but it would be really nice to have both, in one tool. > >> > >>I built something similar for the Tate Gallery in the UK - here you can > >>select the type of content that you want returned, either static pages or > >>dynamic. You can see the idea at > >>http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&action=new > >> > >>This is custom built (using java/Oracle), supports stemming, boolean > >>operators, exact phrase matching, relevancy and matched term > >> highlighting. > >> > >>You can switch on/off the types of documents that you are not interested > >>in. Using this analogy, a search facility that could offer you results > >>from i) the docs and/or ii) the postgres sites static pages would be very > >>useful. > >> > >>John Sidney-Woollett > >> > >>Dave Cramer said: > >> > >> > >>>Marc, > >>> > >>>No it doesn't spider, it is a specialized tool for searching documents. > >>> > >>>I'm curious, what value is there to being able to count the number of > >>>url's ? > >>> > >>>It does do things like query all documents where CREATE AND TABLE are n > >>>words apart, just as fast, I would think these are more valuable to > >>>document searching? > >>> > >>>I think the challenge here is what do we want to search. I am betting > >>>that folks use this page as they would man? ie. what is the command for > >>>create trigger? > >>> > >>>As I said my offer stands to help out, but I think if the goal is to > >>>search the entire website, then this particular tool is not useful. > >>> > >>>At this point I am working on indexing the sgml directly as it has less > >>>cruft in it. For instance all the links that appear in every summary are > >>>just noise. > >>> > >>> > >>>Dave > >>> > >>>On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote: > >>> > >>> > >>>>On Wed, 31 Dec 2003, Dave Cramer wrote: > >>>> > >>>> > >>>> > >>>>>I can modify mine to be client server if you want? > >>>>> > >>>>>It is a java app, so we need to be able to run jdk1.3 at least? > >>>>> > >>>>> > >>>>jdk1.4 is available on the VMs ... does your spider? for instance, you > >>>>mention that you have the docs indexed right now, but we are currently > >>>>indexing: > >>>> > >>>>Server http://archives.postgresql.org/ > >>>>Server http://advocacy.postgresql.org/ > >>>>Server http://developer.postgresql.org/ > >>>>Server http://gborg.postgresql.org/ > >>>>Server http://pgadmin.postgresql.org/ > >>>>Server http://techdocs.postgresql.org/ > >>>>Server http://www.postgresql.org/ > >>>> > >>>>will it be able to handle: > >>>> > >>>>186_archives=# select count(*) from url; > >>>> count > >>>>-------- > >>>> 393551 > >>>>(1 row) > >>>> > >>>>as fast as you are finding with just the docs? > >>>> > >>>>---- > >>>>Marc G. Fournier Hub.Org Networking Services > >>>>(http://www.hub.org) > >>>>Email: scrappy@hub.org Yahoo!: yscrappy ICQ: > >>>>7615664 > >>>> > >>>> > >>>> > >>>-- > >>>Dave Cramer > >>>519 939 0336 > >>>ICQ # 1467551 > >>> > >>> > >>>---------------------------(end of broadcast)--------------------------- > >>>TIP 9: the planner will ignore your desire to choose an index scan if > >>> your > >>> joining column's datatypes do not match > >>> > >>> > >>> > >> > >> > >>---------------------------(end of broadcast)--------------------------- > >>TIP 2: you can get off all lists at once with the unregister command > >> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > >> > >> > >> > > > -- Dave Cramer 519 939 0336 ICQ # 1467551
pgsql-general by date: