Thread: Complicated query... is there a simpler way?
I've got a nasty looking search query, and I'm afraid with a large table it's going to be horribly inefficient, and I was wondering if anybody could think of a way to slim this one down. I'll simplify my situation as much as I can. I'm working with two tables here. The first is a user table: create table users( userid integer not null primary key, firstnames text not null, lastname text not null ); The second is a message board table: create table posts( postid integer not null primary key, reply_to integer references posts(postid), author integer not null references users(userid), subject text not null, body text not null ); I think the only columns which need explaining are postid and reply_to. When a new thread is created, a row is inserted into posts with a unique postid and a null reply_to. Any subsequent responses to this original post are also given unique postid's but their reply_to field references the start of the thread. I'm writing a search function. It would be useful to search by thread, and not by individual post. So I've created an aggregate function concat(text). I've also created a scoring function that takes in three text variables. The first should be the query (which I'll represent as $query, since it's provided by the webserver). The second should be the subject to search (subject is scored differently than the body) and the third should be the body to search. Here's the query: select lastname, firstnames, subject, threadid from posts, users, (select threadid, concat(body) as thread_body from (select postid as threadid, body from posts where reply_to is null union select reply_to as threadid, body from posts where reply_to is not null) as inner_subquery group by threadid) as outer_subquery where users.userid = posts.author and threadid = postid and score_search('$query',subject,thread_body) > 0 order by score_search('$query',subject,thread_body); Forgive me if I have any syntactic errors. I'm translating from my real query. My real query's a little larger since my actualy scenario's a little more complicated. But that's four actual select calls to produce this. Is there a prettier way anybody can think of? Thanks! -jag
*Disclaimer* - me sleepy, no guarantee of intelligebility or relevence ;) First thing that strikes me is that you should use varchar datatype for almost all of your fields with the exception of the body within posts. If I were working on this project, I would make a table called keywords where I would store all of the unique words (minus stop words) that are stored in either the body, author, and first name last name. Then I'd create a mapping table to map keyword -> body, keyword -> subject, and keyword -> author. break the user's search criteria into an array of words. Get rid of the stop words. For each word, lookup the keyword id. For each matching keyword_id find the body, subject, and authors that were hit (from mapping table). Count your hits. Sort your result set by the author, then subject, then body. Have the message with the documents (general) with the most keyword hits come up above those documents with fewer. This type of a search could serve you well depending on the diversity of your subject matter, specificity of queries, and size of the document corpus. You can get really complex if you like (this search doesn't take into account idioms or synonyms or soundex....) - but I have used a similar search method with much success and little time investment. If you are interested in this approach, I can send you some PHP files (for indexing documents and performing searches) and a stored procedure that performs this type of search. The biggest downside of this method is that maintaining the keyword and relationships can become a serious management nightmare, especially if users can revise their posted messages. Good Luck! -r At 12:49 AM 6/27/01 -0500, Joshua Adam Ginsberg wrote: >I've got a nasty looking search query, and I'm afraid with a large table >it's going to be horribly inefficient, and I was wondering if anybody >could think of a way to slim this one down. > >I'll simplify my situation as much as I can. > >I'm working with two tables here. The first is a user table: > >create table users( >userid integer not null primary key, >firstnames text not null, >lastname text not null >); > >The second is a message board table: > >create table posts( >postid integer not null primary key, >reply_to integer references posts(postid), >author integer not null references users(userid), >subject text not null, >body text not null >); > >I think the only columns which need explaining are postid and reply_to. >When a new thread is created, a row is inserted into posts with a unique >postid and a null reply_to. Any subsequent responses to this original post >are also given unique postid's but their reply_to field references the >start of the thread. > >I'm writing a search function. It would be useful to search by thread, and >not by individual post. So I've created an aggregate function >concat(text). I've also created a scoring function that takes in three >text variables. The first should be the query (which I'll represent as >$query, since it's provided by the webserver). The second should be the >subject to search (subject is scored differently than the body) and the >third should be the body to search. > >Here's the query: > >select lastname, firstnames, subject, threadid from posts, users, (select >threadid, concat(body) as thread_body from (select postid as threadid, >body from posts where reply_to is null union select reply_to as threadid, >body from posts where reply_to is not null) as inner_subquery group by >threadid) as outer_subquery where users.userid = posts.author and threadid >= postid and score_search('$query',subject,thread_body) > 0 order by >score_search('$query',subject,thread_body); > >Forgive me if I have any syntactic errors. I'm translating from my real >query. My real query's a little larger since my actualy scenario's a >little more complicated. But that's four actual select calls to produce >this. Is there a prettier way anybody can think of? > >Thanks! > >-jag --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.251 / Virus Database: 124 - Release Date: 4/26/01
Joshua Adam Ginsberg wrote: > > I've got a nasty looking search query, and I'm afraid with a large table > it's going to be horribly inefficient, and I was wondering if anybody > could think of a way to slim this one down. > Here's the query: > > select lastname, firstnames, subject, threadid from posts, users, > (select threadid, concat(body) as thread_body from (select postid as > threadid, body from posts where reply_to is null union select reply_to > as threadid, body from posts where reply_to is not null) as > inner_subquery group by threadid) as outer_subquery where users.userid = > posts.author and threadid = postid and > score_search('$query',subject,thread_body) > 0 order by > score_search('$query',subject,thread_body); Just thinking out loud, and it depends on how you are doing your scoring, but what about scoring each message individually and the grouping by threadid summing the scores? Should bring it down to two levels plus a join with users. - Richard Huxton
Hello! What SQL query will help me to concatenate two different tables with different number of rows? For example , i have first table with column1 and column2 , having 3 rows , and second table with column3, column4 , having 5 rows. How to make third table with column1,column2,column3,column4 and 5 rows in it (and last two rows in column1 and column2 are empty) Thanks for sugestions Igor.
Re: How to join tables with different columns and different number of rows?
From
"Richard Huxton"
Date:
From: "Igor" <dbmanager@osb368.nnov.ru> > Hello! > > What SQL query will help me to concatenate two different tables > with different number of rows? For example , i have first table > with column1 and column2 , having 3 rows , and second table > with column3, column4 , having 5 rows. How to make > third table with column1,column2,column3,column4 and 5 rows in it > (and last two rows in column1 and column2 are empty) Something along the lines of: SELECT col1,col2,'' as dummy3, '' as dummy4 FROM table1 UNION SELECT '' as dummy1, '' as dummy2, col3, col4 FROM table2 should do it for you (not tested) - Richard Huxton
The problem, of course, is your database schema. SQL really sucks for processing of recursive queries, you should reference beginning of the thread in your posts table, and search by that. On Wed, 27 Jun 2001, Joshua Adam Ginsberg wrote: > I've got a nasty looking search query, and I'm afraid with a large table > it's going to be horribly inefficient, and I was wondering if anybody > could think of a way to slim this one down. > > I'll simplify my situation as much as I can. > > I'm working with two tables here. The first is a user table: > > create table users( > userid integer not null primary key, > firstnames text not null, > lastname text not null > ); > > The second is a message board table: > > create table posts( > postid integer not null primary key, > reply_to integer references posts(postid), > author integer not null references users(userid), > subject text not null, > body text not null > ); > > I think the only columns which need explaining are postid and reply_to. > When a new thread is created, a row is inserted into posts with a unique > postid and a null reply_to. Any subsequent responses to this original > post are also given unique postid's but their reply_to field references > the start of the thread. > > I'm writing a search function. It would be useful to search by thread, > and not by individual post. So I've created an aggregate function > concat(text). I've also created a scoring function that takes in three > text variables. The first should be the query (which I'll represent as > $query, since it's provided by the webserver). The second should be the > subject to search (subject is scored differently than the body) and the > third should be the body to search. > > Here's the query: > > select lastname, firstnames, subject, threadid from posts, users, > (select threadid, concat(body) as thread_body from (select postid as > threadid, body from posts where reply_to is null union select reply_to > as threadid, body from posts where reply_to is not null) as > inner_subquery group by threadid) as outer_subquery where users.userid = > posts.author and threadid = postid and > score_search('$query',subject,thread_body) > 0 order by > score_search('$query',subject,thread_body); > > Forgive me if I have any syntactic errors. I'm translating from my real > query. My real query's a little larger since my actualy scenario's a > little more complicated. But that's four actual select calls to produce > this. Is there a prettier way anybody can think of? > > Thanks! > > -jag > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > >
Thank you for replay, it is working, but i would like to get result table with 5 rows. In your example we willl get 8 rows RH> From: "Igor" <dbmanager@osb368.nnov.ru> >> Hello! >> >> What SQL query will help me to concatenate two different tables >> with different number of rows? For example , i have first table >> with column1 and column2 , having 3 rows , and second table >> with column3, column4 , having 5 rows. How to make >> third table with column1,column2,column3,column4 and 5 rows in it >> (and last two rows in column1 and column2 are empty) RH> Something along the lines of: RH> SELECT col1,col2,'' as dummy3, '' as dummy4 RH> FROM table1 RH> UNION RH> SELECT '' as dummy1, '' as dummy2, col3, col4 RH> FROM table2
Re: How to join tables with different columns and different number of rows?
From
Martijn van Oosterhout
Date:
On Wed, Jun 27, 2001 at 02:25:23PM +0400, Igor wrote: > Thank you for replay, it is working, but > i would like to get result table with 5 rows. > In your example we willl get 8 rows I think it's time to reexamine what you're trying to acheive. What you are asking is not possible with SQL, at least not easily. Do it in your client code. > RH> From: "Igor" <dbmanager@osb368.nnov.ru> > > >> Hello! > >> > >> What SQL query will help me to concatenate two different tables > >> with different number of rows? For example , i have first table > >> with column1 and column2 , having 3 rows , and second table > >> with column3, column4 , having 5 rows. How to make > >> third table with column1,column2,column3,column4 and 5 rows in it > >> (and last two rows in column1 and column2 are empty) > > RH> Something along the lines of: > > RH> SELECT col1,col2,'' as dummy3, '' as dummy4 > RH> FROM table1 > RH> UNION > RH> SELECT '' as dummy1, '' as dummy2, col3, col4 > RH> FROM table2 -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ - Artificial Intelligence is the science of making computers that behave - like the ones in the movies.
On Wed, Jun 27, 2001 at 02:24:24AM +0100, Ryan Mahoney wrote: > *Disclaimer* - me sleepy, no guarantee of intelligebility or relevence ;) > > First thing that strikes me is that you should use varchar datatype for > almost all of your fields with the exception of the body within posts. > > If I were working on this project, I would... </lurk> slaver slaver > If you are interested in this approach, I can send you some PHP files (for > indexing documents and performing searches) and a stored procedure that > performs this type of search. The biggest downside of this method is that > maintaining the keyword and relationships can become a serious management > nightmare, especially if users can revise their posted messages. i'm game, drool drool. sign me up! -- I'd concentrate on "living in the now" because it is fun and on building a better world because it is possible. - Tod Steward will@serensoft.com http://sourceforge.net/projects/newbiedoc -- we need your brain! http://www.dontUthink.com/ -- your brain needs us!