Thread: Complicated query... is there a simpler way?

Complicated query... is there a simpler way?

From
Joshua Adam Ginsberg
Date:
I've got a nasty looking search query, and I'm afraid with a large table
it's going to be horribly inefficient, and I was wondering if anybody
could think of a way to slim this one down.

I'll simplify my situation as much as I can.

I'm working with two tables here. The first is a user table:

create table users(
userid integer not null primary key,
firstnames text not null,
lastname text not null
);

The second is a message board table:

create table posts(
postid integer not null primary key,
reply_to integer references posts(postid),
author integer not null references users(userid),
subject text not null,
body text not null
);

I think the only columns which need explaining are postid and reply_to.
When a new thread is created, a row is inserted into posts with a unique
postid and a null reply_to. Any subsequent responses to this original
post are also given unique postid's but their reply_to field references
the start of the thread.

I'm writing a search function. It would be useful to search by thread,
and not by individual post. So I've created an aggregate function
concat(text). I've also created a scoring function that takes in three
text variables. The first should be the query (which I'll represent as
$query, since it's provided by the webserver). The second should be the
subject to search (subject is scored differently than the body) and the
third should be the body to search.

Here's the query:

select lastname, firstnames, subject, threadid from posts, users,
(select threadid, concat(body) as thread_body from (select postid as
threadid, body from posts where reply_to is null union select reply_to
as threadid, body from posts where reply_to is not null) as
inner_subquery group by threadid) as outer_subquery where users.userid =
posts.author and threadid = postid and
score_search('$query',subject,thread_body) > 0 order by
score_search('$query',subject,thread_body);

Forgive me if I have any syntactic errors. I'm translating from my real
query. My real query's a little larger since my actualy scenario's a
little more complicated. But that's  four actual select calls to produce
this. Is there a prettier way anybody can think of?

Thanks!

-jag


Re: Complicated query... is there a simpler way?

From
Ryan Mahoney
Date:
*Disclaimer* - me sleepy, no guarantee of intelligebility or relevence ;)

First thing that strikes me is that you should use varchar datatype for
almost all of your fields with the exception of the body within posts.

If I were working on this project, I would make a table called keywords
where I would store all of the unique words (minus stop words) that are
stored in either the body, author, and first name last name.  Then I'd
create a mapping table to map keyword -> body, keyword -> subject, and
keyword -> author.

break the user's search criteria into an array of words.  Get rid of the
stop words.  For each word, lookup the keyword id.  For each matching
keyword_id find the body, subject, and authors that were hit (from mapping
table).  Count your hits.  Sort your result set by the author, then
subject, then body.  Have the message with the documents (general) with the
most keyword hits come up above those documents with fewer.

This type of a search could serve you well depending on the diversity of
your subject matter, specificity of queries, and size of the document
corpus.  You can get really complex if you like (this search doesn't take
into account idioms or synonyms or soundex....) - but I have used a similar
search method with much success and little time investment.

If you are interested in this approach, I can send you some PHP files (for
indexing documents and performing searches) and a stored procedure that
performs this type of search.  The biggest downside of this method is that
maintaining the keyword and relationships can become a serious management
nightmare, especially if users can revise their posted messages.

Good Luck!

-r


At 12:49 AM 6/27/01 -0500, Joshua Adam Ginsberg wrote:

>I've got a nasty looking search query, and I'm afraid with a large table
>it's going to be horribly inefficient, and I was wondering if anybody
>could think of a way to slim this one down.
>
>I'll simplify my situation as much as I can.
>
>I'm working with two tables here. The first is a user table:
>
>create table users(
>userid integer not null primary key,
>firstnames text not null,
>lastname text not null
>);
>
>The second is a message board table:
>
>create table posts(
>postid integer not null primary key,
>reply_to integer references posts(postid),
>author integer not null references users(userid),
>subject text not null,
>body text not null
>);
>
>I think the only columns which need explaining are postid and reply_to.
>When a new thread is created, a row is inserted into posts with a unique
>postid and a null reply_to. Any subsequent responses to this original post
>are also given unique postid's but their reply_to field references the
>start of the thread.
>
>I'm writing a search function. It would be useful to search by thread, and
>not by individual post. So I've created an aggregate function
>concat(text). I've also created a scoring function that takes in three
>text variables. The first should be the query (which I'll represent as
>$query, since it's provided by the webserver). The second should be the
>subject to search (subject is scored differently than the body) and the
>third should be the body to search.
>
>Here's the query:
>
>select lastname, firstnames, subject, threadid from posts, users, (select
>threadid, concat(body) as thread_body from (select postid as threadid,
>body from posts where reply_to is null union select reply_to as threadid,
>body from posts where reply_to is not null) as inner_subquery group by
>threadid) as outer_subquery where users.userid = posts.author and threadid
>= postid and score_search('$query',subject,thread_body) > 0 order by
>score_search('$query',subject,thread_body);
>
>Forgive me if I have any syntactic errors. I'm translating from my real
>query. My real query's a little larger since my actualy scenario's a
>little more complicated. But that's  four actual select calls to produce
>this. Is there a prettier way anybody can think of?
>
>Thanks!
>
>-jag

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.251 / Virus Database: 124 - Release Date: 4/26/01

Re: Complicated query... is there a simpler way?

From
Richard Huxton
Date:
Joshua Adam Ginsberg wrote:
>
> I've got a nasty looking search query, and I'm afraid with a large table
> it's going to be horribly inefficient, and I was wondering if anybody
> could think of a way to slim this one down.

> Here's the query:
>
> select lastname, firstnames, subject, threadid from posts, users,
> (select threadid, concat(body) as thread_body from (select postid as
> threadid, body from posts where reply_to is null union select reply_to
> as threadid, body from posts where reply_to is not null) as
> inner_subquery group by threadid) as outer_subquery where users.userid =
> posts.author and threadid = postid and
> score_search('$query',subject,thread_body) > 0 order by
> score_search('$query',subject,thread_body);

Just thinking out loud, and it depends on how you are doing your
scoring, but what about scoring each message individually and the
grouping by threadid summing the scores? Should bring it down to two
levels plus a join with users.

- Richard Huxton

Hello!

What SQL query will help me to concatenate two different tables
with different number of rows? For example , i have first table
with column1 and column2 , having 3 rows , and second table
with column3, column4 , having 5 rows. How to make
third table with column1,column2,column3,column4 and 5 rows in it
(and last two rows in column1 and column2 are empty)

Thanks for sugestions

Igor.



Re: How to join tables with different columns and different number of rows?

From
"Richard Huxton"
Date:
From: "Igor" <dbmanager@osb368.nnov.ru>

> Hello!
>
> What SQL query will help me to concatenate two different tables
> with different number of rows? For example , i have first table
> with column1 and column2 , having 3 rows , and second table
> with column3, column4 , having 5 rows. How to make
> third table with column1,column2,column3,column4 and 5 rows in it
> (and last two rows in column1 and column2 are empty)

Something along the lines of:

SELECT col1,col2,'' as dummy3, '' as dummy4
FROM table1
UNION
SELECT '' as dummy1, '' as dummy2, col3, col4
FROM table2

should do it for you (not tested)

- Richard Huxton


Re: Complicated query... is there a simpler way?

From
Alex Pilosov
Date:
The problem, of course, is your database schema.

SQL really sucks for processing of recursive queries, you should reference
beginning of the thread in your posts table, and search by that.

On Wed, 27 Jun 2001, Joshua Adam Ginsberg wrote:

> I've got a nasty looking search query, and I'm afraid with a large table
> it's going to be horribly inefficient, and I was wondering if anybody
> could think of a way to slim this one down.
>
> I'll simplify my situation as much as I can.
>
> I'm working with two tables here. The first is a user table:
>
> create table users(
> userid integer not null primary key,
> firstnames text not null,
> lastname text not null
> );
>
> The second is a message board table:
>
> create table posts(
> postid integer not null primary key,
> reply_to integer references posts(postid),
> author integer not null references users(userid),
> subject text not null,
> body text not null
> );
>
> I think the only columns which need explaining are postid and reply_to.
> When a new thread is created, a row is inserted into posts with a unique
> postid and a null reply_to. Any subsequent responses to this original
> post are also given unique postid's but their reply_to field references
> the start of the thread.
>
> I'm writing a search function. It would be useful to search by thread,
> and not by individual post. So I've created an aggregate function
> concat(text). I've also created a scoring function that takes in three
> text variables. The first should be the query (which I'll represent as
> $query, since it's provided by the webserver). The second should be the
> subject to search (subject is scored differently than the body) and the
> third should be the body to search.
>
> Here's the query:
>
> select lastname, firstnames, subject, threadid from posts, users,
> (select threadid, concat(body) as thread_body from (select postid as
> threadid, body from posts where reply_to is null union select reply_to
> as threadid, body from posts where reply_to is not null) as
> inner_subquery group by threadid) as outer_subquery where users.userid =
> posts.author and threadid = postid and
> score_search('$query',subject,thread_body) > 0 order by
> score_search('$query',subject,thread_body);
>
> Forgive me if I have any syntactic errors. I'm translating from my real
> query. My real query's a little larger since my actualy scenario's a
> little more complicated. But that's  four actual select calls to produce
> this. Is there a prettier way anybody can think of?
>
> Thanks!
>
> -jag
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>


Thank you for replay, it is working, but
i would like to get result table with 5 rows.
In your example we willl get 8 rows

RH> From: "Igor" <dbmanager@osb368.nnov.ru>

>> Hello!
>>
>> What SQL query will help me to concatenate two different tables
>> with different number of rows? For example , i have first table
>> with column1 and column2 , having 3 rows , and second table
>> with column3, column4 , having 5 rows. How to make
>> third table with column1,column2,column3,column4 and 5 rows in it
>> (and last two rows in column1 and column2 are empty)

RH> Something along the lines of:

RH> SELECT col1,col2,'' as dummy3, '' as dummy4
RH> FROM table1
RH> UNION
RH> SELECT '' as dummy1, '' as dummy2, col3, col4
RH> FROM table2



Re: How to join tables with different columns and different number of rows?

From
Martijn van Oosterhout
Date:
On Wed, Jun 27, 2001 at 02:25:23PM +0400, Igor wrote:
> Thank you for replay, it is working, but
> i would like to get result table with 5 rows.
> In your example we willl get 8 rows

I think it's time to reexamine what you're trying to acheive. What you are
asking is not possible with SQL, at least not easily. Do it in your client
code.

> RH> From: "Igor" <dbmanager@osb368.nnov.ru>
>
> >> Hello!
> >>
> >> What SQL query will help me to concatenate two different tables
> >> with different number of rows? For example , i have first table
> >> with column1 and column2 , having 3 rows , and second table
> >> with column3, column4 , having 5 rows. How to make
> >> third table with column1,column2,column3,column4 and 5 rows in it
> >> (and last two rows in column1 and column2 are empty)
>
> RH> Something along the lines of:
>
> RH> SELECT col1,col2,'' as dummy3, '' as dummy4
> RH> FROM table1
> RH> UNION
> RH> SELECT '' as dummy1, '' as dummy2, col3, col4
> RH> FROM table2

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
- Artificial Intelligence is the science of making computers that behave
- like the ones in the movies.

Re: Complicated query... is there a simpler way?

From
will trillich
Date:
On Wed, Jun 27, 2001 at 02:24:24AM +0100, Ryan Mahoney wrote:
> *Disclaimer* - me sleepy, no guarantee of intelligebility or relevence ;)
>
> First thing that strikes me is that you should use varchar datatype for
> almost all of your fields with the exception of the body within posts.
>
> If I were working on this project, I would...

</lurk>
slaver slaver

> If you are interested in this approach, I can send you some PHP files (for
> indexing documents and performing searches) and a stored procedure that
> performs this type of search.  The biggest downside of this method is that
> maintaining the keyword and relationships can become a serious management
> nightmare, especially if users can revise their posted messages.

i'm game, drool drool. sign me up!

--
I'd concentrate on "living in the now" because it is fun
and on building a better world because it is possible.
    - Tod Steward

will@serensoft.com
http://sourceforge.net/projects/newbiedoc -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!