Re: Help with Query Tuning - Mailing list pgsql-performance

From Adarsh Sharma
Subject Re: Help with Query Tuning
Date
Msg-id 4D82DCE2.5020902@orkash.com
Whole thread Raw
In response to Help with Query Tuning  (Adarsh Sharma <adarsh.sharma@orkash.com>)
Responses Re: Help with Query Tuning  (Reid Thompson <Reid.Thompson@ateb.com>)
Re: Help with Query Tuning  (tv@fuzzy.cz)
List pgsql-performance
Thanks , it works now .. :-)

Here is the output :

pdc_uima=# SELECT count(*)  from page_content WHERE publishing_date like '%2010%' and
pdc_uima-# content_language='en' and content is not null and isprocessable = 1 and
pdc_uima-# to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | '
pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | '
pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || ' | '
pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || 'crpf' || ' | '
pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' );

 count 
--------
 137193
(1 row)

Time: 195441.894 ms


But my original query is to use AND also i.e

select  count(*)  from page_content where publishing_date like '%2010%' and content_language='en'  and content is not null and isprocessable = 1 and (content like '%Militant%'
OR content like '%jihad%' OR  content like '%Mujahid%'  OR
 content like '%fedayeen%' OR content like '%insurgent%'  OR content like '%terrORist%' OR
  content like '%cadre%'  OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' OR content like '%injure%');

 count
-------
 57061
(1 row)

Time: 19423.087 ms


Now I have to add AND condition (  AND (content like '%kill%' OR content like '%injure%')  )  also.


Thanks & Regards,
Adarsh Sharma



tv@fuzzy.cz wrote:
tv@fuzzy.cz wrote:   
Yes , I think we caught the problem but it results in the below error :

SELECT count(*)  from page_content
WHERE publishing_date like '%2010%' and content_language='en' and
content is not null and isprocessable = 1 and
to_tsvector('english',content) @@ to_tsquery('english','Mujahid ' ||
'jihad ' || 'Militant ' || 'fedayeen ' || 'insurgent ' || 'terrORist '
|| 'cadre ' || 'civilians ' || 'police ' || 'defence ' || 'cops ' ||
'crpf ' || 'dsf ' || 'ssb');

ERROR:  syntax error in tsquery: "Mujahid jihad Militant fedayeen
insurgent terrORist cadre civilians police defence cops crpf dsf ssb"
       
The text passed to to_tsquery has to be a proper query, i.e. single
tokens
separated by boolean operators. In your case, you should put there '|'
(which means OR) to get something like this
 'Mujahid | jihad | Militant | ...'

or you can use plainto_tsquery() as that accepts simple text, but it
puts
'&' (AND) between the tokens and I guess that's not what you want.

Tomas

     
What to do to make it satisfies the OR condition to match any of the
to_tsquery values as we got it right through like '%Mujahid' or .....
or ....   
You can't force the plainto_tsquery to somehow use the OR instead of AND.
You need to modify the piece of code that produces the search text to put
there '|' characters. So do something like this

SELECT count(*)  from page_content WHERE publishing_date like '%2010%' and
content_language='en' and content is not null and isprocessable = 1 and
to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | '
|| 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen);

Not sure where does this text come from, but you can do this in a higher
level language, e.g. in PHP. Something like this

$words = implode(' | ', explode(' ',$text));

and then pass the $words into the query. Or something like that.

Tomas
 

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Request for feedback on hardware for a new database server
Next
From: Jesper Krogh
Date:
Subject: Re: Request for feedback on hardware for a new database server