Re: Help with Query Tuning - Mailing list pgsql-performance
From | Adarsh Sharma |
---|---|
Subject | Re: Help with Query Tuning |
Date | |
Msg-id | 4D81A951.7030609@orkash.com Whole thread Raw |
In response to | Re: Help with Query Tuning (Kenneth Marshall <ktm@rice.edu>) |
Responses |
Re: Help with Query Tuning
|
List | pgsql-performance |
Thanks, I understand it know :-
But My one doubt which isn't clear :
Original Query :-
select count(*) from page_content where (content like '%Militant%'
OR content like '%jihad%' OR content like '%Mujahid%' OR
content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrORist%' OR
content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' OR content like '%injure%');
Output :-
count
-------
57061
(1 row)
Time: 19726.555 ms
I need to tune it , use full-text searching as :
Modified Query :-
SELECT count(*) from page_content
WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' || 'jihad' || 'Militant' || 'fedayeen' || 'insurgent' || 'terrORist' || 'cadre' || 'civilians' || 'police' || 'defence' || 'cops' || 'crpf' || 'dsf' || 'ssb');
Output :-
count
-------
0
(1 row)
Time: 194685.125 ms
I try, SELECT count(*) from page_content
WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvectOR('english',content) @@ to_tsquery('english','%Mujahid%' || '%jihad%' || '%Militant%' || '%fedayeen%' || '%insurgent%' || '%terrORist%' || '%cadre%' || '%civilians%' || '%police%' || '%defence%' || '%cops%' || '%crpf%' || '%dsf%' || '%ssb%');
count
-------
0
(1 row)
Time: 194722.468 ms
I know I have to create index but index is the next step, first you have to get the correct result .
CREATE INDEX pgweb_idx ON page_content USING gin(to_tsvector('english', content));
Please guide me where I am going wrong.
Thanks & best Regards,
Adarsh Sharma
Kenneth Marshall wrote:
But My one doubt which isn't clear :
Original Query :-
select count(*) from page_content where (content like '%Militant%'
OR content like '%jihad%' OR content like '%Mujahid%' OR
content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrORist%' OR
content like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' OR content like '%injure%');
Output :-
count
-------
57061
(1 row)
Time: 19726.555 ms
I need to tune it , use full-text searching as :
Modified Query :-
SELECT count(*) from page_content
WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvectOR('english',content) @@ to_tsquery('english','Mujahid' || 'jihad' || 'Militant' || 'fedayeen' || 'insurgent' || 'terrORist' || 'cadre' || 'civilians' || 'police' || 'defence' || 'cops' || 'crpf' || 'dsf' || 'ssb');
Output :-
count
-------
0
(1 row)
Time: 194685.125 ms
I try, SELECT count(*) from page_content
WHERE publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and to_tsvectOR('english',content) @@ to_tsquery('english','%Mujahid%' || '%jihad%' || '%Militant%' || '%fedayeen%' || '%insurgent%' || '%terrORist%' || '%cadre%' || '%civilians%' || '%police%' || '%defence%' || '%cops%' || '%crpf%' || '%dsf%' || '%ssb%');
count
-------
0
(1 row)
Time: 194722.468 ms
I know I have to create index but index is the next step, first you have to get the correct result .
CREATE INDEX pgweb_idx ON page_content USING gin(to_tsvector('english', content));
Please guide me where I am going wrong.
Thanks & best Regards,
Adarsh Sharma
Kenneth Marshall wrote:
On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote:Dear all, I am facing a problem while creating the index to make the below query run faster. My table size is near about 1065 MB and 428467 rows. explain analyze select count(*) from page_content where publishing_date like '%2010%' and content_language='en' and content is not null and isprocessable = 1 and (content like '%Militant%' OR content like '%jihad%' OR content like '%Mujahid%' OR content like '%fedayeen%' OR content like '%insurgent%' OR content like '%terrorist%' ORcontent like '%cadre%' OR content like '%civilians%' OR content like '%police%' OR content like '%defence%' OR content like '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content like '%kill%' or content like '%injure%'); *Output: * Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual time=18564.631..18564.631 rows=1 loops=1) -> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0) (actual time=0.146..18529.371 rows=59918 loops=1) Filter: ((content IS NOT NULL) AND (publishing_date ~~ '%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable = 1) AND (((content) ::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND (((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~ '%jihad%'::text) OR ( (content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~ '%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR ((content)::text ~~ '%terrori st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~ '%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR ((content)::text ~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR ((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text) OR ((content)::text ~~ '%ssb%'::text))) Total runtime: 18564.673 ms *Index on that Table : *CREATE INDEX idx_page_idON page_contentUSING btree(crawled_page_id); *Index I create :* CREATE INDEX idx_page_id_contentON page_contentUSING btree(crawled_page_id,content_language,publishing_date,isprocessable); *Index that fail to create: *CREATE INDEX idx_page_id_content1ON page_contentUSING btree(crawled_page_id,content); Error :-ERROR: index row requires 13240 bytes, maximum size is 8191 ********** Error ********** ERROR: index row requires 13240 bytes, maximum size is 8191 SQL state: 54000 How to resolve this error Please give any suggestion to tune the query. Thanks & best Regards, Adarsh SharmaYou should probably be looking at using full-text indexing: http://www.postgresql.org/docs/9.0/static/textsearch.html or limit the size of content for the index. Cheers, Ken
pgsql-performance by date: