Need to improve performance - Mailing list pgsql-sql
From | Vassili A Akimov |
---|---|
Subject | Need to improve performance |
Date | |
Msg-id | 395373C5.B1C2A566@erols.com Whole thread Raw |
List | pgsql-sql |
Hallo, My name is Vassili Akimov. I work with postgres, we maintain the main database for our sponsors on it. One of thier requirements,- perform the search through the long text field. Sometimes this field even exceeds 32k so we cut off the rest. But search trough this long field is too slow. (it takes approximately 4 minutes of proccessor time on Intel pentium II 266MHz). So we were advised to use "fulltextindex".- chunk this fields on single words and make new table with words and oids in it. After we made this table its size was 2940360 records. And I tried to measure the time: Create table app_fti(string varchar(32),id oid); fill this table with words and oid.(as it explained in fulltextindex(update main table)) create index app_fti_idx on app_fti(string,id); cluster app_fti_idx on app_fti; vacuum; vacuum analyze; select f1.id from app_fti f1, app_fti f2 where f1.string~*'visual' and f2.string~*'basic' and f1.id=f2.id; this select takes about the same time that the select we used for searching through whole original text field. So we tried slightly different approach: Create table app_fti(string varchar(32),id oid); fill this table with words and oid. (as it explained in fulltextindex) create index app_fti_idx on app_fti(string); create index app_fti_id_idx on app_fti(id); cluster app_fti_idx on app_fti; vacuum; vacuum analyze; select f1.id from app_fti f1, app_fti f2 where f1.string~*'visual' and f2.string~*'basic' and f1.id=f2.id; this select takes slightly less time but not significant. And if we would add one more word in search criteria, it would add 1 more minute to the search time. So we can't use this for our database. the question is can we do something to make this type of search faster? we need to make performace of our search at least twice faster than it is now. original table and search we do now are: Table "applicant" Attribute | Type | Modifier -----------------------------+-----------+------------------------------------ hours/shift | char(70) | position | text | type of employment | integer | date of applying | date | default date(now()) time of applying | time | default "time"(now()) salary | integer | salary type | char(25) | not null default 'anually' last name | char(20) | first name | char(20) | mid name | char(20) | file reference | char(100) | street address | text | city | char(25) | state | char(2) | zip code | char(5) | country | char(2) | default 'US' permanent street address | text | permanent city | char(25) | permanent state | char(2) | default 'VA' permanent zip code | char(5) | permanent country | char(2) | default 'US' home phone area code | smallint | home phone number | integer | home phone extension | char(10) | alternative phone area code | smallint | alternative phone number | integer | alternative phone extension | char(10) | work phone area code | smallint | work phone number | integer | work phone extension | char(10) | fax area code | smallint | fax number | bigint | pager | bigint | pager extension | char(10) | e-mail | text | url | text | permition | boolean | relocation | boolean | travel | boolean | highest grade | char(10) | major_minor | text | other skills | text | reference number | bigint | default nextval('reference'::text) password | char(15) | other | text | supervisor | char(15) | level | integer | default 7 aux1 | integer | aux2 | integer | aux3 | integer | aux4 | integer | aux5 | integer | aux6 | date | aux7 | time | it also has index on oid; the average select is: SELECT CASE WHEN "other skills"~*'[^a-zA-Z]visual[^a-zA-Z]' then 1 else 0 end + CASE WHEN "other skills"~*'[^a-zA-Z]basic[^a-zA-Z]' then 1 else 0 end + CASE WHEN "other skills"~*'[^a-zA-Z]web[^a-zA-Z]' then 1 else 0 end as "count", "first name", "last name", "reference number", "date of applying", "city", "state", "e-mail", "home phone area code", "home phone number", "home phone extension" from "applicant" WHERE "level">=2 AND ("other skills"~*'(^|[^a-zA-Z])unix($|[^a-zA-Z])' and "other skills"~*'(^|[^a-zA-Z])oracle($|[^a-zA-Z])' and "other skills"~*'(^|[^a-zA-Z])database($|[^a-zA-Z])') order by "count" DESC LIMIT 100; this is the samle of our original query. We want to make it faster. Number of criteria could be more than in this one. Our long field is "other skills". So can anybody advise us how we can improve performance of the search? Thank you vassili.