Need to improve performance - Mailing list pgsql-sql

From Vassili A Akimov
Subject Need to improve performance
Date
Msg-id 395373C5.B1C2A566@erols.com
Whole thread Raw
List pgsql-sql
Hallo,
 My name is Vassili Akimov. I work with postgres, we maintain the main
database for our sponsors on it. One of thier requirements,- perform the

search through the long text field. Sometimes this field even exceeds
32k so we cut off the rest. But search trough this long field is too
slow. (it takes approximately 4 minutes of proccessor time on Intel
pentium II 266MHz). So we were advised to use "fulltextindex".- chunk
this fields on single words and make new table with words and oids in
it. After we made this table its size was 2940360 records. And I tried
to measure the time:

Create table app_fti(string varchar(32),id oid);

fill this table with words and oid.(as it explained in
fulltextindex(update main table))

create index app_fti_idx on app_fti(string,id);
cluster app_fti_idx on app_fti;
vacuum;
vacuum analyze;
select f1.id from app_fti f1, app_fti f2 where f1.string~*'visual' and
f2.string~*'basic' and f1.id=f2.id;

this select takes about the same time that the select we used for
searching through whole original text field. So we tried slightly
different  approach:

Create table app_fti(string varchar(32),id oid);
fill this table with words and oid. (as it explained in fulltextindex)
create index app_fti_idx on app_fti(string);
create index app_fti_id_idx on app_fti(id);
cluster app_fti_idx on app_fti;
vacuum;
vacuum analyze;
select f1.id from app_fti f1, app_fti f2 where f1.string~*'visual' and
f2.string~*'basic' and f1.id=f2.id;

this select takes slightly less time but not significant. And if we
would add one more word in search criteria, it would add 1 more minute
to the search time. So we can't use this for our database.
the question is can we do something to make this type of search faster?
we need to make performace of our search at least twice faster than it
is now.
original table and search we do now are:

                              Table "applicant"
          Attribute          |   Type    |            Modifier

-----------------------------+-----------+------------------------------------

 hours/shift                 | char(70)  |
 position                    | text      |
 type of employment          | integer   |
 date of applying            | date      | default date(now())
 time of applying            | time      | default "time"(now())
 salary                      | integer   |
 salary type                 | char(25)  | not null default 'anually'
 last name                   | char(20)  |
 first name                  | char(20)  |
 mid name                    | char(20)  |
 file reference              | char(100) |
 street address              | text      |
 city                        | char(25)  |
 state                       | char(2)   |
 zip code                    | char(5)   |
 country                     | char(2)   | default 'US'
 permanent street address    | text      |
 permanent city              | char(25)  |
 permanent state             | char(2)   | default 'VA'
 permanent zip code          | char(5)   |
 permanent country           | char(2)   | default 'US'
 home phone area code        | smallint  |
 home phone number           | integer   |
 home phone extension        | char(10)  |
 alternative phone area code | smallint  |
 alternative phone number    | integer   |
 alternative phone extension | char(10)  |
 work phone area code        | smallint  |
 work phone number           | integer   |
 work phone extension        | char(10)  |
 fax area code               | smallint  |
 fax number                  | bigint    |
 pager                       | bigint    |
 pager extension             | char(10)  |
 e-mail                      | text      |
 url                         | text      |
 permition                   | boolean   |
 relocation                  | boolean   |
 travel                      | boolean   |
 highest grade               | char(10)  |
 major_minor                 | text      |
 other skills                | text      |
 reference number            | bigint    | default
nextval('reference'::text)
 password                    | char(15)  |
 other                       | text      |
 supervisor                  | char(15)  |
 level                       | integer   | default 7
 aux1                        | integer   |
 aux2                        | integer   |
 aux3                        | integer   |
 aux4                        | integer   |
 aux5                        | integer   |
 aux6                        | date      |
 aux7                        | time      |

it also has index on oid;

the average select is:

SELECT CASE
WHEN "other skills"~*'[^a-zA-Z]visual[^a-zA-Z]'
then 1 else 0 end +
CASE WHEN "other skills"~*'[^a-zA-Z]basic[^a-zA-Z]'
then 1 else 0 end +
CASE WHEN "other skills"~*'[^a-zA-Z]web[^a-zA-Z]'
then 1 else 0
end as "count",
"first name", "last name",
"reference number",
"date of applying", "city", "state",
"e-mail", "home phone area code",
"home phone number",
 "home phone extension" from "applicant"
WHERE "level">=2 AND
("other skills"~*'(^|[^a-zA-Z])unix($|[^a-zA-Z])'
and "other skills"~*'(^|[^a-zA-Z])oracle($|[^a-zA-Z])'
and "other skills"~*'(^|[^a-zA-Z])database($|[^a-zA-Z])')
order by "count" DESC
LIMIT 100;

this is the samle of our original query. We want to make it faster.
Number of criteria could be more than in this one. Our long field is
"other skills".

So can anybody advise us how we can improve performance of the search?

 Thank you
        vassili.




pgsql-sql by date:

Previous
From: "Mitch Vincent"
Date:
Subject: Orderby two different columns
Next
From: "Mitch Vincent"
Date:
Subject: Re: Need to improve performance