Thread: Need to improve performance

Need to improve performance

From
Vassili A Akimov
Date:
Hallo,
 My name is Vassili Akimov. I work with postgres, we maintain the main
database for our sponsors on it. One of thier requirements,- perform the

search through the long text field. Sometimes this field even exceeds
32k so we cut off the rest. But search trough this long field is too
slow. (it takes approximately 4 minutes of proccessor time on Intel
pentium II 266MHz). So we were advised to use "fulltextindex".- chunk
this fields on single words and make new table with words and oids in
it. After we made this table its size was 2940360 records. And I tried
to measure the time:

Create table app_fti(string varchar(32),id oid);

fill this table with words and oid.(as it explained in
fulltextindex(update main table))

create index app_fti_idx on app_fti(string,id);
cluster app_fti_idx on app_fti;
vacuum;
vacuum analyze;
select f1.id from app_fti f1, app_fti f2 where f1.string~*'visual' and
f2.string~*'basic' and f1.id=f2.id;

this select takes about the same time that the select we used for
searching through whole original text field. So we tried slightly
different  approach:

Create table app_fti(string varchar(32),id oid);
fill this table with words and oid. (as it explained in fulltextindex)
create index app_fti_idx on app_fti(string);
create index app_fti_id_idx on app_fti(id);
cluster app_fti_idx on app_fti;
vacuum;
vacuum analyze;
select f1.id from app_fti f1, app_fti f2 where f1.string~*'visual' and
f2.string~*'basic' and f1.id=f2.id;

this select takes slightly less time but not significant. And if we
would add one more word in search criteria, it would add 1 more minute
to the search time. So we can't use this for our database.
the question is can we do something to make this type of search faster?
we need to make performace of our search at least twice faster than it
is now.
original table and search we do now are:

                              Table "applicant"
          Attribute          |   Type    |            Modifier

-----------------------------+-----------+------------------------------------

 hours/shift                 | char(70)  |
 position                    | text      |
 type of employment          | integer   |
 date of applying            | date      | default date(now())
 time of applying            | time      | default "time"(now())
 salary                      | integer   |
 salary type                 | char(25)  | not null default 'anually'
 last name                   | char(20)  |
 first name                  | char(20)  |
 mid name                    | char(20)  |
 file reference              | char(100) |
 street address              | text      |
 city                        | char(25)  |
 state                       | char(2)   |
 zip code                    | char(5)   |
 country                     | char(2)   | default 'US'
 permanent street address    | text      |
 permanent city              | char(25)  |
 permanent state             | char(2)   | default 'VA'
 permanent zip code          | char(5)   |
 permanent country           | char(2)   | default 'US'
 home phone area code        | smallint  |
 home phone number           | integer   |
 home phone extension        | char(10)  |
 alternative phone area code | smallint  |
 alternative phone number    | integer   |
 alternative phone extension | char(10)  |
 work phone area code        | smallint  |
 work phone number           | integer   |
 work phone extension        | char(10)  |
 fax area code               | smallint  |
 fax number                  | bigint    |
 pager                       | bigint    |
 pager extension             | char(10)  |
 e-mail                      | text      |
 url                         | text      |
 permition                   | boolean   |
 relocation                  | boolean   |
 travel                      | boolean   |
 highest grade               | char(10)  |
 major_minor                 | text      |
 other skills                | text      |
 reference number            | bigint    | default
nextval('reference'::text)
 password                    | char(15)  |
 other                       | text      |
 supervisor                  | char(15)  |
 level                       | integer   | default 7
 aux1                        | integer   |
 aux2                        | integer   |
 aux3                        | integer   |
 aux4                        | integer   |
 aux5                        | integer   |
 aux6                        | date      |
 aux7                        | time      |

it also has index on oid;

the average select is:

SELECT CASE
WHEN "other skills"~*'[^a-zA-Z]visual[^a-zA-Z]'
then 1 else 0 end +
CASE WHEN "other skills"~*'[^a-zA-Z]basic[^a-zA-Z]'
then 1 else 0 end +
CASE WHEN "other skills"~*'[^a-zA-Z]web[^a-zA-Z]'
then 1 else 0
end as "count",
"first name", "last name",
"reference number",
"date of applying", "city", "state",
"e-mail", "home phone area code",
"home phone number",
 "home phone extension" from "applicant"
WHERE "level">=2 AND
("other skills"~*'(^|[^a-zA-Z])unix($|[^a-zA-Z])'
and "other skills"~*'(^|[^a-zA-Z])oracle($|[^a-zA-Z])'
and "other skills"~*'(^|[^a-zA-Z])database($|[^a-zA-Z])')
order by "count" DESC
LIMIT 100;

this is the samle of our original query. We want to make it faster.
Number of criteria could be more than in this one. Our long field is
"other skills".

So can anybody advise us how we can improve performance of the search?

 Thank you
        vassili.




RE: Need to improve performance

From
"Andrew Snow"
Date:

> So we were advised to use "fulltextindex".- chunk
> this fields on single words and make new table with words and oids in
> it. After we made this table its size was 2940360 records. And I tried
> to measure the time:
>
>  ....
>
> select f1.id from app_fti f1, app_fti f2 where f1.string~*'visual' and
> f2.string~*'basic' and f1.id=f2.id;

What is causing it to be so slow is the use of regular expression pattern
matching.  For reasons I don't know, regular expression evaluation in a
query is fairly slow in PostgreSQL.  But since you have (correctly, I think)
made a full text index, why do you still need to use regexp's ?

Replace the ~* operator in the query above with 'LIKE', and you should find
it several times faster.

Please let us know the results of your testing.



- Andrew




Re: Need to improve performance

From
Vassili A Akimov
Date:
Andrew Snow wrote:

> > So we were advised to use "fulltextindex".- chunk
> > this fields on single words and make new table with words and oids in
> > it. After we made this table its size was 2940360 records. And I tried
> > to measure the time:
> >
> >  ....
> >
> > select f1.id from app_fti f1, app_fti f2 where f1.string~*'visual' and
> > f2.string~*'basic' and f1.id=f2.id;
>
> What is causing it to be so slow is the use of regular expression pattern
> matching.  For reasons I don't know, regular expression evaluation in a
> query is fairly slow in PostgreSQL.  But since you have (correctly, I think)
> made a full text index, why do you still need to use regexp's ?
>
> Replace the ~* operator in the query above with 'LIKE', and you should find
> it several times faster.
>
> Please let us know the results of your testing.
>
> - Andrew

        Thank you gentelmen! It seems that regexp was the reason! I enabled
likeplanning and it i=didn't improve the performance until I replaced '~*' with
'LIKE'.  Thank you again!
    V.


Re: Need to improve performance

From
Bruce Momjian
Date:
>
>         Thank you gentelmen! It seems that regexp was the reason! I enabled
> likeplanning and it i=didn't improve the performance until I replaced '~*' with
> 'LIKE'.  Thank you again!

The issue is that ~ and LIKE have similar performance.  ~* is
case-insensitive, and can not used indexes.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026