Re: Full text indexing preformance! (long) - Mailing list pgsql-hackers

From Mitch Vincent
Subject Re: Full text indexing preformance! (long)
Date
Msg-id 005e01bfc9a8$1a3f2da0$0300000a@doot.org
Whole thread Raw
In response to Full text indexing preformance! (long)  ("Mitch Vincent" <mitch@venux.net>)
Responses Re: Full text indexing preformance! (long)
List pgsql-hackers
Hi Tom, thanks for your reply..

I increased BLKSZ to 32k and re-compiled, then I imported all the resumes
(some of which I couldn't get before) and this problem completly
disappeared. The query is very fast now (.0.039792 seconds to be exact)..

One thing I did run into was this...
In my paging system I only have a need for 10 records at a time so I LIMIT
the query. The problem comes when I need to get a total of all the records
that matched the query (as a good search engine, I must tell people how many
records were found).. I can't count() and LIMIT in the same query, so I'm
forced to do 2 queries, one with count() and one without.

An example :

select a.appcode, a.firstname, a.middlename, a.lastname, a.state, a.degree1,
a.d1date, a.degree2, a.d2date, a.salary, a.skill1, a.skill2, a.skill3,
a.objective, a.employer, a.sic1, a.sic2, a.sic3, a.prefs1, a.prefs2, a.sells
from applicants as a,applicants_states as s, applicants_resumes as
ar,resumes_fti as rf where a.status = 'A' and lower(a.firstname) ~
lower('^a') and s.rstate='AL' and rf.string ~'^engineer' and
a.app_id=s.app_id and ar.app_id=a.app_id and rf.id=ar.oid limit 10 offset 0

Vs.

select count (a.app_id) as total from applicants as a,applicants_states as
s, applicants_resumes as ar,resumes_fti as rf where a.status = 'A' and
lower(a.firstname) ~ lower('^a') and s.rstate='AL' and rf.string
~'^engineer' and a.app_id=s.app_id and ar.app_id=a.app_id and rf.id=ar.oid

However the count() query has to go through the entire record set (which
makes sense) but it takes about 4 or 5  seconds.

The plan for the count() query.

NOTICE:  QUERY PLAN:

Aggregate  (cost=56.61..56.61 rows=1 width=20) ->  Nested Loop  (cost=0.00..56.61 rows=1 width=20)       ->  Nested
Loop (cost=0.00..10.74 rows=1 width=16)             ->  Nested Loop  (cost=0.00..8.59 rows=1 width=12)
->  Index Scan using resumes_fti_index on resumes_fti rf
 
(cost=0.00..4.97 rows=1 width=4)                   ->  Index Scan using applicants_resumes_index on
applicants_resumes ar  (cost=0.00..3.61 rows=1 width=8)             ->  Index Scan using applicants_app_id on
applicantsa
 
(cost=0.00..2.14 rows=1 width=4)       ->  Index Scan using applicants_states_app_id on applicants_states s
(cost=0.00..45.86 rows=1 width=4)

And the stats :

ProcessQuery
! system usage stats:
!       5.088647 elapsed 4.954981 user 0.125561 system sec
!       [4.976752 user 0.132817 sys total]
!       0/0 [0/0] filesystem blocks in/out
!       0/4607 [0/4846] page faults/reclaims, 0 [0] swaps
!       0 [0] signals rcvd, 0/0 [3/3] messages rcvd/sent
!       0/52 [3/57] voluntary/involuntary context switches
! postgres usage stats:
!       Shared blocks:          0 read,          0 written, buffer hit rate
= 100.00%
!       Local  blocks:          0 read,          0 written, buffer hit rate
= 0.00%
!       Direct blocks:          0 read,          0 written
CommitTransactionCommand
proc_exit(0)


The "0/4607 [0/4846] page faults/reclaims" area is greatly increased in this
query that from the other. Is that to be expected? Is there anything else I
can do to get the total number of records matched by the query and still use
LIMIT (I doubt it)?

If there isn't anything I can do, which looks to be the case here, I still
appreciate all the help you've given me..

I look forward to your response. Thanks!

-Mitch








pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Timestamp data type problems
Next
From: Lamar Owen
Date:
Subject: Header File cleanup.