Full text indexing (and errors!) - Mailing list pgsql-sql

From Mitch Vincent
Subject Full text indexing (and errors!)
Date
Msg-id 00db01bfc345$e5400100$0300000a@doot.org
Whole thread Raw
Responses Re: Full text indexing (and errors!)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
I posted this to -hackers but thought it might be of some use to people here
as well. I look forward to any and all comments.. Thanks!

----------------------------------

I finally finished the fulltextindex ( in contrib/fulltextindex) and am
wondering if I might be able to do something that I haven't already to make
these queries a bit faster.. There is a lot of data here, I know, but as it
is right now it is faster for me to use grep (or agrep) to search the files
and put the matching files into a temporay table and qualify results on
that... I had hoped this would be the case :-)

Lets see.. (Most of this is in contrib/fulltextindex but I'll repeat it
here)

I have 2 tables.
   Table "applicants_resumes" Attribute  |  Type   | Modifier
-------------+---------+----------app_id      | integer |resume_text | text    |
Index: resumes_oid_index

And
   Table "resumes_fti"Attribute |     Type     | Modifier
-----------+--------------+----------string    | varchar(255) |id        | oid          |
Index: resume_fti_index

The resumes_fti table holds the broken up chunks of text (duh) ..

I also did the  CLUSTER resumes_fti_index on resumes_fti

When I run a query I get :


ipa=# explain select c.* from applicants c, resumes_fti f1 where f1.string ~
'^engineer' and f1.id = c.oid;
NOTICE:  QUERY PLAN:

Hash Join  (cost=918.11..7969.43 rows=56443 width=607) ->  Seq Scan on applicants c  (cost=0.00..1331.56 rows=9856
width=603)->  Hash  (cost=5.00..5.00 rows=168041 width=4)       ->  Index Scan using resume_fti_index on resumes_fti
f1
(cost=0.00..5.00 rows=168041 width=4)

EXPLAIN
ipa=#


query: select c.* from applicants c, resumes_fti f1 where f1.string ~
'^engineer' and f1.id = c.oid;
ProcessQuery
! system usage stats:
!       304.304711 elapsed 9.111341 user 7.992034 system sec
!       [9.172874 user 8.012545 sys total]
!       53309/1599 [53411/1599] filesystem blocks in/out
!       0/295 [0/605] page faults/reclaims, 0 [0] swaps
!       0 [0] signals rcvd, 0/0 [4/5] messages rcvd/sent
!       53168/226 [53255/232] voluntary/involuntary context switches
! postgres usage stats:
!       Shared blocks:      58171 read,          0 written, buffer hit rate
= 14.19%
!       Local  blocks:          0 read,          0 written, buffer hit rate
= 0.00%
!       Direct blocks:          0 read,          0 written
CommitTransactionCommand
proc_exit(0)
shmem_exit(0)
exit(0)


... As you can see, really really slow. So I thought baout creating an index
on 'string' in resumes_fti (makes since as this is suppose to be a full text
index) -- Note that's not listed in the README which was very odd to me. I
don't understand the point in breaking all the text out into chunks if you
don't then index it. The way it is after you get done with the README is
just a broken up version of what you already had (a bunch of text fields).
It was my understanding that the fulltextindex was meant to get around the
inability to index a full text field by breaking the full text field up into
smaller varchar() fields which could them be indexed...

Anyway, I tried --

ipa=# CREATE INDEX "rstring_fti_index" on resumes_fti(lower(string));
FATAL 1:  Memory exhausted in AllocSetAlloc()
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
  before or while processing the request.
 
The connection to the server was lost. Attempting reset: Succeeded.
ipa=#

... and as you can see it messed up pretty badly after about 15 minutes.
This machine is a PII400 with 256 Megs of RAM and 520 megs of swap running
PostgreSQL 7.0..

Sorry about the length of this post and thanks for any
pointers/comments/ideas..

-Mitch Vincent











pgsql-sql by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Foreign keys breaks tables permissions
Next
From: Tom Lane
Date:
Subject: Re: Foreign keys breaks tables permissions