So, I'm trying to create a full text index as described here:
http://techdocs.postgresql.org/techdocs/fulltextindexing.php
Everything was going mostly okay...
I had to hack a quick PHP script instead of using the Perl once since I
didn't have a working Pg.pm, but that was a minor speed bump.
Then I hit a real road-block...
\copy article_fti from fulltext.sorted
\.
ERROR: copy: line 34635390, cannot extend article_fti: No space left on
device.
Check free disk space.
PQendcopy: resetting connection
archive=> \q
[root@rm-004-24 utilities]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 15G 15G 0 100% /
/dev/sda1 48M 6.1M 39M 14% /boot
none 439M 0 439M 0% /dev/shm
Oh. Yeah. I guess that *IS* going to be kind of big...
Any SWAGs how much disk space is required for a 500 M fulltext.sorted file?
IE, the ASCII file of string/OID pairs, in tab-delimited form, is 500 M --
How much PostgreSQL space does that turn into with the tables/indices as
described the URL above?
When I deleted all the fti rows, and did a VACUUM, there was almost 2G
available...
ALSO:
Wouldn't using f1.string = 'perth' be faster than f1.string ~ '^perth' and
equally useful? Or is ~ with ^ somehow actually faster than the seemingly
simple = comparison?
AND:
Would using OR for the individual word comparisons be a big drag on speed?
I'd kind of like to give ranked results based on how many of the terms
were present rather than a complete filter.
I'd be happy to try the EXPLAIN queries, but I don't think they're going
to be accurate without the data in the FTI table...
I got some weird results when I did a test run with a very small dataset
in the FTI table -- But I also think I was doing it in the middle of a
train-wreck between daily VACUUM and pg_dump, which were thrashing each
other with all the FTI data I had imported just for the small test...
I've altered the cron jobs to have more time in between.
THANKS IN ADVANCE!