Home > mailing lists

Full Text Index disk space requirements - Mailing list pgsql-performance

From
Subject	Full Text Index disk space requirements
Date	November 26, 2002 07:37:07
Msg-id	61258.12.249.229.112.1038303419.squirrel@www.l-i-e.com Whole thread Raw
List	pgsql-performance

Tree view

So, I'm trying to create a full text index as described here:

http://techdocs.postgresql.org/techdocs/fulltextindexing.php

Everything was going mostly okay...

I had to hack a quick PHP script instead of using the Perl once since I
didn't have a working Pg.pm, but that was a minor speed bump.

Then I hit a real road-block...

\copy article_fti from fulltext.sorted
\.
ERROR:  copy: line 34635390, cannot extend article_fti: No space left on
device.
        Check free disk space.
PQendcopy: resetting connection
archive=> \q
[root@rm-004-24 utilities]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3              15G   15G     0 100% /
/dev/sda1              48M  6.1M   39M  14% /boot
none                  439M     0  439M   0% /dev/shm

Oh.  Yeah.  I guess that *IS* going to be kind of big...

Any SWAGs how much disk space is required for a 500 M fulltext.sorted file?

IE, the ASCII file of string/OID pairs, in tab-delimited form, is 500 M --
How much PostgreSQL space does that turn into with the tables/indices as
described the URL above?

When I deleted all the fti rows, and did a VACUUM, there was almost 2G
available...

ALSO:
Wouldn't using f1.string = 'perth' be faster than f1.string ~ '^perth' and
equally useful?  Or is ~ with ^ somehow actually faster than the seemingly
simple = comparison?

AND:
Would using OR for the individual word comparisons be a big drag on speed?
 I'd kind of like to give ranked results based on how many of the terms
were present rather than a complete filter.

I'd be happy to try the EXPLAIN queries, but I don't think they're going
to be accurate without the data in the FTI table...

I got some weird results when I did a test run with a very small dataset
in the FTI table -- But I also think I was doing it in the middle of a
train-wreck between daily VACUUM and pg_dump, which were thrashing each
other with all the FTI data I had imported just for the small test...

I've altered the cron jobs to have more time in between.

THANKS IN ADVANCE!

pgsql-performance by date:

From: Ron Johnson
Date: 26 November 2002, 02:27:42
Subject: Re: performance of insert/delete/update

From: Maarten Boekhold
Date: 26 November 2002, 10:49:26
Subject: Re: Full Text Index disk space requirements

Full Text Index disk space requirements - Mailing list pgsql-performance

Previous

Next