Thread: INSERTs becoming slower and slower
Hi,
I am breaking up huge texts (between 25K and 250K words) into single words using PgPlsql.
For this I am using a temp table in the first step :
LOOP
vLeft := vRight;
vTmp := vLeft;
LOOP
vChr := SUBSTRING ( pText FROM vTmp FOR 1);
vTmp := vTmp + 1;
EXIT WHEN (vChr = ' ' OR vChr IS NULL OR vTmp = cBorder);
END LOOP;
vRight := vTmp;
vLit := SUBSTRING(pText FROM vLeft FOR (vRight - vLeft - 1));
IF (LENGTH(vLit) > 0) THEN
WRDCNT := WRDCNT +1;
INSERT INTO DEX_TEMPDOC(TMP_DOO_ID
, TMP_SEQ_ID
, TMP_RAWTEXT)
VALUES (pDOO_ID
, I
, vLIT
);
END IF;
I := I + 1;
vTmp := LENGTH(vLIT);
IF ((WRDCNT % 100) = 0) THEN
PROGRESS = ROUND((100 * I) / DOCLEN,0);
RAISE NOTICE '[PROC] % WORDS -- LAST LIT % (Len %) [% PCT / % of %]', WRDCNT, vLIT, vTMP, PROGRESS, I, DOCLEN;
END IF;
EXIT WHEN vRight >= cBorder;
END LOOP;
The doc is preprocessed, between each word only a single blank can be.
My problem is : The first 25K words are quite quick, but the insert become slower and slower. starting with 1K words per sec I end up with 100 words in 10 sec (when I reach 80K-100K words)
the only (nonunique index) on tempdoc is on RAWTEXT.
What can I do ? Should I drop the index ?
Here is my config:
shared_buffers = 2000 # min 16, at least max_connections*2, 8KB each
work_mem = 32768 # min 64, size in KB
maintenance_work_mem = 16384 # min 1024, size in KB
max_stack_depth = 8192 # min 100, size in KB
enable_hashagg = true
enable_hashjoin = true
enable_indexscan = true
enable_mergejoin = true
enable_nestloop = true
enable_seqscan = false
The machine is a XEON 3GHz, 1GB RAM, SATA RAID 1 Array running 8.0.4 i686 precompiled
Thanks !
Mit freundlichen Grüßen
Dipl.Inform.Marcus Noerder-Tuitje
Entwickler
software technology AG
Kortumstraße 16
44787 Bochum
Tel: 0234 / 52 99 6 26
Fax: 0234 / 52 99 6 22
E-Mail: noerder-tuitje@technology.de
Internet: www.technology.de
You might find it faster to install contrib/tsearch2 for text indexing sort of purposes... Nörder-Tuitje wrote: > > > Hi, > > I am breaking up huge texts (between 25K and 250K words) into single > words using PgPlsql. > > For this I am using a temp table in the first step : > > LOOP > > vLeft := vRight; > vTmp := vLeft; > > LOOP > vChr := SUBSTRING ( pText FROM vTmp FOR 1); > vTmp := vTmp + 1; > EXIT WHEN (vChr = ' ' OR vChr IS NULL OR vTmp = > cBorder); > END LOOP; > > vRight := vTmp; > > vLit := SUBSTRING(pText FROM vLeft FOR (vRight - > vLeft - 1)); > > IF (LENGTH(vLit) > 0) THEN > WRDCNT := WRDCNT +1; > INSERT INTO DEX_TEMPDOC(TMP_DOO_ID > , TMP_SEQ_ID > , TMP_RAWTEXT) > VALUES (pDOO_ID > , I > , vLIT > ); > END IF; > > I := I + 1; > vTmp := LENGTH(vLIT); > > > IF ((WRDCNT % 100) = 0) THEN > PROGRESS = ROUND((100 * I) / DOCLEN,0); > RAISE NOTICE '[PROC] % WORDS -- LAST LIT % (Len > %) [% PCT / % of %]', WRDCNT, vLIT, vTMP, PROGRESS, I, DOCLEN; > > END IF; > > > EXIT WHEN vRight >= cBorder; > END LOOP; > > > The doc is preprocessed, between each word only a single blank can be. > > My problem is : The first 25K words are quite quick, but the insert > become slower and slower. starting with 1K words per sec I end up with > 100 words in 10 sec (when I reach 80K-100K words) > > the only (nonunique index) on tempdoc is on RAWTEXT. > > What can I do ? Should I drop the index ? > > Here is my config: > > shared_buffers = 2000 # min 16, at least max_connections*2, > 8KB each > work_mem = 32768 # min 64, size in KB > maintenance_work_mem = 16384 # min 1024, size in KB > max_stack_depth = 8192 # min 100, size in KB > > enable_hashagg = true > enable_hashjoin = true > enable_indexscan = true > enable_mergejoin = true > enable_nestloop = true > enable_seqscan = false > > The machine is a XEON 3GHz, 1GB RAM, SATA RAID 1 Array running 8.0.4 > i686 precompiled > > > Thanks ! > > > > Mit freundlichen Grüßen > *Dipl.Inform.Marcus Noerder-Tuitje > **Entwickler > * > software technology AG > *Kortumstraße 16 * > *44787 Bochum* > *Tel: 0234 / 52 99 6 26* > *Fax: 0234 / 52 99 6 22* > *E-Mail: noerder-tuitje@technology.de * > *Internet: www.technology.de * > >
Christopher Kings-Lynne wrote: > You might find it faster to install contrib/tsearch2 for text indexing > sort of purposes... > > Nörder-Tuitje wrote: >> Here is my config: >> >> shared_buffers = 2000 # min 16, at least max_connections*2, >> 8KB each >> work_mem = 32768 # min 64, size in KB >> maintenance_work_mem = 16384 # min 1024, size in KB >> max_stack_depth = 8192 # min 100, size in KB >> >> enable_hashagg = true >> enable_hashjoin = true >> enable_indexscan = true >> enable_mergejoin = true >> enable_nestloop = true >> enable_seqscan = false >> >> The machine is a XEON 3GHz, 1GB RAM, SATA RAID 1 Array running 8.0.4 >> i686 precompiled Also, shared_buffers (server-wide) are low, compared to a high work_mem (32M for each sort operation, but this also depends on your concurrency level). And disabling sequential scans in your postgresql.conf would probabily lead to sub-optimal plans in many queries. Best regards -- Matteo Beccati http://phpadsnew.com http://phppgads.com