Performance question - Mailing list pgsql-general

From Gordan Bobic
Subject Performance question
Date
Msg-id 200111071235.fA7CZoM04154@sentinel.bobich.net
Whole thread Raw
List pgsql-general
Hi.

I've got two tables - a master table and an FTI table. The general structure
is as follows:

CREATE TABLE Master
(
    ID        serial,
    TextField    text,
    EntryTime    timestamp,
    PRIMARY KEY
    (
        ID
    )
);

BTREE index on EntryTime exists.

CREATE TABLE MasterFTI
(
    ID        serial    REFERENCES Master(ID) ON DELETE CASCADE,
    string        text,
    PRIMARY KEY
    (
        ID,
        String
    )
);

MasterFTI table is the full text index look-up table. The ratio of Master/FTI
records is about 1:35.

The Master table hovers around 50K records. The FTI table hovers around 1.7M
records. The records get pruned once per day, so that only records with
EntryTime that is less than a week old remain.

When I do:

DELETE
FROM    Master
WHERE    EntryTime < '01-Nov-2001';

This should delete between 5K and 10K records in the Master table, and it
should also delete the corresponding records in the FTI table (200K-400K
records).

This seems to take a VERY long time.

explain delete from Master where EntryTime < '31-Oct-2001';
NOTICE:  QUERY PLAN:

Seq Scan on master  (cost=0.00..5188.86 rows=16263 width=6)

EXPLAIN

I have tried with
SET ENABLE_SEQSCAN = OFF;
and the index on EntryTime gets used, but there doesn't seem to be a great
improvement in speed. I have moved the database from my Pentium 100MHz/128MB
RAMdevelopment server to a dual P3/1GHz 1 GB RAM, and the task still takes a
VERY long time. It has been going for nearly half an hour now (CPU time spent
by process, 99.8% CPU consumption). There is no other server load worth
mentioning, no grinding on the disk, it all seems to have been cached in the
main memory, and it is showing no signs or finishing. Doing a complete
database dump/restore to get the database to the other machine took about 10
minutes. I would have thought that a triggered cascade delete shouldn't take
longer than a dump/restore of the entire database, even when it is done on
about 10-15% of the whole database. Am I wrong here?

Is this normal? I am guessing here that the delay is caused by the triggers
that execute the delete on the FTI table, but this sort of timing still seems
excessive.

What am I doing wrong? Should I be doing a two-stage delete, one for the FTI
table, and the the one for the Master table? That sort of defeats the point
of foreign keys, references and triggers, doesn't it?

But then again, I am prepared to believe that is the problem because last
time I used the fulltextindex module, the performance actually DECREASED from
using a non-indexed ILIKE search on the text fields, due to the subword
searching and duplicate entried allowed by the FTI module. Now I am handling

Currently the only feasible explanation is that I am doing something wrong -
I just cannot see what.

Cheers.

Gordan

pgsql-general by date:

Previous
From: Peter Pilsl
Date:
Subject: lower does not handle german umlaut
Next
From: Gordan Bobic
Date:
Subject: Performance Question Followup