Thread: Performance Question Followup No.2

Performance Question Followup No.2

From
Gordan Bobic
Date:
After just having split the action into two parts (FTI delete + Master
delete), it would appear that most of the delay does come from the triggers
executing.

delete from masterfti where masterfti.id = master.id and master.entrytime <
'31-Oct-2001';
DELETE 172609

Came back in about 10 seconds.

delete from master where entrytime < '31-Oct-2001';

is still going after 10 minutes even though there are no records in the FTI
table that are being deleted.

Is there an explanation for this? Anything that I should check? Any
performance tweaks that would improve this situation?

Sorry to go on about this, but I'm totally puzzled by this.

Regards.

Gordan

Re: Performance Question Followup No.2

From
Tom Lane
Date:
Gordan Bobic <gordan@bobich.net> writes:
> After just having split the action into two parts (FTI delete + Master
> delete), it would appear that most of the delay does come from the triggers
> executing.

I imagine that the problem is that the triggers have to delete the FTI
records retail --- one master record's worth at a time.  That's
inherently far less efficient than getting rid of all of them in a
single query, as your comparison case is doing.  I see no easy way
to get around that in the context of the existing FTI design.

There is a new "tsearch" contrib module in 7.2 that might be worth your
time to look at instead.  I'm not sure whether it's any better on this
measure, but at least it's a fresh implementation...

            regards, tom lane

Re: Performance Question Followup No.2

From
Stephan Szabo
Date:
On Wed, 7 Nov 2001, Gordan Bobic wrote:

> After just having split the action into two parts (FTI delete + Master
> delete), it would appear that most of the delay does come from the triggers
> executing.
>
> delete from masterfti where masterfti.id = master.id and master.entrytime <
> '31-Oct-2001';
> DELETE 172609
>
> Came back in about 10 seconds.
>
> delete from master where entrytime < '31-Oct-2001';
>
> is still going after 10 minutes even though there are no records in the FTI
> table that are being deleted.
>
> Is there an explanation for this? Anything that I should check? Any
> performance tweaks that would improve this situation?
>
> Sorry to go on about this, but I'm totally puzzled by this.

Have you tried restarting your connection and doing the set for the
sequence scan and then doing the first delete?  The foreign key
stuff caches a query plan the first time IIRC, so if you do the
set after it's run once, it isn't going to change its plan.




Re: Performance Question Followup No.2

From
Gordan Bobic
Date:
On Wednesday 07 Nov 2001 17:30, Stephan Szabo wrote:
> On Wed, 7 Nov 2001, Gordan Bobic wrote:
> > After just having split the action into two parts (FTI delete + Master
> > delete), it would appear that most of the delay does come from the
> > triggers executing.
> >
> > delete from masterfti where masterfti.id = master.id and master.entrytime
> > < '31-Oct-2001';
> > DELETE 172609
> >
> > Came back in about 10 seconds.
> >
> > delete from master where entrytime < '31-Oct-2001';
> >
> > is still going after 10 minutes even though there are no records in the
> > FTI table that are being deleted.
> >
> > Is there an explanation for this? Anything that I should check? Any
> > performance tweaks that would improve this situation?
> >
> > Sorry to go on about this, but I'm totally puzzled by this.
>
> Have you tried restarting your connection and doing the set for the
> sequence scan and then doing the first delete?  The foreign key
> stuff caches a query plan the first time IIRC, so if you do the
> set after it's run once, it isn't going to change its plan.

Yes, I did, but with triggers enabled, doing bulk cascaded deletes just takes
too long. I have now removed all the triggers and am handling things at the
application level. It is by no means as neat, tidy and elegant as triggers,
but it is worth it for a several hundred fold performance improvement.

Thanks for telling me about the foreign key caching, though. I can see how
that could cause interesting performance tuning situations. :-)

Gordan

Re: Performance Question Followup No.2

From
Gordan Bobic
Date:
On Wednesday 07 Nov 2001 17:25, Tom Lane wrote:
> Gordan Bobic <gordan@bobich.net> writes:

Thanks for the reply.

> > After just having split the action into two parts (FTI delete + Master
> > delete), it would appear that most of the delay does come from the
> > triggers executing.
>
> I imagine that the problem is that the triggers have to delete the FTI
> records retail --- one master record's worth at a time.  That's
> inherently far less efficient than getting rid of all of them in a
> single query, as your comparison case is doing.  I see no easy way
> to get around that in the context of the existing FTI design.

Would that really explain such a HUGE difference in performance? Even without
any corresponding FTI records (if they are deleted first - I tried it)?

I am not talking about a few percent, or even factor 2 difference. I am
talking about a difference between 10 seconds to completion and aborting
after 45 minutes - on a 1 GHz machine.

> There is a new "tsearch" contrib module in 7.2 that might be worth your
> time to look at instead.  I'm not sure whether it's any better on this
> measure, but at least it's a fresh implementation...

I didn't use the FTI module implementation because again, it uses triggers -
this, yet again proved to be too slow. The query performance wasn't improved,
though, even with properly set up indices. In order to get it to be of
benefit I
1) Implemented it "in software" in the application layer.
2) Made it not insert duplicates.
3) Made it not do word-stemming/subwords.
4) Made the stop-word table separate (for ease of use - application reads
this).
5) Inserted in excess of 200 stop words (finding them all wasy hard work, and
it is a rather application specific thing to do) to get the Master/FTI ratio
to under 35 unique words/record.

Now the performance is slightly improved, although with enough memory and a
fast processor, the difference isn't all that great when compared to an ILIKE
search on the text fields. It's a few times faster, but I guess I was
expecting more...

Regards.

Gordan

Re: Performance Question Followup No.2

From
Antoine Reid
Date:
[comments inline...]
On Wed, Nov 07, 2001 at 01:16:57PM +0000, Gordan Bobic wrote:
> After just having split the action into two parts (FTI delete + Master
> delete), it would appear that most of the delay does come from the triggers
> executing.

If I understand correctly, you experience delays when doing UPDATEs or
DELETEs on the master table? (Due to triggers updating the fti table)

I am still using 7.1.3 but am using contrib/fulltextindex from CVS as of a
couple days ago.  One thing worth noting is a change in the documentation that
comes with it. There are also some functionality changes.

In 7.1.3 (release) they suggest ONE index on ("string", "id") while in CVS
they suggest TWO separate indexes.  After doing this change on my database,
I do see a big difference in the time it takes to make an UPDATE.
My understanding is that it is now able to use the 2nd index (on "id") to
remove the old entries in the fti table, while it previously couldn't use
an index. I may be wrong, but I do see a good speedup.

OTOH, I'm also using the newest fti.c (which supports more than 2 arguments,
to index multiple fields into the same fti table), with the old Makefile,
so I'm not sure what makes the biggest difference.

My biggest consumer of fti is a master table with 10963140 records, with a
corresponding fti table of ~350000000 records. (yes, 350 M records) :)

My next step is probably to add an extra argument to fti() before the
list of fields to include, to control whether substrings are included or not.
In my application, complete words would be sufficient..

Has anybody got a patch to fti.c to disable that? My C pointers logic is
rusty...

> Is there an explanation for this? Anything that I should check? Any
> performance tweaks that would improve this situation?

In my case, doing UPDATEs one record at a time on the master table, in
separate transactions, seems to work very well now that I have separate
indexes on the fti table. Perhaps you could try it in a test database,
do some measurements (one index spanning 2 fields vs. 2 separate indexes)
and post your results here?

I'm really looking forward to contrib/tsearch in 7.2. However ISTR the docs
clearly saying it will only work with >= 7.2, but I think it can solve many
of our problems.. I haven't tried beta2 yet, but will setup a test machine
probably next week. I could probably get to a point where I can do some
measurements. Anybody interested in seeing my numbers?

> Sorry to go on about this, but I'm totally puzzled by this.
>
> Regards.
>
> Gordan

HTH
Antoine