Thread: Performance Question Followup No.2
After just having split the action into two parts (FTI delete + Master delete), it would appear that most of the delay does come from the triggers executing. delete from masterfti where masterfti.id = master.id and master.entrytime < '31-Oct-2001'; DELETE 172609 Came back in about 10 seconds. delete from master where entrytime < '31-Oct-2001'; is still going after 10 minutes even though there are no records in the FTI table that are being deleted. Is there an explanation for this? Anything that I should check? Any performance tweaks that would improve this situation? Sorry to go on about this, but I'm totally puzzled by this. Regards. Gordan
Gordan Bobic <gordan@bobich.net> writes: > After just having split the action into two parts (FTI delete + Master > delete), it would appear that most of the delay does come from the triggers > executing. I imagine that the problem is that the triggers have to delete the FTI records retail --- one master record's worth at a time. That's inherently far less efficient than getting rid of all of them in a single query, as your comparison case is doing. I see no easy way to get around that in the context of the existing FTI design. There is a new "tsearch" contrib module in 7.2 that might be worth your time to look at instead. I'm not sure whether it's any better on this measure, but at least it's a fresh implementation... regards, tom lane
On Wed, 7 Nov 2001, Gordan Bobic wrote: > After just having split the action into two parts (FTI delete + Master > delete), it would appear that most of the delay does come from the triggers > executing. > > delete from masterfti where masterfti.id = master.id and master.entrytime < > '31-Oct-2001'; > DELETE 172609 > > Came back in about 10 seconds. > > delete from master where entrytime < '31-Oct-2001'; > > is still going after 10 minutes even though there are no records in the FTI > table that are being deleted. > > Is there an explanation for this? Anything that I should check? Any > performance tweaks that would improve this situation? > > Sorry to go on about this, but I'm totally puzzled by this. Have you tried restarting your connection and doing the set for the sequence scan and then doing the first delete? The foreign key stuff caches a query plan the first time IIRC, so if you do the set after it's run once, it isn't going to change its plan.
On Wednesday 07 Nov 2001 17:30, Stephan Szabo wrote: > On Wed, 7 Nov 2001, Gordan Bobic wrote: > > After just having split the action into two parts (FTI delete + Master > > delete), it would appear that most of the delay does come from the > > triggers executing. > > > > delete from masterfti where masterfti.id = master.id and master.entrytime > > < '31-Oct-2001'; > > DELETE 172609 > > > > Came back in about 10 seconds. > > > > delete from master where entrytime < '31-Oct-2001'; > > > > is still going after 10 minutes even though there are no records in the > > FTI table that are being deleted. > > > > Is there an explanation for this? Anything that I should check? Any > > performance tweaks that would improve this situation? > > > > Sorry to go on about this, but I'm totally puzzled by this. > > Have you tried restarting your connection and doing the set for the > sequence scan and then doing the first delete? The foreign key > stuff caches a query plan the first time IIRC, so if you do the > set after it's run once, it isn't going to change its plan. Yes, I did, but with triggers enabled, doing bulk cascaded deletes just takes too long. I have now removed all the triggers and am handling things at the application level. It is by no means as neat, tidy and elegant as triggers, but it is worth it for a several hundred fold performance improvement. Thanks for telling me about the foreign key caching, though. I can see how that could cause interesting performance tuning situations. :-) Gordan
On Wednesday 07 Nov 2001 17:25, Tom Lane wrote: > Gordan Bobic <gordan@bobich.net> writes: Thanks for the reply. > > After just having split the action into two parts (FTI delete + Master > > delete), it would appear that most of the delay does come from the > > triggers executing. > > I imagine that the problem is that the triggers have to delete the FTI > records retail --- one master record's worth at a time. That's > inherently far less efficient than getting rid of all of them in a > single query, as your comparison case is doing. I see no easy way > to get around that in the context of the existing FTI design. Would that really explain such a HUGE difference in performance? Even without any corresponding FTI records (if they are deleted first - I tried it)? I am not talking about a few percent, or even factor 2 difference. I am talking about a difference between 10 seconds to completion and aborting after 45 minutes - on a 1 GHz machine. > There is a new "tsearch" contrib module in 7.2 that might be worth your > time to look at instead. I'm not sure whether it's any better on this > measure, but at least it's a fresh implementation... I didn't use the FTI module implementation because again, it uses triggers - this, yet again proved to be too slow. The query performance wasn't improved, though, even with properly set up indices. In order to get it to be of benefit I 1) Implemented it "in software" in the application layer. 2) Made it not insert duplicates. 3) Made it not do word-stemming/subwords. 4) Made the stop-word table separate (for ease of use - application reads this). 5) Inserted in excess of 200 stop words (finding them all wasy hard work, and it is a rather application specific thing to do) to get the Master/FTI ratio to under 35 unique words/record. Now the performance is slightly improved, although with enough memory and a fast processor, the difference isn't all that great when compared to an ILIKE search on the text fields. It's a few times faster, but I guess I was expecting more... Regards. Gordan
[comments inline...] On Wed, Nov 07, 2001 at 01:16:57PM +0000, Gordan Bobic wrote: > After just having split the action into two parts (FTI delete + Master > delete), it would appear that most of the delay does come from the triggers > executing. If I understand correctly, you experience delays when doing UPDATEs or DELETEs on the master table? (Due to triggers updating the fti table) I am still using 7.1.3 but am using contrib/fulltextindex from CVS as of a couple days ago. One thing worth noting is a change in the documentation that comes with it. There are also some functionality changes. In 7.1.3 (release) they suggest ONE index on ("string", "id") while in CVS they suggest TWO separate indexes. After doing this change on my database, I do see a big difference in the time it takes to make an UPDATE. My understanding is that it is now able to use the 2nd index (on "id") to remove the old entries in the fti table, while it previously couldn't use an index. I may be wrong, but I do see a good speedup. OTOH, I'm also using the newest fti.c (which supports more than 2 arguments, to index multiple fields into the same fti table), with the old Makefile, so I'm not sure what makes the biggest difference. My biggest consumer of fti is a master table with 10963140 records, with a corresponding fti table of ~350000000 records. (yes, 350 M records) :) My next step is probably to add an extra argument to fti() before the list of fields to include, to control whether substrings are included or not. In my application, complete words would be sufficient.. Has anybody got a patch to fti.c to disable that? My C pointers logic is rusty... > Is there an explanation for this? Anything that I should check? Any > performance tweaks that would improve this situation? In my case, doing UPDATEs one record at a time on the master table, in separate transactions, seems to work very well now that I have separate indexes on the fti table. Perhaps you could try it in a test database, do some measurements (one index spanning 2 fields vs. 2 separate indexes) and post your results here? I'm really looking forward to contrib/tsearch in 7.2. However ISTR the docs clearly saying it will only work with >= 7.2, but I think it can solve many of our problems.. I haven't tried beta2 yet, but will setup a test machine probably next week. I could probably get to a point where I can do some measurements. Anybody interested in seeing my numbers? > Sorry to go on about this, but I'm totally puzzled by this. > > Regards. > > Gordan HTH Antoine