Thread: Re: Performance problems testing with Spamassassin
work_mem = 131072 # min 64, size in KB shared_buffers = 16000 # min 16, at least max_connections*2, 8KB each checkpoint_segments = 128 # in logfile segments, min 1, 16MB each effective_cache_size = 750000 # typically 8KB each fsync=false # turns forced synchronization on or off ------------------------------------------ On Bizgres (0_7_2) running on a 2GHz Opteron: ------------------------------------------ [llonergan@stinger4 bayesBenchmark]$ ./test.sh real 0m38.348s user 0m1.422s sys 0m1.870s ------------------------------------------ On a 2.4GHz AMD64: ------------------------------------------ [llonergan@kite15 bayesBenchmark]$ ./test.sh real 0m35.497s user 0m2.250s sys 0m0.470s Now we turn fsync=true: ------------------------------------------ On a 2.4GHz AMD64: ------------------------------------------ [llonergan@kite15 bayesBenchmark]$ ./test.sh real 2m7.368s user 0m2.560s sys 0m0.750s I guess we see the real culprit here. Anyone surprised it's the WAL? - Luke ________________________________ From: pgsql-performance-owner@postgresql.org on behalf of Andrew McMillan Sent: Thu 7/28/2005 10:50 PM To: Matthew Schumacher Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0 On Thu, 2005-07-28 at 16:13 -0800, Matthew Schumacher wrote: > > Ok, I finally got some test data together so that others can test > without installing SA. > > The schema and test dataset is over at > http://www.aptalaska.net/~matt.s/bayes/bayesBenchmark.tar.gz > > I have a pretty fast machine with a tuned postgres and it takes it about > 2 minutes 30 seconds to load the test data. Since the test data is the > bayes information on 616 spam messages than comes out to be about 250ms > per message. While that is doable, it does add quite a bit of overhead > to the email system. On my laptop this takes: real 1m33.758s user 0m4.285s sys 0m1.181s One interesting effect is the data in bayes_vars has a huge number of updates and needs vacuum _frequently_. After the run a vacuum full compacts it down from 461 pages to 1 page. Regards, Andrew. ------------------------------------------------------------------------- Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 I don't do it for the money. -- Donald Trump, Art of the Deal -------------------------------------------------------------------------
On Fri, Jul 29, 2005 at 03:01:07AM -0400, Luke Lonergan wrote: > I guess we see the real culprit here. Anyone surprised it's the WAL? So what? Are you planning to suggest people to turn fsync=false? I just had a person lose 3 days of data on some tables because of that, even when checkpoints were 5 minutes apart. With fsync off, there's no work _at all_ going on, not just the WAL -- heap/index file fsync at checkpoint is also skipped. This is no good. -- Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) "In a specialized industrial society, it would be a disaster to have kids running around loose." (Paul Graham)
"Luke Lonergan" <LLonergan@greenplum.com> writes: > I guess we see the real culprit here. Anyone surprised it's the WAL? You have not proved that at all. I haven't had time to look at Matthew's problem, but someone upthread implied that it was doing a separate transaction for each word. If so, collapsing that to something more reasonable (say one xact per message) would probably help a great deal. regards, tom lane
Luke, > work_mem = 131072 # min 64, size in KB Incidentally, this is much too high for an OLTP application, although I don't think this would have affected the test. > shared_buffers = 16000 # min 16, at least max_connections*2, 8KB > each checkpoint_segments = 128 # in logfile segments, min 1, 16MB > each effective_cache_size = 750000 # typically 8KB each > fsync=false # turns forced synchronization on or off Try changing: wal_buffers = 256 and try Bruce's stop full_page_writes patch. > I guess we see the real culprit here. Anyone surprised it's the WAL? Nope. On high-end OLTP stuff, it's crucial that the WAL have its own dedicated disk resource. Also, running a complex stored procedure for each and every word in each e-mail is rather deadly ... with the e-mail traffic our server at Globix receives, for example, that would amount to running it about 1,000 times a minute. It would be far better to batch this, somehow, maybe using temp tables. -- Josh Berkus Aglio Database Solutions San Francisco
Alvaro, On 7/29/05 6:23 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote: > On Fri, Jul 29, 2005 at 03:01:07AM -0400, Luke Lonergan wrote: > >> I guess we see the real culprit here. Anyone surprised it's the WAL? > > So what? Are you planning to suggest people to turn fsync=false? That's not the conclusion I made, no. I was pointing out that fsync has a HUGE impact on his problem, which implies something to do with the I/O sync operations. Black box bottleneck hunt approach #12. > With fsync off, there's no > work _at all_ going on, not just the WAL -- heap/index file fsync at > checkpoint is also skipped. This is no good. OK - so that's what Tom is pointing out, that fsync impacts more than WAL. However, finding out that fsync/no fsync makes a 400% difference in speed for this problem is interesting and relevant, no? - Luke
Tom, On 7/29/05 7:12 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > "Luke Lonergan" <LLonergan@greenplum.com> writes: >> I guess we see the real culprit here. Anyone surprised it's the WAL? > > You have not proved that at all. As Alvaro pointed out, fsync has impact on more than WAL, so good point. Interesting that fsync has such a huge impact on this situation though. - Luke
Ok, Here is something new, when I take my data.sql file and add a begin and commit at the top and bottom, the benchmark is a LOT slower? My understanding is that it should be much faster because fsync isn't called until the commit instead of on every sql command. I must be missing something here. schu
On Fri, 2005-07-29 at 09:47 -0700, Josh Berkus wrote: > Try changing: > wal_buffers = 256 > > and try Bruce's stop full_page_writes patch. > > > I guess we see the real culprit here. Anyone surprised it's the WAL? > > Nope. On high-end OLTP stuff, it's crucial that the WAL have its own > dedicated disk resource. > > Also, running a complex stored procedure for each and every word in each > e-mail is rather deadly ... with the e-mail traffic our server at Globix > receives, for example, that would amount to running it about 1,000 times a > minute. Is this a real-world fix? Seems to me that Spam Assassin runs on a plethora of mail servers, and optimizing his/her/my/your pg config doesn't solve the root problem: there are thousands of (seemingly) high-overhead function calls being executed. > It would be far better to batch this, somehow, maybe using temp > tables. Agreed. On my G4 laptop running the default configured Ubuntu Linux postgresql 7.4.7 package, it took 43 minutes for Matthew's script to run (I ran it twice just to be sure). In my spare time over the last day, I created a brute force perl script that took under 6 minutes. Am I on to something, or did I just optimize for *my* system? http://ccl.cens.nau.edu/~kan4/files/k-bayesBenchmark.tar.gz kan4@slap-happy:~/k-bayesBenchmark$ time ./test.pl <-- snip db creation stuff --> 17:18:44 -- START 17:19:37 -- AFTER TEMP LOAD : loaded 120596 records 17:19:46 -- AFTER bayes_token INSERT : inserted 49359 new records into bayes_token 17:19:50 -- AFTER bayes_vars UPDATE : updated 1 records 17:23:37 -- AFTER bayes_token UPDATE : updated 47537 records DONE real 5m4.551s user 0m29.442s sys 0m3.925s I am sure someone smarter could optimize further. Anyone with a super-spifty machine wanna see if there is an improvement here? -- Karim Nassar <karim.nassar@acm.org>
Karim Nassar wrote: > > kan4@slap-happy:~/k-bayesBenchmark$ time ./test.pl > <-- snip db creation stuff --> > 17:18:44 -- START > 17:19:37 -- AFTER TEMP LOAD : loaded 120596 records > 17:19:46 -- AFTER bayes_token INSERT : inserted 49359 new records into bayes_token > 17:19:50 -- AFTER bayes_vars UPDATE : updated 1 records > 17:23:37 -- AFTER bayes_token UPDATE : updated 47537 records > DONE > > real 5m4.551s > user 0m29.442s > sys 0m3.925s > > > I am sure someone smarter could optimize further. > > Anyone with a super-spifty machine wanna see if there is an improvement > here? > There is a great improvement in loading the data. While I didn't load it on my server, my test box shows significant gains. It seems that the only thing your script does different is separate the updates from inserts so that an expensive update isn't called when we want to insert. The other major difference is the 'IN' and 'MOT IN' syntax which looks to be much faster than trying everything as an update before inserting. While these optimizations seem to make a huge difference in loading the token data, the real life scenario is a little different. You see, the database keeps track of the number of times each token was found in ham or spam, so that when we see a new message we can parse it into tokens then compare with the database to see how likely the messages is spam based on the statistics of tokens we have already learned on. Since we would want to commit this data after each message, the number of tokens processed at one time would probably only be a few hundred, most of which are probably updates after we have trained on a few thousand emails. I apologize if my crude benchmark was misleading, it was meant to simulate the sheer number of inserts/updates the database may go though in an environment that didn't require people to load spamassassin and start training on spam. I'll do some more testing on Monday, perhaps grouping even 200 tokens at a time using your method will yield significant gains, but probably not as dramatic as it does using my loading benchmark. I post more when I have a chance to look at this in more depth. Thanks, schu
On Sat, 2005-07-30 at 00:46 -0800, Matthew Schumacher wrote: > I'll do some more testing on Monday, perhaps grouping even 200 tokens at > a time using your method will yield significant gains, but probably not > as dramatic as it does using my loading benchmark. In that case, some of the clauses could be simplified further since we know that we are dealing with only one user. I don't know what that will get us, since postgres is so damn clever. I suspect that the aggregate functions will be more efficient when you do this, since the temp table will be much smaller, but I am only guessing at this point. If you need to support a massive initial data load, further time savings are to be had by doing COPY instead of 126,000 inserts. Please do keep us updated. Thanking all the gods and/or developers for spamassassin, -- Karim Nassar <karim.nassar@acm.org>