Re: Performance problems testing with Spamassassin - Mailing list pgsql-performance

From Karim Nassar
Subject Re: Performance problems testing with Spamassassin
Date
Msg-id 1122683983.11869.150.camel@localhost.localdomain
Whole thread Raw
In response to Re: Performance problems testing with Spamassassin  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Performance problems testing with Spamassassin  (Matthew Schumacher <matt.s@aptalaska.net>)
List pgsql-performance
On Fri, 2005-07-29 at 09:47 -0700, Josh Berkus wrote:
> Try changing:
> wal_buffers = 256
>
> and try Bruce's stop full_page_writes patch.
>
> > I guess we see the real culprit here.  Anyone surprised it's the WAL?
>
> Nope.  On high-end OLTP stuff, it's crucial that the WAL have its own
> dedicated disk resource.
>
> Also, running a complex stored procedure for each and every word in each
> e-mail is rather deadly ... with the e-mail traffic our server at Globix
> receives, for example, that would amount to running it about 1,000 times a
> minute.

Is this a real-world fix? Seems to me that Spam Assassin runs on a
plethora of mail servers, and optimizing his/her/my/your pg config
doesn't solve the root problem: there are thousands of (seemingly)
high-overhead function calls being executed.


> It would be far better to batch this, somehow, maybe using temp
> tables.

Agreed. On my G4 laptop running the default configured Ubuntu Linux
postgresql 7.4.7 package, it took 43 minutes for Matthew's script to run
(I ran it twice just to be sure). In my spare time over the last day, I
created a brute force perl script that took under 6 minutes. Am I on to
something, or did I just optimize for *my* system?

http://ccl.cens.nau.edu/~kan4/files/k-bayesBenchmark.tar.gz

kan4@slap-happy:~/k-bayesBenchmark$ time ./test.pl
<-- snip db creation stuff -->
17:18:44 -- START
17:19:37 -- AFTER TEMP LOAD : loaded 120596 records
17:19:46 -- AFTER bayes_token INSERT : inserted 49359 new records into bayes_token
17:19:50 -- AFTER bayes_vars UPDATE : updated 1 records
17:23:37 -- AFTER bayes_token UPDATE : updated 47537 records
DONE

real    5m4.551s
user    0m29.442s
sys     0m3.925s


I am sure someone smarter could optimize further.

Anyone with a super-spifty machine wanna see if there is an improvement
here?

--
Karim Nassar <karim.nassar@acm.org>


pgsql-performance by date:

Previous
From: "Dario"
Date:
Subject: Re: Left joining against two empty tables makes a query
Next
From: William Yu
Date:
Subject: Re: Performance problems on 4/8way Opteron (dualcore)