Re: Performance problems testing with Spamassassin 3.1.0 - Mailing list pgsql-performance

From Gavin Sherry
Subject Re: Performance problems testing with Spamassassin 3.1.0
Date
Msg-id Pine.LNX.4.58.0507291352460.10626@linuxworld.com.au
Whole thread Raw
In response to Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher <matt.s@aptalaska.net>)
Responses Re: Performance problems testing with Spamassassin 3.1.0
List pgsql-performance
On Thu, 28 Jul 2005, Matthew Schumacher wrote:

> Karim Nassar wrote:
> > On Wed, 2005-07-27 at 14:35 -0800, Matthew Schumacher wrote:
> >
> >
> >>I put the rest of the schema up at
> >>http://www.aptalaska.net/~matt.s/bayes/bayes_pg.sql in case someone
> >>needs to see it too.
> >
> >
> > Do you have sample data too?
> >
>
> Ok, I finally got some test data together so that others can test
> without installing SA.
>
> The schema and test dataset is over at
> http://www.aptalaska.net/~matt.s/bayes/bayesBenchmark.tar.gz
>
> I have a pretty fast machine with a tuned postgres and it takes it about
> 2 minutes 30 seconds to load the test data.  Since the test data is the
> bayes information on 616 spam messages than comes out to be about 250ms
> per message.  While that is doable, it does add quite a bit of overhead
> to the email system.

I had a look at your data -- thanks.

I have a question though: put_token() is invoked 120596 times in your
benchmark... for 616 messages. That's nearly 200 queries (not even
counting the 1-8 (??) inside the function itself) per message. Something
doesn't seem right there....

Gavin

pgsql-performance by date:

Previous
From: Karim Nassar
Date:
Subject: Re: Two queries are better than one?
Next
From: "Luke Lonergan"
Date:
Subject: Re: [PATCHES] COPY FROM performance improvements