On Thu, 28 Jul 2005, Matthew Schumacher wrote:
> Karim Nassar wrote:
> > On Wed, 2005-07-27 at 14:35 -0800, Matthew Schumacher wrote:
> >
> >
> >>I put the rest of the schema up at
> >>http://www.aptalaska.net/~matt.s/bayes/bayes_pg.sql in case someone
> >>needs to see it too.
> >
> >
> > Do you have sample data too?
> >
>
> Ok, I finally got some test data together so that others can test
> without installing SA.
>
> The schema and test dataset is over at
> http://www.aptalaska.net/~matt.s/bayes/bayesBenchmark.tar.gz
>
> I have a pretty fast machine with a tuned postgres and it takes it about
> 2 minutes 30 seconds to load the test data. Since the test data is the
> bayes information on 616 spam messages than comes out to be about 250ms
> per message. While that is doable, it does add quite a bit of overhead
> to the email system.
I had a look at your data -- thanks.
I have a question though: put_token() is invoked 120596 times in your
benchmark... for 616 messages. That's nearly 200 queries (not even
counting the 1-8 (??) inside the function itself) per message. Something
doesn't seem right there....
Gavin