Re: Performance problems testing with Spamassassin 3.1.0

From: Matthew Schumacher
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Date: ,
Msg-id: 42ED017A.2050207@aptalaska.net
(view: Whole thread, Raw)
In response to: Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel)
Responses: Re: Performance problems testing with Spamassassin 3.1.0  ("Jim C. Nasby")
List: pgsql-performance

Tree view

Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Matthew Schumacher, )
 Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Josh Berkus, )
  Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
   Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Josh Berkus, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
   Re: Performance problems testing with Spamassassin 3.1.0  (Dennis Bjorklund, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Josh Berkus, )
     Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
 Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Tom Lane, )
  Re: Performance problems testing with Spamassassin  ("Luke Lonergan", )
 Re: Performance problems testing with Spamassassin 3.1.0  (Karim Nassar, )
  Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
   Re: Performance problems testing with Spamassassin 3.1.0  (Gavin Sherry, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
     Re: Performance problems testing with Spamassassin 3.1.0  (Gavin Sherry, )
   Re: Performance problems testing with Spamassassin 3.1.0  (Andrew McMillan, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
     Re: Performance problems testing with Spamassassin 3.1.0  (PFC, )
     Re: Performance problems testing with Spamassassin 3.1.0  (Andrew McMillan, )
      Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
      Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
       Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
        Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
         Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
          Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
         Re: Performance problems testing with Spamassassin 3.1.0  (Michael Parker, )
          Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
          Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
           Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
         Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
          Re: Performance problems testing with Spamassassin 3.1.0  (PFC, )
           Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
            Re: Performance problems testing with Spamassassin 3.1.0  ("Jim C. Nasby", )
             Re: Performance problems testing with Spamassassin 3.1.0  (Michael Parker, )
              Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
               Re: Performance problems testing with Spamassassin 3.1.0  (PFC, )
               Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
                Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
               Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
                Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
                 Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
                Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
                 Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
                  Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
                   Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
                    Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
        Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
         Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
          Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
           Re: Performance problems testing with Spamassassin 3.1.0  ("Jim C. Nasby", )
            Re: Performance problems testing with Spamassassin 3.1.0  (Andreas Pflug, )
 Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  ("Merlin Moncure", )

Ok, here is the current plan.

Change the spamassassin API to pass a hash of tokens into the storage
module, pass the tokens to the proc as an array, start a transaction,
load the tokens into a temp table using copy, select the tokens distinct
into the token table for new tokens, update the token table for known
tokens, then commit.

This solves the following problems:

1.  Each email is a transaction instead of each token.
2.  The update statement is only called when we really need an update
which avoids all of those searches.
3.  The looping work is done inside the proc instead of perl calling a
method a zillion times per email.

I'm not sure how vacuuming will be done yet, if we vacuum once per email
that may be too often, so I may do that every 5 mins in cron.

schu


pgsql-performance by date:

From: William Yu
Date:
Subject: Re: Performance problems on 4/8way Opteron (dualcore)
From: Michael Parker
Date:
Subject: Re: Performance problems testing with Spamassassin 3.1.0