Re: Performance problems testing with Spamassassin 3.1.0

From: Matthew Schumacher
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Date: ,
Msg-id: 42E83C09.2040508@aptalaska.net
(view: Whole thread, Raw)
In response to: Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Josh Berkus)
Responses: Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Josh Berkus)
Re: Performance problems testing with Spamassassin 3.1.0  (Dennis Bjorklund)
List: pgsql-performance

Tree view

Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Matthew Schumacher, )
 Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Josh Berkus, )
  Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
   Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Josh Berkus, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
   Re: Performance problems testing with Spamassassin 3.1.0  (Dennis Bjorklund, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Josh Berkus, )
     Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
 Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Tom Lane, )
  Re: Performance problems testing with Spamassassin  ("Luke Lonergan", )
 Re: Performance problems testing with Spamassassin 3.1.0  (Karim Nassar, )
  Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
   Re: Performance problems testing with Spamassassin 3.1.0  (Gavin Sherry, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
     Re: Performance problems testing with Spamassassin 3.1.0  (Gavin Sherry, )
   Re: Performance problems testing with Spamassassin 3.1.0  (Andrew McMillan, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
     Re: Performance problems testing with Spamassassin 3.1.0  (PFC, )
     Re: Performance problems testing with Spamassassin 3.1.0  (Andrew McMillan, )
      Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
      Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
       Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
        Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
         Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
          Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
         Re: Performance problems testing with Spamassassin 3.1.0  (Michael Parker, )
          Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
          Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
           Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
         Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
          Re: Performance problems testing with Spamassassin 3.1.0  (PFC, )
           Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
            Re: Performance problems testing with Spamassassin 3.1.0  ("Jim C. Nasby", )
             Re: Performance problems testing with Spamassassin 3.1.0  (Michael Parker, )
              Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
               Re: Performance problems testing with Spamassassin 3.1.0  (PFC, )
               Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
                Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
               Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
                Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
                 Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
                Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
                 Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
                  Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
                   Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
                    Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
        Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
         Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
          Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
           Re: Performance problems testing with Spamassassin 3.1.0  ("Jim C. Nasby", )
            Re: Performance problems testing with Spamassassin 3.1.0  (Andreas Pflug, )
 Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  ("Merlin Moncure", )

Josh Berkus wrote:
> Matt,
>
>
>>After playing with various indexes and what not I simply am unable to
>>make this procedure perform any better.  Perhaps someone on the list can
>>spot the bottleneck and reveal why this procedure isn't performing that
>>well or ways to make it better.
>
>
> Well, my first thought is that this is a pretty complicated procedure for
> something you want to peform well.    Is all this logic really necessary?
> How does it get done for MySQL?
>

I'm not sure if it's all needed, in mysql they have this simple schema:

===============================================
CREATE TABLE bayes_expire (
  id int(11) NOT NULL default '0',
  runtime int(11) NOT NULL default '0',
  KEY bayes_expire_idx1 (id)
) TYPE=MyISAM;

CREATE TABLE bayes_global_vars (
  variable varchar(30) NOT NULL default '',
  value varchar(200) NOT NULL default '',
  PRIMARY KEY  (variable)
) TYPE=MyISAM;

INSERT INTO bayes_global_vars VALUES ('VERSION','3');

CREATE TABLE bayes_seen (
  id int(11) NOT NULL default '0',
  msgid varchar(200) binary NOT NULL default '',
  flag char(1) NOT NULL default '',
  PRIMARY KEY  (id,msgid)
) TYPE=MyISAM;

CREATE TABLE bayes_token (
  id int(11) NOT NULL default '0',
  token char(5) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  atime int(11) NOT NULL default '0',
  PRIMARY KEY  (id, token),
  INDEX bayes_token_idx1 (token),
  INDEX bayes_token_idx2 (id, atime)
) TYPE=MyISAM;

CREATE TABLE bayes_vars (
  id int(11) NOT NULL AUTO_INCREMENT,
  username varchar(200) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  token_count int(11) NOT NULL default '0',
  last_expire int(11) NOT NULL default '0',
  last_atime_delta int(11) NOT NULL default '0',
  last_expire_reduce int(11) NOT NULL default '0',
  oldest_token_age int(11) NOT NULL default '2147483647',
  newest_token_age int(11) NOT NULL default '0',
  PRIMARY KEY  (id),
  UNIQUE bayes_vars_idx1 (username)
) TYPE=MyISAM;
===============================================

Then they do this to insert the token:

INSERT INTO bayes_token (
  id,
  token,
  spam_count,
  ham_count,
  atime
) VALUES (
  ?,
  ?,
  ?,
  ?,
  ?
) ON DUPLICATE KEY
  UPDATE
    spam_count = GREATEST(spam_count + ?, 0),
    ham_count = GREATEST(ham_count + ?, 0),
    atime = GREATEST(atime, ?)

Or update the token:

UPDATE bayes_vars SET
  $token_count_update
  newest_token_age = GREATEST(newest_token_age, ?),
  oldest_token_age = LEAST(oldest_token_age, ?)
  WHERE id = ?


I think the reason why the procedure was written for postgres was
because of the greatest and least statements performing poorly.

Honestly, I'm not real up on writing procs, I was hoping the problem
would be obvious to someone.

schu


pgsql-performance by date:

From: Karim Nassar
Date:
Subject: Re: Performance problems testing with Spamassassin 3.1.0
From: Kari Lavikka
Date:
Subject: Finding bottleneck