Re: Performance problems testing with Spamassassin 3.1.0 - Mailing list pgsql-performance
| From | Matthew Schumacher |
|---|---|
| Subject | Re: Performance problems testing with Spamassassin 3.1.0 |
| Date | |
| Msg-id | 42E83C09.2040508@aptalaska.net Whole thread Raw |
| In response to | Re: Performance problems testing with Spamassassin 3.1.0 Bayes module. (Josh Berkus <josh@agliodbs.com>) |
| Responses |
Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.
Re: Performance problems testing with Spamassassin 3.1.0 |
| List | pgsql-performance |
Josh Berkus wrote:
> Matt,
>
>
>>After playing with various indexes and what not I simply am unable to
>>make this procedure perform any better. Perhaps someone on the list can
>>spot the bottleneck and reveal why this procedure isn't performing that
>>well or ways to make it better.
>
>
> Well, my first thought is that this is a pretty complicated procedure for
> something you want to peform well. Is all this logic really necessary?
> How does it get done for MySQL?
>
I'm not sure if it's all needed, in mysql they have this simple schema:
===============================================
CREATE TABLE bayes_expire (
id int(11) NOT NULL default '0',
runtime int(11) NOT NULL default '0',
KEY bayes_expire_idx1 (id)
) TYPE=MyISAM;
CREATE TABLE bayes_global_vars (
variable varchar(30) NOT NULL default '',
value varchar(200) NOT NULL default '',
PRIMARY KEY (variable)
) TYPE=MyISAM;
INSERT INTO bayes_global_vars VALUES ('VERSION','3');
CREATE TABLE bayes_seen (
id int(11) NOT NULL default '0',
msgid varchar(200) binary NOT NULL default '',
flag char(1) NOT NULL default '',
PRIMARY KEY (id,msgid)
) TYPE=MyISAM;
CREATE TABLE bayes_token (
id int(11) NOT NULL default '0',
token char(5) NOT NULL default '',
spam_count int(11) NOT NULL default '0',
ham_count int(11) NOT NULL default '0',
atime int(11) NOT NULL default '0',
PRIMARY KEY (id, token),
INDEX bayes_token_idx1 (token),
INDEX bayes_token_idx2 (id, atime)
) TYPE=MyISAM;
CREATE TABLE bayes_vars (
id int(11) NOT NULL AUTO_INCREMENT,
username varchar(200) NOT NULL default '',
spam_count int(11) NOT NULL default '0',
ham_count int(11) NOT NULL default '0',
token_count int(11) NOT NULL default '0',
last_expire int(11) NOT NULL default '0',
last_atime_delta int(11) NOT NULL default '0',
last_expire_reduce int(11) NOT NULL default '0',
oldest_token_age int(11) NOT NULL default '2147483647',
newest_token_age int(11) NOT NULL default '0',
PRIMARY KEY (id),
UNIQUE bayes_vars_idx1 (username)
) TYPE=MyISAM;
===============================================
Then they do this to insert the token:
INSERT INTO bayes_token (
id,
token,
spam_count,
ham_count,
atime
) VALUES (
?,
?,
?,
?,
?
) ON DUPLICATE KEY
UPDATE
spam_count = GREATEST(spam_count + ?, 0),
ham_count = GREATEST(ham_count + ?, 0),
atime = GREATEST(atime, ?)
Or update the token:
UPDATE bayes_vars SET
$token_count_update
newest_token_age = GREATEST(newest_token_age, ?),
oldest_token_age = LEAST(oldest_token_age, ?)
WHERE id = ?
I think the reason why the procedure was written for postgres was
because of the greatest and least statements performing poorly.
Honestly, I'm not real up on writing procs, I was hoping the problem
would be obvious to someone.
schu
pgsql-performance by date: