Re: Performance problems testing with Spamassassin 3.1.0 - Mailing list pgsql-performance
From | Matthew Schumacher |
---|---|
Subject | Re: Performance problems testing with Spamassassin 3.1.0 |
Date | |
Msg-id | 42E83C09.2040508@aptalaska.net Whole thread Raw |
In response to | Re: Performance problems testing with Spamassassin 3.1.0 Bayes module. (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.
Re: Performance problems testing with Spamassassin 3.1.0 |
List | pgsql-performance |
Josh Berkus wrote: > Matt, > > >>After playing with various indexes and what not I simply am unable to >>make this procedure perform any better. Perhaps someone on the list can >>spot the bottleneck and reveal why this procedure isn't performing that >>well or ways to make it better. > > > Well, my first thought is that this is a pretty complicated procedure for > something you want to peform well. Is all this logic really necessary? > How does it get done for MySQL? > I'm not sure if it's all needed, in mysql they have this simple schema: =============================================== CREATE TABLE bayes_expire ( id int(11) NOT NULL default '0', runtime int(11) NOT NULL default '0', KEY bayes_expire_idx1 (id) ) TYPE=MyISAM; CREATE TABLE bayes_global_vars ( variable varchar(30) NOT NULL default '', value varchar(200) NOT NULL default '', PRIMARY KEY (variable) ) TYPE=MyISAM; INSERT INTO bayes_global_vars VALUES ('VERSION','3'); CREATE TABLE bayes_seen ( id int(11) NOT NULL default '0', msgid varchar(200) binary NOT NULL default '', flag char(1) NOT NULL default '', PRIMARY KEY (id,msgid) ) TYPE=MyISAM; CREATE TABLE bayes_token ( id int(11) NOT NULL default '0', token char(5) NOT NULL default '', spam_count int(11) NOT NULL default '0', ham_count int(11) NOT NULL default '0', atime int(11) NOT NULL default '0', PRIMARY KEY (id, token), INDEX bayes_token_idx1 (token), INDEX bayes_token_idx2 (id, atime) ) TYPE=MyISAM; CREATE TABLE bayes_vars ( id int(11) NOT NULL AUTO_INCREMENT, username varchar(200) NOT NULL default '', spam_count int(11) NOT NULL default '0', ham_count int(11) NOT NULL default '0', token_count int(11) NOT NULL default '0', last_expire int(11) NOT NULL default '0', last_atime_delta int(11) NOT NULL default '0', last_expire_reduce int(11) NOT NULL default '0', oldest_token_age int(11) NOT NULL default '2147483647', newest_token_age int(11) NOT NULL default '0', PRIMARY KEY (id), UNIQUE bayes_vars_idx1 (username) ) TYPE=MyISAM; =============================================== Then they do this to insert the token: INSERT INTO bayes_token ( id, token, spam_count, ham_count, atime ) VALUES ( ?, ?, ?, ?, ? ) ON DUPLICATE KEY UPDATE spam_count = GREATEST(spam_count + ?, 0), ham_count = GREATEST(ham_count + ?, 0), atime = GREATEST(atime, ?) Or update the token: UPDATE bayes_vars SET $token_count_update newest_token_age = GREATEST(newest_token_age, ?), oldest_token_age = LEAST(oldest_token_age, ?) WHERE id = ? I think the reason why the procedure was written for postgres was because of the greatest and least statements performing poorly. Honestly, I'm not real up on writing procs, I was hoping the problem would be obvious to someone. schu
pgsql-performance by date: