Re: Performance problems testing with Spamassassin 3.1.0 - Mailing list pgsql-performance

From Tom Lane
Subject Re: Performance problems testing with Spamassassin 3.1.0
Date
Msg-id 27897.1122748133@sss.pgh.pa.us
Whole thread Raw
In response to Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher <matt.s@aptalaska.net>)
Responses Re: Performance problems testing with Spamassassin 3.1.0
Re: Performance problems testing with Spamassassin 3.1.0
Re: Performance problems testing with Spamassassin 3.1.0
List pgsql-performance
Matthew Schumacher <matt.s@aptalaska.net> writes:
> One thing that is still very puzzling to me is why this runs so much
> slower when I put the data.sql in a transaction.  Obviously transactions
> are acting different when you call a proc a zillion times vs an insert
> query.

I looked into this a bit.  It seems that the problem when you wrap the
entire insertion series into one transaction is associated with the fact
that the test does so many successive updates of the single row in
bayes_vars.  (VACUUM VERBOSE at the end of the test shows it cleaning up
49383 dead versions of the one row.)  This is bad enough when it's in
separate transactions, but when it's in one transaction, none of those
dead row versions can be marked "fully dead" yet --- so for every update
of the row, the unique-key check has to visit every dead version to make
sure it's dead in the context of the current transaction.  This makes
the process O(N^2) in the number of updates per transaction.  Which is
bad enough if you just want to do one transaction per message, but it's
intolerable if you try to wrap the whole bulk-load scenario into one
transaction.

I'm not sure that we can do anything to make this a lot smarter, but
in any case, the real problem is to not do quite so many updates of
bayes_vars.

How constrained are you as to the format of the SQL generated by
SpamAssassin?  In particular, could you convert the commands generated
for a single message into a single statement?  I experimented with
passing all the tokens for a given message as a single bytea array,
as in the attached, and got almost a factor of 4 runtime reduction
on your test case.

BTW, it's possible that this is all just a startup-transient problem:
once the database has been reasonably well populated, one would expect
new tokens to be added infrequently, and so the number of updates to
bayes_vars ought to drop off.

            regards, tom lane


Revised insertion procedure:


CREATE or replace FUNCTION put_tokens (_id INTEGER,
                              _tokens BYTEA[],
                              _spam_count INTEGER,
                              _ham_count INTEGER,
                              _atime INTEGER)
RETURNS VOID AS
$$
declare _token bytea;
        new_tokens integer := 0;
BEGIN
  for i in array_lower(_tokens,1) .. array_upper(_tokens,1)
  LOOP
    _token := _tokens[i];
    UPDATE bayes_token
      SET spam_count = spam_count + _spam_count,
          ham_count  = ham_count + _ham_count,
          atime      = _atime
      WHERE id = _id
      AND token = _token;

    IF not found THEN
      INSERT INTO bayes_token VALUES (_id, _token, _spam_count,
                                      _ham_count, _atime);
      new_tokens := new_tokens + 1;
    END IF;
  END LOOP;
  if new_tokens > 0 THEN
      UPDATE bayes_vars SET token_count = token_count + new_tokens
      WHERE id = _id;
      IF NOT FOUND THEN
        RAISE EXCEPTION 'unable to update token_count in bayes_vars';
      END IF;
  END IF;
  RETURN;
END;
$$
LANGUAGE plpgsql;


Typical input:


select
put_tokens(1,'{"\\125\\42\\80\\167\\166","\\38\\153\\220\\93\\190","\\68\\7\\112\\52\\224","\\51\\14\\78\\155\\49","\\73\\245\\15\\221\\43","\\96\\179\\108\\197\\121","\\123\\97\\220\\173\\247","\\55\\132\\243\\51\\65","\\238\\36\\129\\75\\181","\\145\\253\\196\\106\\90","\\119\\0\\51\\127\\236","\\229\\35\\181\\222\\3","\\163\\1\\191\\220\\79","\\232\\97\\152\\207\\26","\\111\\146\\81\\182\\250","\\47\\141\\12\\76\\45","\\252\\97\\168\\243\\222","\\24\\157\\202\\45\\24","\\230\\207\\30\\46\\115","\\106\\45\\182\\94\\136","\\45\\66\\245\\41\\103","\\108\\126\\171\\154\\210","\\64\\90\\1\\184\\145","\\242\\78\\150\\104\\213","\\214\\\\134\\7\\179\\150","\\249\\12\\247\\164\\74","\\234\\35\\93\\118\\102","\\5\\152\\152\\219\\188","\\99\\186\\172\\56\\241","\\99\\220\\62\\240\\148","\\106\\12\\199\\33\\177","\\34\\74\\190\\192\\186","\\219\\127\\145\\132\\203","\\240\\113\\128\\160\\46","\\83\\5\\239\\206\\221","\\245\\253\\219\\83\\250","\\1\\53\\126\\56\\129","\\206\\1!

30\\97\\246\\47","\\217\\57\\185\\37\\202","\\235\\10\\74\\224\\150","\\80\\151\\70\\52\\96","\\126\\49\\156\\162\\93","\\243\\120\\218\\226\\49","\\251\\132\\118\\47\\221","\\241\\160\\120\\146\\198","\\183\\32\\161\\223\\178","\\80\\205\\77\\57\\2","\\121\\231\\13\\71\\218","\\71\\143\\184\\88\\185","\\163\\96\\119\\211\\142","\\20\\143\\90\\91\\211","\\179\\228\\212\\15\\22","\\243\\35\\149\\9\\55","\\140\\149\\99\\233\\241","\\164\\246\\101\\147\\107","\\202\\70\\218\\40\\114","\\39\\36\\186\\46\\84","\\58\\116\\44\\237\\2","\\80\\204\\185\\47\\105","\\64\\227\\29\\108\\222","\\173\\115\\56\\91\\52","\\102\\39\\157\\252\\64","\\133\\9\\89\\207\\62","\\27\\2\\230\\227\\201","\\163\\45\\123\\160\\129","\\170\\131\\168\\107\\198","\\236\\253\\0\\43\\228","\\44\\255\\93\\197\\136","\\64\\122\\42\\230\\126","\\207\\222\\104\\27\\239","\\26\\240\\78\\73\\45","\\225\\107\\181\\246\\160","\\231\\72\\243\\36\\159","\\248\\60\\14\\67\\145","\\21\\161\\247\\43\\198","\\81\\243\\19!
 1\\168\\18","\\237\\227\\23\\40\\140","\\60\\90\\96\\168\\201","\\211\

\107\\181\\46\\38","\\178\\129\\212\\16\\254","\\85\\177\\246\\29\\221","\\182\\123\\178\\157\\9","\\154\\159\\180\\116\\89","\\80\\136\\196\\242\\161","\\185\\110\\90\\163\\157","\\163\\191\\229\\13\\42","\\11\\119\\205\\160\\223","\\75\\216\\70\\223\\6","\\130\\48\\154\\145\\51","\\62\\104\\212\\72\\3","\\247\\105\\51\\64\\136","\\17\\96\\45\\40\\77","\\52\\1\\252\\53\\121","\\68\\195\\58\\103\\91","\\135\\131\\100\\4\\0","\\131\\129\\44\\193\\194","\\47\\234\\101\\143\\26","\\206\\\\134\\32\\154\\0","\\17\\41\\177\\34\\178","\\145\\127\\114\\231\\216","\\19\\172\\6\\39\\126","\\237\\233\\121\\43\\119","\\201\\167\\167\\67\\233","\\88\\159\\102\\50\\117","\\100\\133\\107\\190\\133","\\169\\146\\178\\120\\106"}',1,0,1088628232);
select
put_tokens(1,'{"\\196\\75\\30\\153\\73","\\73\\245\\15\\221\\43","\\14\\7\\116\\254\\162","\\244\\161\\139\\59\\16","\\214\\226\\238\\196\\30","\\209\\14\\131\\231\\30","\\41\\\\134\\176\\195\\166","\\70\\206\\48\\38\\33","\\247\\131\\136\\80\\31","\\4\\85\\5\\167\\214","\\246\\106\\225\\106\\242","\\28\\0\\229\\160\\90","\\127\\209\\58\\120\\83","\\12\\52\\52\\147\\95","\\255\\115\\21\\5\\68","\\244\\152\\121\\76\\20","\\19\\128\\183\\248\\181","\\140\\91\\18\\127\\208","\\93\\9\\62\\196\\247","\\248\\200\\31\\207\\108","\\44\\216\\247\\15\\195","\\59\\189\\9\\237\\142","\\1\\14\\10\\221\\68","\\163\\155\\122\\223\\104","\\97\\5\\105\\55\\137","\\184\\211\\162\\23\\247","\\239\\249\\83\\68\\54","\\67\\207\\180\\186\\234","\\99\\78\\237\\211\\180","\\200\\11\\32\\179\\50","\\95\\105\\18\\60\\253","\\207\\102\\227\\94\\84","\\71\\143\\184\\88\\185","\\13\\181\\75\\24\\192","\\188\\241\\141\\99\\242","\\139\\124\\248\\130\\4","\\25\\110\\149\\63\\114","\\21\\162\\199\\1!

29\\199","\\164\\246\\101\\147\\107","\\198\\202\\223\\58\\197","\\181\\10\\41\\25\\130","\\71\\163\\116\\239\\170","\\46\\170\\238\\142\\89","\\176\\120\\106\\103\\228","\\39\\228\\25\\38\\170","\\114\\79\\121\\18\\222","\\178\\105\\98\\61\\39","\\90\\61\\12\\23\\135","\\176\\118\\81\\65\\66","\\55\\104\\57\\198\\150","\\206\\251\\224\\128\\41","\\29\\158\\68\\146\\164","\\248\\60\\14\\67\\145","\\210\\220\\161\\10\\254","\\72\\81\\151\\213\\68","\\25\\236\\210\\197\\128","\\72\\37\\208\\227\\54","\\242\\24\\6\\88\\26","\\128\\197\\20\\5\\211","\\98\\105\\71\\42\\180","\\91\\43\\72\\84\\104","\\205\\254\\174\\65\\141","\\222\\194\\126\\204\\164","\\233\\153\\37\\148\\226","\\32\\195\\22\\153\\87","\\194\\97\\220\\251\\18","\\151\\201\\148\\52\\147","\\205\\55\\0\\226\\58","\\172\\12\\50\\0\\140","\\56\\32\\43\\9\\45","\\18\\174\\50\\162\\126","\\138\\150\\12\\72\\189","\\49\\230\\150\\210\\48","\\2\\140\\64\\104\\32","\\14\\174\\41\\196\\121","\\100\\195\\116\\130\\101","\!
 \222\\45\\94\\39\\64","\\178\\203\\221\\63\\94","\\26\\188\\157\\\\134

\\52","\\119\\0\\51\\127\\236","\\88\\32\\224\\142\\164","\\111\\146\\81\\182\\250","\\12\\177\\151\\83\\13","\\113\\27\\173\\162\\19","\\158\\216\\41\\236\\226","\\16\\88\\\\134\\180\\112","\\43\\32\\16\\77\\238","\\136\\93\\210\\172\\63","\\251\\214\\30\\40\\146","\\156\\27\\198\\60\\170","\\185\\29\\172\\30\\68","\\202\\83\\59\\228\\252","\\219\\127\\145\\132\\203","\\1\\223\\97\\229\\127","\\113\\83\\123\\167\\140","\\99\\1\\116\\56\\165","\\143\\224\\239\\1\\173","\\49\\186\\156\\51\\92","\\246\\224\\70\\245\\137","\\235\\10\\74\\224\\150","\\43\\88\\245\\14\\103","\\88\\128\\232\\142\\254","\\251\\132\\118\\47\\221","\\36\\7\\142\\234\\98","\\130\\126\\199\\170\\126","\\133\\23\\51\\253\\234","\\249\\89\\242\\87\\86","\\102\\243\\47\\193\\211","\\140\\18\\140\\164\\248","\\179\\228\\212\\15\\22","\\168\\155\\243\\169\\191","\\117\\37\\139\\241\\230","\\155\\11\\254\\171\\200","\\196\\159\\253\\223\\15","\\93\\207\\154\\106\\135","\\11\\255\\28\\123\\125","\\239\\9\\226!

\\59\\198","\\191\\204\\230\\61\\39","\\175\\204\\181\\113\\\\134","\\64\\227\\29\\108\\222","\\169\\173\\194\\83\\40","\\212\\93\\170\\169\\12","\\249\\55\\232\\182\\208","\\75\\175\\181\\248\\246","\\108\\95\\114\\215\\138","\\220\\37\\59\\207\\197","\\45\\146\\43\\76\\81","\\166\\231\\20\\9\\189","\\27\\126\\81\\92\\75","\\66\\168\\119\\100\\196","\\229\\9\\196\\165\\250","\\83\\186\\103\\184\\46","\\85\\177\\246\\29\\221","\\140\\159\\53\\211\\157","\\214\\193\\192\\217\\109","\\10\\5\\64\\97\\157","\\92\\137\\120\\70\\55","\\235\\45\\181\\44\\98","\\150\\56\\132\\207\\19","\\67\\95\\161\\39\\122","\\109\\65\\145\\170\\79","\\\\134\\28\\90\\39\\33","\\226\\177\\240\\202\\157","\\1\\57\\50\\6\\240","\\249\\240\\222\\56\\161","\\110\\136\\88\\85\\249","\\82\\27\\239\\51\\211","\\114\\223\\252\\83\\189","\\129\\216\\251\\218\\80","\\247\\36\\101\\90\\229","\\209\\73\\221\\46\\11","\\242\\12\\120\\117\\\\134","\\146\\198\\57\\177\\49","\\212\\57\\9\\240\\216","\\215\\151\\2!
 16\\59\\75","\\47\\132\\161\\165\\54","\\113\\4\\77\\241\\150","\\217\

\184\\149\\53\\124","\\152\\111\\25\\231\\104","\\42\\185\\112\\250\\156","\\39\\131\\14\\140\\189","\\148\\169\\158\\251\\150","\\184\\142\\204\\122\\179","\\19\\189\\181\\105\\116","\\116\\77\\22\\135\\50","\\236\\231\\60\\132\\229","\\200\\63\\76\\232\\9","\\32\\20\\168\\87\\45","\\99\\129\\99\\165\\29","\\2\\208\\66\\228\\105","\\99\\194\\194\\229\\17","\\85\\250\\55\\51\\114","\\200\\165\\249\\77\\72","\\5\\91\\178\\157\\24","\\245\\253\\219\\83\\250","\\166\\103\\181\\196\\34","\\227\\149\\148\\105\\157","\\95\\44\\15\\251\\98","\\183\\32\\161\\223\\178","\\120\\236\\145\\158\\78","\\244\\4\\92\\233\\112","\\189\\231\\124\\92\\19","\\112\\132\\8\\49\\157","\\160\\243\\244\\94\\104","\\150\\176\\139\\251\\157","\\176\\193\\155\\175\\144","\\161\\208\\145\\92\\92","\\77\\122\\94\\69\\182","\\77\\13\\131\\29\\27","\\92\\9\\178\\204\\254","\\177\\4\\154\\211\\63","\\62\\4\\242\\1\\78","\\4\\129\\113\\205\\164","\\168\\95\\68\\89\\38","\\173\\115\\56\\91\\52","\\212\\161\\1!

59\\148\\179","\\133\\9\\89\\207\\62","\\242\\51\\168\\130\\86","\\154\\199\\208\\84\\2","\\160\\215\\250\\104\\22","\\45\\252\\143\\149\\204","\\48\\50\\91\\39\\243","\\94\\168\\48\\202\\122","\\238\\38\\180\\135\\142","\\234\\59\\24\\148\\2","\\237\\227\\23\\40\\140","\\7\\114\\176\\80\\123","\\204\\170\\0\\60\\65","\\217\\202\\249\\158\\182","\\82\\170\\45\\96\\86","\\118\\11\\123\\51\\216","\\192\\130\\153\\88\\59","\\219\\53\\146\\88\\198","\\203\\114\\182\\147\\145","\\158\\140\\239\\104\\247","\\179\\86\\111\\146\\65","\\192\\168\\51\\125\\183","\\8\\251\\77\\143\\231","\\237\\229\\173\\221\\29","\\69\\178\\247\\196\\175","\\114\\33\\237\\189\\119","\\220\\44\\144\\93\\98","\\241\\38\\138\\127\\252","\\66\\218\\237\\199\\157","\\132\\240\\212\\221\\48","\\180\\41\\157\\84\\37","\\203\\180\\58\\113\\136","\\156\\39\\111\\181\\34","\\16\\202\\216\\183\\55","\\154\\51\\122\\201\\45","\\218\\112\\47\\206\\142","\\189\\141\\110\\230\\132","\\80\\167\\61\\103\\247","\\186\!
 \15\\121\\27\\167","\\103\\163\\217\\19\\220","\\173\\116\\86\\7\\249"

,"\\25\\37\\98\\35\\127","\\44\\92\\200\\89\\84","\\171\\129\\106\\249\\38","\\24\\147\\77\\\\134\\62","\\254\\184\\72\\159\\91","\\221\\13\\18\\153\\154","\\109\\232\\79\\169\\176","\\152\\103\\190\\50\\18","\\51\\71\\217\\22\\76","\\105\\109\\7\\77\\198","\\250\\121\\163\\49\\73","\\138\\204\\\\134\\247\\116","\\130\\38\\156\\36\\27","\\20\\83\\86\\113\\124","\\40\\63\\161\\157\\76","\\205\\99\\150\\109\\249","\\111\\174\\57\\169\\238","\\106\\169\\245\\170\\240","\\32\\10\\53\\160\\76","\\226\\0\\58\\9\\22","\\63\\83\\21\\3\\205","\\212\\141\\249\\177\\102","\\197\\226\\42\\202\\130","\\70\\40\\85\\176\\2","\\3\\16\\133\\118\\91","\\232\\48\\176\\209\\77","\\20\\149\\0\\2\\144","\\50\\87\\138\\108\\149","\\13\\78\\64\\211\\245","\\15\\158\\123\\62\\103","\\239\\68\\210\\175\\197","\\247\\216\\7\\211\\5","\\112\\100\\135\\210\\101","\\47\\26\\118\\254\\62","\\123\\7\\143\\206\\114","\\184\\43\\252\\56\\194","\\55\\16\\219\\\\134\\201","\\170\\128\\224\\160\\251","\\180\\10!

8\\182\\255\\118","\\164\\155\\151\\195\\67","\\116\\56\\163\\249\\92","\\250\\207\\75\\244\\104","\\122\\219\\25\\49\\17","\\16\\61\\66\\50\\32","\\15\\223\\166\\\\134\\188","\\16\\221\\48\\159\\124","\\163\\66\\245\\19\\190","\\52\\177\\137\\57\\104","\\137\\158\\143\\12\\73","\\175\\156\\252\\243\\165","\\18\\119\\\\134\\198\\209","\\179\\60\\37\\63\\136","\\68\\117\\75\\163\\27","\\234\\108\\150\\93\\15","\\209\\159\\154\\221\\138","\\70\\215\\50\\36\\255","\\237\\64\\91\\125\\54","\\17\\41\\177\\34\\178","\\19\\241\\29\\12\\34","\\\\134\\151\\112\\6\\214","\\63\\61\\146\\243\\60"}',1,0,1088649636);


For testing purposes, here is the Perl script I used to make the revised
input from the original test script:


#!/usr/bin/perl -w

$lastat = 0;

while (<>) {
    if (m/^select put_token\((\d+),'(.*)',(\d+),(\d+),(\d+)\);$/) {
    $d1 = $1;
    $data = $2;
    $d3 = $3;
    $d4 = $4;
    $d5 = $5;
    $data =~ s/\\/\\\\/g;
    if ($d5 == $lastat) {
        $last2 = $last2 . ',"' . $data . '"';
    } else {
        if ($lastat) {
        print "select put_tokens($last1,'{$last2}',$last3,$last4,$lastat);\n";
        }
        $last1 = $d1;
        $last2 = '"' . $data . '"';
        $last3 = $d3;
        $last4 = $d4;
        $lastat = $d5;
    }
    } else {
    print ;
    }
}

print "select put_tokens($last1,'{$last2}',$last3,$last4,$lastat);\n";


BTW, the data quoting is probably wrong here, but I suppose that can be
fixed easily.

pgsql-performance by date:

Previous
From: Karim Nassar
Date:
Subject: Re: Performance problems testing with Spamassassin
Next
From: Matthew Schumacher
Date:
Subject: Re: Performance problems testing with Spamassassin 3.1.0