Re: Performance problems testing with Spamassassin 3.1.0 - Mailing list pgsql-performance
From | Tom Lane |
---|---|
Subject | Re: Performance problems testing with Spamassassin 3.1.0 |
Date | |
Msg-id | 27897.1122748133@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Performance problems testing with Spamassassin 3.1.0 (Matthew Schumacher <matt.s@aptalaska.net>) |
Responses |
Re: Performance problems testing with Spamassassin 3.1.0
Re: Performance problems testing with Spamassassin 3.1.0 Re: Performance problems testing with Spamassassin 3.1.0 |
List | pgsql-performance |
Matthew Schumacher <matt.s@aptalaska.net> writes: > One thing that is still very puzzling to me is why this runs so much > slower when I put the data.sql in a transaction. Obviously transactions > are acting different when you call a proc a zillion times vs an insert > query. I looked into this a bit. It seems that the problem when you wrap the entire insertion series into one transaction is associated with the fact that the test does so many successive updates of the single row in bayes_vars. (VACUUM VERBOSE at the end of the test shows it cleaning up 49383 dead versions of the one row.) This is bad enough when it's in separate transactions, but when it's in one transaction, none of those dead row versions can be marked "fully dead" yet --- so for every update of the row, the unique-key check has to visit every dead version to make sure it's dead in the context of the current transaction. This makes the process O(N^2) in the number of updates per transaction. Which is bad enough if you just want to do one transaction per message, but it's intolerable if you try to wrap the whole bulk-load scenario into one transaction. I'm not sure that we can do anything to make this a lot smarter, but in any case, the real problem is to not do quite so many updates of bayes_vars. How constrained are you as to the format of the SQL generated by SpamAssassin? In particular, could you convert the commands generated for a single message into a single statement? I experimented with passing all the tokens for a given message as a single bytea array, as in the attached, and got almost a factor of 4 runtime reduction on your test case. BTW, it's possible that this is all just a startup-transient problem: once the database has been reasonably well populated, one would expect new tokens to be added infrequently, and so the number of updates to bayes_vars ought to drop off. regards, tom lane Revised insertion procedure: CREATE or replace FUNCTION put_tokens (_id INTEGER, _tokens BYTEA[], _spam_count INTEGER, _ham_count INTEGER, _atime INTEGER) RETURNS VOID AS $$ declare _token bytea; new_tokens integer := 0; BEGIN for i in array_lower(_tokens,1) .. array_upper(_tokens,1) LOOP _token := _tokens[i]; UPDATE bayes_token SET spam_count = spam_count + _spam_count, ham_count = ham_count + _ham_count, atime = _atime WHERE id = _id AND token = _token; IF not found THEN INSERT INTO bayes_token VALUES (_id, _token, _spam_count, _ham_count, _atime); new_tokens := new_tokens + 1; END IF; END LOOP; if new_tokens > 0 THEN UPDATE bayes_vars SET token_count = token_count + new_tokens WHERE id = _id; IF NOT FOUND THEN RAISE EXCEPTION 'unable to update token_count in bayes_vars'; END IF; END IF; RETURN; END; $$ LANGUAGE plpgsql; Typical input: select put_tokens(1,'{"\\125\\42\\80\\167\\166","\\38\\153\\220\\93\\190","\\68\\7\\112\\52\\224","\\51\\14\\78\\155\\49","\\73\\245\\15\\221\\43","\\96\\179\\108\\197\\121","\\123\\97\\220\\173\\247","\\55\\132\\243\\51\\65","\\238\\36\\129\\75\\181","\\145\\253\\196\\106\\90","\\119\\0\\51\\127\\236","\\229\\35\\181\\222\\3","\\163\\1\\191\\220\\79","\\232\\97\\152\\207\\26","\\111\\146\\81\\182\\250","\\47\\141\\12\\76\\45","\\252\\97\\168\\243\\222","\\24\\157\\202\\45\\24","\\230\\207\\30\\46\\115","\\106\\45\\182\\94\\136","\\45\\66\\245\\41\\103","\\108\\126\\171\\154\\210","\\64\\90\\1\\184\\145","\\242\\78\\150\\104\\213","\\214\\\\134\\7\\179\\150","\\249\\12\\247\\164\\74","\\234\\35\\93\\118\\102","\\5\\152\\152\\219\\188","\\99\\186\\172\\56\\241","\\99\\220\\62\\240\\148","\\106\\12\\199\\33\\177","\\34\\74\\190\\192\\186","\\219\\127\\145\\132\\203","\\240\\113\\128\\160\\46","\\83\\5\\239\\206\\221","\\245\\253\\219\\83\\250","\\1\\53\\126\\56\\129","\\206\\1! 30\\97\\246\\47","\\217\\57\\185\\37\\202","\\235\\10\\74\\224\\150","\\80\\151\\70\\52\\96","\\126\\49\\156\\162\\93","\\243\\120\\218\\226\\49","\\251\\132\\118\\47\\221","\\241\\160\\120\\146\\198","\\183\\32\\161\\223\\178","\\80\\205\\77\\57\\2","\\121\\231\\13\\71\\218","\\71\\143\\184\\88\\185","\\163\\96\\119\\211\\142","\\20\\143\\90\\91\\211","\\179\\228\\212\\15\\22","\\243\\35\\149\\9\\55","\\140\\149\\99\\233\\241","\\164\\246\\101\\147\\107","\\202\\70\\218\\40\\114","\\39\\36\\186\\46\\84","\\58\\116\\44\\237\\2","\\80\\204\\185\\47\\105","\\64\\227\\29\\108\\222","\\173\\115\\56\\91\\52","\\102\\39\\157\\252\\64","\\133\\9\\89\\207\\62","\\27\\2\\230\\227\\201","\\163\\45\\123\\160\\129","\\170\\131\\168\\107\\198","\\236\\253\\0\\43\\228","\\44\\255\\93\\197\\136","\\64\\122\\42\\230\\126","\\207\\222\\104\\27\\239","\\26\\240\\78\\73\\45","\\225\\107\\181\\246\\160","\\231\\72\\243\\36\\159","\\248\\60\\14\\67\\145","\\21\\161\\247\\43\\198","\\81\\243\\19! 1\\168\\18","\\237\\227\\23\\40\\140","\\60\\90\\96\\168\\201","\\211\ \107\\181\\46\\38","\\178\\129\\212\\16\\254","\\85\\177\\246\\29\\221","\\182\\123\\178\\157\\9","\\154\\159\\180\\116\\89","\\80\\136\\196\\242\\161","\\185\\110\\90\\163\\157","\\163\\191\\229\\13\\42","\\11\\119\\205\\160\\223","\\75\\216\\70\\223\\6","\\130\\48\\154\\145\\51","\\62\\104\\212\\72\\3","\\247\\105\\51\\64\\136","\\17\\96\\45\\40\\77","\\52\\1\\252\\53\\121","\\68\\195\\58\\103\\91","\\135\\131\\100\\4\\0","\\131\\129\\44\\193\\194","\\47\\234\\101\\143\\26","\\206\\\\134\\32\\154\\0","\\17\\41\\177\\34\\178","\\145\\127\\114\\231\\216","\\19\\172\\6\\39\\126","\\237\\233\\121\\43\\119","\\201\\167\\167\\67\\233","\\88\\159\\102\\50\\117","\\100\\133\\107\\190\\133","\\169\\146\\178\\120\\106"}',1,0,1088628232); select put_tokens(1,'{"\\196\\75\\30\\153\\73","\\73\\245\\15\\221\\43","\\14\\7\\116\\254\\162","\\244\\161\\139\\59\\16","\\214\\226\\238\\196\\30","\\209\\14\\131\\231\\30","\\41\\\\134\\176\\195\\166","\\70\\206\\48\\38\\33","\\247\\131\\136\\80\\31","\\4\\85\\5\\167\\214","\\246\\106\\225\\106\\242","\\28\\0\\229\\160\\90","\\127\\209\\58\\120\\83","\\12\\52\\52\\147\\95","\\255\\115\\21\\5\\68","\\244\\152\\121\\76\\20","\\19\\128\\183\\248\\181","\\140\\91\\18\\127\\208","\\93\\9\\62\\196\\247","\\248\\200\\31\\207\\108","\\44\\216\\247\\15\\195","\\59\\189\\9\\237\\142","\\1\\14\\10\\221\\68","\\163\\155\\122\\223\\104","\\97\\5\\105\\55\\137","\\184\\211\\162\\23\\247","\\239\\249\\83\\68\\54","\\67\\207\\180\\186\\234","\\99\\78\\237\\211\\180","\\200\\11\\32\\179\\50","\\95\\105\\18\\60\\253","\\207\\102\\227\\94\\84","\\71\\143\\184\\88\\185","\\13\\181\\75\\24\\192","\\188\\241\\141\\99\\242","\\139\\124\\248\\130\\4","\\25\\110\\149\\63\\114","\\21\\162\\199\\1! 29\\199","\\164\\246\\101\\147\\107","\\198\\202\\223\\58\\197","\\181\\10\\41\\25\\130","\\71\\163\\116\\239\\170","\\46\\170\\238\\142\\89","\\176\\120\\106\\103\\228","\\39\\228\\25\\38\\170","\\114\\79\\121\\18\\222","\\178\\105\\98\\61\\39","\\90\\61\\12\\23\\135","\\176\\118\\81\\65\\66","\\55\\104\\57\\198\\150","\\206\\251\\224\\128\\41","\\29\\158\\68\\146\\164","\\248\\60\\14\\67\\145","\\210\\220\\161\\10\\254","\\72\\81\\151\\213\\68","\\25\\236\\210\\197\\128","\\72\\37\\208\\227\\54","\\242\\24\\6\\88\\26","\\128\\197\\20\\5\\211","\\98\\105\\71\\42\\180","\\91\\43\\72\\84\\104","\\205\\254\\174\\65\\141","\\222\\194\\126\\204\\164","\\233\\153\\37\\148\\226","\\32\\195\\22\\153\\87","\\194\\97\\220\\251\\18","\\151\\201\\148\\52\\147","\\205\\55\\0\\226\\58","\\172\\12\\50\\0\\140","\\56\\32\\43\\9\\45","\\18\\174\\50\\162\\126","\\138\\150\\12\\72\\189","\\49\\230\\150\\210\\48","\\2\\140\\64\\104\\32","\\14\\174\\41\\196\\121","\\100\\195\\116\\130\\101","\! \222\\45\\94\\39\\64","\\178\\203\\221\\63\\94","\\26\\188\\157\\\\134 \\52","\\119\\0\\51\\127\\236","\\88\\32\\224\\142\\164","\\111\\146\\81\\182\\250","\\12\\177\\151\\83\\13","\\113\\27\\173\\162\\19","\\158\\216\\41\\236\\226","\\16\\88\\\\134\\180\\112","\\43\\32\\16\\77\\238","\\136\\93\\210\\172\\63","\\251\\214\\30\\40\\146","\\156\\27\\198\\60\\170","\\185\\29\\172\\30\\68","\\202\\83\\59\\228\\252","\\219\\127\\145\\132\\203","\\1\\223\\97\\229\\127","\\113\\83\\123\\167\\140","\\99\\1\\116\\56\\165","\\143\\224\\239\\1\\173","\\49\\186\\156\\51\\92","\\246\\224\\70\\245\\137","\\235\\10\\74\\224\\150","\\43\\88\\245\\14\\103","\\88\\128\\232\\142\\254","\\251\\132\\118\\47\\221","\\36\\7\\142\\234\\98","\\130\\126\\199\\170\\126","\\133\\23\\51\\253\\234","\\249\\89\\242\\87\\86","\\102\\243\\47\\193\\211","\\140\\18\\140\\164\\248","\\179\\228\\212\\15\\22","\\168\\155\\243\\169\\191","\\117\\37\\139\\241\\230","\\155\\11\\254\\171\\200","\\196\\159\\253\\223\\15","\\93\\207\\154\\106\\135","\\11\\255\\28\\123\\125","\\239\\9\\226! \\59\\198","\\191\\204\\230\\61\\39","\\175\\204\\181\\113\\\\134","\\64\\227\\29\\108\\222","\\169\\173\\194\\83\\40","\\212\\93\\170\\169\\12","\\249\\55\\232\\182\\208","\\75\\175\\181\\248\\246","\\108\\95\\114\\215\\138","\\220\\37\\59\\207\\197","\\45\\146\\43\\76\\81","\\166\\231\\20\\9\\189","\\27\\126\\81\\92\\75","\\66\\168\\119\\100\\196","\\229\\9\\196\\165\\250","\\83\\186\\103\\184\\46","\\85\\177\\246\\29\\221","\\140\\159\\53\\211\\157","\\214\\193\\192\\217\\109","\\10\\5\\64\\97\\157","\\92\\137\\120\\70\\55","\\235\\45\\181\\44\\98","\\150\\56\\132\\207\\19","\\67\\95\\161\\39\\122","\\109\\65\\145\\170\\79","\\\\134\\28\\90\\39\\33","\\226\\177\\240\\202\\157","\\1\\57\\50\\6\\240","\\249\\240\\222\\56\\161","\\110\\136\\88\\85\\249","\\82\\27\\239\\51\\211","\\114\\223\\252\\83\\189","\\129\\216\\251\\218\\80","\\247\\36\\101\\90\\229","\\209\\73\\221\\46\\11","\\242\\12\\120\\117\\\\134","\\146\\198\\57\\177\\49","\\212\\57\\9\\240\\216","\\215\\151\\2! 16\\59\\75","\\47\\132\\161\\165\\54","\\113\\4\\77\\241\\150","\\217\ \184\\149\\53\\124","\\152\\111\\25\\231\\104","\\42\\185\\112\\250\\156","\\39\\131\\14\\140\\189","\\148\\169\\158\\251\\150","\\184\\142\\204\\122\\179","\\19\\189\\181\\105\\116","\\116\\77\\22\\135\\50","\\236\\231\\60\\132\\229","\\200\\63\\76\\232\\9","\\32\\20\\168\\87\\45","\\99\\129\\99\\165\\29","\\2\\208\\66\\228\\105","\\99\\194\\194\\229\\17","\\85\\250\\55\\51\\114","\\200\\165\\249\\77\\72","\\5\\91\\178\\157\\24","\\245\\253\\219\\83\\250","\\166\\103\\181\\196\\34","\\227\\149\\148\\105\\157","\\95\\44\\15\\251\\98","\\183\\32\\161\\223\\178","\\120\\236\\145\\158\\78","\\244\\4\\92\\233\\112","\\189\\231\\124\\92\\19","\\112\\132\\8\\49\\157","\\160\\243\\244\\94\\104","\\150\\176\\139\\251\\157","\\176\\193\\155\\175\\144","\\161\\208\\145\\92\\92","\\77\\122\\94\\69\\182","\\77\\13\\131\\29\\27","\\92\\9\\178\\204\\254","\\177\\4\\154\\211\\63","\\62\\4\\242\\1\\78","\\4\\129\\113\\205\\164","\\168\\95\\68\\89\\38","\\173\\115\\56\\91\\52","\\212\\161\\1! 59\\148\\179","\\133\\9\\89\\207\\62","\\242\\51\\168\\130\\86","\\154\\199\\208\\84\\2","\\160\\215\\250\\104\\22","\\45\\252\\143\\149\\204","\\48\\50\\91\\39\\243","\\94\\168\\48\\202\\122","\\238\\38\\180\\135\\142","\\234\\59\\24\\148\\2","\\237\\227\\23\\40\\140","\\7\\114\\176\\80\\123","\\204\\170\\0\\60\\65","\\217\\202\\249\\158\\182","\\82\\170\\45\\96\\86","\\118\\11\\123\\51\\216","\\192\\130\\153\\88\\59","\\219\\53\\146\\88\\198","\\203\\114\\182\\147\\145","\\158\\140\\239\\104\\247","\\179\\86\\111\\146\\65","\\192\\168\\51\\125\\183","\\8\\251\\77\\143\\231","\\237\\229\\173\\221\\29","\\69\\178\\247\\196\\175","\\114\\33\\237\\189\\119","\\220\\44\\144\\93\\98","\\241\\38\\138\\127\\252","\\66\\218\\237\\199\\157","\\132\\240\\212\\221\\48","\\180\\41\\157\\84\\37","\\203\\180\\58\\113\\136","\\156\\39\\111\\181\\34","\\16\\202\\216\\183\\55","\\154\\51\\122\\201\\45","\\218\\112\\47\\206\\142","\\189\\141\\110\\230\\132","\\80\\167\\61\\103\\247","\\186\! \15\\121\\27\\167","\\103\\163\\217\\19\\220","\\173\\116\\86\\7\\249" ,"\\25\\37\\98\\35\\127","\\44\\92\\200\\89\\84","\\171\\129\\106\\249\\38","\\24\\147\\77\\\\134\\62","\\254\\184\\72\\159\\91","\\221\\13\\18\\153\\154","\\109\\232\\79\\169\\176","\\152\\103\\190\\50\\18","\\51\\71\\217\\22\\76","\\105\\109\\7\\77\\198","\\250\\121\\163\\49\\73","\\138\\204\\\\134\\247\\116","\\130\\38\\156\\36\\27","\\20\\83\\86\\113\\124","\\40\\63\\161\\157\\76","\\205\\99\\150\\109\\249","\\111\\174\\57\\169\\238","\\106\\169\\245\\170\\240","\\32\\10\\53\\160\\76","\\226\\0\\58\\9\\22","\\63\\83\\21\\3\\205","\\212\\141\\249\\177\\102","\\197\\226\\42\\202\\130","\\70\\40\\85\\176\\2","\\3\\16\\133\\118\\91","\\232\\48\\176\\209\\77","\\20\\149\\0\\2\\144","\\50\\87\\138\\108\\149","\\13\\78\\64\\211\\245","\\15\\158\\123\\62\\103","\\239\\68\\210\\175\\197","\\247\\216\\7\\211\\5","\\112\\100\\135\\210\\101","\\47\\26\\118\\254\\62","\\123\\7\\143\\206\\114","\\184\\43\\252\\56\\194","\\55\\16\\219\\\\134\\201","\\170\\128\\224\\160\\251","\\180\\10! 8\\182\\255\\118","\\164\\155\\151\\195\\67","\\116\\56\\163\\249\\92","\\250\\207\\75\\244\\104","\\122\\219\\25\\49\\17","\\16\\61\\66\\50\\32","\\15\\223\\166\\\\134\\188","\\16\\221\\48\\159\\124","\\163\\66\\245\\19\\190","\\52\\177\\137\\57\\104","\\137\\158\\143\\12\\73","\\175\\156\\252\\243\\165","\\18\\119\\\\134\\198\\209","\\179\\60\\37\\63\\136","\\68\\117\\75\\163\\27","\\234\\108\\150\\93\\15","\\209\\159\\154\\221\\138","\\70\\215\\50\\36\\255","\\237\\64\\91\\125\\54","\\17\\41\\177\\34\\178","\\19\\241\\29\\12\\34","\\\\134\\151\\112\\6\\214","\\63\\61\\146\\243\\60"}',1,0,1088649636); For testing purposes, here is the Perl script I used to make the revised input from the original test script: #!/usr/bin/perl -w $lastat = 0; while (<>) { if (m/^select put_token\((\d+),'(.*)',(\d+),(\d+),(\d+)\);$/) { $d1 = $1; $data = $2; $d3 = $3; $d4 = $4; $d5 = $5; $data =~ s/\\/\\\\/g; if ($d5 == $lastat) { $last2 = $last2 . ',"' . $data . '"'; } else { if ($lastat) { print "select put_tokens($last1,'{$last2}',$last3,$last4,$lastat);\n"; } $last1 = $d1; $last2 = '"' . $data . '"'; $last3 = $d3; $last4 = $d4; $lastat = $d5; } } else { print ; } } print "select put_tokens($last1,'{$last2}',$last3,$last4,$lastat);\n"; BTW, the data quoting is probably wrong here, but I suppose that can be fixed easily.
pgsql-performance by date: