Re: Performance problems testing with Spamassassin 3.1.0 - Mailing list pgsql-performance

From John Arbash Meinel
Subject Re: Performance problems testing with Spamassassin 3.1.0
Date
Msg-id 42ECF3AE.6060809@arbash-meinel.com
Whole thread Raw
In response to Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel <john@arbash-meinel.com>)
Responses Re: Performance problems testing with Spamassassin 3.1.0
List pgsql-performance
John Arbash Meinel wrote:

>Matthew Schumacher wrote:
>
>
>
>>All it's doing is trying the update before the insert to get around the
>>problem of not knowing which is needed.  With only 2-3 of the queries
>>implemented I'm already back to running about the same speed as the
>>original SA proc that is going to ship with SA 3.1.0.
>>
>>All of the queries are using indexes so at this point I'm pretty
>>convinced that the biggest problem is the sheer number of queries
>>required to run this proc 200 times for each email (once for each token).
>>
>>I don't see anything that could be done to make this much faster on the
>>postgres end, it's looking like the solution is going to involve cutting
>>down the number of queries some how.
>>
>>One thing that is still very puzzling to me is why this runs so much
>>slower when I put the data.sql in a transaction.  Obviously transactions
>>are acting different when you call a proc a zillion times vs an insert
>>query.
>>
>>
>>
>>
>Well, I played with adding a COMMIT;BEGIN; statement to your exact test
>every 1000 lines. And this is what I got:
>
>
Just for reference, I also tested this on my old server, which is a dual
Celeron 450 with 256M ram. FC4 and Postgres 8.0.3
Unmodified:
real    54m15.557s
user    0m24.328s
sys     0m14.200s

With Transactions every 1000 selects, and vacuum every 5000:
real    8m36.528s
user    0m16.585s
sys     0m12.569s

With Transactions every 1000 selects, and vacuum every 10000:
real    7m50.748s
user    0m16.183s
sys     0m12.489s

On this machine vacuum is more expensive, since it doesn't have as much ram.

Anyway, on this machine, I see approx 7x improvement. Which I think is
probably going to satisfy your spamassassin needs.
John
=:->

PS> Looking forward to having a spamassassin that can utilize my
favorite db. Right now, I'm not using a db backend because it wasn't
worth setting up mysql.

>Unmodified:
>real    17m53.587s
>user    0m6.204s
>sys     0m3.556s
>
>With BEGIN/COMMIT:
>real    1m53.466s
>user    0m5.203s
>sys     0m3.211s
>
>So I see the potential for improvement almost 10 fold by switching to
>transactions. I played with the perl script (and re-implemented it in
>python), and for the same data as the perl script, using COPY instead of
>INSERT INTO means 5s instead of 33s.
>
>I also played around with adding VACUUM ANALYZE every 10 COMMITS, which
>brings the speed to:
>
>real    1m41.258s
>user    0m5.394s
>sys     0m3.212s
>
>And doing VACUUM ANALYZE every 5 COMMITS makes it:
>real    1m46.403s
>user    0m5.597s
>sys     0m3.244s
>
>I'm assuming the slowdown is because of the extra time spent vacuuming.
>Overall performance might still be improving, since you wouldn't
>actually be inserting all 100k rows at once.
>
>
...

>This is all run on Ubuntu, with postgres 7.4.7, and a completely
>unchanged postgresql.conf. (But the machine is a dual P4 2.4GHz, with
>3GB of RAM).
>
>John
>=:->
>
>


Attachment

pgsql-performance by date:

Previous
From: "J. Andrew Rogers"
Date:
Subject: Re: Performance problems on 4/8way Opteron (dualcore)
Next
From: Matthew Schumacher
Date:
Subject: Re: Performance problems testing with Spamassassin 3.1.0