Re: Postgres vs. DSpam - Mailing list pgsql-performance

From Andrew McMillan
Subject Re: Postgres vs. DSpam
Date
Msg-id 1101433032.16058.57.camel@lamb.mcmillan.net.nz
Whole thread Raw
In response to Postgres vs. MySQL  (Evilio del Rio <edelrio@cmima.csic.es>)
Responses Re: Postgres vs. DSpam  (Neil Conway <neilc@samurai.com>)
Re: Postgres vs. DSpam  ("Jim C. Nasby" <decibel@decibel.org>)
List pgsql-performance
On Wed, 2004-11-24 at 14:14 +0100, Evilio del Rio wrote:
> Hi,
>
> I have installed the dspam filter
> (http://www.nuclearelephant.com/projects/dspam) on our mail server
> (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users
> with a quite low traffic of 4000 messages/day. So it's a quite common
> platform/environment, nothing spectacular.

I am using DSpam with PostgreSQL here.  I have a daily job that cleans
the DSpam database up, as follows:

DELETE FROM dspam_token_data
  WHERE (innocent_hits*2) + spam_hits < 5
  AND CURRENT_DATE - last_hit > 60;

DELETE FROM dspam_token_data
  WHERE innocent_hits = 1
  AND CURRENT_DATE - last_hit > 30;

DELETE FROM dspam_token_data
  WHERE CURRENT_DATE - last_hit > 180;

DELETE FROM dspam_signature_data
  WHERE CURRENT_DATE - created_on > 14;

VACUUM dspam_token_data;

VACUUM dspam_signature_data;



I also occasionally do a "VACUUM FULL ANALYZE;" on the database as well.


In all honesty though, I think that MySQL is better suited to DSpam than
PostgreSQL is.


> Please, could anyone explain me this difference?
> Is Postgres that bad?
> Is MySQL that good?
> Am I the only one to have observed this behavior?

I believe that what DSpam does that is not well-catered for in the way
PostgreSQL operates, is that it does very frequent updates to rows in
(eventually) quite large tables.  In PostgreSQL the UPDATE will result
internally in a new record being written, with the old record being
marked as deleted.  That old record won't be re-used until after a
VACUUM has run, and this means that the on-disk tables will have a lot
of dead rows in them quite quickly.

The reason that PostgreSQL operates this way, is a direct result of the
way transactional support is implemented, and it may well change in a
version or two.  It's got better over the last few versions, with things
like pg_autovacuum, but that approach still doesn't suit some types of
database updating.

Cheers,
                    Andrew.
-------------------------------------------------------------------------
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/            PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201      MOB: +64(272)DEBIAN      OFFICE: +64(4)499-2267
       These PRESERVES should be FORCE-FED to PENTAGON OFFICIALS!!
-------------------------------------------------------------------------


Attachment

pgsql-performance by date:

Previous
From: Gary Doades
Date:
Subject: Postgres backend using huge amounts of ram
Next
From: Neil Conway
Date:
Subject: Re: Postgres vs. DSpam