Re: Postgres vs. DSpam - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: Postgres vs. DSpam
Date
Msg-id 20041129215056.GZ41545@decibel.org
Whole thread Raw
In response to Re: Postgres vs. DSpam  (Andrew McMillan <andrew@catalyst.net.nz>)
List pgsql-performance
FWIW, those queries won't be able to use an index. A better WHERE clause
would be:

AND last_hit < CURRENT_DATE - 60

On Fri, Nov 26, 2004 at 02:37:12PM +1300, Andrew McMillan wrote:
> On Wed, 2004-11-24 at 14:14 +0100, Evilio del Rio wrote:
> > Hi,
> >
> > I have installed the dspam filter
> > (http://www.nuclearelephant.com/projects/dspam) on our mail server
> > (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users
> > with a quite low traffic of 4000 messages/day. So it's a quite common
> > platform/environment, nothing spectacular.
>
> I am using DSpam with PostgreSQL here.  I have a daily job that cleans
> the DSpam database up, as follows:
>
> DELETE FROM dspam_token_data
>   WHERE (innocent_hits*2) + spam_hits < 5
>   AND CURRENT_DATE - last_hit > 60;
>
> DELETE FROM dspam_token_data
>   WHERE innocent_hits = 1
>   AND CURRENT_DATE - last_hit > 30;
>
> DELETE FROM dspam_token_data
>   WHERE CURRENT_DATE - last_hit > 180;
>
> DELETE FROM dspam_signature_data
>   WHERE CURRENT_DATE - created_on > 14;
>
> VACUUM dspam_token_data;
>
> VACUUM dspam_signature_data;
>
>
>
> I also occasionally do a "VACUUM FULL ANALYZE;" on the database as well.
>
>
> In all honesty though, I think that MySQL is better suited to DSpam than
> PostgreSQL is.
>
>
> > Please, could anyone explain me this difference?
> > Is Postgres that bad?
> > Is MySQL that good?
> > Am I the only one to have observed this behavior?
>
> I believe that what DSpam does that is not well-catered for in the way
> PostgreSQL operates, is that it does very frequent updates to rows in
> (eventually) quite large tables.  In PostgreSQL the UPDATE will result
> internally in a new record being written, with the old record being
> marked as deleted.  That old record won't be re-used until after a
> VACUUM has run, and this means that the on-disk tables will have a lot
> of dead rows in them quite quickly.
>
> The reason that PostgreSQL operates this way, is a direct result of the
> way transactional support is implemented, and it may well change in a
> version or two.  It's got better over the last few versions, with things
> like pg_autovacuum, but that approach still doesn't suit some types of
> database updating.
>
> Cheers,
>                     Andrew.
> -------------------------------------------------------------------------
> Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
> WEB: http://catalyst.net.nz/            PHYS: Level 2, 150-154 Willis St
> DDI: +64(4)803-2201      MOB: +64(272)DEBIAN      OFFICE: +64(4)499-2267
>        These PRESERVES should be FORCE-FED to PENTAGON OFFICIALS!!
> -------------------------------------------------------------------------
>



--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

pgsql-performance by date:

Previous
From: Christopher Browne
Date:
Subject: Re: [dspam-users] Postgres vs. MySQL
Next
From: Dmitry Karasik
Date:
Subject: VACUUM ANALYZE downgrades performance