Re: Performance query about large tables, lots of concurrent access - Mailing list pgsql-performance

From Bill Moran
Subject Re: Performance query about large tables, lots of concurrent access
Date
Msg-id 20070619101504.e456ebef.wmoran@collaborativefusion.com
Whole thread Raw
In response to Re: Performance query about large tables, lots of concurrent access  (Karl Wright <kwright@metacarta.com>)
Responses Re: Performance query about large tables, lots of concurrent access
List pgsql-performance
In response to Karl Wright <kwright@metacarta.com>:

> Alvaro Herrera wrote:
> > Karl Wright wrote:
> >> Alvaro Herrera wrote:
> >>> Karl Wright wrote:
> >>>
> >>>> This particular run lasted four days before a VACUUM became essential.
> >>>> The symptom that indicates that VACUUM is needed seems to be that the
> >>>> CPU usage of any given postgresql query skyrockets.  Is this essentially
> >>>> correct?
> >>> Are you saying you weren't used to run VACUUM all the time?  If so,
> >>> that's where the problem lies.
> >> Postgresql 7.4 VACUUM runs for so long that starting it with a cron job
> >> even every 24 hours caused multiple instances of VACUUM to eventually be
> >> running in my case.  So I tried to find a VACUUM schedule that permitted
> >> each individual vacuum to finish before the next one started.  A vacuum
> >> seemed to require 4-5 days with this particular database - or at least
> >> it did for 7.4.  So I had the VACUUM schedule set to run every six days.
> >
> > How large is the database?  I must admit I have never seen a database
> > that took 4 days to vacuum.  This could mean that your database is
> > humongous, or that the vacuum strategy is wrong for some reason.
>
> The database is humongus, and the machine is under intense load.  On the
> instance where this long vacuum occurred, there were several large
> tables - one with 7,000,000 rows, one with 14,000,000, one with
> 140,000,000, and one with 250,000,000.

Don't rule out the possibility that the only way to fix this _might_ be to
throw more hardware at it.  Proper configuration can buy you a lot, but if
your usage is exceeding the available bandwidth of the IO subsystem, the
only way you're going to get better performance is to put in a faster IO
subsystem.

> > You know that you can run vacuum on particular tables, right?  It would
> > be probably a good idea to run vacuum on the most updated tables, and
> > leave alone those that are not or little updated (hopefully the biggest;
> > this would mean that an almost-complete vacuum run would take much less
> > than a whole day).
>
> Yeah, sorry, that doesn't apply here.

Why not?  I see no reason why an appropriate autovaccum schedule would not
apply to your scenario.  I'm not saying it does, only that your response
does not indicate that it doesn't, and thus I'm concerned that you're
writing autovacuum off without proper research.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

pgsql-performance by date:

Previous
From: Richard Huxton
Date:
Subject: Re: Maintenance question / DB size anomaly...
Next
From: Karl Wright
Date:
Subject: Re: Performance query about large tables, lots of concurrent access