Re: Problem after VACUUM ANALYZE - Mailing list pgsql-general

From Richard Huxton
Subject Re: Problem after VACUUM ANALYZE
Date
Msg-id 47FB91EB.1030004@archonet.com
Whole thread Raw
In response to Problem after VACUUM ANALYZE  (mljv@planwerk6.de)
Responses Re: Problem after VACUUM ANALYZE  (mljv@planwerk6.de)
Re: Problem after VACUUM ANALYZE  (mljv@planwerk6.de)
List pgsql-general
mljv@planwerk6.de wrote:
> We looked in our cpu monitoring and saw that we have huge IOwait while VACUUM
> is running, not unusual though. But just after VACUUM was finished, the
> userCPU load raised to 200% (dual cpu). At this time in the early morning
> there were only very few users using our app. We analyzed the situation
> further and saw that most queries had a very long duration time (minutes
> instead of milliseconds). BTW: the vacuum process was not running at this
> moment. It was logged before as done.

But WHAT was using all of your cpu? Was it PostgreSQL, and if so just
one backend? If something else was using all your cpu, then it might
just be that the PostgreSQL server wasn't getting a chance to run your
queries.

> * Maybe the Vacuum analyze process has not enough memory and therefore could
> not ANALYZE the tables correctly. It then writes wrong statistics to the
> database which results in wrong execution plans using sequence scans instead
> of index scans. This only happens if the vacuum analyze process runs
> concurrently with user requests. If it runs on its own, the vacuum process
> has enough memory and writes correct statistics to the database.

Doesn't sound likely to me. You've got 8GB of RAM, and if you were going
into swap you'd have noticed the disk I/O.

> Here are some of our configuration parameters. We never really tweaked it as
> it ran fine. We just raised some parameters. The following list should show
> all parameters changed from the default:

> max_connections = 300
> shared_buffers = 30000
> work_mem = 10240

OK, so that's 30,000 * 8KB = 240MB of shared_buffers
You have 10MB of work_mem and if all 300 connections were using that
much you'd have committed 3GB of your RAM for that. Of course they'll
want more than just that.

Do you really have 300 concurrent connections?

> maintenance_work_mem = 163840

160MB for vacuums - should be OK given how much memory you have and the
fact that it's quiet when you vacuum.

> vacuum_mem = 32000

This is what maintenance_work_mem used to be called. You can delete this
  entry.

> max_fsm_pages = 500000

You can track at most 500,000 pages with free space on them. In 8.2+
versions VACUUM VERBOSE will show you how many are currently being used.
Not sure about 8.1

> bgwriter_lru_percent = 10.0
> bgwriter_lru_maxpages = 100
> bgwriter_all_percent = 5
> bgwriter_all_maxpages = 200
> wal_buffers = 16

> checkpoint_segments = 10

If you have bursts of write activity you might want to increase this.

> checkpoint_warning = 3600

> effective_cache_size = 180000

That's 180,000 * 8KB = 180 * 8MB = 1.4GB
If that's really all you're using as cache, I'd reduce the number of
concurrent connections. Check free/top and see how much RAM is really
being used as disk cache.

> random_page_cost = 3

Might be too high - you don't mention what disks you have.

> stats_command_string = off

If you turn this one on, you'll be able to see the queries each backend
is executing as they happen. Might be useful, but does have some cost.


The crucial thing is to find out exactly what is happening when things
get very slow. Check vmstat and top, look in the pg_locks system-table
and if needs be we can see what strace says a particular backend is doing.

--
   Richard Huxton
   Archonet Ltd

pgsql-general by date:

Previous
From: Zdenek Kotala
Date:
Subject: Re: "too many trigger records found for relation xyz"
Next
From: Richard Huxton
Date:
Subject: Re: Number or parameters for functions - limited to 32 ?