Re: PostgreSQL 8.4.8 bringing my website down every evening - Mailing list pgsql-general

From Tomas Vondra
Subject Re: PostgreSQL 8.4.8 bringing my website down every evening
Date
Msg-id 4DDE5D34.2000808@fuzzy.cz
Whole thread Raw
In response to Re: PostgreSQL 8.4.8 bringing my website down every evening  (Alexander Farber <alexander.farber@gmail.com>)
Responses Re: PostgreSQL 8.4.8 bringing my website down every evening  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-general
Dne 26.5.2011 11:41, Alexander Farber napsal(a):
> Thank you, I'll try your suggestions.
>
> I'm just slow in doing so, because it's just a
> (sometimes pretty time consuming) hobby-project.
>
> I'm missing knowledge on how to monitor my DB status,
> i.e. how to check some of the things you've asked.

OK, let me explain in a bit more detail. Merlin recommended those 5
things to find out where the real bottleneck is (CPU, I/O, ...), because
that's the necessary first step to fix it.

1. cpu bound? check top cpu usage during

   Just run "top" and see what's going on when there are problems. If
   the is 100% busy then the DB is CPU bound and you have to optimize
   it so that it uses less CPU (or add faster/more CPUs).

   It might be that most of the CPU is consumed by other processes
   (e.g. Java doing GC) but still you need to find out if it's the case.

2. i/o bound? check top wait%

   Run "top" and see what is the wait time. If you have more drives,
   you can run "dstat" or "iostat -x" to see "per disk" stats. If the
   wait/util values grow too much (beyond 50%), you're probably I/O
   bound and you need to fix this.

3. scaling issues? # active connections over 20 or so can be
dangerous.  consider installing a pooler (my favorite is pgbouncer).
also monitor vmstat for context switches

   Run "vmstat 1" and see the "cs" (context switch) column. The more
   context switches happen, the more overhead that makes and the less
   actual work can be done. So if you have too many active processes
   (and each connection is a separate postgres backend process), this
   may be a serious problem (unless the connections are idle).

   The state of the connection can be seen from "ps ax" output - there
   will be something like this:

      5257 ?        Ss     0:00 postgres: pguser pgdb [local] idle

   which means the connection is idle, or this

      5257 ?        Rs     0:02 postgres: vampire pgmap [local] SELECT

   when there's a query running.

   Or you can use pg_stat_activity system view - the idle connections
   will have "<IDLE>" in the "current_query" column.

4. lousy queries? enable min_duration_statement in logs and take note of
queries running over 20-50ms

   Poor SQL queries are often the real cause - you have to find out
   which queries are slow (and then you can analyze why). The queries
   can be obtained in two ways.

   First you can set "log_min_duration_statement" in the config file,
   and queries exceeding this number of miliseconds will be written
   to the postgresql log. For example this

      log_min_duration_statement = 250

   will log all queries that take more than 250ms. Be careful not to
   set it too low (I really wouldn't set it to 20ms right now), because
   it means more I/O and it might make the problem even worse. Queries
   start to slow down, more and more of them exceed this threshold and
   need to be written, that means more I/O and that makes more queries
   to run slow - you get the idea.

   Or you could use the pg_stat_activity view again. Once the problems
   happen log into psql and run this

      select * from pg_stat_activity where current_query != '<IDLE>'
                                  order by (now() - query_start) desc;

   and you'll get list of currently running queries sorted by time.

5. something else? when are your backups running?  what else is
happening at that time?

   This just means the actual problem might be outside postgresql, e.g.
   an intensive batch process / backup / ... consuming a lot of CPU,
   I/O or other resources. That's it.

> Also I wonder, how's shared memory used by PostgreSQL.
> I'm irritated - how it could work with 32MB,
> but now I've got suggestion to increase it
> to 512MB (and it seems to work too...)

Shared buffers are a 'database cache'. When the DB needs a block from a
file (because that's where the data are stored), it reads the data into
the cache. When the same block is needed again, it may be read from the
cache (which is much faster). Unless there's not enough space to hold
all the blocks - in that case the block may be removed from the cache
and will be read from the disk again.

So it's about size of this cache - when you increase the cache more
reads will be resolved without actual I/O. But as Merlin noted, this may
not help when there's a lot of writes in your workload. It may actually
make the thing worse during checkpoint. I'd recommend to enable
"log_checkpoints" to see if this is a problem.

regards
Tomas

pgsql-general by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Miidpoint between two long/lat points? (earthdistance?)
Next
From: Merlin Moncure
Date:
Subject: Re: PostgreSQL 8.4.8 bringing my website down every evening