Re: limiting resources to users - Mailing list pgsql-general

From Craig Ringer
Subject Re: limiting resources to users
Date
Msg-id 4B0E2170.2050903@postnewspapers.com.au
Whole thread Raw
In response to Re: limiting resources to users  ("Gauthier, Dave" <dave.gauthier@intel.com>)
List pgsql-general
Gauthier, Dave wrote:
> Actually, limiting resources on a per DB basis would work for me too.
>
> I thin kOracle had a thing where you could limit resources, or at least prioritize users.

Pg isn't Oracle - for which we can be alternately extremely thankful and
somewhat frustrated. Lots of shiny features vs hellish admin, freaky SQL
syntax, and $myeyesarebleeding.

> Someone else was reporting this to me, and again, it was a MySQL DB
> that I'll eventually be replacing with PG.  I'll inherit this problem
> too, so I'm being proactive in looking for a fix.  He said that the
> cpu was pegged (no mention of I/O). He started killing processes left
> and write until performace was back (it was either that or a DB
> restart to clear the slate).

Argh.  It could just as easily be disk I/O.

  for i in `seq 1 20`; do
  ( dd if=/dev/md0 bs=1M seek=$(($i * 1000)) of=/dev/null &)
  done

results in:

top - 14:26:20 up 3 days, 1:43,  3 users, load average: 4.70, 3.02, 1.41
Tasks: 255 total,  15 running, 240 sleeping,   0 stopped,   0 zombie
Cpu(s): 30.1%us, 65.0%sy, 0.0%ni, 0.0%id, 2.5%wa, 0.5%hi, 2.0%si, 0.0%st
Mem:   4055728k total,  3776220k used,   279508k free,   750588k buffers
Swap:  2120544k total,     4572k used,  2115972k free,  2245336k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

 1926 syslog    20   0 34552 1216  956 S   81  0.0   1:41.32 rsyslogd

32619 root      20   0  4152 1772  632 R    7  0.0   0:00.46 dd

32631 root      20   0  4152 1772  632 D    6  0.0   0:00.54 dd

32623 root      20   0  4152 1776  632 D    6  0.0   0:00.51 dd


... which could be confused for a busy CPU, but is really load caused by
disk I/O. Most of the `dd' processes are in 'D' state - ie
uninterruptable sleep in a system call - and if you check "wchan" with
"ps" you'll see that they're sleeping waiting for disk I/O.

Randomly killing processes is *not* a good way to tackle this - as I
think you already knew. Tools like `ps' (especially using custom formats
with wchan), `vmstat', `top', `iotop', `blktrace' + `blkparse',
`btrace', `top', `sar', alt-sysrq-t, etc can help you learn more about
what is going on without having to employ such brute-force tactics.

You might find that the worst performance issues are caused by terribly
written queries, poorly structured tables or views, lack of basic DB
maintenance, bad DB tuning, etc. It need not be the users' eagerness
alone - and you may find that a properly set up database can take the
reporting load and the critical load effortlessly, or close enough as
not to bother with extreme measures like separating the DBs into
multiple clusters.

--
Craig Ringer

pgsql-general by date:

Previous
From: "Gauthier, Dave"
Date:
Subject: Re: limiting resources to users
Next
From: Craig Ringer
Date:
Subject: Re: Wiki page on vacuum full