Thread: Need to find out which process is hitting hda

Need to find out which process is hitting hda

From
Ow Mun Heng
Date:
I'm using centos 5 as the OS so, there's no fancy dtrace to look at
which processes is causing my disks to thrash.

I have 4 disks in the box. (all ide, 7200rpm)

1 OS disk [hda]
2 raided (1) disks [hdb/hdc]
1 pg_xlog disk (and also used as an alternate tablespace for [hdd]
temp/in-transit files via select, insert into tmp table. delete from tmp
table, insert into footable select * from tmp table)

Problem now I see from both atop and iostat, the Device: (iostat -dx 10)

             rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
hda              98.60    14.69 121.98 15.08  1775.02  2908.29    34.17    47.53  551.67   7.29  99.95
hdb               0.70     4.20 16.48  2.30   304.50    51.95    18.98     0.21   10.94   8.45  15.86
hdc               0.00     3.40 12.49  2.00   223.78    43.16    18.43     0.07    5.04   4.42   6.40
hdd               0.00    56.94  0.50  3.70    53.55   485.91   128.57     0.02    5.48   3.95   1.66
md0               0.00     0.00 29.57 11.89   526.67    95.10    15.00     0.00    0.00   0.00   0.00

the number of writes and reads on hda is much greater than expected and I'm not sure who/what is causing it.

Thanks for any clues.


Re: Need to find out which process is hitting hda

From
"Scott Marlowe"
Date:
On Dec 13, 2007 5:06 AM, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote:
> I'm using centos 5 as the OS so, there's no fancy dtrace to look at
> which processes is causing my disks to thrash.
>
> I have 4 disks in the box. (all ide, 7200rpm)
>
> 1 OS disk [hda]
> 2 raided (1) disks [hdb/hdc]
> 1 pg_xlog disk (and also used as an alternate tablespace for [hdd]
> temp/in-transit files via select, insert into tmp table. delete from tmp
> table, insert into footable select * from tmp table)
>
> the number of writes and reads on hda is much greater than expected and I'm not sure who/what is causing it.

Logging?  just guessing.  Or swapping.  What's free say?

Re: Need to find out which process is hitting hda

From
"Merlin Moncure"
Date:
On Dec 13, 2007 6:06 AM, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote:
> I'm using centos 5 as the OS so, there's no fancy dtrace to look at
> which processes is causing my disks to thrash.
>
> I have 4 disks in the box. (all ide, 7200rpm)
>
> 1 OS disk [hda]
> 2 raided (1) disks [hdb/hdc]
> 1 pg_xlog disk (and also used as an alternate tablespace for [hdd]
> temp/in-transit files via select, insert into tmp table. delete from tmp
> table, insert into footable select * from tmp table)
>
> Problem now I see from both atop and iostat, the Device: (iostat -dx 10)
>
>                  rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
> hda              98.60    14.69 121.98 15.08  1775.02  2908.29    34.17    47.53  551.67   7.29  99.95
> hdb               0.70     4.20 16.48  2.30   304.50    51.95    18.98     0.21   10.94   8.45  15.86
> hdc               0.00     3.40 12.49  2.00   223.78    43.16    18.43     0.07    5.04   4.42   6.40
> hdd               0.00    56.94  0.50  3.70    53.55   485.91   128.57     0.02    5.48   3.95   1.66
> md0               0.00     0.00 29.57 11.89   526.67    95.10    15.00     0.00    0.00   0.00   0.00
>
> the number of writes and reads on hda is much greater than expected and I'm not sure who/what is causing it.

there are a few things that I can think of that can can cause postgres
to cause i/o on a drive other than the data drive:
* logging (eliminate this by moving logs temporarily)
* swapping (swap is high and changing, other ways)
* dumps, copy statement (check cron)
* procedures, especially the external ones (perl, etc) that write to disk

my seat-of-the-pants guess is that you are looking at swap.

of course, a runaway program other than postgres can be the cause

merlin

Re: Need to find out which process is hitting hda

From
Tom Lane
Date:
"Merlin Moncure" <mmoncure@gmail.com> writes:
> there are a few things that I can think of that can can cause postgres
> to cause i/o on a drive other than the data drive:
> * logging (eliminate this by moving logs temporarily)
> * swapping (swap is high and changing, other ways)
> * dumps, copy statement (check cron)
> * procedures, especially the external ones (perl, etc) that write to disk

> my seat-of-the-pants guess is that you are looking at swap.

vmstat would confirm or disprove that particular guess, since it tracks
swap I/O separately.

            regards, tom lane

Re: Need to find out which process is hitting hda

From
Greg Smith
Date:
On Thu, 13 Dec 2007, Ow Mun Heng wrote:

> I'm using centos 5 as the OS so, there's no fancy dtrace to look at
> which processes is causing my disks to thrash.

Does plain old top show you anything interesting?  If you hit 'c' after
starting it you'll get more information about the postgres processes in
particular.

> 1 OS disk [hda]
>              rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
> hda              98.60    14.69 121.98 15.08  1775.02  2908.29    34.17    47.53  551.67   7.29  99.95

The funny thing here is that both the writes and reads are very high
compared to the other disks.  That rules out most of what I go looking for
when there's run-away activity.  Many common causes do almost all reads
(i.e. some filesystem crawler like updatedb running) or almost all writes
(loggers gone wild!).  Swapping might do both, so consider mine a second
vote to correlate this with vmstat output.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Need to find out which process is hitting hda

From
Ow Mun Heng
Date:
On Fri, 2007-12-14 at 01:54 -0500, Tom Lane wrote:
> "Merlin Moncure" <mmoncure@gmail.com> writes:
> > there are a few things that I can think of that can can cause postgres
> > to cause i/o on a drive other than the data drive:
> > * logging (eliminate this by moving logs temporarily)
I'll have to try this

> > * swapping (swap is high and changing, other ways)
> > * dumps, copy statement (check cron)
Not doing any of these

> > * procedures, especially the external ones (perl, etc) that write to disk
Nope. the only perl running is just pulling data from the master DB into
this little box


>
> > my seat-of-the-pants guess is that you are looking at swap.
>
> vmstat would confirm or disprove that particular guess, since it tracks
> swap I/O separately.

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  6 300132   5684   4324 315888  420   32  1024   644 1309  485 35 11  0 54  0
 0  6 299820   6768   4328 313004  588   76  3048   576 1263  588 36 12  0 52  0
 0  6 299428   5424   4340 313700  480   36  2376   104 1291  438 24  9  0 67  0
 2  6 298836   5108   4268 313788  800    0  2312   216 1428  625 30 10  0 60  0
 2  6 298316   5692   4192 313044  876    0  1652  1608 1488  656 33 11  0 56  0
 2  6 298004   6256   4140 312184  560    4  1740  1572 1445  601 42 11  0 47  0

I kept looking at the io columns and didn't even think of the swap
partition. It's true that it's moving quite erratically but I won't say
that it's really thrashing.

             total       used       free     shared    buffers     cached
Mem:           503        498          4          0          3        287
-/+ buffers/cache:        207        295
Swap:         2527        328       2199

(YEP, I know I'm RAM starved on this machine)


Re: Need to find out which process is hitting hda

From
Tom Lane
Date:
Ow Mun Heng <Ow.Mun.Heng@wdc.com> writes:
>> vmstat would confirm or disprove that particular guess, since it tracks
>> swap I/O separately.

> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  2  6 300132   5684   4324 315888  420   32  1024   644 1309  485 35 11  0 54  0
>  0  6 299820   6768   4328 313004  588   76  3048   576 1263  588 36 12  0 52  0
>  0  6 299428   5424   4340 313700  480   36  2376   104 1291  438 24  9  0 67  0
>  2  6 298836   5108   4268 313788  800    0  2312   216 1428  625 30 10  0 60  0
>  2  6 298316   5692   4192 313044  876    0  1652  1608 1488  656 33 11  0 56  0
>  2  6 298004   6256   4140 312184  560    4  1740  1572 1445  601 42 11  0 47  0

> I kept looking at the io columns and didn't even think of the swap
> partition. It's true that it's moving quite erratically but I won't say
> that it's really thrashing.

Hmmm ... my experience is that the si/so columns should show *zero* under
normal load.  What you're showing here is swap as a sizable percentage
of total I/O load, and with the CPU spending the majority of its time
in I/O wait, that's clearly where you need to focus your attention.

> (YEP, I know I'm RAM starved on this machine)

Yeah, that's what it looks like.  Head down to your local CompUSA and
get some RAM at fire-sale prices ...

            regards, tom lane

Re: Need to find out which process is hitting hda

From
"Scott Marlowe"
Date:
On Dec 14, 2007 1:33 AM, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote:
> I kept looking at the io columns and didn't even think of the swap
> partition. It's true that it's moving quite erratically but I won't say
> that it's really thrashing.
>
>              total       used       free     shared    buffers     cached
> Mem:           503        498          4          0          3        287
> -/+ buffers/cache:        207        295
> Swap:         2527        328       2199
>
> (YEP, I know I'm RAM starved on this machine)

Good lord, my laptop has more memory than that. :)

What Tom said, buy some more RAM.  Also, look at turning down the
swappiness setting as well.

Re: Need to find out which process is hitting hda

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, 16 Dec 2007 17:55:55 -0600
"Scott Marlowe" <scott.marlowe@gmail.com> wrote:

> On Dec 14, 2007 1:33 AM, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote:
> > I kept looking at the io columns and didn't even think of the swap
> > partition. It's true that it's moving quite erratically but I won't
> > say that it's really thrashing.
> >
> >              total       used       free     shared    buffers
> > cached Mem:           503        498          4          0
> > 3        287 -/+ buffers/cache:        207        295
> > Swap:         2527        328       2199
> >
> > (YEP, I know I'm RAM starved on this machine)
> 
> Good lord, my laptop has more memory than that. :)

My phone has more memory than that :P

Sincerely,

Joshua D. Drake



- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHZb6bATb/zqfZUUQRAqmxAJ4o2PzaSUrxEAT9ElAfFNdnofKwaACfR6IZ
3uf1dtRME1SUyKKbPY1iwKU=
=KJFh
-----END PGP SIGNATURE-----

Re: Need to find out which process is hitting hda

From
"Scott Marlowe"
Date:
On Dec 16, 2007 6:11 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Sun, 16 Dec 2007 17:55:55 -0600
> "Scott Marlowe" <scott.marlowe@gmail.com> wrote:
>
> > On Dec 14, 2007 1:33 AM, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote:
> > > I kept looking at the io columns and didn't even think of the swap
> > > partition. It's true that it's moving quite erratically but I won't
> > > say that it's really thrashing.
> > >
> > >              total       used       free     shared    buffers
> > > cached Mem:           503        498          4          0
> > > 3        287 -/+ buffers/cache:        207        295
> > > Swap:         2527        328       2199
> > >
> > > (YEP, I know I'm RAM starved on this machine)
> >
> > Good lord, my laptop has more memory than that. :)
>
> My phone has more memory than that :P

Now that you mention it, my phone does indeed have more memory than my
laptop as well.   sheesh.  technology doesn't march forward, it drag
races forwards.

Re: Need to find out which process is hitting hda

From
Ow Mun Heng
Date:
On Sun, 2007-12-16 at 16:11 -0800, Joshua D. Drake wrote:
> On Sun, 16 Dec 2007 17:55:55 -0600
> "Scott Marlowe" <scott.marlowe@gmail.com> wrote:
>
> > On Dec 14, 2007 1:33 AM, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote:
> > > I kept looking at the io columns and didn't even think of the swap
> > > partition. It's true that it's moving quite erratically but I won't
> > > say that it's really thrashing.
> > >
> > >              total       used       free     shared    buffers
> > > cached Mem:           503        498          4          0
> > > 3        287 -/+ buffers/cache:        207        295
> > > Swap:         2527        328       2199
> > >
> > > (YEP, I know I'm RAM starved on this machine)
> >
> > Good lord, my laptop has more memory than that. :)
>
> My phone has more memory than that :P

What can I say :-p
budgets are tight