Thread: pgpgout/s without swapping--what does it mean?

pgpgout/s without swapping--what does it mean?

From
Kevin Goess
Date:
We had a big increase in load, iowait, and disk i/o on a dedicated database host the other night. 

Looking at the sar logs, the problem shows itself in a big increase in pgpgout/s, which I believe is postgres paging out parts of itself to disk?

02:15:01 AM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
02:25:01 AM     49.63    163.83    836.80      0.00    467.69      4.80      0.00      4.80    100.00
02:35:01 AM     44.85    230.29   1248.68      0.00    677.18     16.21      0.00     16.21    100.00
02:45:01 AM     47.67    190.82   1059.58      0.00    517.72      5.33      0.00      5.33    100.00
02:55:01 AM     59.00    175.58    986.87      0.00    514.08     18.13      0.00     18.13    100.00
03:05:01 AM     24.67    849.77   1382.60      0.00   1267.94     37.35      0.00     37.35    100.00
03:15:01 AM     28.67   1701.67    717.88      0.00   1231.48      0.00      0.00      0.00      0.00
03:25:02 AM     42.64  21342.02   4086.19      0.04   9701.70    415.92      0.00    414.44     99.64
03:35:01 AM     35.60  28290.69   4305.38      0.10  12906.73    623.89      0.00    615.85     98.71
03:45:01 AM     36.94  31119.30   3675.34      0.01  12456.54    527.55      0.00    521.61     98.87
03:55:01 AM     42.77  29020.72   3458.96      0.01  12165.57    557.57      0.00    553.10     99.20
Average:        41.25  11306.39   2175.59      0.02   5189.70    220.62      0.00    218.63     99.10

However, there isn't a corresponding increase in pages *in*, so if postgres is writing portions of itself out to disk, they can't be very important.  And there's no swapping going on:

02:15:01 AM  pswpin/s pswpout/s
02:25:01 AM      0.00      0.00
02:35:01 AM      0.00      0.00
02:45:01 AM      0.00      0.00
02:55:01 AM      0.00      0.00
03:05:01 AM      0.00      0.00
03:15:01 AM      0.00      0.00
03:25:02 AM      0.00      0.00
03:35:01 AM      0.00      0.00
03:45:01 AM      0.00      0.00
03:55:01 AM      0.00      0.00
   Average:         0.00      0.00

Can anybody help me understand what these statistics are suggesting, what's actually going on on this box/in postgresql?  What is it writing to disk, and why?  Is it just writing out new/changed rows, or what?

--
Kevin M. Goess
Software Engineer
Berkeley Electronic Press
kgoess@bepress.com

510-665-1200 x179
www.bepress.com

bepress: sustainable scholarly publishing  

Re: pgpgout/s without swapping--what does it mean?

From
David Johnston
Date:
Kevin Goess wrote
> Can anybody help me understand what these statistics are suggesting,
> what's
> actually going on on this box/in postgresql?  What is it writing to disk,
> and why?  Is it just writing out new/changed rows, or what?

Not a clue on the statistics but most likely you are seeing checkpoint
activity.

At a simplified level all changes to the database are first persisted to
disk using WAL (write-ahead-logs).  The changes are written to disk, into
WAL files, during commit via fsync.  The original data files are not
affected immediately thus improving performance at the time of commit by
instead risking a prolonged delay in situations where an unclean shutdown
occurs.  However, at some point the WAL files need to be removed and the
physical table files updated.  This occurs during a checkpoint.  A
checkpoint basically causes the on-disk files to become baselined to the
current reality so that only subsequent WAL files need be applied.

There is considerably more to this whole concept than I can go into off the
top of my head but in addition to looking at just I/O it would help to look,
simultaneously, at what processes are active.

HTH

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pgpgout-s-without-swapping-what-does-it-mean-tp5796346p5796355.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: pgpgout/s without swapping--what does it mean?

From
Jeff Janes
Date:
On Monday, March 17, 2014, Kevin Goess <kgoess@bepress.com> wrote:
We had a big increase in load, iowait, and disk i/o on a dedicated database host the other night. 

Looking at the sar logs, the problem shows itself in a big increase in pgpgout/s, which I believe is postgres paging out parts of itself to disk?

02:15:01 AM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
 
...
 
However, there isn't a corresponding increase in pages *in*, so if postgres is writing portions of itself out to disk, they can't be very important.


As far as I can tell, pgpgout/s includes all data written to disk, not just process memory being paged.  So it includes WAL and data files being written, for example due to bulk loads.  Seems like a odd name for that parameter, and I don't know how it differs from bwrtn/s, other than the different units.

If it is a bulk load, that would explain why it is not being read back in.  Also, it could be that the data is needed, but when it is needed it is still in cache and so doesn't lead to disk reads.  But it still needs to be written for durability reasons.

Cheers,

Jeff

Re: pgpgout/s without swapping--what does it mean?

From
Kevin Goess
Date:
That makes sense, so updates to rows that are already in memory, either in blocks in the kernel page cache or in blocks in the postgres cache, would trigger writes but not reads.  Thanks for the sanity check.


On Mon, Mar 17, 2014 at 11:04 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Monday, March 17, 2014, Kevin Goess <kgoess@bepress.com> wrote:
We had a big increase in load, iowait, and disk i/o on a dedicated database host the other night. 

Looking at the sar logs, the problem shows itself in a big increase in pgpgout/s, which I believe is postgres paging out parts of itself to disk?

02:15:01 AM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
 
...
 
However, there isn't a corresponding increase in pages *in*, so if postgres is writing portions of itself out to disk, they can't be very important.


As far as I can tell, pgpgout/s includes all data written to disk, not just process memory being paged.  So it includes WAL and data files being written, for example due to bulk loads.  Seems like a odd name for that parameter, and I don't know how it differs from bwrtn/s, other than the different units.

If it is a bulk load, that would explain why it is not being read back in.  Also, it could be that the data is needed, but when it is needed it is still in cache and so doesn't lead to disk reads.  But it still needs to be written for durability reasons.

Cheers,

Jeff



--
Kevin M. Goess
Software Engineer
Berkeley Electronic Press
kgoess@bepress.com

510-665-1200 x179
www.bepress.com

bepress: sustainable scholarly publishing  

Re: pgpgout/s without swapping--what does it mean?

From
rahulraveendran
Date:
Hi ,

Me too have the same issue on PosgreSQL server. pgpgout/s is going high and
the same time didn't observe any swapping or memory pressure on the host.
Please refer below sar O/P.

12:00:01 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s
pgscand/s pgsteal/s    %vmeff
12:10:01 PM      0.00      4.16    114.55      0.00    106.62      0.00
0.00      0.00      0.00
12:20:01 PM      0.00      3.94     38.37      0.00     57.85      0.00
0.00      0.00      0.00
12:30:01 PM      7.63      4.65    206.51      0.07    108.31      0.00
0.00      0.00      0.00
12:40:01 PM      3.08      4.99    110.57      0.02     93.15      0.00
0.00      0.00      0.00
12:50:01 PM      0.39      4.43    133.92      0.00     88.53      0.00
0.00      0.00      0.00
01:00:01 PM      0.00      4.93    134.93      0.00     88.07      0.00
0.00      0.00      0.00
01:10:01 PM     17.10  20652.46    233.39      0.18  14035.87   4165.03
0.00   4164.51     99.99
01:20:01 PM      9.37  15831.25    140.93      0.13  11912.22   4116.36
0.00   4116.36    100.00
01:30:01 PM     15.40  19688.21    164.96      0.18  14261.36   4938.36
0.00   4938.32    100.00
01:40:01 PM     11.52  11529.86    149.57      0.12   8133.76   2784.77
0.00   2784.77    100.00
01:50:01 PM      9.16  12416.42    140.14      0.13   8890.65   3057.49
0.00   3057.49    100.00
02:00:01 PM     12.19   9076.26    142.05      0.11   6756.87   2320.05
1.88   2321.92    100.00
02:10:01 PM     14.29  12839.23    219.15      0.17   9501.58   3257.99
0.00   3257.99    100.00
02:20:01 PM     12.78  10694.65    145.12      0.11   8268.03   2465.23
0.00   2465.23    100.00
02:30:01 PM      5.79      4.93    140.92      0.03     94.61      0.00
0.00      0.00      0.00
02:40:01 PM      0.06      4.57    147.94      0.00    109.82      0.00
0.00      0.00      0.00
02:50:01 PM      4.94      5.11    140.54      0.03     94.58      0.00
0.00      0.00      0.00
03:00:01 PM      0.00      4.92    138.82      0.00     94.25      0.00
0.00      0.00      0.00
03:10:01 PM      7.66      7.84    267.43      0.12    177.26      0.00
0.00      0.00      0.00
03:20:01 PM      0.00      4.57    138.89      0.00     94.06      0.00
0.00      0.00      0.00
03:30:01 PM      0.00      4.85    140.98      0.00     95.61      0.00
0.00      0.00      0.00
03:40:01 PM      0.00      4.54    148.36      0.00    110.78      0.00
0.00      0.00      0.00
03:50:01 PM      0.00      4.51    137.96      0.00     93.90      0.00
0.00      0.00      0.00
04:00:01 PM      0.00      5.08    143.63      0.00     95.18      0.00
0.00      0.00      0.00
04:10:01 PM     22.53  50746.42    230.90      0.22  35820.37  18562.07
0.00  11605.90     62.52
04:20:01 PM     23.93  60429.47    136.53      0.19  43166.45  22012.66
0.00  14525.16     65.99
04:30:01 PM     17.97   3110.80    140.26      0.10   2822.69   1505.53
0.00    601.17     39.93

And also observed that during this time . there some rsync jobs started by
pgsql and consuming CPU resources.

% CPU   PID
30.8      10452 sshd
24.9      10454 rsync
 3.0       10455 flush-253:2

postgres 10452 30.8  0.1 104372  6256 ?        S    16:00   6:02 sshd:
postgres@notty
postgres 10453  0.0  0.0 110512  1496 ?        Ss   16:00   0:00 rsync
--server -logDtpre.iLs . /backup/ARCHIVELOGS/CLUSTER1
postgres 10454 24.9  0.0 110772  1056 ?        S    16:00   4:53 rsync
--server -logDtpre.iLs . /backup/ARCHIVELOGS/CLUSTER1
root     10455  3.0  0.0      0     0 ?        S    16:00   0:35
[flush-253:2]

VM stat .
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa st
 2  1  15704 173136 234692 5197108    0    0   167  4141    1    1  3  2 93
2  0
 0  1  15704 120684 234692 5250312    0    0     0 242356 4374 3450 18 17 25
39  0
 1  0  15704 131224 234688 5247532    0    0     0 102396 5525 2630 22 17 45
16  0
 2  0  15704 159744 234688 5219992    0    0     0    56 3809 2555 17 10 74
0  0
 1  0  15704 157636 234688 5221132    0    0     0   172 6719 4522 26 16 58
0  0

Can anyone suggest how to get rid of this pageout alert.






--
View this message in context:
http://postgresql.nabble.com/pgpgout-s-without-swapping-what-does-it-mean-tp5796346p5867809.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.