Re: BUG #7902: lazy cleanup of extraneous WAL files can cause out of disk issues - Mailing list pgsql-bugs

From Jeff Janes
Subject Re: BUG #7902: lazy cleanup of extraneous WAL files can cause out of disk issues
Date
Msg-id CAMkU=1xnyJwrg4E7ia6JAnsA1QUkwSxL1upY_auoecM4b3Ubiw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #7902: lazy cleanup of extraneous WAL files can cause out of disk issues  (Jeff Frost <jeff@pgexperts.com>)
Responses Re: BUG #7902: lazy cleanup of extraneous WAL files can cause out of disk issues  (Jeff Frost <jeff@pgexperts.com>)
List pgsql-bugs
On Sunday, February 24, 2013, Jeff Frost wrote:

>
> This is how the log_checkpoint output looks during the run:
>
> 2013-02-24 21:06:31.156 UTC,,,1624,,51282598.658,114,,2013-02-23 02:12:40
> UTC,,0,LOG,00000,"checkpoint starting: immediate force wait",,,,,,,,,""
> 2013-02-24 21:06:31.216 UTC,,,1624,,51282598.658,115,,2013-02-23 02:12:40
> UTC,,0,LOG,00000,"checkpoint complete: wrote 108 buffers (0.0%); 0
> transaction log file(s) added, 0 removed, 4 recycled; write=0.054 s,
> sync=0.002 s, total=0.059 s; sync files=10, longest=0.000 s, average=0.000
> s",,,,,,,,,""
>
> So, this ^^^^^ is the manual checkpoint prior to the pgbench run and
> everything else is part of the pgbench run.
>
> 2013-02-24 21:06:45.227 UTC,,,1624,,51282598.658,116,,2013-02-23 02:12:40
> UTC,,0,LOG,00000,"checkpoints are occurring too frequently (14 seconds
> apart)",,"Consider increasing the configuration parameter
> ""checkpoint_segments"".",,,,,,,""
>

It took 14 seconds to go fill up 128 files.


> 2013-02-24 21:06:45.227 UTC,,,1624,,51282598.658,117,,2013-02-23 02:12:40
> UTC,,0,LOG,00000,"checkpoint starting: xlog",,,,,,,,,""
> 2013-02-24 21:08:35.239 UTC,,,1624,,51282598.658,118,,2013-02-23 02:12:40
> UTC,,0,LOG,00000,"checkpoint complete: wrote 266935 buffers (12.7%); 0
> transaction log file(s) added, 0 removed, 1 recycled; write=10.889 s,
> sync=99.079 s, total=110.011 s; sync files=29, longest=12.888 s,
> average=3.416 s",,,,,,,,,""
>

It spent 10.9 seconds writing buffers, either because it knew it was behind
time and was writing as fast as it could, or because it figured it needed
to be done with everything in 10.9 / 0.9 = 12.1 seconds, as that was when
the next checkpoint was due to start (i.e. when the next 128 was going be
filled).  But it took 99 seconds to accomplish the part of the checkpoint
that is supposed to fit in 1.2 seconds or less.  Assuming it was still
generating log files at the same rate of 128 every 12 to 14 seconds, there
would be an extra 1000 files by the time it is done.  That is almost enough
there to fill up your hard drive.  Now it probably won't manage to do that,
as I imagine the long fsync times would also lead to stalls in throughput.


> 2013-02-24 21:08:35.241 UTC,,,1624,,51282598.658,119,,2013-02-23 02:12:40
> UTC,,0,LOG,00000,"checkpoint starting: xlog",,,,,,,,,""
> 2013-02-24 21:11:19.506 UTC,,,1624,,51282598.658,120,,2013-02-23 02:12:40
> UTC,,0,LOG,00000,"checkpoint complete: wrote 801517 buffers (38.2%); 0
> transaction log file(s) added, 0 removed, 128 recycled; write=15.484 s,
> sync=148.739 s, total=164.266 s; sync files=32, longest=11.773 s,
> average=4.648 s",,,,,,,,,""
>

Here again, either the writes are being metered so they finish after 15.484
s when PG estimates that 0.9 * 128 log files would be filled, or they were
going as fast as possible because it thought they would be filled even
earlier.  It then took nearly ten times that long to finish the checkpoint
(which according to completion_target, is supposed to be take 1/10 the time
of the write, not 10 times longer.  So of course the actual number of log
files is going to greatly exceed the planned 128.


> 2013-02-24 21:11:19.506 UTC,,,1624,,51282598.658,121,,2013-02-23 02:12:40
> UTC,,0,LOG,00000,"checkpoint starting: xlog",,,,,,,,,""
> 2013-02-24 21:14:20.486 UTC,,,1624,,51282598.658,122,,2013-02-23 02:12:40
> UTC,,0,LOG,00000,"checkpoint complete: wrote 1009466 buffers (48.1%); 0
> transaction log file(s) added, 140 removed, 257 recycled; write=13.690 s,
> sync=167.245 s, total=180.980 s; sync files=33, longest=14.442 s,
> average=5.067 s",,,,,,,,,""
> 2013-02-24 21:14:20.487 UTC,,,1624,,51282598.658,123,,2013-02-23 02:12:40
> UTC,,0,LOG,00000,"checkpoint starting: xlog",,,,,,,,,""
> 2013-02-24 21:17:31.507 UTC,,,1624,,51282598.658,124,,2013-02-23 02:12:40
> UTC,,0,LOG,00000,"checkpoint complete: wrote 1055059 buffers (50.3%); 0
> transaction log file(s) added, 259 removed, 257 recycled; write=22.731 s,
> sync=168.239 s, total=191.020 s; sync files=32, longest=13.650 s,
> average=5.257 s",,,,,,,,,""
> 2013-02-24 21:17:31.507 UTC,,,1624,,51282598.658,125,,2013-02-23 02:12:40
> UTC,,0,LOG,00000,"checkpoint starting: xlog",,,,,,,,,""
> 2013-02-24 21:23:36.201 UTC,,,1624,,51282598.658,126,,2013-02-23 02:12:40
> UTC,,0,LOG,00000,"checkpoint complete: wrote 814767 buffers (38.9%); 0
> transaction log file(s) added, 546 removed, 0 recycled; write=269.553 s,
> sync=95.042 s, total=364.693 s; sync files=31, longest=11.211 s,
> average=3.065 s",,,,,,,,,""
>
> They're all number of xlog based and the first one doesn't remove any,
> that's not surprising,
>

What was the state of the pg_xlog directory before you started the test?
 If it were not full up to check_point segments * 2, (which is likely was
not, since populating pgbench_accounts avoids WAL when not archiving), this
I agree it is not surprising.

> but I'm surprised that the second one doesn't remove any.  It does
recycle quite a few.

PG keep two checkpoints worth or WAL files (that doesn't really seem to be
documented, and I'm not sure why it is done, other than an abundance of
caution).  So after the 2nd checkpoint is done, it keeps the ~500 or so log
files that were filled since the 2nd one started, and the ~400 or so filled
since the first one started.  The 128 that were filled to trigger the first
checkpoint can now be recycled, though.  Then after the next checkpoint,
those 400 are freed up and 257 are recycled and 140 are removed.  And so on


>  I guess the issue is that it can generate WAL files way faster than it
> can checkpoint and cleanup.
>

Yep, I don't have any doubt about that.


>  I think I need to write a script to output the number of WAL files each
> minute so we can better correlate what's going on.
>

I think we can already be confident about what is going on.   But perhaps
the log_checkpoint should record the number of log files retained, as well
as the numbers recycled and removed.  And also it would nice to know the
difference between time spend actually writing buffers, and time spent
napping during the buffer-write phase.

Cheers,

Jeff

pgsql-bugs by date:

Previous
From: Jeff Janes
Date:
Subject: Re: BUG #7902: lazy cleanup of extraneous WAL files can cause out of disk issues
Next
From: adam.tomjack@zuerchertech.com
Date:
Subject: BUG #7906: pg_dump exits successfully after an error