standby not cleaning up WAL files often enough - Mailing list pgsql-admin

From Jeff Frost
Subject standby not cleaning up WAL files often enough
Date
Msg-id EE4EF5D6-EF29-41F6-9A0C-C861D95E669A@pgexperts.com
Whole thread Raw
Responses Re: standby not cleaning up WAL files often enough
List pgsql-admin
I’ve got an interesting situation on a new 9.4.1 streaming replica.

Occasionally it will get disconnected and need to catch up by pulling files from the wal-archive via wal-e.  This works fine and most of the time it will catch up and reconnect.

Ocasionally though, it will run the 50G pg_xlog partition out of disk space because it doesn’t seem to clean up the WAL files nearly fast enough.  For those who haven’t done the math in their head, that’s a bit over 3,000 WAL files.

checkpoint_segments = 33
checkpoint_completion_target = 0.9
checkpoint_timeout = 10min

Here’s how the logs look:

2015-04-22 20:48:34.331 GMT,,,32645,,553298b3.7f85,2666,,2015-04-18 17:47:31 GMT,,0,LOG,00000,"restartpoint complete: wrote 382002 buffers (36.4%); 0 transaction log file(s) added, 461 removed, 67 recycled; write=197.586 s, sync=0.253 s, total=199.001 s; sync files=397, longest=0.146 s, average=0.000 s",,,,,,,,,""
2015-04-22 20:52:01.176 GMT,,,32645,,553298b3.7f85,2669,,2015-04-18 17:47:31 GMT,,0,LOG,00000,"restartpoint complete: wrote 395868 buffers (37.8%); 0 transaction log file(s) added, 428 removed, 67 recycled; write=205.487 s, sync=0.281 s, total=206.843 s; sync files=510, longest=0.241 s, average=0.000 s",,,,,,,,,""
2015-04-22 20:55:13.082 GMT,,,32645,,553298b3.7f85,2672,,2015-04-18 17:47:31 GMT,,0,LOG,00000,"restartpoint complete: wrote 369073 buffers (35.2%); 0 transaction log file(s) added, 461 removed, 67 recycled; write=190.356 s, sync=0.469 s, total=191.903 s; sync files=481, longest=0.363 s, average=0.000 s",,,,,,,,,""
2015-04-22 20:58:30.610 GMT,,,32645,,553298b3.7f85,2675,,2015-04-18 17:47:31 GMT,,0,LOG,00000,"restartpoint complete: wrote 380614 buffers (36.3%); 0 transaction log file(s) added, 461 removed, 67 recycled; write=196.295 s, sync=0.150 s, total=197.526 s; sync files=478, longest=0.062 s, average=0.000 s",,,,,,,,,""
2015-04-22 21:02:01.334 GMT,,,32645,,553298b3.7f85,2678,,2015-04-18 17:47:31 GMT,,0,LOG,00000,"restartpoint complete: wrote 396343 buffers (37.8%); 0 transaction log file(s) added, 461 removed, 67 recycled; write=209.380 s, sync=0.192 s, total=210.722 s; sync files=444, longest=0.161 s, average=0.000 s",,,,,,,,,""
2015-04-22 21:05:33.490 GMT,,,32645,,553298b3.7f85,2681,,2015-04-18 17:47:31 GMT,,0,LOG,00000,"restartpoint complete: wrote 404200 buffers (38.5%); 0 transaction log file(s) added, 428 removed, 67 recycled; write=210.967 s, sync=0.182 s, total=212.154 s; sync files=505, longest=0.142 s, average=0.000 s",,,,,,,,,""
2015-04-22 21:08:37.649 GMT,,,32645,,553298b3.7f85,2684,,2015-04-18 17:47:31 GMT,,0,LOG,00000,"restartpoint complete: wrote 417375 buffers (39.8%); 0 transaction log file(s) added, 461 removed, 67 recycled; write=182.664 s, sync=0.391 s, total=184.157 s; sync files=448, longest=0.198 s, average=0.000 s",,,,,,,,,""
2015-04-22 21:08:47.816 GMT,,,32645,,553298b3.7f85,2687,,2015-04-18 17:47:31 GMT,,0,LOG,00000,"restartpoint complete: wrote 914929 buffers (87.3%); 0 transaction log file(s) added, 485 removed, 10 recycled; write=5.511 s, sync=3.434 s, total=10.165 s; sync files=459, longest=0.484 s, average=0.007 s",,,,,,,,,""
2015-04-22 21:10:07.464 GMT,,,32645,,553298b3.7f85,2690,,2015-04-18 17:47:31 GMT,,0,LOG,00000,"restartpoint complete: wrote 8421 buffers (0.8%); 0 transaction log file(s) added, 395 removed, 67 recycled; write=70.043 s, sync=0.278 s, total=71.243 s; sync files=182, longest=0.234 s, average=0.001 s",,,,,,,,,”"

All of those are started by "xlog" and not by "time'.

At first I was thinking perhaps wal-e was prefetching way too many WAL files, but it appears to only pre-fetch 8 by default:


Also, I verified that there were almost 3k WAL files in the pg_xlog directory itself.

I guess my question is: how can I avoid this behavior other than just continuing to grow that pg_xlog partition?


pgsql-admin by date:

Previous
From: jaime soler
Date:
Subject: Re: pgdump with batch insert
Next
From: Jorge Torralba
Date:
Subject: Re: standby not cleaning up WAL files often enough