Thread: pg_wal lifecycle
Hi all, I'd like to see if I get it right about pg_wal: it grows up to pretty much max_wal_size, at which point a checkpoint is triggered. If the server triggers a timed checkpoint before the pg_wal is at max_wal_size, the system recycles the wals thus keeping the pg_wal size lower than max_wal_size. Is this correct? In particular, what is in simple words, the discrimination between recycling a segment and creating a new one? % sudo du -hs $PGDATA/pg_wal/ 273M /postgres/13/data/pg_wal/ % psql -U postgres -c "show max_wal_size;" -c "show checkpoint_timeout;" max_wal_size -------------- 1GB (1 riga) checkpoint_timeout -------------------- 5min (1 riga) and from the last log checkpoint entry: 2021-07-13 02:48:10.492 EDT [2424] LOG: checkpoint starting: time 2021-07-13 02:50:44.515 EDT [2424] LOG: checkpoint complete: wrote 9170 buffers (28.0%); 0 WAL file(s) added, 0 removed, 13 recycled; write=148.878 s, sync=1.889 s, total=154.023 s; sync files=42, longest=0.109 s, average=0.045 s; distance=217376 kB, estimate=217376 kB Thanks, Luca
On 13.07.21 09:07, Luca Ferrari wrote: > I'd like to see if I get it right about pg_wal: it grows up to pretty > much max_wal_size, at which point a checkpoint is triggered. > If the server triggers a timed checkpoint before the pg_wal is at > max_wal_size, the system recycles the wals thus keeping the pg_wal > size lower than max_wal_size. > Is this correct? In particular, what is in simple words, the > discrimination between recycling a segment and creating a new one? Recycling in this context just means that instead of creating a new WAL file for new WAL traffic, it reuses an old file. So if you have WAL files 5, 6, 7, 8, 9, and you know that you don't need 5 and 6 anymore, when you need to start WAL file 10, instead of creating a new file "10", the system just renames "5" to "10" and starts overwriting what was in there. This is just an optimization to use the file system better; it doesn't affect the logical principles of what is going on.
On Tue, Jul 13, 2021 at 3:43 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > Recycling in this context just means that instead of creating a new WAL > file for new WAL traffic, it reuses an old file. So if you have WAL > files 5, 6, 7, 8, 9, and you know that you don't need 5 and 6 anymore, > when you need to start WAL file 10, instead of creating a new file "10", > the system just renames "5" to "10" and starts overwriting what was in > there. This is just an optimization to use the file system better; it > doesn't affect the logical principles of what is going on. Yes, I know about recycling and the problem it can cause (or solve), like the setting wal_recycle. However it is still not clear to me when the database triggers a wal deletion or a wal recycling. I suspect the recycling could be driven by checkpoint_completion_target: if the wal is still in the window of the writing checkpoint it cannot be recycled. But what about deletion? % sudo du -hs $PGDATA/pg_wal 977M /postgres/13/data/pg_wal ... some work % psql -U postgres -c 'checkpoint;' testdb % sudo du -hs $PGDATA/pg_wal 929M /postgres/13/data/pg_wal and in the logs I see: LOG: checkpoint complete: wrote 4425 buffers (13.5%); 0 WAL file(s) added, 3 removed, 0 recycled; write=0.263 s, sync=0.297 s, total=1.617 s; sync files=2, longest=0.255 s, average=0.149 s; distance=62683 kB, estimate=62683 kB So the system was still under the max_wal_size (1 GB), it did not write any new WAL file but decided to remove three of them ( 977 - 3 * 16 = 929 MB). I do agree that being near max_wal_size, deleting wal files could be good to avoid growing pg_wal too much due to long transactions, but still I cannot predict the behavior. Luca