Thread: pg_wal lifecycle

pg_wal lifecycle

From
Luca Ferrari
Date:
Hi all,
I'd like to see if I get it right about pg_wal: it grows up to pretty
much max_wal_size, at which point a checkpoint is triggered.
If the server triggers a timed checkpoint before the pg_wal is at
max_wal_size, the system recycles the wals thus keeping the pg_wal
size lower than max_wal_size.
Is this correct? In particular, what is in simple words, the
discrimination between recycling a segment and creating a new one?


% sudo du -hs $PGDATA/pg_wal/
273M    /postgres/13/data/pg_wal/
% psql -U postgres -c "show max_wal_size;" -c "show checkpoint_timeout;"
 max_wal_size
--------------
 1GB
(1 riga)

 checkpoint_timeout
--------------------
 5min
(1 riga)

and from the last log checkpoint entry:

2021-07-13 02:48:10.492 EDT [2424] LOG:  checkpoint starting: time
2021-07-13 02:50:44.515 EDT [2424] LOG:  checkpoint complete: wrote
9170 buffers (28.0%); 0 WAL file(s) added, 0 removed, 13 recycled;
write=148.878 s, sync=1.889 s, total=154.023 s; sync files=42,
longest=0.109 s, average=0.045 s; distance=217376 kB, estimate=217376
kB

Thanks,
Luca



Re: pg_wal lifecycle

From
Peter Eisentraut
Date:
On 13.07.21 09:07, Luca Ferrari wrote:
> I'd like to see if I get it right about pg_wal: it grows up to pretty
> much max_wal_size, at which point a checkpoint is triggered.
> If the server triggers a timed checkpoint before the pg_wal is at
> max_wal_size, the system recycles the wals thus keeping the pg_wal
> size lower than max_wal_size.
> Is this correct? In particular, what is in simple words, the
> discrimination between recycling a segment and creating a new one?

Recycling in this context just means that instead of creating a new WAL 
file for new WAL traffic, it reuses an old file.  So if you have WAL 
files 5, 6, 7, 8, 9, and you know that you don't need 5 and 6 anymore, 
when you need to start WAL file 10, instead of creating a new file "10", 
the system just renames "5" to "10" and starts overwriting what was in 
there.  This is just an optimization to use the file system better; it 
doesn't affect the logical principles of what is going on.



Re: pg_wal lifecycle

From
Luca Ferrari
Date:
On Tue, Jul 13, 2021 at 3:43 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> Recycling in this context just means that instead of creating a new WAL
> file for new WAL traffic, it reuses an old file.  So if you have WAL
> files 5, 6, 7, 8, 9, and you know that you don't need 5 and 6 anymore,
> when you need to start WAL file 10, instead of creating a new file "10",
> the system just renames "5" to "10" and starts overwriting what was in
> there.  This is just an optimization to use the file system better; it
> doesn't affect the logical principles of what is going on.

Yes, I know about recycling and the problem it can cause (or solve),
like the setting wal_recycle.
However it is still not clear to me when the database triggers a wal
deletion or a wal recycling. I suspect the recycling could be driven
by checkpoint_completion_target: if the wal is still in the window of
the writing checkpoint it cannot be recycled. But what about deletion?


% sudo du -hs $PGDATA/pg_wal
977M    /postgres/13/data/pg_wal

... some work
% psql -U postgres -c 'checkpoint;' testdb
% sudo du -hs $PGDATA/pg_wal
929M    /postgres/13/data/pg_wal

and in the logs I see:

LOG:  checkpoint complete: wrote 4425 buffers (13.5%); 0 WAL file(s)
added, 3 removed, 0 recycled; write=0.263 s, sync=0.297 s, total=1.617
s; sync files=2, longest=0.255 s, average=0.149 s; distance=62683 kB,
estimate=62683 kB

So the system was still under the max_wal_size (1 GB), it did not
write any new WAL file but decided to remove three of them ( 977 - 3 *
16 = 929 MB).
I do agree that being near max_wal_size, deleting wal files could be
good to avoid growing pg_wal too much due to long transactions, but
still I cannot predict the behavior.

Luca