On Fri, 2023-11-24 at 16:59 +0100, Les wrote:
>
>
> Laurenz Albe <laurenz.albe@cybertec.at> (2023. nov. 24., P, 16:00):
> > On Fri, 2023-11-24 at 12:39 +0100, Les wrote:
> > > Under normal circumstances, the number of write operations is relatively low, with an
> > > average of 4-5 MB/sec total write speed on the disk associated with the data directory.
> > > Yesterday, the primary server suddenly started writing to the pg_wal directory at a
> > > crazy pace, 1.5GB/sec, but sometimes it went up to over 3GB/sec.
> > > [...]
> > > Upon further analysis of the database, we found that we did not see any mass data
> > > changes in any of the tables. The only exception is a sequence value that was moved
> > > millions of steps within a single minute.
> >
> > That looks like some application went crazy and inserted millions of rows, but the
> > inserts were rolled back. But it is hard to be certain with the clues given.
>
> Writing of WAL files continued after we shut down all clients, and restarted the primary PostgreSQL server.
>
> How can the primary server generate more and more WAL files (writes) after all clients have
> been shut down and the server was restarted? My only bet was the autovacuum. But I ruled
> that out, because removing a replication slot has no effect on the autovacuum (am I wrong?).
It must have been autovacuum. Removing a replication slot has an influence, since then
autovacuum can do more work. If the problem stopped when you dropped the replication slot,
it could be a coincidence.
> Now you are saying that this looks like a huge rollback.
It could have been many small rollbacks.
> Does rolling back changes require even more data to be written to the WAL after server
> restart?
No. My assumption would be that something generated lots of INSERTs that were all
rolled back. That creates WAL, even though you see no change in the table data.
> Does removing a replication slot lessen the amount of data needed to be written for
> a rollback (or for anything else)?
No: the WAL is generated by whatever precedes the ROLLBACK, and the ROLLBACK does
not create a lot of WAL.
> It is a fact that the primary stopped writing at 1.5GB/sec the moment we removed the slot.
I have no explanation for that, except a coincidence.
Replication slots don't generate WAL.
Yours,
Laurenz Albe