Odd OS, SAN or related throughput issue affecting large streamer - Mailing list pgsql-admin

From Jerry Sievers
Subject Odd OS, SAN or related throughput issue affecting large streamer
Date
Msg-id 87o8f9n5o0.fsf@jsievers.enova.com
Whole thread Raw
Responses Re: Odd OS, SAN or related throughput issue affecting large streamer
List pgsql-admin
Greetings!

I do not know if $issue has anything to do w/Pg directly but would be
very grateful for any insights...

We've got ~20 servers on a beefy physical host that's using a
fibre-channel storage array backend.

Runtime and OS software was updated recently, and about 2 days ago, our
big monster system began lagging in replication.  It generally is able
to stream WALs from the primary, but is no longer able to apply them and
the slowdown is orders of magnitute below what it was prior.

Full reboot of the host restores adequate throughput for several hours,
upon which time the backlogging resumes.  We have whitnessed the reboot
as temp fix and then repeated falloff twice now on consecutive days.

System of interest is a churny, ~50TB warehouse w/4 tablespaces.  I'
unclear on whether all or just some of them are sluggish on writes.

It is "replay" lag, not lag in streaming that is evident.

my SysEng team is so far unable to spot what's causing the issue.

Suggestions re where to look next?

Thx!


postgres=# select version();
                                                                    version
                      
 

-----------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.11 (Ubuntu 11.11-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12)
5.4.020160609, 64-bit
 

$ uname -a
Linux foobox.foocorp.com 4.15.0-139-generic #143~16.04.1-Ubuntu SMP Wed Mar 17 08:10:33 UTC 2021 x86_64 x86_64 x86_64
GNU/Linux

-- 
Jerry Sievers
Postgres DBA/Development Consulting



pgsql-admin by date:

Previous
From: Ron
Date:
Subject: Re: Where are password hashes stored?
Next
From: Johannes Truschnigg
Date:
Subject: Re: Odd OS, SAN or related throughput issue affecting large streamer