Home > mailing lists

Re: [PERFORM] [GENERAL] Questionaire: Common WAL write rates on busy servers. - Mailing list pgsql-performance

From	bricklen
Subject	Re: [PERFORM] [GENERAL] Questionaire: Common WAL write rates on busy servers.
Date	April 25, 2017 17:19:31
Msg-id	CAGrpgQ-kdUawMakxoe-yd=3YKTGBgNhkd7DLYTUG4oMZevWD7g@mail.gmail.com Whole thread
In response to	[PERFORM] Questionaire: Common WAL write rates on busy servers. (Andres Freund <andres@anarazel.de>)
List	pgsql-performance

Tree view

On Mon, Apr 24, 2017 at 9:17 PM, Andres Freund <andres@anarazel.de> wrote:

Questions (answer as many you can comfortably answer):
- How many MB/s, segments/s do you see on busier servers?
- What generates the bulk of WAL on your servers (9.5+ can use
pg_xlogdump --stats to compute that)?
- Are you seeing WAL writes being a bottleneck?OA
- What kind of backup methods are you using and is the WAL volume a
problem?
- What kind of replication are you using and is the WAL volume a
problem?
- What are your settings for wal_compression, max_wal_size (9.5+) /
checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers?
- Could you quickly describe your workload?

* Postgresql 9.3

* 1500+ db servers

* Daily peak for busy databases: 75 WALs switched per second (less than 10% of the servers experience this)

* Avg per db: 2 WALs/s
* Mainly generated by large batch sync processes that occur throughout the day, and by a legacy archiving process to purge older data (potentially many millions of cascading deletes).
*Half the servers have (encrypted) pg_dump backups, WAL volume hasn't proved to be a problem there, though dump size is a problem for a few of the larger databases (less than 1TB).

* Inter-data-centre replication is all streaming, across DC's (over the WAN) WAL shipping is over compressed SSH tunnels.

Occasionally the streaming replication falls behind, but more commonly it is the cross-DC log shipping that becomes a problem. Some of the servers will generate 50+ GBs of WAL in a matter of minutes and that backs up immediately on the masters. Occasionally this has a knock-on effect for other servers and slows down their log shipping due to network saturation.

* checkpoint_segments: 64, checkpoint_timeout: 5 mins, wal_buffers: 16MB

Workload:

70% of servers are generally quiet, with occasional bursty reads and writes.

20% are medium use, avg a few hundred transactions/second

10% average around 5k txns/s, with bursts up to 25k txns/s for several minutes.

All servers have about 80% reads / 20% writes, though those numbers flip during big sync jobs and when the purging maintenance kicks off.

pgsql-performance by date:

From: Vladimir Borodin
Date: 25 April 2017, 10:56:14
Subject: Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.

From: Claudio Freire
Date: 25 April 2017, 19:07:41
Subject: Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.

Re: [PERFORM] [GENERAL] Questionaire: Common WAL write rates on busy servers. - Mailing list pgsql-performance

Previous

Next