Re: [GENERAL] Questionaire: Common WAL write rates on busy servers. - Mailing list pgsql-general

From Jerry Sievers
Subject Re: [GENERAL] Questionaire: Common WAL write rates on busy servers.
Date
Msg-id 87inlsco4j.fsf@jsievers.enova.com
Whole thread Raw
In response to [GENERAL] Questionaire: Common WAL write rates on busy servers.  (Andres Freund <andres@anarazel.de>)
List pgsql-general
Andres Freund <andres@anarazel.de> writes:

> Hi,
>
> I've lately seen more and more installations where the generation of
> write-ahead-log (WAL) is one of the primary bottlenecks.  I'm curious
> whether that's primarily a "sampling error" of mine, or whether that's
> indeed more common.
>
> The primary reason I'm curious is that I'm pondering a few potential
> optimizations, and would like to have some guidance which are more and
> which are less important.

I have a busy warehouse spitting out about 400k
segments/week... ~10MB/second :-)

We have resorted to a rather complex batch/parallel compressor/shipper
to keep up with the volume.

>
> Questions (answer as many you can comfortably answer):
> - How many MB/s, segments/s do you see on busier servers?

Our busiest system Avg 10MB/second but very burst.  Assume it'w many
times that during high churn periods.

> - What generates the bulk of WAL on your servers (9.5+ can use
>   pg_xlogdump --stats to compute that)?

Simply warehouse incremental loading and/or full table delete/trunc and
reload, plus dirived data being created.  Many of the transient tables
are on NVME and unlogged.

> - Are you seeing WAL writes being a bottleneck?OA
> - What kind of backup methods are you using and is the WAL volume a
>   problem?

I do not know if basic local WAL writing itself is a problem of or not
but as mentioned, we are scarcely able to handle the necessary archiving
to make backups and PITR possible.

> - What kind of replication are you using and is the WAL volume a

Th;are 2 streamers both feeding directly from master.  We use a fairly
large 30k keep-segments value to help avoid streamers falling behind and
then having to resort to remote archive fetching.

It does appear that since streaming WAL reception and application as
well as of course remote fetching are single threaded, this is a
bottleneck as well.  That is, a totally unloded and well outfitted
(hardware wise) streamer can barely keep up with master.

> - What are your settings for wal_compression, max_wal_size (9.5+) /
>   checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers?

 checkpoint_timeout                  | 5min
 max_wal_size                        | 4GB
 wal_buffers                         | 16MB
 wal_compression                     | off

> - Could you quickly describe your workload?

warehouse with user self-service reporting creation/storage allowed in
same system.

>
> Feel free to add any information you think is pertinent ;)

Great idea!!  Thanks

>
> Greetings,
>
> Andres Freund

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800


pgsql-general by date:

Previous
From: Mike Blackwell
Date:
Subject: Re: [GENERAL] Surprising results from array concatenation
Next
From: kerneltrick
Date:
Subject: Re: [GENERAL] FDW table doesn't exist