Thread: Calculating how much redo log space has been used

Calculating how much redo log space has been used

From

Reza Taheri

Date:

14 August 2018, 21:51:34

Hello PostgreSQL community,

I am helping with a benchmarking exercise using PGSQL (I chair the TPC subcommittee that has released a benchmark using PGSQL). A requirement of the benchmark is having enough log space allocated for 8 hours of running without needing to archive, back up, etc. I am trying to a) figure out how I can establish the exact space usage for the auditor; and b) how I can reduce the log space usage. Looking at iostat and pgstatspack, it looks like we will need to allocate something like 1.5TB of log space for a 5TB database, which is a huge ratio. (Yes, in the real world, we’d probably archive or ship the logs; but for benchmarking, that doesn’t work)

pgstatspack gives me something like below:

background writer stats

-------------------+-----------------+--------------------+---------------+------------------+-----------------+---------------

22 | 0 | 6416768 | 2252636 | 0 | 280211 | 9786558

(1 row)

background writer relative stats

-------------------+----------------------------+--------------------+---------------+-----------------+--------------+----------------------

100% | 6 | 71% | 25% | 3% | 8.659 MB/s | 2278.000 MB

I can calculate how many checkpoint segments I have used from the MB/s. But is there a more direct way of seeing how/when a checkpoint segment is filled up and we move on to the next one?

Also, it looks like the full_page_writes parameter is the only thing that can help reduce the log usage size, but that I have to set it to 1 to avoid corruption after a system crash, which is a requirement. Another requirement is a very short, 6-minute checkpoint time, which means we will likely write the full page very often. Yes, my hands are tied!

Here are the relevant non-default settings:

shared_buffers = 18000MB # min 128kB

temp_buffers = 2MB # min 800kB

maintenance_work_mem = 5MB # min 1MB

bgwriter_delay = 10ms # 10-10000ms between rounds

bgwriter_lru_maxpages = 200 # 0-1000 max buffers written/round

effective_io_concurrency = 10 # 1-1000; 0 disables prefetching

wal_sync_method = open_datasync # the default is the first option

wal_buffers = 16MB # min 32kB, -1 sets based on shared_buffers

wal_writer_delay = 10ms # 1-10000 milliseconds

checkpoint_segments = 750 # in logfile segments, min 1, 16MB each

checkpoint_timeout = 6min # range 30s-1h

checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0

effective_cache_size = 512MB

default_statistics_target = 10000 # range 1-10000

log_destination = 'stderr' # Valid values are combinations of

logging_collector = on # Enable capturing of stderr and csvlog

log_directory = 'pg_log' # directory where log files are written,

log_filename = 'postgresql-%a.log' # log file name pattern,

log_truncate_on_rotation = on # If on, an existing log file with the

log_rotation_age = 1d # Automatic rotation of logfiles will

log_rotation_size = 0 # Automatic rotation of logfiles will

log_checkpoints = on

Re: Calculating how much redo log space has been used

From

Andres Freund

Date:

14 August 2018, 22:31:18

Hi,

On 2018-08-14 18:51:34 +0000, Reza Taheri wrote:
> Also, it looks like the full_page_writes parameter is the only thing
> that can help reduce the log usage size

There's also wal_compression.


> Another requirement is a very short, 6-minute checkpoint time, which
> means we will likely write the full page very often. Yes, my hands are
> tied!

Why is that a requirement / how is specifically phrased? Is it a bounded
recovery time?

Greetings,

Andres Freund

RE: Calculating how much redo log space has been used

From

Reza Taheri

Date:

15 August 2018, 23:03:00

> -----Original Message-----
> From: Andres Freund [mailto:andres@anarazel.de]
> Sent: Tuesday, August 14, 2018 12:31 PM
> To: Reza Taheri <rtaheri@vmware.com>
> Cc: pgsql-performance@postgresql.org
> Subject: Re: Calculating how much redo log space has been used
>
> Hi,
>
> On 2018-08-14 18:51:34 +0000, Reza Taheri wrote:
> > Also, it looks like the full_page_writes parameter is the only thing
> > that can help reduce the log usage size
>
> There's also wal_compression.
>
>
> > Another requirement is a very short, 6-minute checkpoint time, which
> > means we will likely write the full page very often. Yes, my hands are
> > tied!
>
> Why is that a requirement / how is specifically phrased? Is it a bounded
> recovery time?
>
> Greetings,
>
> Andres Freund

Hi Andres,
Good to know about wal_compression. It gives us a good reason to upgrade to 9.5 to get that feature.

The need for a 6-minute checkpoint came from this requirement in the benchmark specification:

the database contents (excluding the transaction log) stored on Durable Media cannot be more than 12 minutes older than
anyCommitted state of the database. 
Comment: This may mean that Database Management Systems implementing traditional checkpoint algorithms may need to
performcheckpoints twice as frequently (i.e. every 6 minutes) in order to guarantee that the 12-minute requirement is
met.

But in any case, I now realize that I was going into the weeds, looking at the wrong thing. My original issue was
figuringout how quickly we churn through checkpoint segment files, and had been looking at the checkpoint stats in
pgstatspackto figure that out. But that's the wrong place to look. I don't think there is anything in the pgstatspack
outputthat can give me that information. I can tell by looking at the timestamps of the checkpoint segment files, but I
washoping to find something that gets logged in pg_log/postgresql-*log and tells me when we switch to a new log 

Thanks,
Reza