Re: [HACKERS] WAL logging problem in 9.4.3? - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: [HACKERS] WAL logging problem in 9.4.3?
Date
Msg-id 20191120.150546.1050157217400213784.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: [HACKERS] WAL logging problem in 9.4.3?  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
I'm in the benchmarking week..

Thanks for reviewing!.

At Sun, 17 Nov 2019 20:54:34 -0800, Noah Misch <noah@leadboat.com> wrote in 
> On Tue, Nov 05, 2019 at 02:53:35PM -0800, Noah Misch wrote:
> > I started pre-commit editing on 2019-10-28, and comment+README updates have
> > been the largest part of that.  I'll check my edits against the things you
> > list here, and I'll share on-list before committing.  I've now marked the CF
> > entry Ready for Committer.

I'll look into that soon.

By the way, before finalize this, I'd like to share the result of a
brief benchmarking.

First, I measured the direct effect of WAL skipping.
I measured the time required to do the following sequence for the
COMMIT-FPW-WAL case and COMMIT-fsync case. WAL and heap files are on
non-server spec HDD.

  BEGIN;
  TRUNCATE t;
  INSERT INTO t (SELECT a FROM generate_series(1, n) a);
  COMMIT;

REPLICA means the time with wal_level = replica
SYNC    means the time with wal_level = minimal and force file sync.
WAL     means the time with wal_level = minimal and force commit-WAL.
pages is the number of pages of the table.
(REPLICA comes from run.sh 1, SYNC/WAL comes from run.sh 2)

pages REPLICA    SYNC      WAL
    1:   144 ms   683 ms   217 ms
    3:   303 ms   995 ms   385 ms
    5:   271 ms  1007 ms   217 ms
   10:   157 ms  1043 ms   224 ms
   17:   189 ms  1007 ms   193 ms
   31:   202 ms  1091 ms   230 ms
   56:   265 ms  1175 ms   226 ms
  100:   510 ms  1307 ms   270 ms
  177:   790 ms  1523 ms   524 ms
  316:  1827 ms  1643 ms   719 ms
  562:  1904 ms  2109 ms  1148 ms
 1000:  3060 ms  2979 ms  2113 ms
 1778:  6077 ms  3945 ms  3618 ms
 3162: 13038 ms  7078 ms  6734 ms

There was a crossing point around 3000 pages. (bench1() finds that by
bisecting, run.sh 3).


With multiple sessions, the crossing point  but does not go so
small.

10 processes (run.pl 4 10) The numbers in parentheses are WAL[n]/WAL[n-1].
pages    SYNC     WAL
  316:  8436 ms  4694 ms
  562: 12067 ms  9627 ms (x2.1) # WAL wins
 1000: 19154 ms 43262 ms (x4.5) # SYNC wins. WAL's slope becomes steep.
 1778: 32495 ms 63863 ms (x1.4)

100 processes (run.pl 4 100)
pages    SYNC     WAL
   10: 13275 ms  1868 ms 
   17: 15919 ms  4438 ms (x2.3)
   31: 17063 ms  6431 ms (x1.5)
   56: 23193 ms 14276 ms (x2.2)  # WAL wins
  100: 35220 ms 67843 ms (x4.8)  # SYNC wins. WAL's slope becomes steep.

With 10 pgbench sessions.
pages   SYNC     WAL     
    1:   915 ms   301 ms
    3:  1634 ms   508 ms
    5:  1634 ms   293ms
   10:  1671 ms  1043 ms
   17:  1600 ms   333 ms
   31:  1864 ms   314 ms
   56:  1562 ms   448 ms
  100:  1538 ms   394 ms
  177:  1697 ms  1047 ms
  316:  3074 ms  1788 ms
  562:  3306 ms  1245 ms
 1000:  3440 ms  2182 ms
 1778:  5064 ms  6464 ms  # WAL's slope becomes steep
 3162:  8675 ms  8165 ms


I don't think the result of 100 processes is meaningful, so excluding
the result a candidate for wal_skip_threshold can be 1000.

Thoughts? The attached is the benchmark script.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
#! /usr/bin/perl

use strict;
use IPC::Open2;
use Time::HiRes qw (gettimeofday tv_interval);

my $tupperpage = 226;

my @time = ();

sub bench {
    my ($header, $nprocs, $ntups, $threshold) = @_;
    my @result = ();
    my @rds = ();
    
    for (my $ip = 0 ; $ip < $nprocs ; $ip++)
    {
        pipe(my $rd, my $wr);
        $rds[$ip] = $rd;
        
        my $pid = fork();

        die "fork failed: $!\n" if ($pid < 0);
        if ($pid == 0)
        {
            close($rd);
            
            my $pid = open2(my $psqlrd, my $psqlwr, "psql postgres");
            print $psqlwr "SET wal_skip_threshold to $threshold;\n";
            print $psqlwr "DROP TABLE IF EXISTS t$ip;";
            print $psqlwr "CREATE TABLE t$ip (a int);\n";

            my @st = gettimeofday();
            for (my $i = 0 ; $i < 10 ; $i++)
            {
                print $psqlwr "BEGIN;";
                print $psqlwr "TRUNCATE t$ip;";
                print $psqlwr "INSERT INTO t$ip (SELECT a FROM generate_series(1, $ntups) a);";
                print $psqlwr "COMMIT;";
            }
            close($psqlwr);
            waitpid($pid, 0);

            print $wr $ip, " ", 1000 * tv_interval(\@st, [gettimeofday]), "\n";
            exit;
        }
        close($wr);
    }

    my $rpid;
    while (($rpid = wait()) == 0) {}

    my $sum = 0;
    for (my $ip = 0 ; $ip < $nprocs ; $ip++)
    {
        my $ret = readline($rds[$ip]);
        die "format? $ret\n" if ($ret !~ /^([0-9]+) ([0-9.]+)$/);

        $sum += $2;
    }

    printf "$header: procs $nprocs: time %.0f\n", $sum / $nprocs;
}

sub log10 { return log($_[0]) / log(10); }

# benchmark for wal_level = replica, the third parameter of bench
# doesn't affect
sub bench1
{
    print "benchmark for wal_level = replica\n";
    for (my $s = 0 ; $s <= 4 ; $s += 0.25)
    {
        my $ss = int(10 ** $s);
        bench("size $ss", 1, $ss * $tupperpage, $ss * 8);
    }
}

# benchmark for wal_level = minimal.
sub bench2
{
    print "benchmark for wal_level = minimal\n";
    for (my $s = 0 ; $s <= 3.5 ; $s += 0.25)
    {
        my $ss = int(10 ** $s);
        bench("size $ss: SYNC ", 1, $ss * $tupperpage, $ss * 8);
        bench("size $ss: WAL  ", 1, $ss * $tupperpage, ($ss + 1) * 8);
    }
}

# find crossing point of WAL and SYNC by bisecting
sub bench3
{
    print "find crossing point of WAL and SYNC by bisecting\n";
    bench("SYNC: size 0", 1, 1, 8);
    bench("WAL : size 0", 1, 1, 16);
    my $s = 1;
    my $st = 10000;
    while (1)
    {
        my $ts = bench("SYNC: size $s", $tupperpage * $s, $s * 8);
        my $tw = bench("WAL : size $s", $tupperpage * $s, ($s + 1) * 8);

        if ($st < 1.0){
            print "DONE\n";
            exit(0);
        }
        if ($ts > $tw)
        {
            $s += $st; $st /= 2;
        }
        else
        {
            $s -= $st; $st /= 2;
        }
    }
}

# benchmark with multiple processes
sub bench4
{
    my $nprocs = $ARGV[1];
    
    print "benchmark for wal_level = minimal, $nprocs processes\n";
    
    for (my $s = 1.0 ; $s <= 3.5 ; $s += 0.25)
    {
        my $ss = int(10 ** $s);
        bench("size $ss: SYNC ", $nprocs, $ss * $tupperpage, $ss * 8);
        bench("size $ss: WAL  ", $nprocs, $ss * $tupperpage, ($ss + 1) * 8);
    }
}


bench1() if ($ARGV[0] == 1);
bench2() if ($ARGV[0] == 2);
bench3() if ($ARGV[0] == 3);
bench4() if ($ARGV[0] == 4);




pgsql-hackers by date:

Previous
From: Paul A Jungwirth
Date:
Subject: Re: range_agg
Next
From: btkimurayuzk
Date:
Subject: Re: pg_waldump and PREPARE