Re: [HACKERS] WAL logging problem in 9.4.3? - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: [HACKERS] WAL logging problem in 9.4.3? |
Date | |
Msg-id | 20191120.150546.1050157217400213784.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: [HACKERS] WAL logging problem in 9.4.3? (Noah Misch <noah@leadboat.com>) |
List | pgsql-hackers |
I'm in the benchmarking week.. Thanks for reviewing!. At Sun, 17 Nov 2019 20:54:34 -0800, Noah Misch <noah@leadboat.com> wrote in > On Tue, Nov 05, 2019 at 02:53:35PM -0800, Noah Misch wrote: > > I started pre-commit editing on 2019-10-28, and comment+README updates have > > been the largest part of that. I'll check my edits against the things you > > list here, and I'll share on-list before committing. I've now marked the CF > > entry Ready for Committer. I'll look into that soon. By the way, before finalize this, I'd like to share the result of a brief benchmarking. First, I measured the direct effect of WAL skipping. I measured the time required to do the following sequence for the COMMIT-FPW-WAL case and COMMIT-fsync case. WAL and heap files are on non-server spec HDD. BEGIN; TRUNCATE t; INSERT INTO t (SELECT a FROM generate_series(1, n) a); COMMIT; REPLICA means the time with wal_level = replica SYNC means the time with wal_level = minimal and force file sync. WAL means the time with wal_level = minimal and force commit-WAL. pages is the number of pages of the table. (REPLICA comes from run.sh 1, SYNC/WAL comes from run.sh 2) pages REPLICA SYNC WAL 1: 144 ms 683 ms 217 ms 3: 303 ms 995 ms 385 ms 5: 271 ms 1007 ms 217 ms 10: 157 ms 1043 ms 224 ms 17: 189 ms 1007 ms 193 ms 31: 202 ms 1091 ms 230 ms 56: 265 ms 1175 ms 226 ms 100: 510 ms 1307 ms 270 ms 177: 790 ms 1523 ms 524 ms 316: 1827 ms 1643 ms 719 ms 562: 1904 ms 2109 ms 1148 ms 1000: 3060 ms 2979 ms 2113 ms 1778: 6077 ms 3945 ms 3618 ms 3162: 13038 ms 7078 ms 6734 ms There was a crossing point around 3000 pages. (bench1() finds that by bisecting, run.sh 3). With multiple sessions, the crossing point but does not go so small. 10 processes (run.pl 4 10) The numbers in parentheses are WAL[n]/WAL[n-1]. pages SYNC WAL 316: 8436 ms 4694 ms 562: 12067 ms 9627 ms (x2.1) # WAL wins 1000: 19154 ms 43262 ms (x4.5) # SYNC wins. WAL's slope becomes steep. 1778: 32495 ms 63863 ms (x1.4) 100 processes (run.pl 4 100) pages SYNC WAL 10: 13275 ms 1868 ms 17: 15919 ms 4438 ms (x2.3) 31: 17063 ms 6431 ms (x1.5) 56: 23193 ms 14276 ms (x2.2) # WAL wins 100: 35220 ms 67843 ms (x4.8) # SYNC wins. WAL's slope becomes steep. With 10 pgbench sessions. pages SYNC WAL 1: 915 ms 301 ms 3: 1634 ms 508 ms 5: 1634 ms 293ms 10: 1671 ms 1043 ms 17: 1600 ms 333 ms 31: 1864 ms 314 ms 56: 1562 ms 448 ms 100: 1538 ms 394 ms 177: 1697 ms 1047 ms 316: 3074 ms 1788 ms 562: 3306 ms 1245 ms 1000: 3440 ms 2182 ms 1778: 5064 ms 6464 ms # WAL's slope becomes steep 3162: 8675 ms 8165 ms I don't think the result of 100 processes is meaningful, so excluding the result a candidate for wal_skip_threshold can be 1000. Thoughts? The attached is the benchmark script. regards. -- Kyotaro Horiguchi NTT Open Source Software Center #! /usr/bin/perl use strict; use IPC::Open2; use Time::HiRes qw (gettimeofday tv_interval); my $tupperpage = 226; my @time = (); sub bench { my ($header, $nprocs, $ntups, $threshold) = @_; my @result = (); my @rds = (); for (my $ip = 0 ; $ip < $nprocs ; $ip++) { pipe(my $rd, my $wr); $rds[$ip] = $rd; my $pid = fork(); die "fork failed: $!\n" if ($pid < 0); if ($pid == 0) { close($rd); my $pid = open2(my $psqlrd, my $psqlwr, "psql postgres"); print $psqlwr "SET wal_skip_threshold to $threshold;\n"; print $psqlwr "DROP TABLE IF EXISTS t$ip;"; print $psqlwr "CREATE TABLE t$ip (a int);\n"; my @st = gettimeofday(); for (my $i = 0 ; $i < 10 ; $i++) { print $psqlwr "BEGIN;"; print $psqlwr "TRUNCATE t$ip;"; print $psqlwr "INSERT INTO t$ip (SELECT a FROM generate_series(1, $ntups) a);"; print $psqlwr "COMMIT;"; } close($psqlwr); waitpid($pid, 0); print $wr $ip, " ", 1000 * tv_interval(\@st, [gettimeofday]), "\n"; exit; } close($wr); } my $rpid; while (($rpid = wait()) == 0) {} my $sum = 0; for (my $ip = 0 ; $ip < $nprocs ; $ip++) { my $ret = readline($rds[$ip]); die "format? $ret\n" if ($ret !~ /^([0-9]+) ([0-9.]+)$/); $sum += $2; } printf "$header: procs $nprocs: time %.0f\n", $sum / $nprocs; } sub log10 { return log($_[0]) / log(10); } # benchmark for wal_level = replica, the third parameter of bench # doesn't affect sub bench1 { print "benchmark for wal_level = replica\n"; for (my $s = 0 ; $s <= 4 ; $s += 0.25) { my $ss = int(10 ** $s); bench("size $ss", 1, $ss * $tupperpage, $ss * 8); } } # benchmark for wal_level = minimal. sub bench2 { print "benchmark for wal_level = minimal\n"; for (my $s = 0 ; $s <= 3.5 ; $s += 0.25) { my $ss = int(10 ** $s); bench("size $ss: SYNC ", 1, $ss * $tupperpage, $ss * 8); bench("size $ss: WAL ", 1, $ss * $tupperpage, ($ss + 1) * 8); } } # find crossing point of WAL and SYNC by bisecting sub bench3 { print "find crossing point of WAL and SYNC by bisecting\n"; bench("SYNC: size 0", 1, 1, 8); bench("WAL : size 0", 1, 1, 16); my $s = 1; my $st = 10000; while (1) { my $ts = bench("SYNC: size $s", $tupperpage * $s, $s * 8); my $tw = bench("WAL : size $s", $tupperpage * $s, ($s + 1) * 8); if ($st < 1.0){ print "DONE\n"; exit(0); } if ($ts > $tw) { $s += $st; $st /= 2; } else { $s -= $st; $st /= 2; } } } # benchmark with multiple processes sub bench4 { my $nprocs = $ARGV[1]; print "benchmark for wal_level = minimal, $nprocs processes\n"; for (my $s = 1.0 ; $s <= 3.5 ; $s += 0.25) { my $ss = int(10 ** $s); bench("size $ss: SYNC ", $nprocs, $ss * $tupperpage, $ss * 8); bench("size $ss: WAL ", $nprocs, $ss * $tupperpage, ($ss + 1) * 8); } } bench1() if ($ARGV[0] == 1); bench2() if ($ARGV[0] == 2); bench3() if ($ARGV[0] == 3); bench4() if ($ARGV[0] == 4);
pgsql-hackers by date: