Re: [HACKERS] WAL logging problem in 9.4.3? - Mailing list pgsql-hackers
| From | Kyotaro Horiguchi |
|---|---|
| Subject | Re: [HACKERS] WAL logging problem in 9.4.3? |
| Date | |
| Msg-id | 20191120.150546.1050157217400213784.horikyota.ntt@gmail.com Whole thread Raw |
| In response to | Re: [HACKERS] WAL logging problem in 9.4.3? (Noah Misch <noah@leadboat.com>) |
| List | pgsql-hackers |
I'm in the benchmarking week..
Thanks for reviewing!.
At Sun, 17 Nov 2019 20:54:34 -0800, Noah Misch <noah@leadboat.com> wrote in
> On Tue, Nov 05, 2019 at 02:53:35PM -0800, Noah Misch wrote:
> > I started pre-commit editing on 2019-10-28, and comment+README updates have
> > been the largest part of that. I'll check my edits against the things you
> > list here, and I'll share on-list before committing. I've now marked the CF
> > entry Ready for Committer.
I'll look into that soon.
By the way, before finalize this, I'd like to share the result of a
brief benchmarking.
First, I measured the direct effect of WAL skipping.
I measured the time required to do the following sequence for the
COMMIT-FPW-WAL case and COMMIT-fsync case. WAL and heap files are on
non-server spec HDD.
BEGIN;
TRUNCATE t;
INSERT INTO t (SELECT a FROM generate_series(1, n) a);
COMMIT;
REPLICA means the time with wal_level = replica
SYNC means the time with wal_level = minimal and force file sync.
WAL means the time with wal_level = minimal and force commit-WAL.
pages is the number of pages of the table.
(REPLICA comes from run.sh 1, SYNC/WAL comes from run.sh 2)
pages REPLICA SYNC WAL
1: 144 ms 683 ms 217 ms
3: 303 ms 995 ms 385 ms
5: 271 ms 1007 ms 217 ms
10: 157 ms 1043 ms 224 ms
17: 189 ms 1007 ms 193 ms
31: 202 ms 1091 ms 230 ms
56: 265 ms 1175 ms 226 ms
100: 510 ms 1307 ms 270 ms
177: 790 ms 1523 ms 524 ms
316: 1827 ms 1643 ms 719 ms
562: 1904 ms 2109 ms 1148 ms
1000: 3060 ms 2979 ms 2113 ms
1778: 6077 ms 3945 ms 3618 ms
3162: 13038 ms 7078 ms 6734 ms
There was a crossing point around 3000 pages. (bench1() finds that by
bisecting, run.sh 3).
With multiple sessions, the crossing point but does not go so
small.
10 processes (run.pl 4 10) The numbers in parentheses are WAL[n]/WAL[n-1].
pages SYNC WAL
316: 8436 ms 4694 ms
562: 12067 ms 9627 ms (x2.1) # WAL wins
1000: 19154 ms 43262 ms (x4.5) # SYNC wins. WAL's slope becomes steep.
1778: 32495 ms 63863 ms (x1.4)
100 processes (run.pl 4 100)
pages SYNC WAL
10: 13275 ms 1868 ms
17: 15919 ms 4438 ms (x2.3)
31: 17063 ms 6431 ms (x1.5)
56: 23193 ms 14276 ms (x2.2) # WAL wins
100: 35220 ms 67843 ms (x4.8) # SYNC wins. WAL's slope becomes steep.
With 10 pgbench sessions.
pages SYNC WAL
1: 915 ms 301 ms
3: 1634 ms 508 ms
5: 1634 ms 293ms
10: 1671 ms 1043 ms
17: 1600 ms 333 ms
31: 1864 ms 314 ms
56: 1562 ms 448 ms
100: 1538 ms 394 ms
177: 1697 ms 1047 ms
316: 3074 ms 1788 ms
562: 3306 ms 1245 ms
1000: 3440 ms 2182 ms
1778: 5064 ms 6464 ms # WAL's slope becomes steep
3162: 8675 ms 8165 ms
I don't think the result of 100 processes is meaningful, so excluding
the result a candidate for wal_skip_threshold can be 1000.
Thoughts? The attached is the benchmark script.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
#! /usr/bin/perl
use strict;
use IPC::Open2;
use Time::HiRes qw (gettimeofday tv_interval);
my $tupperpage = 226;
my @time = ();
sub bench {
my ($header, $nprocs, $ntups, $threshold) = @_;
my @result = ();
my @rds = ();
for (my $ip = 0 ; $ip < $nprocs ; $ip++)
{
pipe(my $rd, my $wr);
$rds[$ip] = $rd;
my $pid = fork();
die "fork failed: $!\n" if ($pid < 0);
if ($pid == 0)
{
close($rd);
my $pid = open2(my $psqlrd, my $psqlwr, "psql postgres");
print $psqlwr "SET wal_skip_threshold to $threshold;\n";
print $psqlwr "DROP TABLE IF EXISTS t$ip;";
print $psqlwr "CREATE TABLE t$ip (a int);\n";
my @st = gettimeofday();
for (my $i = 0 ; $i < 10 ; $i++)
{
print $psqlwr "BEGIN;";
print $psqlwr "TRUNCATE t$ip;";
print $psqlwr "INSERT INTO t$ip (SELECT a FROM generate_series(1, $ntups) a);";
print $psqlwr "COMMIT;";
}
close($psqlwr);
waitpid($pid, 0);
print $wr $ip, " ", 1000 * tv_interval(\@st, [gettimeofday]), "\n";
exit;
}
close($wr);
}
my $rpid;
while (($rpid = wait()) == 0) {}
my $sum = 0;
for (my $ip = 0 ; $ip < $nprocs ; $ip++)
{
my $ret = readline($rds[$ip]);
die "format? $ret\n" if ($ret !~ /^([0-9]+) ([0-9.]+)$/);
$sum += $2;
}
printf "$header: procs $nprocs: time %.0f\n", $sum / $nprocs;
}
sub log10 { return log($_[0]) / log(10); }
# benchmark for wal_level = replica, the third parameter of bench
# doesn't affect
sub bench1
{
print "benchmark for wal_level = replica\n";
for (my $s = 0 ; $s <= 4 ; $s += 0.25)
{
my $ss = int(10 ** $s);
bench("size $ss", 1, $ss * $tupperpage, $ss * 8);
}
}
# benchmark for wal_level = minimal.
sub bench2
{
print "benchmark for wal_level = minimal\n";
for (my $s = 0 ; $s <= 3.5 ; $s += 0.25)
{
my $ss = int(10 ** $s);
bench("size $ss: SYNC ", 1, $ss * $tupperpage, $ss * 8);
bench("size $ss: WAL ", 1, $ss * $tupperpage, ($ss + 1) * 8);
}
}
# find crossing point of WAL and SYNC by bisecting
sub bench3
{
print "find crossing point of WAL and SYNC by bisecting\n";
bench("SYNC: size 0", 1, 1, 8);
bench("WAL : size 0", 1, 1, 16);
my $s = 1;
my $st = 10000;
while (1)
{
my $ts = bench("SYNC: size $s", $tupperpage * $s, $s * 8);
my $tw = bench("WAL : size $s", $tupperpage * $s, ($s + 1) * 8);
if ($st < 1.0){
print "DONE\n";
exit(0);
}
if ($ts > $tw)
{
$s += $st; $st /= 2;
}
else
{
$s -= $st; $st /= 2;
}
}
}
# benchmark with multiple processes
sub bench4
{
my $nprocs = $ARGV[1];
print "benchmark for wal_level = minimal, $nprocs processes\n";
for (my $s = 1.0 ; $s <= 3.5 ; $s += 0.25)
{
my $ss = int(10 ** $s);
bench("size $ss: SYNC ", $nprocs, $ss * $tupperpage, $ss * 8);
bench("size $ss: WAL ", $nprocs, $ss * $tupperpage, ($ss + 1) * 8);
}
}
bench1() if ($ARGV[0] == 1);
bench2() if ($ARGV[0] == 2);
bench3() if ($ARGV[0] == 3);
bench4() if ($ARGV[0] == 4);
pgsql-hackers by date: