Thread: pgbench client-side performance issue on large scripts
Hi, On large scripts, pgbench happens to consume a lot of CPU time. For instance, with a script consisting of 50000 "SELECT 1;" I see "pgbench -f 50k-select.sql" taking about 5.8 secs of CPU time, out of a total time of 6.7 secs. When run with perf, this profile shows up: 81,10% pgbench pgbench [.] expr_scanner_get_lineno 0,36% pgbench [unknown] [k] 0xffffffffac90410b 0,33% pgbench [unknown] [k] 0xffffffffac904109 0,23% pgbench libpq.so.5.18 [.] pqParseInput3 0,21% pgbench [unknown] [k] 0xffffffffac800080 0,17% pgbench pgbench [.] advanceConnectionState 0,15% pgbench [unknown] [k] 0xffffffffac904104 0,14% pgbench libpq.so.5.18 [.] PQmakeEmptyPGresult In ParseScript(), expr_scanner_get_lineno() is called for each line with its current offset, and it scans the script from the beginning up to the current line. I think that on the whole, parsing this script ends up looking at (N*(N+1))/2 lines, which is 1.275 billion lines if N=50000. Since it only need the current line number in case of certain errors with \gset, I've made the trivial attached fix calling expr_scanner_get_lineno() only in these cases. This moves the CPU consumption to a more reasonable 0.1s in the above test case (with the drawback of having the line number pointing one line after). However, there is another caller, process_backslash_command() which is not amenable to the same kind of easy fix. A large script having backslash commands near the end is also penalized, and it would be nice to fix that as well. I wonder whether pgbench should materialize the current line number in a variable, as psql does in pset.lineno. But given that there are two different parsers in pgbench, maybe it's not the simplest. Flex has yylineno but neither pgbench nor psql make use of it. I thought I would ask before going further into this, as there might be a better method that I don't see, being unfamiliar with that code and with flex/bison. WDYT? Best regards, -- Daniel Vérité https://postgresql.verite.pro/
Attachment
"Daniel Verite" <daniel@manitou-mail.org> writes: > On large scripts, pgbench happens to consume a lot of CPU time. > For instance, with a script consisting of 50000 "SELECT 1;" > I see "pgbench -f 50k-select.sql" taking about 5.8 secs of CPU time, > out of a total time of 6.7 secs. When run with perf, this profile shows up: You ran only a single execution of a 50K-line script? This test case feels a little bit artificial. Having said that ... > In ParseScript(), expr_scanner_get_lineno() is called for each line > with its current offset, and it scans the script from the beginning > up to the current line. I think that on the whole, parsing this script > ends up looking at (N*(N+1))/2 lines, which is 1.275 billion lines > if N=50000. ... yes, O(N^2) is not nice. It has to be possible to do better. > I wonder whether pgbench should materialize the current line number > in a variable, as psql does in pset.lineno. But given that there are > two different parsers in pgbench, maybe it's not the simplest. > Flex has yylineno but neither pgbench nor psql make use of it. Yeah, we do rely on yylineno in bootscanner.l and ecpg, but not elsewhere; not sure if there's a performance reason for that. I see that plpgsql has a hand-rolled version (look for cur_line_num) that perhaps could be stolen. regards, tom lane
I wrote: > Yeah, we do rely on yylineno in bootscanner.l and ecpg, but not > elsewhere; not sure if there's a performance reason for that. Ah: the flex manual specifically calls out "%option yylineno" as something that has a moderate performance cost. So that's why we don't use it except in non-performance-critical scanners. Now, it could be argued that pgbench's script scanner doesn't rise to that level of performance-criticalness, especially not if we're paying the cost of counting newlines some other way. I'm not excited about doing a lot of performance analysis here to decide that. I think we could steal plpgsql's idea to keep the code structure basically the same while avoiding the O(N^2) repeat scans, and that should be enough. regards, tom lane