[HACKERS] Reducing runtime of stats regression test - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | [HACKERS] Reducing runtime of stats regression test |
Date | |
Msg-id | 17795.1493869423@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: [HACKERS] Reducing runtime of stats regression test
(Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: [HACKERS] Reducing runtime of stats regression test (Noah Misch <noah@leadboat.com>) |
List | pgsql-hackers |
On a reasonably fast development machine, one of the biggest time sinks while running the core regression tests is the long "sleep" calls in the stats.sql regression test. I took a closer look at these, and I think we could basically get rid of them. First up is this bit at the beginning of that test script: -- wait to let any prior tests finish dumping out stats; -- else our messages might get lost due to contention SELECT pg_sleep_for('2 seconds'); The stated concern isn't really all that plausible, since even if we launch a batch of test scripts at once, they don't all finish at once, so there's unlikely to be a big pileup of traffic to the stats collector. But we don't have to take that on faith: in assert-enabled builds, pgstat_send() logs any failure to send a stats message. I have grepped the buildfarm logs for "could not send to statistics collector" log messages during the "make check" stage (a total of 754313 runs dating back ten years). What I find is that members mereswine and gull occasionally report "Network is down", and a few times currawong and thrips have complained "Invalid argument", and there are exactly no other such messages. In particular there are no EAGAIN or EWOULDBLOCK failures that would suggest congestion on the stats collector's input. So this is basically not something that happens at all in the regression tests, let alone during startup of the stats test in particular. Now, another failure mechanism that could conceivably be ameliorated by this initial wait is if one of the immediately preceding runs has performed a scan on tenk2 but doesn't manage to transmit that info before stats.sql creates its initial copy of the stats counts. Then when that count does get sent, it could look like one triggered by stats.sql itself, fooling the test. But that seems rather unlikely, because tenk2 isn't touched by very many test scripts. And even if it did happen that would not cause an observed test failure; at worst it would obscure a failure that we should have detected. I'm doubtful that this is worth losing any sleep over. In short, I think we could just drop the above wait altogether, and be no worse off. The only useful thing it's doing for us is exercising pg_sleep_for(), which is otherwise untested ... but we can transfer that responsibility into wait_for_stats(). The other significant delay in stats.sql is -- force the rate-limiting logic in pgstat_report_stat() to time out -- and send a message SELECT pg_sleep(1.0); Now, we do seem to need a delay there, because the rate-limiting logic is unlikely to have permitted the count from the immediately preceding statement to have gotten sent right then, and the count won't get sent at all while we're inside wait_for_stats (since backends only send stats just before going idle). But there's more than one way to skin this cat. We can just start a new connection with \c, and let wait_for_stats wait for the old one to send its stats before quitting. Even on my oldest and slowest buildfarm machines, starting a new session takes well under one second. In short then, I propose the attached patch, which reduces the runtime of stats.sql by a shade under 3 seconds. Given that the runtime of "make installcheck-parallel" is circa 15-17 seconds on typical current hardware, that's a nice gain. I'm not sure about backpatching. I think developers mostly only care about regression tests on HEAD, and the savings is relatively less exciting on the buildfarm, since it doesn't get any bigger on slower machines. Thoughts? regards, tom lane diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out index a811265..91c061b 100644 *** a/src/test/regress/expected/stats.out --- b/src/test/regress/expected/stats.out *************** SET enable_seqscan TO on; *** 16,31 **** SET enable_indexscan TO on; -- for the moment, we don't want index-only scans here SET enable_indexonlyscan TO off; - -- wait to let any prior tests finish dumping out stats; - -- else our messages might get lost due to contention - SELECT pg_sleep_for('2 seconds'); - pg_sleep_for - -------------- - - (1 row) - -- save counters ! CREATE TEMP TABLE prevstats AS SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch, (b.heap_blks_read + b.heap_blks_hit) AS heap_blks, (b.idx_blks_read + b.idx_blks_hit) AS idx_blks, --- 16,23 ---- SET enable_indexscan TO on; -- for the moment, we don't want index-only scans here SET enable_indexonlyscan TO off; -- save counters ! CREATE TABLE prevstats AS SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch, (b.heap_blks_read + b.heap_blks_hit) AS heap_blks, (b.idx_blks_read + b.idx_blks_hit) AS idx_blks, *************** begin *** 66,72 **** exit when updated1 and updated2 and updated3; -- wait a little ! perform pg_sleep(0.1); -- reset stats snapshot so we can test again perform pg_stat_clear_snapshot(); --- 58,64 ---- exit when updated1 and updated2 and updated3; -- wait a little ! perform pg_sleep_for('100 milliseconds'); -- reset stats snapshot so we can test again perform pg_stat_clear_snapshot(); *************** SELECT count(*) FROM tenk2 WHERE unique1 *** 143,156 **** 1 (1 row) ! -- force the rate-limiting logic in pgstat_report_stat() to time out ! -- and send a message ! SELECT pg_sleep(1.0); ! pg_sleep ! ---------- ! ! (1 row) ! -- wait for stats collector to update SELECT wait_for_stats(); wait_for_stats --- 135,147 ---- 1 (1 row) ! -- We can't just call wait_for_stats() at this point, because we only ! -- transmit stats when the session goes idle, and we probably didn't ! -- transmit the last couple of counts yet thanks to the rate-limiting logic ! -- in pgstat_report_stat(). But instead of waiting for the rate limiter's ! -- timeout to elapse, let's just start a new session. The old one will ! -- then send its stats before dying. ! \c - -- wait for stats collector to update SELECT wait_for_stats(); wait_for_stats *************** FROM prevstats AS pr; *** 199,202 **** --- 190,194 ---- (1 row) DROP TABLE trunc_stats_test, trunc_stats_test1, trunc_stats_test2, trunc_stats_test3, trunc_stats_test4; + DROP TABLE prevstats; -- End of Stats Test diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql index b3e2efa..1c21f38 100644 *** a/src/test/regress/sql/stats.sql --- b/src/test/regress/sql/stats.sql *************** SET enable_indexscan TO on; *** 14,25 **** -- for the moment, we don't want index-only scans here SET enable_indexonlyscan TO off; - -- wait to let any prior tests finish dumping out stats; - -- else our messages might get lost due to contention - SELECT pg_sleep_for('2 seconds'); - -- save counters ! CREATE TEMP TABLE prevstats AS SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch, (b.heap_blks_read + b.heap_blks_hit) AS heap_blks, (b.idx_blks_read + b.idx_blks_hit) AS idx_blks, --- 14,21 ---- -- for the moment, we don't want index-only scans here SET enable_indexonlyscan TO off; -- save counters ! CREATE TABLE prevstats AS SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch, (b.heap_blks_read + b.heap_blks_hit) AS heap_blks, (b.idx_blks_read + b.idx_blks_hit) AS idx_blks, *************** begin *** 61,67 **** exit when updated1 and updated2 and updated3; -- wait a little ! perform pg_sleep(0.1); -- reset stats snapshot so we can test again perform pg_stat_clear_snapshot(); --- 57,63 ---- exit when updated1 and updated2 and updated3; -- wait a little ! perform pg_sleep_for('100 milliseconds'); -- reset stats snapshot so we can test again perform pg_stat_clear_snapshot(); *************** SELECT count(*) FROM tenk2; *** 137,145 **** -- do an indexscan SELECT count(*) FROM tenk2 WHERE unique1 = 1; ! -- force the rate-limiting logic in pgstat_report_stat() to time out ! -- and send a message ! SELECT pg_sleep(1.0); -- wait for stats collector to update SELECT wait_for_stats(); --- 133,145 ---- -- do an indexscan SELECT count(*) FROM tenk2 WHERE unique1 = 1; ! -- We can't just call wait_for_stats() at this point, because we only ! -- transmit stats when the session goes idle, and we probably didn't ! -- transmit the last couple of counts yet thanks to the rate-limiting logic ! -- in pgstat_report_stat(). But instead of waiting for the rate limiter's ! -- timeout to elapse, let's just start a new session. The old one will ! -- then send its stats before dying. ! \c - -- wait for stats collector to update SELECT wait_for_stats(); *************** SELECT pr.snap_ts < pg_stat_get_snapshot *** 165,168 **** --- 165,169 ---- FROM prevstats AS pr; DROP TABLE trunc_stats_test, trunc_stats_test1, trunc_stats_test2, trunc_stats_test3, trunc_stats_test4; + DROP TABLE prevstats; -- End of Stats Test -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: