RE: [PATCH] Speedup truncates of relation forks - Mailing list pgsql-hackers
From | Jamison, Kirk |
---|---|
Subject | RE: [PATCH] Speedup truncates of relation forks |
Date | |
Msg-id | D09B13F772D2274BB348A310EE3027C6502672@g01jpexmbkw24 Whole thread Raw |
In response to | Re: [PATCH] Speedup truncates of relation forks (Adrien Nayrat <adrien.nayrat@anayrat.info>) |
Responses |
Re: [PATCH] Speedup truncates of relation forks
|
List | pgsql-hackers |
On Wednesday, June 26, 2019 6:10 PM(GMT+9), Adrien Nayrat wrote: > As far as I remember, you should see "relation" wait events (type lock) on > standby server. This is due to startup process acquiring AccessExclusiveLock > for the truncation and other backend waiting to acquire a lock to read the > table. Hi Adrien, thank you for taking time to reply. I understand that RelationTruncate() can block read-only queries on standby during redo. However, it's difficult for me to reproduce the test case where I need to catch that wait for relation lock, because one has to execute SELECT within the few milliseconds of redoing the truncation of one table. Instead, I just measured the whole recovery time, smgr_redo(), to show the recovery improvement compared to head. Please refer below. [Recovery Test] I used the same stored functions and configurations in the previous email & created "test" db. $ createdb test $ psql -d test 1. [Primary] Create 10,000 relations. test=# SELECT create_tables(10000); 2. [P] Insert one row in each table. test=# SELECT insert_tables(10000); 3. [P] Delete row of each table. test=# SELECT delfrom_tables(10000); 4. [Standby] WAL application is stopped at Standby server. test=# SELECT pg_wal_replay_pause(); 5. [P] VACUUM is executed at Primary side, and measure its execution time. test=# \timing on test=# VACUUM; Alternatively, you may use: $ time psql -d test -c 'VACUUM;' (Note: WAL has not replayed on standby because it's been paused.) 6. [P] Wait until VACUUM has finished execution. Then, stop primary server. test=# pg_ctl stop -w 7. [S] Resume WAL replay, then promote standby (failover). I used a shell script to execute recovery & promote standby server because it's kinda difficult to measure recovery time. Please refer to the script below. - "SELECT pg_wal_replay_resume();" is executed and the WAL application is resumed. - "pg_ctl promote" to promote standby. - The time difference of "select pg_is_in_recovery();" from "t" to "f" is measured. shell script: PGDT=/path_to_storage_directory/ if [ "$1" = "resume" ]; then psql -c "SELECT pg_wal_replay_resume();" test date +%Y/%m/%d_%H:%M:%S.%3N pg_ctl promote -D ${PGDT} set +x date +%Y/%m/%d_%H:%M:%S.%3N while [ 1 ] do RS=`psql -Atc "select pg_is_in_recovery();" test` if [ ${RS} = "f" ]; then break fi done date +%Y/%m/%d_%H:%M:%S.%3N set -x exit 0 fi [Test Results] shared_buffers = 24GB 1. HEAD (wal replay resumed) 2019/07/01_08:48:50.326 server promoted 2019/07/01_08:49:50.482 2019/07/01_09:02:41.051 Recovery Time: 13 min 50.725 s -> Time difference from WAL replay to complete recovery 12 min 50.569 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f" 2. PATCH (wal replay resumed) 2019/07/01_07:34:26.766 server promoted 2019/07/01_07:34:57.790 2019/07/01_07:34:57.809 Recovery Time: 31.043 s -> Time difference from WAL replay to complete recovery 00.019 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f" [Conclusion] The recovery time significantly improved compared to head from 13 minutes to 30 seconds. Any thoughts? I'd really appreciate your comments/feedback about the patch and/or test. Regards, Kirk Jamison
pgsql-hackers by date: