Re: [PATCH] Speedup truncates of relation forks - Mailing list pgsql-hackers

From Adrien Nayrat
Subject Re: [PATCH] Speedup truncates of relation forks
Date
Msg-id 2950c507-7487-55e4-574a-02b215a1abf2@anayrat.info
Whole thread Raw
In response to RE: [PATCH] Speedup truncates of relation forks  ("Jamison, Kirk" <k.jamison@jp.fujitsu.com>)
List pgsql-hackers
On 7/1/19 12:55 PM, Jamison, Kirk wrote:
> On Wednesday, June 26, 2019 6:10 PM(GMT+9), Adrien Nayrat wrote:
>> As far as I remember, you should see "relation" wait events (type lock) on
>> standby server. This is due to startup process acquiring AccessExclusiveLock
>> for the truncation and other backend waiting to acquire a lock to read the
>> table.
>
> Hi Adrien, thank you for taking time to reply.
>
> I understand that RelationTruncate() can block read-only queries on
> standby during redo. However, it's difficult for me to reproduce the
> test case where I need to catch that wait for relation lock, because
> one has to execute SELECT within the few milliseconds of redoing the
> truncation of one table.

Yes, that why your test by measuring vacuum execution time is better as it is
more reproductible.

>
> Instead, I just measured the whole recovery time, smgr_redo(),
> to show the recovery improvement compared to head. Please refer below.
>
> [Recovery Test]
> I used the same stored functions and configurations in the previous email
> & created "test" db.
>
> $ createdb test
> $ psql -d test
>
> 1. [Primary] Create 10,000 relations.
>     test=# SELECT create_tables(10000);
>
> 2. [P] Insert one row in each table.
>     test=# SELECT insert_tables(10000);
>
> 3. [P] Delete row of each table.
>     test=# SELECT delfrom_tables(10000);
>
> 4. [Standby] WAL application is stopped at Standby server.
>     test=# SELECT pg_wal_replay_pause();
>
> 5. [P] VACUUM is executed at Primary side, and measure its execution time.
>     test=# \timing on
>     test=# VACUUM;
>
>     Alternatively, you may use:
>     $ time psql -d test -c 'VACUUM;'
>     (Note: WAL has not replayed on standby because it's been paused.)
>
> 6. [P] Wait until VACUUM has finished execution. Then, stop primary server.
>     test=# pg_ctl stop -w
>
> 7. [S] Resume WAL replay, then promote standby (failover).
> I used a shell script to execute recovery & promote standby server
> because it's kinda difficult to measure recovery time. Please refer to the script below.
> - "SELECT pg_wal_replay_resume();" is executed and the WAL application is resumed.
> - "pg_ctl promote" to promote standby.
> - The time difference of "select pg_is_in_recovery();" from "t" to "f" is measured.
>
> shell script:
>
> PGDT=/path_to_storage_directory/
>
> if [ "$1" = "resume" ]; then
>     psql -c "SELECT pg_wal_replay_resume();" test
>     date +%Y/%m/%d_%H:%M:%S.%3N
>     pg_ctl promote -D ${PGDT}
>     set +x
>     date +%Y/%m/%d_%H:%M:%S.%3N
>     while [ 1 ]
>     do
>         RS=`psql -Atc "select pg_is_in_recovery();" test`
>         if [ ${RS} = "f" ]; then
>             break
>         fi
>     done
>     date +%Y/%m/%d_%H:%M:%S.%3N
>     set -x
>     exit 0
> fi
>
>
> [Test Results]
> shared_buffers = 24GB
>
> 1. HEAD
> (wal replay resumed)
> 2019/07/01_08:48:50.326
> server promoted
> 2019/07/01_08:49:50.482
> 2019/07/01_09:02:41.051
>
>  Recovery Time:
>  13 min 50.725 s -> Time difference from WAL replay to complete recovery
>  12 min 50.569 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f"
>
> 2. PATCH
> (wal replay resumed)
> 2019/07/01_07:34:26.766
> server promoted
> 2019/07/01_07:34:57.790
> 2019/07/01_07:34:57.809
>
>  Recovery Time:
>  31.043 s -> Time difference from WAL replay to complete recovery
>  00.019 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f"
>
> [Conclusion]
> The recovery time significantly improved compared to head
> from 13 minutes to 30 seconds.
>
> Any thoughts?
> I'd really appreciate your comments/feedback about the patch and/or test.
>
>

Thanks for the time you spend on this test, it is a huge win!
Although creating 10k tables and deleting tuples is not a common use case, it is
still good to know how your patch performs.
I will try to look deeper in your patch, but my knowledge on postgres internal
are limited :)

--
Adrien



Attachment

pgsql-hackers by date:

Previous
From: Nikolay Shaplov
Date:
Subject: Re: [PATCH][PROPOSAL] Add enum releation option type
Next
From: Prabhat Sahu
Date:
Subject: Re: Attached partition not considering altered column properties ofroot partition.