RE: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers
From | k.jamison@fujitsu.com |
---|---|
Subject | RE: [Patch] Optimize dropping of relation buffers using dlist |
Date | |
Msg-id | OSBPR01MB2341B75F355B285199AD1BD3EFD00@OSBPR01MB2341.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | RE: [Patch] Optimize dropping of relation buffers using dlist ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>) |
Responses |
RE: [Patch] Optimize dropping of relation buffers using dlist
Re: [Patch] Optimize dropping of relation buffers using dlist |
List | pgsql-hackers |
On Wed, January 6, 2021 7:04 PM (JST), I wrote: > I will resume the test similar to Tang, because she also executed the original > failover test which I have been doing before. > To avoid confusion and to check if the results from mine and Tang are > consistent, I also did the recovery/failover test for VACUUM on single relation, > which I will send in a separate email after this. A. Test to find the right THRESHOLD So below are the procedures and results of the VACUUM recovery performance test on single relation. I followed the advice below and applied the supplementary patch on top of V39: Test-for-threshold.patch This will ensure that we'll always enter the optimized path. We're gonna use the threshold then as the relation size. > >One idea could be to remove "nBlocksToInvalidate < > >BUF_DROP_FULL_SCAN_THRESHOLD" part of check "if (cached && > >nBlocksToInvalidate < BUF_DROP_FULL_SCAN_THRESHOLD)" so that it > >always use optimized path for the tests. Then use the relation size > >as NBuffers/128, NBuffers/256, NBuffers/512 for different values of > >shared buffers as 128MB, 1GB, 20GB, 100GB. Each relation size is NBuffers/XXX, so I used the attached "rel.sh" script to test from NBuffers/512 until NBuffers/8 relation size per shared_buffers. I did not go further beyond 8 because it took too much time, and I could already observe significant results until that. [Vacuum Recovery Performance on Single Relation] 1. Setup synchronous streaming replication. I used the configuration written at the bottom of this email. 2. [Primary] Create 1 table. (rel.sh create) 3. [Primary] Insert data of NBuffers/XXX size. Make sure to use the correct size for the set shared_buffers by commenting out the right size in "insert" of rel.sh script. (rel.sh insert) 4. [Primary] Delete table. (rel.sh delete) 5. [Standby] Optional: To double-check that DELETE is reflected on standby. SELECT count(*) FROM tableXXX; Make sure it returns 0. 6. [Standby] Pause WAL replay. (rel.sh pause) (This script will execute SELECT pg_wal_replay_pause(); .) 7. [Primary] VACUUM the single relation. (rel.sh vacuum) 8. [Primary] After the vacuum finishes, stop the server. (rel.sh stop) (The script will execute pg_ctl stop -D $PGDATA -w -mi) 9. [Standby] Resume WAL replay and promote the standby. (rel.sh resume) It basically prints a timestamp when resuming WAL replay, and prints another timestamp when the promotion is done. Compute the time difference. [Results for VACUUM on single relation] Average of 5 runs. 1. % REGRESSION % Regression: (patched - master)/master | rel_size | 128MB | 1GB | 20GB | 100GB | |----------|--------|--------|--------|----------| | NB/512 | 0.000% | 0.000% | 0.000% | -32.680% | | NB/256 | 0.000% | 0.000% | 0.000% | 0.000% | | NB/128 | 0.000% | 0.000% | 0.000% | -16.502% | | NB/64 | 0.000% | 0.000% | 0.000% | -9.841% | | NB/32 | 0.000% | 0.000% | 0.000% | -6.219% | | NB/16 | 0.000% | 0.000% | 0.000% | 3.323% | | NB/8 | 0.000% | 0.000% | 0.000% | 8.178% | For 100GB shared_buffers, we can observe regression beyond NBuffers/32. So with this, we can conclude that NBuffers/32 is the right threshold. For NBuffers/16 and beyond, the patched performs worse than master. In other words, the cost of for finding to be invalidated buffers gets higher in the optimized path than the traditional path. So in attached V39 patches, I have updated the threshold BUF_DROP_FULL_SCAN_THRESHOLD to NBuffers/32. 2. [PATCHED] Units: Seconds | rel_size | 128MB | 1GB | 20GB | 100GB | |----------|-------|-------|-------|-------| | NB/512 | 0.106 | 0.106 | 0.106 | 0.206 | | NB/256 | 0.106 | 0.106 | 0.106 | 0.306 | | NB/128 | 0.106 | 0.106 | 0.206 | 0.506 | | NB/64 | 0.106 | 0.106 | 0.306 | 0.907 | | NB/32 | 0.106 | 0.106 | 0.406 | 1.508 | | NB/16 | 0.106 | 0.106 | 0.706 | 3.109 | | NB/8 | 0.106 | 0.106 | 1.307 | 6.614 | 3. MASTER Units: Seconds | rel_size | 128MB | 1GB | 20GB | 100GB | |----------|-------|-------|-------|-------| | NB/512 | 0.106 | 0.106 | 0.106 | 0.306 | | NB/256 | 0.106 | 0.106 | 0.106 | 0.306 | | NB/128 | 0.106 | 0.106 | 0.206 | 0.606 | | NB/64 | 0.106 | 0.106 | 0.306 | 1.006 | | NB/32 | 0.106 | 0.106 | 0.406 | 1.608 | | NB/16 | 0.106 | 0.106 | 0.706 | 3.009 | | NB/8 | 0.106 | 0.106 | 1.307 | 6.114 | I used the following configurations: [postgesql.conf] shared_buffers = 100GB #20GB,1GB,128MB autovacuum = off full_page_writes = off checkpoint_timeout = 30min max_locks_per_transaction = 10000 max_wal_size = 20GB # For streaming replication from primary. Don't uncomment on Standby. synchronous_commit = remote_write synchronous_standby_names = 'walreceiver' # For Standby. Don't uncomment on Primary. # hot_standby = on #primary_conninfo = 'host=... user=... port=... application_name=walreceiver' ---------- B. Regression Test using the NBuffers/32 Threshold (V39 Patches) For this one, we do NOT need the supplementary Test-for-threshold.patch. Apply only the V39 patches. But instead of using "rel.sh" test script, please use the attached "test.sh". Similar to the tests I did before for 1000 relations, I executed the recovery performance test, now with the threshold NBuffers/32. The configuration setting in postgresql.conf is similar to the test above. Each relation has 1 block, 8kB size. Total of 1000 relations. Test procedures is almost similar to A, so I'll just summarize it, 1. Setup synchronous streaming replication and config settings. 2. [Primary] test.sh create (The test.sh script will create 1000 tables) 3. [Primary] test.sh insert 4. [Primary] test.sh delete (Skip step 4-5 for TRUNCATE test) 5. [Standby] Optional for VACUUM test: To double-check that DELETE is reflected on standby. SELECT count(*) FROM tableXXX; Make sure it returns 0. 6. [Standby] test.sh pause 7. [Primary] "test.sh vacuum" for VACUUM test "test,sh truncate" for TRUNCATE test 8. [Primary] If #7 is done, test.sh stop 9. [Standby] If primary is fully stopped, run "test.sh resume". Compute the time difference. [Results for VACUUM Recovery Performance for 1000 relations] Unit is in seconds. Average of 5 executions. % regression = (patched-master)/master | s_b | Master | Patched | %reg | |--------|--------|---------|---------| | 128 MB | 0.306 | 0.306 | 0.00% | | 1 GB | 0.506 | 0.306 | -39.53% | | 20 GB | 14.522 | 0.306 | -97.89% | | 100 GB | 66.564 | 0.306 | -99.54% | [Results for TRUNCATE Recovery Performance for 1000 relations] Unit is in seconds. Average of 5 executions. % regression = (patched-master)/master | s_b | Master | Patched | %reg | |--------|--------|---------|---------| | 128 MB | 0.206 | 0.206 | 0.00% | | 1 GB | 0.506 | 0.206 | -59.29% | | 20 GB | 16.476 | 0.206 | -98.75% | | 100 GB | 88.261 | 0.206 | -99.77% | The results for the patched were constant for all shared_buffers settings for both TRUNCATE and VACUUM. That means we can gain huge performance benefits with the patch. The performance benefits have been tested a lot so there's no question about that. So I think the final decision for value of threshold would come if the results will be consistent with others. For now, in my test results, the threshold NBuffers/32 is what I concluded. It's already indicated in the attached V39 patch set. [Specs Used] Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz 8 CPUs, 256GB Memory XFS, RHEL7.2, latest Postgres(Head version) Feedbacks are definitely welcome. And if you want to test, I have already indicated the detailed steps including the scripts I used. Have fun testing! Regards, Kirk Jamison
Attachment
pgsql-hackers by date: