Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables - Mailing list pgsql-hackers

From feichanghong
Subject Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables
Date
Msg-id tencent_427567C207B10908071B828A92D8FB638606@qq.com
Whole thread Raw
In response to Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables  (wenhui qiu <qiuwenhuifx@gmail.com>)
List pgsql-hackers

Hi wenhui,


I carefully analyzed the reason for the performance regression with fewer

temporary tables in the previous patch (v1-0002-): the k_hash_funcs determined

by the bloom_create function were 10(MAX_HASH_FUNCS), which led to an excessive

calculation overhead for the bloom filter.


Based on the calculation formula for the bloom filter, when the number of items

is 100 and k_hash_funcs is 2, the false positive rate for a 1KB bloom filter is

0.0006096; when the number of items is 1000, the false positive rate is

0.048929094. Therefore, k_hash_funcs of 2 can already achieve a decent false

positive rate, while effectively reducing the computational overhead of the

bloom filter.


I have re-implemented a bloom_create_v2 function to create a bloom filter with

a specified number of hash functions and specified memory size.


From the test data below, it can be seen that the new patch in the attachment

(v1-0003-) does not lead to performance regression in any scenario.

Furthermore, the default threshold value can be lowered to 2.


Here is the TPS performance data for different numbers of temporary tables

under different thresholds, as compared with the head (98347b5a). The testing

tool used is pgbench, with the workload being to insert into one temporary

table (when the number of temporary tables is 0, the workload is SELECT 1):


|tablenum      |0           |1           |2          |5          |10         |100        |1000       |

|--------------|------------|------------|-----------|-----------|-----------|-----------|-----------|

|head(98347b5a)|39912.722209|10064.306268|9183.871298|7452.071689|5641.487369|1073.203851|114.530958 |

|threshold-2   |40097.047974|10009.598155|9982.172866|9955.651235|9999.338901|9785.626296|8278.828828|


Here is the TPS performance data for different numbers of temporary tables

at a threshold of 2, compared with the head (commit 98347b5a). The testing tool

is pgbench, with the workload being to insert into all temporary tables:


|table num     |1          |2          | 5         |10         |100       |1000     |

|--------------|-----------|-----------|-----------|-----------|----------|---------|

|head(98347b5a)|7243.945042|5734.545012|3627.290594|2262.594766|297.856756|27.745808|

|threshold-2   |7289.171381|5740.849676|3626.135510|2207.439931|293.145036|27.020953|



I have previously researched the implementation of the Global Temp Table (GTT)

you mentioned, and it have been used in Alibaba Cloud's PolarDB (Link [1]).

GTT can prevent truncation operations on temporary tables that have not been

accessed by the current session (those not in the OnCommitItem List), but GTT

that have been accessed by the current session still need to be truncated at

commit time.Therefore, GTT also require the optimizations mentioned in the

above patch.



[1] https://www.alibabacloud.com/help/en/polardb/polardb-for-oracle/using-global-temporary-tables?spm=a3c0i.23458820.2359477120.1.66e16e9bUpV7cK


Best Regards,
Fei Changhong
Attachment

pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Pluggable cumulative statistics
Next
From: Tom Lane
Date:
Subject: Re: Injection point locking