Hi hackers,
When running tpcc on sysbench with high concurrency (96 threads, scale factor 5) we realized that a fix for visibility check (introduced in PG-14.5) causes sysbench to fail in 1 out of 70 runs.
The error is the following:
SQL error, errno = 0, state = 'XX000': new multixact has more than one updating member
And it is caused by the following statement:
UPDATE warehouse1
SET w_ytd = w_ytd + 234
WHERE w_id = 3;
The commit that fixes the visibility check is the following:
https://github.com/postgres/postgres/commit/e24615a0057a9932904317576cf5c4d42349b363
We reverted this commit and tpcc does not fail anymore, proving that this change is problematic.
Steps to reproduce:
1. Install sysbench
https://github.com/akopytov/sysbench
2. Install percona sysbench TPCC
https://github.com/Percona-Lab/sysbench-tpcc
3. Run percona sysbench -- prepare
# sysbench-tpcc/tpcc.lua --pgsql-host=localhost --pgsql-port=5432 --pgsql-user={USER} --pgsql-password={PASSWORD} --pgsql-db=test_database --db-driver=pgsql --tables=1 --threads=96 --scale=5 --time=60 prepare
4. Run percona sysbench -- run
# sysbench-tpcc/tpcc.lua --pgsql-host=localhost --pgsql-port=5432 --pgsql-user={USER} --pgsql-password={PASSWORD} --pgsql-db=test_database --db-driver=pgsql --tables=1 --report-interval=1 --rand-seed=1 --threads=96 --scale=5 --time=60 run
We tested on a machine with 2 NUMA nodes, 16 physical cores per node, and 2 threads per core, resulting in 64 threads total. The total memory is 376GB.
Attached please find the configuration file we used (postgresql.conf).
This commit was supposed to fix a race condition during the visibility check. Please let us know whether you are aware of this issue and if there is a quick fix.
Any input is highly appreciated.
Thanks,
Dimos
[ServiceNow]