Fix for visibility check on 14.5 fails on tpcc with high concurrency - Mailing list pgsql-hackers

From Dimos Stamatakis
Subject Fix for visibility check on 14.5 fails on tpcc with high concurrency
Date
Msg-id CO2PR0801MB2310579F65529380A4E5EDC0E20A9@CO2PR0801MB2310.namprd08.prod.outlook.com
Whole thread Raw
Responses Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency
List pgsql-hackers

Hi hackers,

 

When running tpcc on sysbench with high concurrency (96 threads, scale factor 5) we realized that a fix for visibility check (introduced in PG-14.5) causes sysbench to fail in 1 out of 70 runs.

The error is the following:

 

SQL error, errno = 0, state = 'XX000': new multixact has more than one updating member

 

And it is caused by the following statement:

 

UPDATE warehouse1

          SET w_ytd = w_ytd + 234

          WHERE w_id = 3;

 

The commit that fixes the visibility check is the following:

https://github.com/postgres/postgres/commit/e24615a0057a9932904317576cf5c4d42349b363

 

We reverted this commit and tpcc does not fail anymore, proving that this change is problematic.

Steps to reproduce:

1. Install sysbench

  https://github.com/akopytov/sysbench

2. Install percona sysbench TPCC

  https://github.com/Percona-Lab/sysbench-tpcc

3. Run percona sysbench -- prepare

  # sysbench-tpcc/tpcc.lua --pgsql-host=localhost --pgsql-port=5432 --pgsql-user={USER} --pgsql-password={PASSWORD} --pgsql-db=test_database --db-driver=pgsql --tables=1 --threads=96 --scale=5 --time=60 prepare

4. Run percona sysbench -- run

  # sysbench-tpcc/tpcc.lua --pgsql-host=localhost --pgsql-port=5432 --pgsql-user={USER} --pgsql-password={PASSWORD} --pgsql-db=test_database --db-driver=pgsql --tables=1 --report-interval=1 --rand-seed=1 --threads=96 --scale=5 --time=60 run

 

We tested on a machine with 2 NUMA nodes, 16 physical cores per node, and 2 threads per core, resulting in 64 threads total. The total memory is 376GB.

Attached please find the configuration file we used (postgresql.conf).

 

This commit was supposed to fix a race condition during the visibility check. Please let us know whether you are aware of this issue and if there is a quick fix.

Any input is highly appreciated.

 

Thanks,

Dimos

[ServiceNow]

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Fix comments atop pg_get_replication_slots
Next
From: Amit Kapila
Date:
Subject: Re: Logical Replication Custom Column Expression