Thread: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY
Hello, hackers.
While testing my work on (1) I was struggling with addressing a strange issue with ON CONFLICT UPDATE and REINDEX CONCURRENTLY.
After some time, I have realized the same issue persists on the master branch as well :)
I have prepared two TAP tests to reproduce the issues (2), also in attachment.
First one, does the next thing:
CREATE UNLOGGED TABLE tbl(i int primary key, updated_at timestamp);
CREATE INDEX idx ON tbl(i, updated_at); -- it is not required to reproduce but make it to happen faster
Then it runs next scripts with pgbench concurrently:
1) INSERT INTO tbl VALUES(13,now()) on conflict(i) do update set updated_at = now();
2) INSERT INTO tbl VALUES(42,now()) on conflict(i) do update set updated_at = now();
3) INSERT INTO tbl VALUES(69,now()) on conflict(i) do update set updated_at = now();
Also, during pgbench the next command is run in the loop:
REINDEX INDEX CONCURRENTLY tbl_pkey;
For some time, everything looks more-less fine (except live locks, but this is the issue for the next test).
But after some time, about a minute or so (on ~3000th REINDEX) it just fails like this:
make -C src/test/modules/test_misc/ check PROVE_TESTS='t/006_*'
# waiting for an about 3000, now is 2174, seconds passed : 84
# waiting for an about 3000, now is 2175, seconds passed : 84
# waiting for an about 3000, now is 2176, seconds passed : 84
# waiting for an about 3000, now is 2177, seconds passed : 84
# waiting for an about 3000, now is 2178, seconds passed : 84
# waiting for an about 3000, now is 2179, seconds passed : 84
# waiting for an about 3000, now is 2180, seconds passed : 84
# waiting for an about 3000, now is 2181, seconds passed : 84
# waiting for an about 3000, now is 2182, seconds passed : 84
# waiting for an about 3000, now is 2183, seconds passed : 84
# waiting for an about 3000, now is 2184, seconds passed : 84
# Failed test 'concurrent INSERTs, UPDATES and RC status (got 2 vs expected 0)'
# at t/006_concurrently_unique_fail.pl line 69.
# Failed test 'concurrent INSERTs, UPDATES and RC stderr /(?^:^$)/'
# at t/006_concurrently_unique_fail.pl line 69.
# 'pgbench: error: pgbench: error: client 4 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# client 15 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 9 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 8 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 3 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 2 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 12 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 18 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: pgbench:client 14 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# error: client 1 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 13 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 16 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 5 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: Run was aborted; the above results are incomplete.
# '
Probably something wrong with arbiter index selection for different backends. I am afraid it could be a symptom of a more serious issue.
-------------------------------------
The second test shows an interesting live lock state in the similar situation.
CREATE UNLOGGED TABLE tbl(i int primary key, n int);
CREATE INDEX idx ON tbl(i, n);
INSERT INTO tbl VALUES(13,1);
pgbench concurrently runs single command
INSERT INTO tbl VALUES(13,1) on conflict(i) do update set n = tbl.n + EXCLUDED.n;
And also reindexing in the loop
REINDEX INDEX CONCURRENTLY tbl_pkey;
After the start, a little bit strange issue happens
make -C src/test/modules/test_misc/ check PROVE_TESTS='t/007_*'
# going to start reindex, num tuples in table is 1
# reindex 0 done in 0.00704598426818848 seconds, num inserted during reindex tuples is 0 speed is 0 per second
# going to start reindex, num tuples in table is 7
# reindex 1 done in 0.453176021575928 seconds, num inserted during reindex tuples is 632 speed is 1394.60158947115 per second
# going to start reindex, num tuples in table is 647
# current n is 808, 808 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 811, 3 per one second
# current n is 917, 106 per one second
# current n is 1024, 107 per one second
# reindex 2 done in 8.4104950428009 seconds, num inserted during reindex tuples is 467 speed is 55.5258635340064 per second
# going to start reindex, num tuples in table is 1136
# current n is 1257, 233 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1490, 233 per one second
# reindex 3 done in 5.21368479728699 seconds, num inserted during reindex tuples is 411 speed is 78.8310026363446 per second
# going to start reindex, num tuples in table is 1566
In some moments, all CPUs all hot but 30 connections are unable to do any upsert. I think it may be somehow caused by two arbiter indexes (old and new reindexed one).
Best regards,
Mikhail.
[1]: https://www.postgresql.org/message-id/flat/CANtu0ogBOtd9ravu1CUbuZWgq6qvn1rny38PGKDPk9zzQPH8_A%40mail.gmail.com#d4be02ff70f3002522f9fadbd165d631
[2]: https://github.com/michail-nikolaev/postgres/commit/9446f944b415306d9e5d5ab98f69938d8f5ee87f
While testing my work on (1) I was struggling with addressing a strange issue with ON CONFLICT UPDATE and REINDEX CONCURRENTLY.
After some time, I have realized the same issue persists on the master branch as well :)
I have prepared two TAP tests to reproduce the issues (2), also in attachment.
First one, does the next thing:
CREATE UNLOGGED TABLE tbl(i int primary key, updated_at timestamp);
CREATE INDEX idx ON tbl(i, updated_at); -- it is not required to reproduce but make it to happen faster
Then it runs next scripts with pgbench concurrently:
1) INSERT INTO tbl VALUES(13,now()) on conflict(i) do update set updated_at = now();
2) INSERT INTO tbl VALUES(42,now()) on conflict(i) do update set updated_at = now();
3) INSERT INTO tbl VALUES(69,now()) on conflict(i) do update set updated_at = now();
Also, during pgbench the next command is run in the loop:
REINDEX INDEX CONCURRENTLY tbl_pkey;
For some time, everything looks more-less fine (except live locks, but this is the issue for the next test).
But after some time, about a minute or so (on ~3000th REINDEX) it just fails like this:
make -C src/test/modules/test_misc/ check PROVE_TESTS='t/006_*'
# waiting for an about 3000, now is 2174, seconds passed : 84
# waiting for an about 3000, now is 2175, seconds passed : 84
# waiting for an about 3000, now is 2176, seconds passed : 84
# waiting for an about 3000, now is 2177, seconds passed : 84
# waiting for an about 3000, now is 2178, seconds passed : 84
# waiting for an about 3000, now is 2179, seconds passed : 84
# waiting for an about 3000, now is 2180, seconds passed : 84
# waiting for an about 3000, now is 2181, seconds passed : 84
# waiting for an about 3000, now is 2182, seconds passed : 84
# waiting for an about 3000, now is 2183, seconds passed : 84
# waiting for an about 3000, now is 2184, seconds passed : 84
# Failed test 'concurrent INSERTs, UPDATES and RC status (got 2 vs expected 0)'
# at t/006_concurrently_unique_fail.pl line 69.
# Failed test 'concurrent INSERTs, UPDATES and RC stderr /(?^:^$)/'
# at t/006_concurrently_unique_fail.pl line 69.
# 'pgbench: error: pgbench: error: client 4 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# client 15 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 9 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 8 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 3 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 2 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 12 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 18 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: pgbench:client 14 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# error: client 1 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 13 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 16 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 5 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: Run was aborted; the above results are incomplete.
# '
Probably something wrong with arbiter index selection for different backends. I am afraid it could be a symptom of a more serious issue.
-------------------------------------
The second test shows an interesting live lock state in the similar situation.
CREATE UNLOGGED TABLE tbl(i int primary key, n int);
CREATE INDEX idx ON tbl(i, n);
INSERT INTO tbl VALUES(13,1);
pgbench concurrently runs single command
INSERT INTO tbl VALUES(13,1) on conflict(i) do update set n = tbl.n + EXCLUDED.n;
And also reindexing in the loop
REINDEX INDEX CONCURRENTLY tbl_pkey;
After the start, a little bit strange issue happens
make -C src/test/modules/test_misc/ check PROVE_TESTS='t/007_*'
# going to start reindex, num tuples in table is 1
# reindex 0 done in 0.00704598426818848 seconds, num inserted during reindex tuples is 0 speed is 0 per second
# going to start reindex, num tuples in table is 7
# reindex 1 done in 0.453176021575928 seconds, num inserted during reindex tuples is 632 speed is 1394.60158947115 per second
# going to start reindex, num tuples in table is 647
# current n is 808, 808 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 811, 3 per one second
# current n is 917, 106 per one second
# current n is 1024, 107 per one second
# reindex 2 done in 8.4104950428009 seconds, num inserted during reindex tuples is 467 speed is 55.5258635340064 per second
# going to start reindex, num tuples in table is 1136
# current n is 1257, 233 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1490, 233 per one second
# reindex 3 done in 5.21368479728699 seconds, num inserted during reindex tuples is 411 speed is 78.8310026363446 per second
# going to start reindex, num tuples in table is 1566
In some moments, all CPUs all hot but 30 connections are unable to do any upsert. I think it may be somehow caused by two arbiter indexes (old and new reindexed one).
Best regards,
Mikhail.
[1]: https://www.postgresql.org/message-id/flat/CANtu0ogBOtd9ravu1CUbuZWgq6qvn1rny38PGKDPk9zzQPH8_A%40mail.gmail.com#d4be02ff70f3002522f9fadbd165d631
[2]: https://github.com/michail-nikolaev/postgres/commit/9446f944b415306d9e5d5ab98f69938d8f5ee87f
Attachment
On Tue, Jun 11, 2024 at 01:00:00PM +0200, Michail Nikolaev wrote: > Probably something wrong with arbiter index selection for different > backends. I am afraid it could be a symptom of a more serious issue. ON CONFLICT selects an index that may be rebuilt in parallel of the REINDEX happening, and its contents may be incomplete. Isn't the issue that we may select as arbiter indexes stuff that's !indisvalid? Using the ccnew or ccold indexes would not be correct for the conflict resolutions. -- Michael
Attachment
Hello, Michael.
> Isn't the issue that we may select as arbiter indexes stuff that's !indisvalid?
As far as I can see (1) !indisvalid indexes are filtered out.
But... It looks like this choice is not locked in any way (2), so index_concurrently_swap or index_concurrently_set_dead can change this index after the decision is made, even despite WaitForLockersMultiple (3). In some cases, it may cause a data loss...
But I was unable to reproduce that using some random usleep(), however - maybe it is a wrong assumption.
Hello.
> But I was unable to reproduce that using some random usleep(), however - maybe it is a wrong assumption.
It seems like the assumption is correct - we may use an invalid index as arbiter due to race condition.
The attached patch adds a check for that case, and now the test fails like this:
# pgbench: error: client 16 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 9 script 1 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 7 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 9 script 1 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 7 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
I think It is even possible to see !alive index in the same situation (it is worse case), but I was unable to reproduce it so far.
Best regards,
Mikhail.
Attachment
Hello, everyone.
> I think It is even possible to see !alive index in the same situation (it is worse case), but I was unable to reproduce it so far.
Fortunately, it is not possible.
So, seems like I have found the source of the problem:
1) infer_arbiter_indexes calls RelationGetIndexList to get the list of candidates.
> I think It is even possible to see !alive index in the same situation (it is worse case), but I was unable to reproduce it so far.
Fortunately, it is not possible.
So, seems like I have found the source of the problem:
1) infer_arbiter_indexes calls RelationGetIndexList to get the list of candidates.
It does no lock selected indexes in any additional way which prevents index_concurrently_swap changing them (set and clear validity).
RelationGetIndexList relcache.c:4857
infer_arbiter_indexes plancat.c:780
make_modifytable createplan.c:7097 ---------- node->arbiterIndexes = infer_arbiter_indexes(root);
create_modifytable_plan createplan.c:2826
create_plan_recurse createplan.c:532
create_plan createplan.c:349
standard_planner planner.c:421
planner planner.c:282
pg_plan_query postgres.c:904
pg_plan_queries postgres.c:996
exec_simple_query postgres.c:1193
2) other backend marks some index as invalid and commits
index_concurrently_swap index.c:1600
ReindexRelationConcurrently indexcmds.c:4115
ReindexIndex indexcmds.c:2814
ExecReindex indexcmds.c:2743
ProcessUtilitySlow utility.c:1567
standard_ProcessUtility utility.c:1067
ProcessUtility utility.c:523
PortalRunUtility pquery.c:1158
PortalRunMulti pquery.c:1315
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
3) first backend invalidates catalog snapshot because transactional snapshot
InvalidateCatalogSnapshot snapmgr.c:426
GetTransactionSnapshot snapmgr.c:278
PortalRunMulti pquery.c:1244
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
4) first backend copies indexes selected using previous catalog snapshot
ExecInitModifyTable nodeModifyTable.c:4499 -------- resultRelInfo->ri_onConflictArbiterIndexes = node->arbiterIndexes;
ExecInitNode execProcnode.c:177
InitPlan execMain.c:966
standard_ExecutorStart execMain.c:261
ExecutorStart execMain.c:137
ProcessQuery pquery.c:155
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
5) then reads indexes using new fresh snapshot
RelationGetIndexList relcache.c:4816
ExecOpenIndices execIndexing.c:175
ExecInsert nodeModifyTable.c:792 ------------- ExecOpenIndices(resultRelInfo, onconflict != ONCONFLICT_NONE);
ExecModifyTable nodeModifyTable.c:4059
ExecProcNodeFirst execProcnode.c:464
ExecProcNode executor.h:274
ExecutePlan execMain.c:1646
standard_ExecutorRun execMain.c:363
ExecutorRun execMain.c:304
ProcessQuery pquery.c:160
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
5) and uses arbiter selected with stale snapshot with new index view (marked as invalid)
ExecInsert nodeModifyTable.c:1016 -------------- arbiterIndexes = resultRelInfo->ri_onConflictArbiterIndexes;
............
ExecInsert nodeModifyTable.c:1048 ---------------if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate, conflictTid, arbiterIndexes))
ExecModifyTable nodeModifyTable.c:4059
ExecProcNodeFirst execProcnode.c:464
ExecProcNode executor.h:274
ExecutePlan execMain.c:1646
standard_ExecutorRun execMain.c:363
ExecutorRun execMain.c:304
ProcessQuery pquery.c:160
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
I have attached an updated test for the issue (it fails on assert quickly and uses only 2 backends).
The same issue may happen in case of CREATE/DROP INDEX CONCURRENTLY as well.
RelationGetIndexList relcache.c:4857
infer_arbiter_indexes plancat.c:780
make_modifytable createplan.c:7097 ---------- node->arbiterIndexes = infer_arbiter_indexes(root);
create_modifytable_plan createplan.c:2826
create_plan_recurse createplan.c:532
create_plan createplan.c:349
standard_planner planner.c:421
planner planner.c:282
pg_plan_query postgres.c:904
pg_plan_queries postgres.c:996
exec_simple_query postgres.c:1193
2) other backend marks some index as invalid and commits
index_concurrently_swap index.c:1600
ReindexRelationConcurrently indexcmds.c:4115
ReindexIndex indexcmds.c:2814
ExecReindex indexcmds.c:2743
ProcessUtilitySlow utility.c:1567
standard_ProcessUtility utility.c:1067
ProcessUtility utility.c:523
PortalRunUtility pquery.c:1158
PortalRunMulti pquery.c:1315
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
3) first backend invalidates catalog snapshot because transactional snapshot
InvalidateCatalogSnapshot snapmgr.c:426
GetTransactionSnapshot snapmgr.c:278
PortalRunMulti pquery.c:1244
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
4) first backend copies indexes selected using previous catalog snapshot
ExecInitModifyTable nodeModifyTable.c:4499 -------- resultRelInfo->ri_onConflictArbiterIndexes = node->arbiterIndexes;
ExecInitNode execProcnode.c:177
InitPlan execMain.c:966
standard_ExecutorStart execMain.c:261
ExecutorStart execMain.c:137
ProcessQuery pquery.c:155
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
5) then reads indexes using new fresh snapshot
RelationGetIndexList relcache.c:4816
ExecOpenIndices execIndexing.c:175
ExecInsert nodeModifyTable.c:792 ------------- ExecOpenIndices(resultRelInfo, onconflict != ONCONFLICT_NONE);
ExecModifyTable nodeModifyTable.c:4059
ExecProcNodeFirst execProcnode.c:464
ExecProcNode executor.h:274
ExecutePlan execMain.c:1646
standard_ExecutorRun execMain.c:363
ExecutorRun execMain.c:304
ProcessQuery pquery.c:160
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
5) and uses arbiter selected with stale snapshot with new index view (marked as invalid)
ExecInsert nodeModifyTable.c:1016 -------------- arbiterIndexes = resultRelInfo->ri_onConflictArbiterIndexes;
............
ExecInsert nodeModifyTable.c:1048 ---------------if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate, conflictTid, arbiterIndexes))
ExecModifyTable nodeModifyTable.c:4059
ExecProcNodeFirst execProcnode.c:464
ExecProcNode executor.h:274
ExecutePlan execMain.c:1646
standard_ExecutorRun execMain.c:363
ExecutorRun execMain.c:304
ProcessQuery pquery.c:160
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274
I have attached an updated test for the issue (it fails on assert quickly and uses only 2 backends).
The same issue may happen in case of CREATE/DROP INDEX CONCURRENTLY as well.
The simplest possible fix is to use ShareLock instead ShareUpdateExclusiveLock in the index_concurrently_swap
oldClassRel = relation_open(oldIndexId, ShareLock);
newClassRel = relation_open(newIndexId, ShareLock);
newClassRel = relation_open(newIndexId, ShareLock);
But this is not a "concurrent" way. But such update should be fast enough as far as I understand.
Best regards,
Mikhail.
Attachment
On Mon, Jun 17, 2024 at 07:00:51PM +0200, Michail Nikolaev wrote: > The simplest possible fix is to use ShareLock > instead ShareUpdateExclusiveLock in the index_concurrently_swap > > oldClassRel = relation_open(oldIndexId, ShareLock); > newClassRel = relation_open(newIndexId, ShareLock); > > But this is not a "concurrent" way. But such update should be fast enough > as far as I understand. Nope, that won't fly far. We should not use a ShareLock in this step or we are going to conflict with row exclusive locks, impacting all workloads when doing a REINDEX CONCURRENTLY. That may be a long shot, but the issue is that we do the swap of all the indexes in a single transaction, but do not wait for them to complete when committing the swap's transaction in phase 4. Your report is telling us that we really have a good reason to wait for all the transactions that may use these indexes to finish. One thing coming on top of my mind to keep things concurrent-safe while allowing a clean use of the arbiter indexes would be to stick a WaitForLockersMultiple() on AccessExclusiveLock just *before* the transaction commit of phase 4, say, lacking the progress report part: --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -4131,6 +4131,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein CommandCounterIncrement(); } + WaitForLockersMultiple(lockTags, AccessExclusiveLock, true); + /* Commit this transaction and make index swaps visible */ CommitTransactionCommand(); StartTransactionCommand(); This is a non-fresh Friday-afternoon idea, but it would make sure that we don't have any transactions using the indexes switched to _ccold with indisvalid that are waiting for a drop in phase 5. Your tests seem to pass with that, and that keeps the operation intact concurrent-wise (I'm really wishing for isolation tests with injection points just now, because I could use them here). > + Assert(indexRelation->rd_index->indislive); > + Assert(indexRelation->rd_index->indisvalid); > + > if (!indexRelation->rd_index->indimmediate) > ereport(ERROR, > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), This kind of validation check may be a good idea in the long term. That seems incredibly useful to me if we were to add more code paths that do concurrent index rebuilds, to make sure that we don't rely on an index we should not use at all. That's a HEAD-only thing IMO, though. -- Michael
Attachment
Hello, Michael!
> This is a non-fresh Friday-afternoon idea, but it would make sure that
> we don't have any transactions using the indexes switched to _ccold
> with indisvalid that are waiting for a drop in phase 5. Your tests
> seem to pass with that, and that keeps the operation intact
> concurrent-wise (I'm really wishing for isolation tests with injection
> points just now, because I could use them here).
> we don't have any transactions using the indexes switched to _ccold
> with indisvalid that are waiting for a drop in phase 5. Your tests
> seem to pass with that, and that keeps the operation intact
> concurrent-wise (I'm really wishing for isolation tests with injection
> points just now, because I could use them here).
Yes, I also have tried that approach, but it doesn't work, unfortunately.
You may fail test increasing number of connections:
'--no-vacuum --client=10 -j 2 --transactions=1000',
The source of the issue is not the swap of the indexes (and not related to REINDEX CONCURRENTLY only), but the fact that indexes are fetched once during planning (to find the arbiter), but then later reread with a new catalog snapshot for the the actual execution.
So, other possible fixes I see:
* fallback to replanning in case we see something changed during the execution
* select arbiter indexes during actual execution
> That's a HEAD-only thing IMO,
> though.
> though.
Do you mean that it needs to be moved to a separate patch?
Best regards,
Mikhail.
On Mon, Jun 17, 2024 at 07:00:51PM +0200, Michail Nikolaev wrote: > The same issue may happen in case of CREATE/DROP INDEX CONCURRENTLY as well. While looking at all that, I've been also curious about this specific point, and it is indeed possible to finish in a state where a duplicate key would be found in one of indexes selected by the executor during an INSERT ON CONFLICT while a concurrent set of CICs and DICs are run, so you don't really need a REINDEX. See for example the attached test. -- Michael
Attachment
On Fri, Jun 21, 2024 at 11:31:21AM +0200, Michail Nikolaev wrote: > Yes, I also have tried that approach, but it doesn't work, unfortunately. > You may fail test increasing number of connections: > > '--no-vacuum --client=10 -j 2 --transactions=1000', > > The source of the issue is not the swap of the indexes (and not related to > REINDEX CONCURRENTLY only), but the fact that indexes are fetched once > during planning (to find the arbiter), but then later reread with a new > catalog snapshot for the the actual execution. When I first saw this report, my main worry was that I have somewhat managed to break the state of the indexes leading to data corruption because of an incorrect step in the concurrent operations. However, as far as I can see this is not the case, as an effect of two properties we rely on for concurrent index operations, that hold in the executor and the planner. Simply put: - The planner ignores indexes with !indisvalid. - The executor ignores indexes with !indislive. The critical point is that we wait in DROP INDEX CONC and REINDEX CONC for any transactions still using an index that's waiting to be marked as !indislive, because such indexes *must* not be used in the executor. > So, other possible fixes I see: > * fallback to replanning in case we see something changed during the > execution > * select arbiter indexes during actual execution These two properties make ON CONFLICT react the way it should depending on the state of the indexes selected by the planner based on the query clauses, with changes reflecting when executing, with two patterns involved: - An index may be created in a concurrent operation after the planner has selected the arbiter indexes (the index may be defined, still not valid yet, or just created after), then the query execution would need to handle the extra index created available at execution, with a failure on a ccnew index. - An index may be selected at planning phase, then a different index could be used by a constraint once both indexes swap, with a failure on a ccold index. As far as I can see, it depends on what kind of query semantics and the amount of transparency you are looking for here in your application. An error in the query itself can also be defined as useful so as your application is aware of what happens as an effect of the concurrent index build (reindex or CIC/DIC), and it is not really clear to me why silently falling back to a re-selection of the arbiter indexes would be always better. Replanning could be actually dangerous if a workload is heavily concurrently REINDEX'd, as we could fall into a trap where a query can never decide which index to use. I'm not saying that it cannot be improved, but it's not completely clear to me what query semantics are the best for all users because the behavior of HEAD and your suggestions have merits and demerits. Anything we could come up with would be an improvement anyway, AFAIU. >> That's a HEAD-only thing IMO, >> though. > > Do you mean that it needs to be moved to a separate patch? It should, but I'm wondering if that's necessary for two reasons. First, a check on indisvalid would be incorrect, because indexes marked as !indisvalid && indislive mean that there is a concurrent operation happening, and that this concurrent operation is waiting for all transactions working with a lock on this index to finish before flipping the live flag and make this index invalid for decisions taken in the executor, like HOT updates, etc. A check on indislive may be an idea, still I'm slightly biased regarding its additional value because any indexes opened for a relation are fetched from the relcache with RelationGetIndexList() explaining why indislive indexes cannot be fetched, and we rely on that in the executor for the indexes opened by a relation. -- Michael
Attachment
Hello, Michael!
> As far as I can see, it depends on what kind of query semantics and
> the amount of transparency you are looking for here in your
> application. An error in the query itself can also be defined as
> useful so as your application is aware of what happens as an effect of
> the concurrent index build (reindex or CIC/DIC), and it is not really
> clear to me why silently falling back to a re-selection of the arbiter
> indexes would be always better.
From my point of view, INSERT ON CONFLICT UPDATE should never fail with "ERROR: duplicate key value violates unique constraint" because the main idea of upsert is to avoid such situations.
So, it is expected by majority and, probably, is even documented.
On the other side, REINDEX CONCURRENTLY should not cause any queries to fail accidentally without any clear reason.
Also, as you can see from the topic starter letter, we could see errors like this:
* ERROR: duplicate key value violates unique constraint "tbl_pkey"
* ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
* ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
So, the first error message does not provide any clue for the developer to understand what happened.
> - The planner ignores indexes with !indisvalid.
> - The executor ignores indexes with !indislive.
Yes, and it feels like we need one more flag here to distinguish !indisvalid indexes which are going to become valid and which are going to become !indislive.
For example, let name it as indiscorrect (it means it contains all the data). In such case, we may use the following logic:
1) !indisvalid && !indiscorrect - index in validation phase probably, do not use it as arbiter because it does not contain all the data yet
2) !indisvalid && indiscorrect - index will be dropped most likely. Do not plan new queries with it, but it still may be used by other queries (including upserts). So, we still need to include it to the arbiters.
And, during the reindex concurrently:
1) begin; mark new index as indisvalid and indiscorrect; mark old one as !indisvalid but still indiscorrect. invalidate relcache; commit;
Currently, some queries are still using the old one as arbiter, some queries use both.
2) WaitForLockersMultiple
Now all queries use both indexes as arbiter.
3) begin; mark old index as !indiscorrect, additionally to !indisvalid; invalidate cache; commit;
Now, some queries use only the new index, both some still use both.
4) WaitForLockersMultiple;
Now, all queries use only the new index - we are safe to mark the old one it as !indislive.
> It should, but I'm wondering if that's necessary for two reasons.
In that case, it becomes:
Assert(indexRelation->rd_index->indiscorrect);
Assert(indexRelation->rd_index->indislive);
and it is always the valid check.
Best regards,
Mikhail.
> As far as I can see, it depends on what kind of query semantics and
> the amount of transparency you are looking for here in your
> application. An error in the query itself can also be defined as
> useful so as your application is aware of what happens as an effect of
> the concurrent index build (reindex or CIC/DIC), and it is not really
> clear to me why silently falling back to a re-selection of the arbiter
> indexes would be always better.
From my point of view, INSERT ON CONFLICT UPDATE should never fail with "ERROR: duplicate key value violates unique constraint" because the main idea of upsert is to avoid such situations.
So, it is expected by majority and, probably, is even documented.
On the other side, REINDEX CONCURRENTLY should not cause any queries to fail accidentally without any clear reason.
Also, as you can see from the topic starter letter, we could see errors like this:
* ERROR: duplicate key value violates unique constraint "tbl_pkey"
* ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
* ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
So, the first error message does not provide any clue for the developer to understand what happened.
> - The planner ignores indexes with !indisvalid.
> - The executor ignores indexes with !indislive.
Yes, and it feels like we need one more flag here to distinguish !indisvalid indexes which are going to become valid and which are going to become !indislive.
For example, let name it as indiscorrect (it means it contains all the data). In such case, we may use the following logic:
1) !indisvalid && !indiscorrect - index in validation phase probably, do not use it as arbiter because it does not contain all the data yet
2) !indisvalid && indiscorrect - index will be dropped most likely. Do not plan new queries with it, but it still may be used by other queries (including upserts). So, we still need to include it to the arbiters.
And, during the reindex concurrently:
1) begin; mark new index as indisvalid and indiscorrect; mark old one as !indisvalid but still indiscorrect. invalidate relcache; commit;
Currently, some queries are still using the old one as arbiter, some queries use both.
2) WaitForLockersMultiple
Now all queries use both indexes as arbiter.
3) begin; mark old index as !indiscorrect, additionally to !indisvalid; invalidate cache; commit;
Now, some queries use only the new index, both some still use both.
4) WaitForLockersMultiple;
Now, all queries use only the new index - we are safe to mark the old one it as !indislive.
> It should, but I'm wondering if that's necessary for two reasons.
In that case, it becomes:
Assert(indexRelation->rd_index->indiscorrect);
Assert(indexRelation->rd_index->indislive);
and it is always the valid check.
Best regards,
Mikhail.
Hello, Noah!
> On your other thread, it would be useful to see stack traces from the high-CPU
> processes once the live lock has ended all query completion.
> processes once the live lock has ended all query completion.
I was wrong, it is not a livelock, it is a deadlock, actually. I missed it because pgbench retries deadlocks automatically.
It looks like this:
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl ERROR: deadlock detected
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl DETAIL: Process 711743 waits for ShareLock on transaction 3633; blocked by process 711749.
Process 711749 waits for ShareLock on speculative token 2 of transaction 3622; blocked by process 711743.
Process 711743: INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n
Process 711749: INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl HINT: See server log for query details.
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl CONTEXT: while inserting index tuple (15,145) in relation "tbl_pkey_ccnew"
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl STATEMENT: INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl DETAIL: Process 711743 waits for ShareLock on transaction 3633; blocked by process 711749.
Process 711749 waits for ShareLock on speculative token 2 of transaction 3622; blocked by process 711743.
Process 711743: INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n
Process 711749: INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl HINT: See server log for query details.
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl CONTEXT: while inserting index tuple (15,145) in relation "tbl_pkey_ccnew"
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl STATEMENT: INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n
Stacktraces:
-------------------------
INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n
#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38e30, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38e30, nevents=1, wait_event_info=50331655) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2f5f14, wakeEvents=33, timeout=0, wait_event_info=50331655) at latch.c:538
#4 in ProcSleep (locallock=0x122f778, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f778, owner=0x1247408, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e39220, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e39220, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in SpeculativeInsertionWait (xid=3622, token=2) at lmgr.c:833
#9 in _bt_doinsert (rel=0x7acb2dbb12e8, itup=0x12f1308, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, heapRel=0x7acb2dbb0f08) at nbtinsert.c:225
#10 in btinsert (rel=0x7acb2dbb12e8, values=0x7ffcc4e39440, isnull=0x7ffcc4e39420, ht_ctid=0x12ebe20, heapRel=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, indexInfo=0x12f08a8) at nbtree.c:195
#11 in index_insert (indexRelation=0x7acb2dbb12e8, values=0x7ffcc4e39440, isnull=0x7ffcc4e39420, heap_t_ctid=0x12ebe20, heapRelation=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, indexInfo=0x12f08a8) at indexam.c:230
#12 in ExecInsertIndexTuples (resultRelInfo=0x12eaa00, slot=0x12ebdf0, estate=0x12ea560, update=true, noDupErr=false, specConflict=0x0, arbiterIndexes=0x0, onlySummarizing=false) at execIndexing.c:438
#13 in ExecUpdateEpilogue (context=0x7ffcc4e39870, updateCxt=0x7ffcc4e3962c, resultRelInfo=0x12eaa00, tupleid=0x7ffcc4e39732, oldtuple=0x0, slot=0x12ebdf0) at nodeModifyTable.c:2130
#14 in ExecUpdate (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, tupleid=0x7ffcc4e39732, oldtuple=0x0, slot=0x12ebdf0, canSetTag=true) at nodeModifyTable.c:2478
#15 in ExecOnConflictUpdate (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, conflictTid=0x7ffcc4e39732, excludedSlot=0x12f05b8, canSetTag=true, returning=0x7ffcc4e39738) at nodeModifyTable.c:2694
#16 in ExecInsert (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, slot=0x12f05b8, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1048
#17 in ExecModifyTable (pstate=0x12ea7f0) at nodeModifyTable.c:4059
#18 in ExecProcNodeFirst (node=0x12ea7f0) at execProcnode.c:464
#19 in ExecProcNode (node=0x12ea7f0) at ../../../src/include/executor/executor.h:274
#20 in ExecutePlan (estate=0x12ea560, planstate=0x12ea7f0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x12daac8, execute_once=true) at execMain.c:1646
#21 in standard_ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363
#22 in ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:304
#23 in ProcessQuery (plan=0x12e1360, sourceText=0x12083b0 "INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n ", params=0x0, queryEnv=0x0, dest=0x12daac8, qc=0x7ffcc4e39ae0) at pquery.c:160
#24 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=true, dest=0x12daac8, altdest=0x10382a0 <donothingDR>, qc=0x7ffcc4e39ae0) at pquery.c:1277
#25 in FillPortalStore (portal=0x1289c90, isTopLevel=true) at pquery.c:1026
#26 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x12e14c0, altdest=0x12e14c0, qc=0x7ffcc4e39d30) at pquery.c:763
#27 in exec_simple_query (query_string=0x12083b0 "INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n ") at postgres.c:1274
-------------------------
INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n
#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38f60, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38f60, nevents=1, wait_event_info=50331653) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2f4dbc, wakeEvents=33, timeout=0, wait_event_info=50331653) at latch.c:538
#4 in ProcSleep (locallock=0x122f670, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f670, owner=0x1247408, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e39370, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e39370, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in XactLockTableWait (xid=3633, rel=0x7acb2dba66d8, ctid=0x1240a68, oper=XLTW_InsertIndex) at lmgr.c:701
#9 in _bt_doinsert (rel=0x7acb2dba66d8, itup=0x1240a68, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, heapRel=0x7acb2dbb0f08) at nbtinsert.c:227
#10 in btinsert (rel=0x7acb2dba66d8, values=0x7ffcc4e395c0, isnull=0x7ffcc4e395a0, ht_ctid=0x12400e8, heapRel=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x1240500) at nbtree.c:195
#11 in index_insert (indexRelation=0x7acb2dba66d8, values=0x7ffcc4e395c0, isnull=0x7ffcc4e395a0, heap_t_ctid=0x12400e8, heapRelation=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x1240500) at indexam.c:230
#12 in ExecInsertIndexTuples (resultRelInfo=0x12eaa00, slot=0x12400b8, estate=0x12ea560, update=false, noDupErr=true, specConflict=0x7ffcc4e39722, arbiterIndexes=0x12e0998, onlySummarizing=false) at execIndexing.c:438
#13 in ExecInsert (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, slot=0x12400b8, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1095
#14 in ExecModifyTable (pstate=0x12ea7f0) at nodeModifyTable.c:4059
#15 in ExecProcNodeFirst (node=0x12ea7f0) at execProcnode.c:464
#16 in ExecProcNode (node=0x12ea7f0) at ../../../src/include/executor/executor.h:274
#17 in ExecutePlan (estate=0x12ea560, planstate=0x12ea7f0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x12daac8, execute_once=true) at execMain.c:1646
#18 in standard_ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363
#19 in ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:304
#20 in ProcessQuery (plan=0x12e1360, sourceText=0x12083b0 "INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n ", params=0x0, queryEnv=0x0, dest=0x12daac8, qc=0x7ffcc4e39ae0) at pquery.c:160
#21 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=true, dest=0x12daac8, altdest=0x10382a0 <donothingDR>, qc=0x7ffcc4e39ae0) at pquery.c:1277
#22 in FillPortalStore (portal=0x1289c90, isTopLevel=true) at pquery.c:1026
#23 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x12e14c0, altdest=0x12e14c0, qc=0x7ffcc4e39d30) at pquery.c:763
#24 in exec_simple_query (query_string=0x12083b0 "INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n ") at postgres.c:1274
-------------------------
Also, at that time (but not reported in deadlock) reindex is happening. Without reindex I am unable to reproduce deadlock.
#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38cd0, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38cd0, nevents=1, wait_event_info=50331654) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2ff0c4, wakeEvents=33, timeout=0, wait_event_info=50331654) at latch.c:538
#4 in ProcSleep (locallock=0x122f358, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f358, owner=0x12459f0, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e390e0, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e390e0, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in VirtualXactLock (vxid=..., wait=true) at lock.c:4627
#9 in WaitForLockersMultiple (locktags=0x12327a8, lockmode=8, progress=true) at lmgr.c:955
#10 in ReindexRelationConcurrently (stmt=0x1208e08, relationOid=16401, params=0x7ffcc4e39528) at indexcmds.c:4154
#11 in ReindexIndex (stmt=0x1208e08, params=0x7ffcc4e39528, isTopLevel=true) at indexcmds.c:2814
#12 in ExecReindex (pstate=0x12329f0, stmt=0x1208e08, isTopLevel=true) at indexcmds.c:2743
#13 in ProcessUtilitySlow (pstate=0x12329f0, pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:1567
#14 in standard_ProcessUtility (pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:1067
#15 in ProcessUtility (pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:523
#16 in PortalRunUtility (portal=0x1289c90, pstmt=0x1208f58, isTopLevel=true, setHoldSnapshot=false, dest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:1158
#17 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=false, dest=0x1209318, altdest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:1315
#18 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1209318, altdest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:791
#19 in exec_simple_query (query_string=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;") at postgres.c:1274
INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n
#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38e30, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38e30, nevents=1, wait_event_info=50331655) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2f5f14, wakeEvents=33, timeout=0, wait_event_info=50331655) at latch.c:538
#4 in ProcSleep (locallock=0x122f778, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f778, owner=0x1247408, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e39220, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e39220, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in SpeculativeInsertionWait (xid=3622, token=2) at lmgr.c:833
#9 in _bt_doinsert (rel=0x7acb2dbb12e8, itup=0x12f1308, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, heapRel=0x7acb2dbb0f08) at nbtinsert.c:225
#10 in btinsert (rel=0x7acb2dbb12e8, values=0x7ffcc4e39440, isnull=0x7ffcc4e39420, ht_ctid=0x12ebe20, heapRel=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, indexInfo=0x12f08a8) at nbtree.c:195
#11 in index_insert (indexRelation=0x7acb2dbb12e8, values=0x7ffcc4e39440, isnull=0x7ffcc4e39420, heap_t_ctid=0x12ebe20, heapRelation=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, indexInfo=0x12f08a8) at indexam.c:230
#12 in ExecInsertIndexTuples (resultRelInfo=0x12eaa00, slot=0x12ebdf0, estate=0x12ea560, update=true, noDupErr=false, specConflict=0x0, arbiterIndexes=0x0, onlySummarizing=false) at execIndexing.c:438
#13 in ExecUpdateEpilogue (context=0x7ffcc4e39870, updateCxt=0x7ffcc4e3962c, resultRelInfo=0x12eaa00, tupleid=0x7ffcc4e39732, oldtuple=0x0, slot=0x12ebdf0) at nodeModifyTable.c:2130
#14 in ExecUpdate (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, tupleid=0x7ffcc4e39732, oldtuple=0x0, slot=0x12ebdf0, canSetTag=true) at nodeModifyTable.c:2478
#15 in ExecOnConflictUpdate (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, conflictTid=0x7ffcc4e39732, excludedSlot=0x12f05b8, canSetTag=true, returning=0x7ffcc4e39738) at nodeModifyTable.c:2694
#16 in ExecInsert (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, slot=0x12f05b8, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1048
#17 in ExecModifyTable (pstate=0x12ea7f0) at nodeModifyTable.c:4059
#18 in ExecProcNodeFirst (node=0x12ea7f0) at execProcnode.c:464
#19 in ExecProcNode (node=0x12ea7f0) at ../../../src/include/executor/executor.h:274
#20 in ExecutePlan (estate=0x12ea560, planstate=0x12ea7f0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x12daac8, execute_once=true) at execMain.c:1646
#21 in standard_ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363
#22 in ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:304
#23 in ProcessQuery (plan=0x12e1360, sourceText=0x12083b0 "INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n ", params=0x0, queryEnv=0x0, dest=0x12daac8, qc=0x7ffcc4e39ae0) at pquery.c:160
#24 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=true, dest=0x12daac8, altdest=0x10382a0 <donothingDR>, qc=0x7ffcc4e39ae0) at pquery.c:1277
#25 in FillPortalStore (portal=0x1289c90, isTopLevel=true) at pquery.c:1026
#26 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x12e14c0, altdest=0x12e14c0, qc=0x7ffcc4e39d30) at pquery.c:763
#27 in exec_simple_query (query_string=0x12083b0 "INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n ") at postgres.c:1274
-------------------------
INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n
#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38f60, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38f60, nevents=1, wait_event_info=50331653) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2f4dbc, wakeEvents=33, timeout=0, wait_event_info=50331653) at latch.c:538
#4 in ProcSleep (locallock=0x122f670, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f670, owner=0x1247408, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e39370, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e39370, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in XactLockTableWait (xid=3633, rel=0x7acb2dba66d8, ctid=0x1240a68, oper=XLTW_InsertIndex) at lmgr.c:701
#9 in _bt_doinsert (rel=0x7acb2dba66d8, itup=0x1240a68, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, heapRel=0x7acb2dbb0f08) at nbtinsert.c:227
#10 in btinsert (rel=0x7acb2dba66d8, values=0x7ffcc4e395c0, isnull=0x7ffcc4e395a0, ht_ctid=0x12400e8, heapRel=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x1240500) at nbtree.c:195
#11 in index_insert (indexRelation=0x7acb2dba66d8, values=0x7ffcc4e395c0, isnull=0x7ffcc4e395a0, heap_t_ctid=0x12400e8, heapRelation=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x1240500) at indexam.c:230
#12 in ExecInsertIndexTuples (resultRelInfo=0x12eaa00, slot=0x12400b8, estate=0x12ea560, update=false, noDupErr=true, specConflict=0x7ffcc4e39722, arbiterIndexes=0x12e0998, onlySummarizing=false) at execIndexing.c:438
#13 in ExecInsert (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, slot=0x12400b8, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1095
#14 in ExecModifyTable (pstate=0x12ea7f0) at nodeModifyTable.c:4059
#15 in ExecProcNodeFirst (node=0x12ea7f0) at execProcnode.c:464
#16 in ExecProcNode (node=0x12ea7f0) at ../../../src/include/executor/executor.h:274
#17 in ExecutePlan (estate=0x12ea560, planstate=0x12ea7f0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x12daac8, execute_once=true) at execMain.c:1646
#18 in standard_ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363
#19 in ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:304
#20 in ProcessQuery (plan=0x12e1360, sourceText=0x12083b0 "INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n ", params=0x0, queryEnv=0x0, dest=0x12daac8, qc=0x7ffcc4e39ae0) at pquery.c:160
#21 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=true, dest=0x12daac8, altdest=0x10382a0 <donothingDR>, qc=0x7ffcc4e39ae0) at pquery.c:1277
#22 in FillPortalStore (portal=0x1289c90, isTopLevel=true) at pquery.c:1026
#23 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x12e14c0, altdest=0x12e14c0, qc=0x7ffcc4e39d30) at pquery.c:763
#24 in exec_simple_query (query_string=0x12083b0 "INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n ") at postgres.c:1274
-------------------------
Also, at that time (but not reported in deadlock) reindex is happening. Without reindex I am unable to reproduce deadlock.
#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38cd0, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38cd0, nevents=1, wait_event_info=50331654) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2ff0c4, wakeEvents=33, timeout=0, wait_event_info=50331654) at latch.c:538
#4 in ProcSleep (locallock=0x122f358, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f358, owner=0x12459f0, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e390e0, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e390e0, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in VirtualXactLock (vxid=..., wait=true) at lock.c:4627
#9 in WaitForLockersMultiple (locktags=0x12327a8, lockmode=8, progress=true) at lmgr.c:955
#10 in ReindexRelationConcurrently (stmt=0x1208e08, relationOid=16401, params=0x7ffcc4e39528) at indexcmds.c:4154
#11 in ReindexIndex (stmt=0x1208e08, params=0x7ffcc4e39528, isTopLevel=true) at indexcmds.c:2814
#12 in ExecReindex (pstate=0x12329f0, stmt=0x1208e08, isTopLevel=true) at indexcmds.c:2743
#13 in ProcessUtilitySlow (pstate=0x12329f0, pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:1567
#14 in standard_ProcessUtility (pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:1067
#15 in ProcessUtility (pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:523
#16 in PortalRunUtility (portal=0x1289c90, pstmt=0x1208f58, isTopLevel=true, setHoldSnapshot=false, dest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:1158
#17 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=false, dest=0x1209318, altdest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:1315
#18 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1209318, altdest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:791
#19 in exec_simple_query (query_string=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;") at postgres.c:1274
It looks like a deadlock caused by different set of indexes being used as arbiter indexes (or by the different order).
Best regards,
Mikhail.
Hell, everyone!
Using the brand-new injection points support in specs, I created a spec to reproduce the issue.
It fails like this currently:
make -C src/test/modules/injection_points/ check
@@ -64,6 +64,7 @@
step s3_s1: <... completed>
step s2_s1: <... completed>
+ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
starting permutation: s3_s1 s2_s1 s4_s1 s1_s1 s4_s2 s4_s3
injection_points_attach
@@ -129,3 +130,4 @@
step s3_s1: <... completed>
step s2_s1: <... completed>
+ERROR: duplicate key value violates unique constraint "tbl_pkey"
step s3_s1: <... completed>
step s2_s1: <... completed>
+ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
starting permutation: s3_s1 s2_s1 s4_s1 s1_s1 s4_s2 s4_s3
injection_points_attach
@@ -129,3 +130,4 @@
step s3_s1: <... completed>
step s2_s1: <... completed>
+ERROR: duplicate key value violates unique constraint "tbl_pkey"
Best regards,
Mikhail.
Attachment
Hello!
Just rebased.