Thread: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From

Michail Nikolaev

Date:

11 June 2024, 11:00:00

Hello, hackers.

While testing my work on (1) I was struggling with addressing a strange issue with ON CONFLICT UPDATE and REINDEX CONCURRENTLY.

After some time, I have realized the same issue persists on the master branch as well :)

I have prepared two TAP tests to reproduce the issues (2), also in attachment.

First one, does the next thing:

CREATE UNLOGGED TABLE tbl(i int primary key, updated_at timestamp);
CREATE INDEX idx ON tbl(i, updated_at); -- it is not required to reproduce but make it to happen faster

Then it runs next scripts with pgbench concurrently:

1) INSERT INTO tbl VALUES(13,now()) on conflict(i) do update set updated_at = now();
2) INSERT INTO tbl VALUES(42,now()) on conflict(i) do update set updated_at = now();
3) INSERT INTO tbl VALUES(69,now()) on conflict(i) do update set updated_at = now();

Also, during pgbench the next command is run in the loop:

REINDEX INDEX CONCURRENTLY tbl_pkey;

For some time, everything looks more-less fine (except live locks, but this is the issue for the next test).
But after some time, about a minute or so (on ~3000th REINDEX) it just fails like this:

make -C src/test/modules/test_misc/ check PROVE_TESTS='t/006_*'

# waiting for an about 3000, now is 2174, seconds passed : 84
# waiting for an about 3000, now is 2175, seconds passed : 84
# waiting for an about 3000, now is 2176, seconds passed : 84
# waiting for an about 3000, now is 2177, seconds passed : 84
# waiting for an about 3000, now is 2178, seconds passed : 84
# waiting for an about 3000, now is 2179, seconds passed : 84
# waiting for an about 3000, now is 2180, seconds passed : 84
# waiting for an about 3000, now is 2181, seconds passed : 84
# waiting for an about 3000, now is 2182, seconds passed : 84
# waiting for an about 3000, now is 2183, seconds passed : 84
# waiting for an about 3000, now is 2184, seconds passed : 84

# Failed test 'concurrent INSERTs, UPDATES and RC status (got 2 vs expected 0)'
# at t/006_concurrently_unique_fail.pl line 69.

# Failed test 'concurrent INSERTs, UPDATES and RC stderr /(?^:^$)/'
# at t/006_concurrently_unique_fail.pl line 69.
# 'pgbench: error: pgbench: error: client 4 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# client 15 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 9 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 8 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 3 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 2 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 12 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 18 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: pgbench:client 14 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# error: client 1 script 0 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(13) already exists.
# pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 13 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 16 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 5 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: Run was aborted; the above results are incomplete.
# '

Probably something wrong with arbiter index selection for different backends. I am afraid it could be a symptom of a more serious issue.

-------------------------------------

The second test shows an interesting live lock state in the similar situation.

CREATE UNLOGGED TABLE tbl(i int primary key, n int);
CREATE INDEX idx ON tbl(i, n);
INSERT INTO tbl VALUES(13,1);

pgbench concurrently runs single command

INSERT INTO tbl VALUES(13,1) on conflict(i) do update set n = tbl.n + EXCLUDED.n;

And also reindexing in the loop

REINDEX INDEX CONCURRENTLY tbl_pkey;

After the start, a little bit strange issue happens

make -C src/test/modules/test_misc/ check PROVE_TESTS='t/007_*'

# going to start reindex, num tuples in table is 1
# reindex 0 done in 0.00704598426818848 seconds, num inserted during reindex tuples is 0 speed is 0 per second
# going to start reindex, num tuples in table is 7
# reindex 1 done in 0.453176021575928 seconds, num inserted during reindex tuples is 632 speed is 1394.60158947115 per second
# going to start reindex, num tuples in table is 647
# current n is 808, 808 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 808, 0 per one second
# current n is 811, 3 per one second
# current n is 917, 106 per one second
# current n is 1024, 107 per one second
# reindex 2 done in 8.4104950428009 seconds, num inserted during reindex tuples is 467 speed is 55.5258635340064 per second
# going to start reindex, num tuples in table is 1136
# current n is 1257, 233 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1257, 0 per one second
# current n is 1490, 233 per one second
# reindex 3 done in 5.21368479728699 seconds, num inserted during reindex tuples is 411 speed is 78.8310026363446 per second
# going to start reindex, num tuples in table is 1566

In some moments, all CPUs all hot but 30 connections are unable to do any upsert. I think it may be somehow caused by two arbiter indexes (old and new reindexed one).

Best regards,
Mikhail.

[1]: https://www.postgresql.org/message-id/flat/CANtu0ogBOtd9ravu1CUbuZWgq6qvn1rny38PGKDPk9zzQPH8_A%40mail.gmail.com#d4be02ff70f3002522f9fadbd165d631
[2]: https://github.com/michail-nikolaev/postgres/commit/9446f944b415306d9e5d5ab98f69938d8f5ee87f

Attachment

v1-0001-test-for-issue-with-upsert-fail.patch

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From

Michael Paquier

Date:

13 June 2024, 23:18:15

On Tue, Jun 11, 2024 at 01:00:00PM +0200, Michail Nikolaev wrote:
> Probably something wrong with arbiter index selection for different
> backends. I am afraid it could be a symptom of a more serious issue.

ON CONFLICT selects an index that may be rebuilt in parallel of the
REINDEX happening, and its contents may be incomplete.  Isn't the
issue that we may select as arbiter indexes stuff that's !indisvalid?
Using the ccnew or ccold indexes would not be correct for the conflict
resolutions.
--
Michael

Attachment

signature.asc

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From

Michail Nikolaev

Date:

14 June 2024, 10:06:13

Hello, Michael.

> Isn't the issue that we may select as arbiter indexes stuff that's !indisvalid?

As far as I can see (1) !indisvalid indexes are filtered out.

But... It looks like this choice is not locked in any way (2), so index_concurrently_swap or index_concurrently_set_dead can change this index after the decision is made, even despite WaitForLockersMultiple (3). In some cases, it may cause a data loss...

But I was unable to reproduce that using some random usleep(), however - maybe it is a wrong assumption.

[1]: https://github.com/postgres/postgres/blob/915de706d28c433283e9dc63701e8f978488a2b9/src/backend/optimizer/util/plancat.c#L804

[2]: https://github.com/postgres/postgres/blob/915de706d28c433283e9dc63701e8f978488a2b9/src/backend/optimizer/util/plancat.c#L924-L928

[3]: https://github.com/postgres/postgres/blob/8aee330af55d8a759b2b73f5a771d9d34a7b887f/src/backend/commands/indexcmds.c#L4153

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From

Michail Nikolaev

Date:

14 June 2024, 13:30:55

Hello.

> But I was unable to reproduce that using some random usleep(), however - maybe it is a wrong assumption.

It seems like the assumption is correct - we may use an invalid index as arbiter due to race condition.

The attached patch adds a check for that case, and now the test fails like this:

# pgbench: error: client 16 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL: Key (i)=(42) already exists.
# pgbench: error: client 9 script 1 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey"
# DETAIL: Key (i)=(69) already exists.
# pgbench: error: client 7 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters

I think It is even possible to see !alive index in the same situation (it is worse case), but I was unable to reproduce it so far.

Best regards,

Mikhail.

Attachment

upsert_issue.patch

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From

Michail Nikolaev

Date:

17 June 2024, 17:00:51

Hello, everyone.

> I think It is even possible to see !alive index in the same situation (it is worse case), but I was unable to reproduce it so far.
Fortunately, it is not possible.

So, seems like I have found the source of the problem:

1) infer_arbiter_indexes calls RelationGetIndexList to get the list of candidates.

It does no lock selected indexes in any additional way which prevents index_concurrently_swap changing them (set and clear validity).

RelationGetIndexList relcache.c:4857
infer_arbiter_indexes plancat.c:780
make_modifytable createplan.c:7097 ---------- node->arbiterIndexes = infer_arbiter_indexes(root);
create_modifytable_plan createplan.c:2826
create_plan_recurse createplan.c:532
create_plan createplan.c:349
standard_planner planner.c:421
planner planner.c:282
pg_plan_query postgres.c:904
pg_plan_queries postgres.c:996
exec_simple_query postgres.c:1193

2) other backend marks some index as invalid and commits

index_concurrently_swap index.c:1600
ReindexRelationConcurrently indexcmds.c:4115
ReindexIndex indexcmds.c:2814
ExecReindex indexcmds.c:2743
ProcessUtilitySlow utility.c:1567
standard_ProcessUtility utility.c:1067
ProcessUtility utility.c:523
PortalRunUtility pquery.c:1158
PortalRunMulti pquery.c:1315
PortalRun pquery.c:791
exec_simple_query postgres.c:1274

3) first backend invalidates catalog snapshot because transactional snapshot

InvalidateCatalogSnapshot snapmgr.c:426
GetTransactionSnapshot snapmgr.c:278
PortalRunMulti pquery.c:1244
PortalRun pquery.c:791
exec_simple_query postgres.c:1274

4) first backend copies indexes selected using previous catalog snapshot

ExecInitModifyTable nodeModifyTable.c:4499 -------- resultRelInfo->ri_onConflictArbiterIndexes = node->arbiterIndexes;
ExecInitNode execProcnode.c:177
InitPlan execMain.c:966
standard_ExecutorStart execMain.c:261
ExecutorStart execMain.c:137
ProcessQuery pquery.c:155
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274

5) then reads indexes using new fresh snapshot

RelationGetIndexList relcache.c:4816
ExecOpenIndices execIndexing.c:175
ExecInsert nodeModifyTable.c:792 ------------- ExecOpenIndices(resultRelInfo, onconflict != ONCONFLICT_NONE);
ExecModifyTable nodeModifyTable.c:4059
ExecProcNodeFirst execProcnode.c:464
ExecProcNode executor.h:274
ExecutePlan execMain.c:1646
standard_ExecutorRun execMain.c:363
ExecutorRun execMain.c:304
ProcessQuery pquery.c:160
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274

5) and uses arbiter selected with stale snapshot with new index view (marked as invalid)

ExecInsert nodeModifyTable.c:1016 -------------- arbiterIndexes = resultRelInfo->ri_onConflictArbiterIndexes;
............

ExecInsert nodeModifyTable.c:1048 ---------------if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate, conflictTid, arbiterIndexes))
ExecModifyTable nodeModifyTable.c:4059
ExecProcNodeFirst execProcnode.c:464
ExecProcNode executor.h:274
ExecutePlan execMain.c:1646
standard_ExecutorRun execMain.c:363
ExecutorRun execMain.c:304
ProcessQuery pquery.c:160
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274

I have attached an updated test for the issue (it fails on assert quickly and uses only 2 backends).
The same issue may happen in case of CREATE/DROP INDEX CONCURRENTLY as well.

The simplest possible fix is to use ShareLock instead ShareUpdateExclusiveLock in the index_concurrently_swap

oldClassRel = relation_open(oldIndexId, ShareLock);
newClassRel = relation_open(newIndexId, ShareLock);

But this is not a "concurrent" way. But such update should be fast enough as far as I understand.

Best regards,

Mikhail.

Attachment

better_test_er.pat

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From

Michael Paquier

Date:

21 June 2024, 06:53:45

On Mon, Jun 17, 2024 at 07:00:51PM +0200, Michail Nikolaev wrote:
> The simplest possible fix is to use ShareLock
> instead ShareUpdateExclusiveLock in the index_concurrently_swap
>
>             oldClassRel = relation_open(oldIndexId, ShareLock);
>             newClassRel = relation_open(newIndexId, ShareLock);
>
> But this is not a "concurrent" way. But such update should be fast enough
> as far as I understand.

Nope, that won't fly far.  We should not use a ShareLock in this step
or we are going to conflict with row exclusive locks, impacting all
workloads when doing a REINDEX CONCURRENTLY.

That may be a long shot, but the issue is that we do the swap of all
the indexes in a single transaction, but do not wait for them to
complete when committing the swap's transaction in phase 4.  Your
report is telling us that we really have a good reason to wait for all
the transactions that may use these indexes to finish.  One thing
coming on top of my mind to keep things concurrent-safe while allowing
a clean use of the arbiter indexes would be to stick a
WaitForLockersMultiple() on AccessExclusiveLock just *before* the
transaction commit of phase 4, say, lacking the progress report part:
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4131,6 +4131,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
         CommandCounterIncrement();
     }

+    WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
+
     /* Commit this transaction and make index swaps visible */
     CommitTransactionCommand();
     StartTransactionCommand();

This is a non-fresh Friday-afternoon idea, but it would make sure that
we don't have any transactions using the indexes switched to _ccold
with indisvalid that are waiting for a drop in phase 5.  Your tests
seem to pass with that, and that keeps the operation intact
concurrent-wise (I'm really wishing for isolation tests with injection
points just now, because I could use them here).

> +        Assert(indexRelation->rd_index->indislive);
> +        Assert(indexRelation->rd_index->indisvalid);
> +
>          if (!indexRelation->rd_index->indimmediate)
>              ereport(ERROR,
>                      (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

This kind of validation check may be a good idea in the long term.
That seems incredibly useful to me if we were to add more code paths
that do concurrent index rebuilds, to make sure that we don't rely on
an index we should not use at all.  That's a HEAD-only thing IMO,
though.
--
Michael

Attachment

signature.asc

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From

Michail Nikolaev

Date:

21 June 2024, 09:31:21

Hello, Michael!

> This is a non-fresh Friday-afternoon idea, but it would make sure that
> we don't have any transactions using the indexes switched to _ccold
> with indisvalid that are waiting for a drop in phase 5. Your tests
> seem to pass with that, and that keeps the operation intact
> concurrent-wise (I'm really wishing for isolation tests with injection
> points just now, because I could use them here).

Yes, I also have tried that approach, but it doesn't work, unfortunately.

You may fail test increasing number of connections:

'--no-vacuum --client=10 -j 2 --transactions=1000',

The source of the issue is not the swap of the indexes (and not related to REINDEX CONCURRENTLY only), but the fact that indexes are fetched once during planning (to find the arbiter), but then later reread with a new catalog snapshot for the the actual execution.

So, other possible fixes I see:

* fallback to replanning in case we see something changed during the execution

* select arbiter indexes during actual execution

> That's a HEAD-only thing IMO,
> though.

Do you mean that it needs to be moved to a separate patch?

Best regards,

Mikhail.

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From

Michael Paquier

Date:

24 June 2024, 07:06:21

On Mon, Jun 17, 2024 at 07:00:51PM +0200, Michail Nikolaev wrote:
> The same issue may happen in case of CREATE/DROP INDEX CONCURRENTLY as well.

While looking at all that, I've been also curious about this specific
point, and it is indeed possible to finish in a state where a
duplicate key would be found in one of indexes selected by the
executor during an INSERT ON CONFLICT while a concurrent set of CICs
and DICs are run, so you don't really need a REINDEX.  See for example
the attached test.
--
Michael

Attachment

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From

Michael Paquier

Date:

25 June 2024, 01:14:16

On Fri, Jun 21, 2024 at 11:31:21AM +0200, Michail Nikolaev wrote:
> Yes, I also have tried that approach, but it doesn't work, unfortunately.
> You may fail test increasing number of connections:
>
> '--no-vacuum --client=10 -j 2 --transactions=1000',
>
> The source of the issue is not the swap of the indexes (and not related to
> REINDEX CONCURRENTLY only), but the fact that indexes are fetched once
> during planning (to find the arbiter), but then later reread with a new
> catalog snapshot for the the actual execution.

When I first saw this report, my main worry was that I have somewhat
managed to break the state of the indexes leading to data corruption
because of an incorrect step in the concurrent operations.  However,
as far as I can see this is not the case, as an effect of two
properties we rely on for concurrent index operations, that hold in
the executor and the planner.  Simply put:
- The planner ignores indexes with !indisvalid.
- The executor ignores indexes with !indislive.

The critical point is that we wait in DROP INDEX CONC and REINDEX CONC
for any transactions still using an index that's waiting to be marked
as !indislive, because such indexes *must* not be used in the
executor.

> So, other possible fixes I see:
> * fallback to replanning in case we see something changed during the
> execution
> * select arbiter indexes during actual execution

These two properties make ON CONFLICT react the way it should
depending on the state of the indexes selected by the planner based on
the query clauses, with changes reflecting when executing, with two
patterns involved:
- An index may be created in a concurrent operation after the planner
has selected the arbiter indexes (the index may be defined, still not
valid yet, or just created after), then the query execution would need
to handle the extra index created available at execution, with a
failure on a ccnew index.
- An index may be selected at planning phase, then a different index
could be used by a constraint once both indexes swap, with a failure
on a ccold index.

As far as I can see, it depends on what kind of query semantics and
the amount of transparency you are looking for here in your
application.  An error in the query itself can also be defined as
useful so as your application is aware of what happens as an effect of
the concurrent index build (reindex or CIC/DIC), and it is not really
clear to me why silently falling back to a re-selection of the arbiter
indexes would be always better.  Replanning could be actually
dangerous if a workload is heavily concurrently REINDEX'd, as we could
fall into a trap where a query can never decide which index to use.
I'm not saying that it cannot be improved, but it's not completely
clear to me what query semantics are the best for all users because
the behavior of HEAD and your suggestions have merits and demerits.
Anything we could come up with would be an improvement anyway, AFAIU.

>> That's a HEAD-only thing IMO,
>> though.
>
> Do you mean that it needs to be moved to a separate patch?

It should, but I'm wondering if that's necessary for two reasons.

First, a check on indisvalid would be incorrect, because indexes
marked as !indisvalid && indislive mean that there is a concurrent
operation happening, and that this concurrent operation is waiting for
all transactions working with a lock on this index to finish before
flipping the live flag and make this index invalid for decisions taken
in the executor, like HOT updates, etc.

A check on indislive may be an idea, still I'm slightly biased
regarding its additional value because any indexes opened for a
relation are fetched from the relcache with RelationGetIndexList()
explaining why indislive indexes cannot be fetched, and we rely on
that in the executor for the indexes opened by a relation.
--
Michael

Attachment

signature.asc

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From

Michail Nikolaev

Date:

25 June 2024, 11:47:00

Hello, Michael!

> As far as I can see, it depends on what kind of query semantics and
> the amount of transparency you are looking for here in your
> application. An error in the query itself can also be defined as
> useful so as your application is aware of what happens as an effect of
> the concurrent index build (reindex or CIC/DIC), and it is not really
> clear to me why silently falling back to a re-selection of the arbiter
> indexes would be always better.

From my point of view, INSERT ON CONFLICT UPDATE should never fail with "ERROR: duplicate key value violates unique constraint" because the main idea of upsert is to avoid such situations.
So, it is expected by majority and, probably, is even documented.

On the other side, REINDEX CONCURRENTLY should not cause any queries to fail accidentally without any clear reason.

Also, as you can see from the topic starter letter, we could see errors like this:

* ERROR: duplicate key value violates unique constraint "tbl_pkey"
* ERROR: duplicate key value violates unique constraint "tbl_pkey_ccnew"
* ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"

So, the first error message does not provide any clue for the developer to understand what happened.

> - The planner ignores indexes with !indisvalid.
> - The executor ignores indexes with !indislive.

Yes, and it feels like we need one more flag here to distinguish !indisvalid indexes which are going to become valid and which are going to become !indislive.

For example, let name it as indiscorrect (it means it contains all the data). In such case, we may use the following logic:

1) !indisvalid && !indiscorrect - index in validation phase probably, do not use it as arbiter because it does not contain all the data yet
2) !indisvalid && indiscorrect - index will be dropped most likely. Do not plan new queries with it, but it still may be used by other queries (including upserts). So, we still need to include it to the arbiters.

And, during the reindex concurrently:

1) begin; mark new index as indisvalid and indiscorrect; mark old one as !indisvalid but still indiscorrect. invalidate relcache; commit;

Currently, some queries are still using the old one as arbiter, some queries use both.

2) WaitForLockersMultiple

Now all queries use both indexes as arbiter.

3) begin; mark old index as !indiscorrect, additionally to !indisvalid; invalidate cache; commit;

Now, some queries use only the new index, both some still use both.

4) WaitForLockersMultiple;

Now, all queries use only the new index - we are safe to mark the old one it as !indislive.

> It should, but I'm wondering if that's necessary for two reasons.
In that case, it becomes:

Assert(indexRelation->rd_index->indiscorrect);
Assert(indexRelation->rd_index->indislive);

and it is always the valid check.

Best regards,
Mikhail.

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From

Michail Nikolaev

Date:

25 June 2024, 15:57:22

Hello, Noah!

Answering https://www.postgresql.org/message-id/flat/20240612194857.1c.nmisch%40google.com#684361ba86bad11f4e9fd84dfa8e0084

> On your other thread, it would be useful to see stack traces from the high-CPU
> processes once the live lock has ended all query completion.

I was wrong, it is not a livelock, it is a deadlock, actually. I missed it because pgbench retries deadlocks automatically.

It looks like this:

2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl ERROR: deadlock detected
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl DETAIL: Process 711743 waits for ShareLock on transaction 3633; blocked by process 711749.
Process 711749 waits for ShareLock on speculative token 2 of transaction 3622; blocked by process 711743.
Process 711743: INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n
Process 711749: INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl HINT: See server log for query details.
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl CONTEXT: while inserting index tuple (15,145) in relation "tbl_pkey_ccnew"
2024-06-25 17:16:17.447 CEST [711743] 007_concurrently_unique_stuck.pl STATEMENT: INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n

Stacktraces:

-------------------------

INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n

#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38e30, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38e30, nevents=1, wait_event_info=50331655) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2f5f14, wakeEvents=33, timeout=0, wait_event_info=50331655) at latch.c:538
#4 in ProcSleep (locallock=0x122f778, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f778, owner=0x1247408, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e39220, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e39220, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in SpeculativeInsertionWait (xid=3622, token=2) at lmgr.c:833
#9 in _bt_doinsert (rel=0x7acb2dbb12e8, itup=0x12f1308, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, heapRel=0x7acb2dbb0f08) at nbtinsert.c:225
#10 in btinsert (rel=0x7acb2dbb12e8, values=0x7ffcc4e39440, isnull=0x7ffcc4e39420, ht_ctid=0x12ebe20, heapRel=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, indexInfo=0x12f08a8) at nbtree.c:195
#11 in index_insert (indexRelation=0x7acb2dbb12e8, values=0x7ffcc4e39440, isnull=0x7ffcc4e39420, heap_t_ctid=0x12ebe20, heapRelation=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, indexInfo=0x12f08a8) at indexam.c:230
#12 in ExecInsertIndexTuples (resultRelInfo=0x12eaa00, slot=0x12ebdf0, estate=0x12ea560, update=true, noDupErr=false, specConflict=0x0, arbiterIndexes=0x0, onlySummarizing=false) at execIndexing.c:438
#13 in ExecUpdateEpilogue (context=0x7ffcc4e39870, updateCxt=0x7ffcc4e3962c, resultRelInfo=0x12eaa00, tupleid=0x7ffcc4e39732, oldtuple=0x0, slot=0x12ebdf0) at nodeModifyTable.c:2130
#14 in ExecUpdate (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, tupleid=0x7ffcc4e39732, oldtuple=0x0, slot=0x12ebdf0, canSetTag=true) at nodeModifyTable.c:2478
#15 in ExecOnConflictUpdate (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, conflictTid=0x7ffcc4e39732, excludedSlot=0x12f05b8, canSetTag=true, returning=0x7ffcc4e39738) at nodeModifyTable.c:2694
#16 in ExecInsert (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, slot=0x12f05b8, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1048
#17 in ExecModifyTable (pstate=0x12ea7f0) at nodeModifyTable.c:4059
#18 in ExecProcNodeFirst (node=0x12ea7f0) at execProcnode.c:464
#19 in ExecProcNode (node=0x12ea7f0) at ../../../src/include/executor/executor.h:274
#20 in ExecutePlan (estate=0x12ea560, planstate=0x12ea7f0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x12daac8, execute_once=true) at execMain.c:1646
#21 in standard_ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363
#22 in ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:304
#23 in ProcessQuery (plan=0x12e1360, sourceText=0x12083b0 "INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n ", params=0x0, queryEnv=0x0, dest=0x12daac8, qc=0x7ffcc4e39ae0) at pquery.c:160
#24 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=true, dest=0x12daac8, altdest=0x10382a0 <donothingDR>, qc=0x7ffcc4e39ae0) at pquery.c:1277
#25 in FillPortalStore (portal=0x1289c90, isTopLevel=true) at pquery.c:1026
#26 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x12e14c0, altdest=0x12e14c0, qc=0x7ffcc4e39d30) at pquery.c:763
#27 in exec_simple_query (query_string=0x12083b0 "INSERT INTO tbl VALUES(13,41011) on conflict(i) do update set n = tbl.n + 1 RETURNING n ") at postgres.c:1274

-------------------------

INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n

#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38f60, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38f60, nevents=1, wait_event_info=50331653) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2f4dbc, wakeEvents=33, timeout=0, wait_event_info=50331653) at latch.c:538
#4 in ProcSleep (locallock=0x122f670, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f670, owner=0x1247408, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e39370, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e39370, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in XactLockTableWait (xid=3633, rel=0x7acb2dba66d8, ctid=0x1240a68, oper=XLTW_InsertIndex) at lmgr.c:701
#9 in _bt_doinsert (rel=0x7acb2dba66d8, itup=0x1240a68, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, heapRel=0x7acb2dbb0f08) at nbtinsert.c:227
#10 in btinsert (rel=0x7acb2dba66d8, values=0x7ffcc4e395c0, isnull=0x7ffcc4e395a0, ht_ctid=0x12400e8, heapRel=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x1240500) at nbtree.c:195
#11 in index_insert (indexRelation=0x7acb2dba66d8, values=0x7ffcc4e395c0, isnull=0x7ffcc4e395a0, heap_t_ctid=0x12400e8, heapRelation=0x7acb2dbb0f08, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x1240500) at indexam.c:230
#12 in ExecInsertIndexTuples (resultRelInfo=0x12eaa00, slot=0x12400b8, estate=0x12ea560, update=false, noDupErr=true, specConflict=0x7ffcc4e39722, arbiterIndexes=0x12e0998, onlySummarizing=false) at execIndexing.c:438
#13 in ExecInsert (context=0x7ffcc4e39870, resultRelInfo=0x12eaa00, slot=0x12400b8, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1095
#14 in ExecModifyTable (pstate=0x12ea7f0) at nodeModifyTable.c:4059
#15 in ExecProcNodeFirst (node=0x12ea7f0) at execProcnode.c:464
#16 in ExecProcNode (node=0x12ea7f0) at ../../../src/include/executor/executor.h:274
#17 in ExecutePlan (estate=0x12ea560, planstate=0x12ea7f0, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x12daac8, execute_once=true) at execMain.c:1646
#18 in standard_ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363
#19 in ExecutorRun (queryDesc=0x12dab58, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:304
#20 in ProcessQuery (plan=0x12e1360, sourceText=0x12083b0 "INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n ", params=0x0, queryEnv=0x0, dest=0x12daac8, qc=0x7ffcc4e39ae0) at pquery.c:160
#21 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=true, dest=0x12daac8, altdest=0x10382a0 <donothingDR>, qc=0x7ffcc4e39ae0) at pquery.c:1277
#22 in FillPortalStore (portal=0x1289c90, isTopLevel=true) at pquery.c:1026
#23 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x12e14c0, altdest=0x12e14c0, qc=0x7ffcc4e39d30) at pquery.c:763
#24 in exec_simple_query (query_string=0x12083b0 "INSERT INTO tbl VALUES(13,89318) on conflict(i) do update set n = tbl.n + 1 RETURNING n ") at postgres.c:1274

-------------------------

Also, at that time (but not reported in deadlock) reindex is happening. Without reindex I am unable to reproduce deadlock.

#0 in epoll_wait (epfd=5, events=0x1203328, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 in WaitEventSetWaitBlock (set=0x12032c0, cur_timeout=-1, occurred_events=0x7ffcc4e38cd0, nevents=1) at latch.c:1570
#2 in WaitEventSetWait (set=0x12032c0, timeout=-1, occurred_events=0x7ffcc4e38cd0, nevents=1, wait_event_info=50331654) at latch.c:1516
#3 in WaitLatch (latch=0x7acb2a2ff0c4, wakeEvents=33, timeout=0, wait_event_info=50331654) at latch.c:538
#4 in ProcSleep (locallock=0x122f358, lockMethodTable=0x1037340 <default_lockmethod>, dontWait=false) at proc.c:1355
#5 in WaitOnLock (locallock=0x122f358, owner=0x12459f0, dontWait=false) at lock.c:1833
#6 in LockAcquireExtended (locktag=0x7ffcc4e390e0, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1046
#7 in LockAcquire (locktag=0x7ffcc4e390e0, lockmode=5, sessionLock=false, dontWait=false) at lock.c:739
#8 in VirtualXactLock (vxid=..., wait=true) at lock.c:4627
#9 in WaitForLockersMultiple (locktags=0x12327a8, lockmode=8, progress=true) at lmgr.c:955
#10 in ReindexRelationConcurrently (stmt=0x1208e08, relationOid=16401, params=0x7ffcc4e39528) at indexcmds.c:4154
#11 in ReindexIndex (stmt=0x1208e08, params=0x7ffcc4e39528, isTopLevel=true) at indexcmds.c:2814
#12 in ExecReindex (pstate=0x12329f0, stmt=0x1208e08, isTopLevel=true) at indexcmds.c:2743
#13 in ProcessUtilitySlow (pstate=0x12329f0, pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:1567
#14 in standard_ProcessUtility (pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:1067
#15 in ProcessUtility (pstmt=0x1208f58, queryString=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1209318, qc=0x7ffcc4e39d30) at utility.c:523
#16 in PortalRunUtility (portal=0x1289c90, pstmt=0x1208f58, isTopLevel=true, setHoldSnapshot=false, dest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:1158
#17 in PortalRunMulti (portal=0x1289c90, isTopLevel=true, setHoldSnapshot=false, dest=0x1209318, altdest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:1315
#18 in PortalRun (portal=0x1289c90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1209318, altdest=0x1209318, qc=0x7ffcc4e39d30) at pquery.c:791
#19 in exec_simple_query (query_string=0x12083b0 "REINDEX INDEX CONCURRENTLY tbl_pkey;") at postgres.c:1274

It looks like a deadlock caused by different set of indexes being used as arbiter indexes (or by the different order).

Best regards,

Mikhail.

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From

Michail Nikolaev

Date:

07 August 2024, 09:30:44

Hell, everyone!

Using the brand-new injection points support in specs, I created a spec to reproduce the issue.

It fails like this currently:

make -C src/test/modules/injection_points/ check

@@ -64,6 +64,7 @@

step s3_s1: <... completed>
step s2_s1: <... completed>
+ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold"

starting permutation: s3_s1 s2_s1 s4_s1 s1_s1 s4_s2 s4_s3
injection_points_attach
@@ -129,3 +130,4 @@

step s3_s1: <... completed>
step s2_s1: <... completed>
+ERROR: duplicate key value violates unique constraint "tbl_pkey"

Best regards,

Mikhail.

Attachment

v1-0001-spec-to-reproduce-issue-with-REINDEX-CONCURRENTLY.patch

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

From

Міхаіл Нікалаеў

Date:

20 February, 15:19:50

Hello!

Just rebased.