Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key
Date
Msg-id 20180305232353.gpue7jldnm4bjf4i@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key  (amul sul <sulamul@gmail.com>)
Responses Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key  (Amit Kapila <amit.kapila16@gmail.com>)
Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key  (amul sul <sulamul@gmail.com>)
List pgsql-hackers
Hi,

On 2018-02-13 12:41:26 +0530, amul sul wrote:
> From 08c8c7ece7d9411e70a780dbeed89d81419db6b6 Mon Sep 17 00:00:00 2001
> From: Amul Sul <sulamul@gmail.com>
> Date: Tue, 13 Feb 2018 12:37:52 +0530
> Subject: [PATCH 1/2] Invalidate ip_blkid v5
> 
> v5:
>  - Added code in heap_mask to skip wal_consistency_checking[7]
>  - Fixed previous todos.
> 
> v5-wip2:
>  - Minor changes in RelationFindReplTupleByIndex() and
>    RelationFindReplTupleSeq()
> 
>  - TODO;
>    Same as the privious
> 
> v5-wip: Update w.r.t Amit Kapila's comments[6].
>  - Reverted error message in nodeModifyTable.c from 'tuple to be locked'
>    to 'tuple to be updated'.
> 
>  - TODO:
>  1. Yet to made a decision of having LOG/ELOG/ASSERT in the
>     RelationFindReplTupleByIndex() and RelationFindReplTupleSeq().
> 
> v4: Rebased on "UPDATE of partition key v35" patch[5].
> 
> v3: Update w.r.t Amit Kapila's[3] & Alvaro Herrera[4] comments
>  - typo in all error message and comment : "to an another" -> "to another"
>  - error message change : "tuple to be updated" -> "tuple to be locked"
>  - In ExecOnConflictUpdate(), error report converted into assert &
>   comments added.
> 
> v2: Updated w.r.t Robert review comments[2]
>  - Updated couple of comment of heap_delete argument and ItemPointerData
>  - Added same concurrent update error logic in ExecOnConflictUpdate,
>    RelationFindReplTupleByIndex and RelationFindReplTupleSeq
> 
> v1: Initial version -- as per Amit Kapila's suggestions[1]
>  - When tuple is being moved to another partition then ip_blkid in the
>    tuple header mark to InvalidBlockNumber.

Very nice and instructive to keep this in a submission's commit message.



> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> index 8a846e7dba..f4560ee9cb 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -3037,6 +3037,8 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
>   *    crosscheck - if not InvalidSnapshot, also check tuple against this
>   *    wait - true if should wait for any conflicting update to commit/abort
>   *    hufd - output parameter, filled in failure cases (see below)
> + *    row_moved - true iff the tuple is being moved to another partition
> + *                table due to an update of partition key. Otherwise, false.
>   *

I don't think 'row_moved' is a good variable name for this. Moving a row
in our heap format can mean a lot of things. Maybe 'to_other_part' or
'changing_part'?


> +    /*
> +     * Sets a block identifier to the InvalidBlockNumber to indicate such an
> +     * update being moved tuple to another partition.
> +     */
> +    if (row_moved)
> +        BlockIdSet(&((tp.t_data->t_ctid).ip_blkid), InvalidBlockNumber);

The parens here are set in a bit werid way. I assume that's from copying
it out of ItemPointerSet()?  Why aren't you just using ItemPointerSetBlockNumber()?


I think it'd be better if we followed the example of specultive inserts
and created an equivalent of HeapTupleHeaderSetSpeculativeToken. That'd
be a heck of a lot easier to grep for...


> @@ -9314,6 +9323,18 @@ heap_mask(char *pagedata, BlockNumber blkno)
>               */
>              if (HeapTupleHeaderIsSpeculative(page_htup))
>                  ItemPointerSet(&page_htup->t_ctid, blkno, off);
> +
> +            /*
> +             * For a deleted tuple, a block identifier is set to the

I think this 'the' is superflous.


> +             * InvalidBlockNumber to indicate that the tuple has been moved to
> +             * another partition due to an update of partition key.

But I think it should be 'the partition key'.


> +             * Like speculative tuple, to ignore any inconsistency set block
> +             * identifier to current block number.

This doesn't quite parse.


> +             */
> +            if (!BlockNumberIsValid(
> +                    BlockIdGetBlockNumber((&((page_htup->t_ctid).ip_blkid)))))
> +                BlockIdSet(&((page_htup->t_ctid).ip_blkid), blkno);
>          }

That formatting looks wrong. I think it should be replaced by a macro
like mentioned above.


>          /*
> diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
> index 160d941c00..a770531e14 100644
> --- a/src/backend/commands/trigger.c
> +++ b/src/backend/commands/trigger.c
> @@ -3071,6 +3071,11 @@ ltrmark:;
>                      ereport(ERROR,
>                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>                               errmsg("could not serialize access due to concurrent update")));
> +                if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
> +                    ereport(ERROR,
> +                            (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                             errmsg("tuple to be locked was already moved to another partition due to concurrent
update")));

Yes, given that we repeat this in multiple places, I *definitely* want
to see this wrapped in a macro with a descriptive name.


> diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
> index 7961b4be6a..b07b7092de 100644
> --- a/src/backend/executor/nodeLockRows.c
> +++ b/src/backend/executor/nodeLockRows.c
> @@ -218,6 +218,11 @@ lnext:
>                      ereport(ERROR,
>                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>                               errmsg("could not serialize access due to concurrent update")));
> +                if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
> +                    ereport(ERROR,
> +                            (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                             errmsg("tuple to be locked was already moved to another partition due to concurrent
update")));
> +

Why are we using ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE rather than
ERRCODE_T_R_SERIALIZATION_FAILURE?  A lot of frameworks have builtin
logic to retry serialization failures, and this kind of thing is going
to resolved by retrying, no?


> diff --git a/src/test/isolation/expected/partition-key-update-1.out
b/src/test/isolation/expected/partition-key-update-1.out
> new file mode 100644

I'd like to see tests that show various interactions with ON CONFLICT.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Weird failures on lorikeet
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Creating backup history files for backups taken fromstandbys