Re: [HACKERS] UPDATE of partition key - Mailing list pgsql-hackers

From Amit Langote
Subject Re: [HACKERS] UPDATE of partition key
Date
Msg-id 8c7bdfd1-685d-a0e9-56c2-278f00124d12@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] UPDATE of partition key  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Responses Re: [HACKERS] UPDATE of partition key  (Amit Khandekar <amitdkhan.pg@gmail.com>)
List pgsql-hackers
Hi,

On 2017/03/02 15:23, Amit Khandekar wrote:
> On 23 February 2017 at 16:02, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>
>>> 2. In the patch, as part of the row movement, ExecDelete() is called
>>> followed by ExecInsert(). This is done that way, because we want to
>>> have the ROW triggers on that (sub)partition executed. If a user has
>>> explicitly created DELETE and INSERT BR triggers for this partition, I
>>> think we should run those. While at the same time, another question
>>> is, what about UPDATE trigger on the same table ? Here again, one can
>>> argue that because this UPDATE has been transformed into a
>>> DELETE-INSERT, we should not run UPDATE trigger for row-movement. But
>>> there can be a counter-argument. For e.g. if a user needs to make sure
>>> about logging updates of particular columns of a row, he will expect
>>> the logging to happen even when that row was transparently moved. In
>>> the patch, I have retained the firing of UPDATE BR trigger.
>>
>> What of UPDATE AR triggers?
> 
> I think it does not make sense running after row triggers in case of
> row-movement. There is no update happened on that leaf partition. This
> reasoning can also apply to BR update triggers. But the reasons for
> having a BR trigger and AR triggers are quite different. Generally, a
> user needs to do some modifications to the row before getting the
> final NEW row into the database, and hence [s]he defines a BR trigger
> for that. And we can't just silently skip this step only because the
> final row went into some other partition; in fact the row-movement
> itself might depend on what the BR trigger did with the row. Whereas,
> AR triggers are typically written for doing some other operation once
> it is made sure the row is actually updated. In case of row-movement,
> it is not actually updated.

OK, so it'd be better to clarify in the documentation that that's the case.

>> As a comment on how row-movement is being handled in code, I wonder if it
>> could be be made to look similar structurally to the code in ExecInsert()
>> that handles ON CONFLICT DO UPDATE.  That is,
>>
>> if (partition constraint fails)
>> {
>>     /* row movement */
>> }
>> else
>> {
>>     /* ExecConstraints() */
>>     /* heap_update(), EvalPlanQual(), and ExecInsertIndexTuples() */
>> }
> 
> I guess this is what has been effectively done for row movement, no ?

Yes, although it seems nice how the formatting of the code in ExecInsert()
makes it apparent that they are distinct code paths.  OTOH, the additional
diffs caused by the suggested formatting might confuse other reviewers.

> Looking at that, I found that in the current patch, if there is no
> row-movement happening, ExecPartitionCheck() effectively gets called
> twice : First time when ExecPartitionCheck() is explicitly called for
> row-movement-required check, and second time in ExecConstraints()
> call. May be there should be 2 separate functions
> ExecCheckConstraints() and ExecPartitionConstraints(), and also
> ExecCheckConstraints() that just calls both. This way we can call the
> appropriate functions() accordingly in row-movement case, and the
> other callers would continue to call ExecConstraints().

One random idea: we could add a bool ri_PartitionCheckOK which is set to
true after it is checked in ExecConstraints().  And modify the condition
in ExecConstraints() as follows:
   if (resultRelInfo->ri_PartitionCheck &&
+       !resultRelInfo->ri_PartitionCheckOK &&       !ExecPartitionCheck(resultRelInfo, slot, estate))

>>> 3. In case of a concurrent update/delete, suppose session A has locked
>>> the row for deleting it. Now a session B has decided to update this
>>> row and that is going to cause row movement, which means it will
>>> delete it first. But when session A is finished deleting it, session B
>>> finds that it is already deleted. In such case, it should not go ahead
>>> with inserting a new row as part of the row movement. For that, I have
>>> added a new parameter 'already_delete' for ExecDelete().
>>
>> Makes sense.  Maybe: already_deleted -> concurrently_deleted.
> 
> Right, concurrently_deleted sounds more accurate. In the next patch, I
> will change that.

Okay, thanks.

>>> Of course, this still won't completely solve the concurrency anomaly.
>>> In the above case, the UPDATE of Session B gets lost. May be, for a
>>> user that does not tolerate this, we can have a table-level option
>>> that disallows row movement, or will cause an error to be thrown for
>>> one of the concurrent session.
>>
>> Will this table-level option be specified for a partitioned table once or
>> for individual partitions?
> 
> My opinion is, if decide to have table-level option, it should be on
> the root partition, to keep it simple.

I see.

>> But that starts to sound less attractive when one realizes that
>> that will occur for every row that wants to move.
> 
> If we manage to call ExecSetupPartitionTupleRouting() during execution
> phase only once for the very first time we find the update requires
> row movement, then we can re-use the info.

That might work, too.  But I guess we're going with initialization in
ExecInitModifyTable().

> One more thing I noticed is that, in case of update-returning, the
> ExecDelete() will also generate result of RETURNING, which we are
> discarding. So this is a waste. We should not even process RETURNING
> in ExecDelete() called for row-movement. The RETURNING should be
> processed only for ExecInsert().

I wonder if it makes sense to have ExecDeleteInternal() and
ExecInsertInternal(), which perform the core function of DELETE and
INSERT, respectively.  Such as running triggers, checking constraints,
etc.  The RETURNING part is controllable by the statement, so it will be
handled by the ExecDelete() and ExecInsert(), like it is now.

When called from ExecUpdate() as part of row-movement, they perform just
the core part and leave the rest to be done by ExecUpdate() itself.

Thanks,
Amit





pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Print correct startup cost for the group aggregate.
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] Logical Replication and Character encoding