Re: [HACKERS] UPDATE of partition key - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: [HACKERS] UPDATE of partition key
Date
Msg-id CAJ3gD9cpgQiNS2vigq48QK53mzjPQwO-zEKu-CyUt4ujiRrtzA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] UPDATE of partition key  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: [HACKERS] UPDATE of partition key  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Re: [HACKERS] UPDATE of partition key  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
I haven't yet handled all points, but meanwhile, some of the important
points are discussed below ...

On 6 March 2017 at 15:11, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>
>>> But that starts to sound less attractive when one realizes that
>>> that will occur for every row that wants to move.
>>
>> If we manage to call ExecSetupPartitionTupleRouting() during execution
>> phase only once for the very first time we find the update requires
>> row movement, then we can re-use the info.
>
> That might work, too.  But I guess we're going with initialization in
> ExecInitModifyTable().

I am more worried about this: even the UPDATEs that do not involve row
movement would do the expensive setup. So do it only once when we find
that we need to move the row. Something like this :
ExecUpdate()
{
....   if (resultRelInfo->ri_PartitionCheck &&     !ExecPartitionCheck(resultRelInfo, slot, estate))   {     bool
already_deleted;
     ExecDelete(tupleid, oldtuple, planSlot, epqstate, estate,            &already_deleted, canSetTag);
     if (already_deleted)       return NULL;     else     {       /* If we haven't already built the state for INSERT
    * tuple routing, build it now */       if (!mtstate->mt_partition_dispatch_info)       {
ExecSetupPartitionTupleRouting(                  mtstate->resultRelInfo->ri_RelationDesc,
&mtstate->mt_partition_dispatch_info,                  &mtstate->mt_partitions,
&mtstate->mt_partition_tupconv_maps,                  &mtstate->mt_partition_tuple_slot,
&mtstate->mt_num_dispatch,                  &mtstate->mt_num_partitions);       }
 
       return ExecInsert(mtstate, slot, planSlot, NULL,                 ONCONFLICT_NONE, estate, false);     }   }
...
}


>
>> One more thing I noticed is that, in case of update-returning, the
>> ExecDelete() will also generate result of RETURNING, which we are
>> discarding. So this is a waste. We should not even process RETURNING
>> in ExecDelete() called for row-movement. The RETURNING should be
>> processed only for ExecInsert().
>
> I wonder if it makes sense to have ExecDeleteInternal() and
> ExecInsertInternal(), which perform the core function of DELETE and
> INSERT, respectively.  Such as running triggers, checking constraints,
> etc.  The RETURNING part is controllable by the statement, so it will be
> handled by the ExecDelete() and ExecInsert(), like it is now.
>
> When called from ExecUpdate() as part of row-movement, they perform just
> the core part and leave the rest to be done by ExecUpdate() itself.

Yes, if we decide to execute only the core insert/delete operations
and skip the triggers, then there is a compelling reason to have
something like ExecDeleteInternal() and ExecInsertInternal(). In fact,
I was about to start doing the same, except for the below discussion
...

On 4 March 2017 at 12:49, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Mar 2, 2017 at 11:53 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> I think it does not make sense running after row triggers in case of
>> row-movement. There is no update happened on that leaf partition. This
>> reasoning can also apply to BR update triggers. But the reasons for
>> having a BR trigger and AR triggers are quite different. Generally, a
>> user needs to do some modifications to the row before getting the
>> final NEW row into the database, and hence [s]he defines a BR trigger
>> for that. And we can't just silently skip this step only because the
>> final row went into some other partition; in fact the row-movement
>> itself might depend on what the BR trigger did with the row. Whereas,
>> AR triggers are typically written for doing some other operation once
>> it is made sure the row is actually updated. In case of row-movement,
>> it is not actually updated.
>
> How about running the BR update triggers for the old partition and the
> AR update triggers for the new partition?  It seems weird to run BR
> update triggers but not AR update triggers.  Another option would be
> to run BR and AR delete triggers and then BR and AR insert triggers,
> emphasizing the choice to treat this update as a delete + insert, but
> (as Amit Kh. pointed out to me when we were in a room together this
> week) that precludes using the BEFORE trigger to modify the row.

I checked the trigger behaviour in case of UPSERT. Here, when there is
conflict found, ExecOnConflictUpdate() is called, and then the
function returns immediately, which means AR INSERT trigger will not
fire. And ExecOnConflictUpdate() calls ExecUpdate(), which means BR
and AR UPDATE triggers will be fired. So in short, when an INSERT
becomes an UPDATE, BR INSERT triggers do fire, but then the BR UPDATE
and AR UPDATE also get fired. On the same lines, it makes sense in
case of UPDATE=>DELETE+INSERT operation to fire a BR UPDATE trigger on
the original table, and then the BR and AR DELETE/INSERT triggers on
the respective tables.

So the common policy can be :
Fire the BR trigger. It can be INESRT/UPDATE/DELETE trigger depending
upon what the statement is.
If there is a change in the operation, according to what the operation
is converted to (UPDATE->DELETE+INSERT or INSERT->UPDATE), all the
respective triggers would be fired.



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Next
From: Emre Hasegeli
Date:
Subject: Re: [HACKERS] BRIN cost estimate