Home > mailing lists

Re: [HACKERS] UPDATE of partition key - Mailing list pgsql-hackers

From	Amit Khandekar
Subject	Re: [HACKERS] UPDATE of partition key
Date	March 2, 2017 12:23:24
Msg-id	CAJ3gD9cfcQL9cAPQrEhvLjWtdzATSefOpdZ0S4-ONcUTC8ocYA@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] UPDATE of partition key (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses	Re: [HACKERS] UPDATE of partition key (Robert Haas <robertmhaas@gmail.com>) Re: [HACKERS] UPDATE of partition key (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
List	pgsql-hackers

Tree view

On 23 February 2017 at 16:02, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
>
>> 2. In the patch, as part of the row movement, ExecDelete() is called
>> followed by ExecInsert(). This is done that way, because we want to
>> have the ROW triggers on that (sub)partition executed. If a user has
>> explicitly created DELETE and INSERT BR triggers for this partition, I
>> think we should run those. While at the same time, another question
>> is, what about UPDATE trigger on the same table ? Here again, one can
>> argue that because this UPDATE has been transformed into a
>> DELETE-INSERT, we should not run UPDATE trigger for row-movement. But
>> there can be a counter-argument. For e.g. if a user needs to make sure
>> about logging updates of particular columns of a row, he will expect
>> the logging to happen even when that row was transparently moved. In
>> the patch, I have retained the firing of UPDATE BR trigger.
>
> What of UPDATE AR triggers?

I think it does not make sense running after row triggers in case of
row-movement. There is no update happened on that leaf partition. This
reasoning can also apply to BR update triggers. But the reasons for
having a BR trigger and AR triggers are quite different. Generally, a
user needs to do some modifications to the row before getting the
final NEW row into the database, and hence [s]he defines a BR trigger
for that. And we can't just silently skip this step only because the
final row went into some other partition; in fact the row-movement
itself might depend on what the BR trigger did with the row. Whereas,
AR triggers are typically written for doing some other operation once
it is made sure the row is actually updated. In case of row-movement,
it is not actually updated.

>
> As a comment on how row-movement is being handled in code, I wonder if it
> could be be made to look similar structurally to the code in ExecInsert()
> that handles ON CONFLICT DO UPDATE.  That is,
>
> if (partition constraint fails)
> {
>     /* row movement */
> }
> else
> {
>     /* ExecConstraints() */
>     /* heap_update(), EvalPlanQual(), and ExecInsertIndexTuples() */
> }

I guess this is what has been effectively done for row movement, no ?

Looking at that, I found that in the current patch, if there is no
row-movement happening, ExecPartitionCheck() effectively gets called
twice : First time when ExecPartitionCheck() is explicitly called for
row-movement-required check, and second time in ExecConstraints()
call. May be there should be 2 separate functions
ExecCheckConstraints() and ExecPartitionConstraints(), and also
ExecCheckConstraints() that just calls both. This way we can call the
appropriate functions() accordingly in row-movement case, and the
other callers would continue to call ExecConstraints().

>
> I see that ExecConstraint() won't get called on the source partition's
> constraints if row movement occurs.  Maybe, that's unnecessary because the
> new row won't be inserted into that partition anyway.

Yes I agree.

>
> ExecWithCheckOptions() for RLS update check does happen *before* row
> movement though.

Yes. I think we should do it anyways.

>
>> 3. In case of a concurrent update/delete, suppose session A has locked
>> the row for deleting it. Now a session B has decided to update this
>> row and that is going to cause row movement, which means it will
>> delete it first. But when session A is finished deleting it, session B
>> finds that it is already deleted. In such case, it should not go ahead
>> with inserting a new row as part of the row movement. For that, I have
>> added a new parameter 'already_delete' for ExecDelete().
>
> Makes sense.  Maybe: already_deleted -> concurrently_deleted.

Right, concurrently_deleted sounds more accurate. In the next patch, I
will change that.

>
>> Of course, this still won't completely solve the concurrency anomaly.
>> In the above case, the UPDATE of Session B gets lost. May be, for a
>> user that does not tolerate this, we can have a table-level option
>> that disallows row movement, or will cause an error to be thrown for
>> one of the concurrent session.
>
> Will this table-level option be specified for a partitioned table once or
> for individual partitions?

My opinion is, if decide to have table-level option, it should be on
the root partition, to keep it simple.

>
>> 4. The ExecSetupPartitionTupleRouting() is re-used for routing the row
>> that is to be moved. So in ExecInitModifyTable(), we call
>> ExecSetupPartitionTupleRouting() even for UPDATE. We can also do this
>> only during execution time for the very first time we find that we
>> need to do a row movement. I will think over that, but I am thinking
>> it might complicate things, as compared to always doing the setup for
>> UPDATE. WIll check on that.
>
> Hmm.  ExecSetupPartitionTupleRouting(), which does significant amount of
> setup work, is fine being called in ExecInitModifyTable() in the insert
> case because there are often cases where that's a bulk-insert and hence
> cost of the setup work is amortized.  Updates, OTOH, are seldom done in a
> bulk manner.  So that might be an argument for doing it late only when
> needed.

Yes, agreed.

> But that starts to sound less attractive when one realizes that
> that will occur for every row that wants to move.

If we manage to call ExecSetupPartitionTupleRouting() during execution
phase only once for the very first time we find the update requires
row movement, then we can re-use the info.

One more thing I noticed is that, in case of update-returning, the
ExecDelete() will also generate result of RETURNING, which we are
discarding. So this is a waste. We should not even process RETURNING
in ExecDelete() called for row-movement. The RETURNING should be
processed only for ExecInsert().

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

pgsql-hackers by date:

From: Ashutosh Bapat
Date: 02 March 2017, 12:21:13
Subject: Re: [HACKERS] dropping partitioned tables without CASCADE

From: Fabien COELHO
Date: 02 March 2017, 12:23:40
Subject: Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands:\quit_if, \quit_unless)

Re: [HACKERS] UPDATE of partition key - Mailing list pgsql-hackers

Previous

Next