Re: [HACKERS] UPDATE of partition key - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] UPDATE of partition key
Date
Msg-id CAA4eK1JgPjKxfgw9GdJUowDA617KyCEj_VPhZTrCUxdWnwGBwg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] UPDATE of partition key  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] UPDATE of partition key
List pgsql-hackers
On Thu, Jun 1, 2017 at 3:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, May 29, 2017 at 5:26 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> But I think, we can also take step-by-step approach even for v11. If
>>> we agree that it is ok to silently do the updates as long as we
>>> document the behaviour, we can go ahead and do this, and then as a
>>> second step, implement error handling as a separate patch. If that
>>> patch does not materialize, we at least have the current behaviour
>>> documented.
>>
>> I think that is sensible approach if we find the second step involves
>> big or complicated changes.
>
> I think it is definitely a good idea to separate the two patches.
> UPDATE tuple routing without any special handling for the EPQ issue is
> just a partitioning feature.  The proposed handling for the EPQ issue
> is an *on-disk format change*.  That turns a patch which is subject
> only to routine bugs into one which can eat your data permanently --
> so having the "can eat your data permanently" separated out for both
> review and commit seems only prudent.  For me, it's not a matter of
> which patch is big or complicated, but rather a matter of one of them
> being a whole lot riskier than the other.  Even UPDATE tuple routing
> could mess things up pretty seriously if we end up with tuples in the
> wrong partition, of course, but the other thing is still worse.
>
> In terms of a development plan, I think we would need to have both
> patches before either could be committed.  I believe that everyone
> other than me who has expressed an opinion on this issue has said that
> it's unacceptable to just ignore the issue, so it doesn't sound like
> there will be much appetite for having #1 go into the tree without #2.
> I'm still really concerned about that approach because we do not have
> very much bit space left and WARM wants to use quite a bit of it.  I
> think it's quite possible that we'll be sad in the future if we find
> that we can't implement feature XYZ because of the bit-space consumed
> by this feature.  However, I don't have the only vote here and I'm not
> going to try to shove this into the tree over multiple objections
> (unless there are a lot more votes the other way, but so far there's
> no sign of that).
>
> Greg/Amit's idea of using the CTID field rather than an infomask bit
> seems like a possibly promising approach.  Not everything that needs
> bit-space can use the CTID field, so using it is a little less likely
> to conflict with something else we want to do in the future than using
> a precious infomask bit.  However, I'm worried about this:
>
>     /* Make sure there is no forward chain link in t_ctid */
>     tp.t_data->t_ctid = tp.t_self;
>
> The comment does not say *why* we need to make sure that there is no
> forward chain link, but it implies that some code somewhere in the
> system does or at one time did depend on no forward link existing.
>

I think it is to ensure that EvalPlanQual mechanism gets invoked in
the right case.   The visibility routine will return HeapTupleUpdated
both when the tuple is deleted or updated (updated - has a newer
version of the tuple), so we use ctid to decide if we need to follow
the tuple chain for a newer version of the tuple.

> Any such code that still exists will need to be updated.
>

Yeah.

> The other potential issue I see here is that I know the WARM code also
> tries to use the bit-space in the CTID field; in particular, it uses
> the CTID field of the last tuple in a HOT chain to point back to the
> root of the chain.  That seems like it could conflict with the usage
> proposed here, but I'm not totally sure.
>

The proposed change in WARM tuple patch uses ip_posid field of CTID
and we are planning to use ip_blkid field.  Here is the relevant text
and code from WARM tuple patch:

"Store the root line pointer of the WARM chain in the t_ctid.ip_posid
field of the last tuple in the chain and mark the tuple header with
HEAP_TUPLE_LATEST flag to record that fact."

+#define HeapTupleHeaderSetHeapLatest(tup, offnum) \
+do { \
+ AssertMacro(OffsetNumberIsValid(offnum)); \
+ (tup)->t_infomask2 |= HEAP_LATEST_TUPLE; \
+ ItemPointerSetOffsetNumber(&(tup)->t_ctid, (offnum)); \
+} while (0)

For further details, refer patch 0001-Track-root-line-pointer-v23_v26
in the below e-mail:
https://www.postgresql.org/message-id/CABOikdOTstHK2y0rDk%2BY3Wx9HRe%2BbZtj3zuYGU%3DVngneiHo5KQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Server ignores contents of SASLInitialResponse
Next
From: Amit Khandekar
Date:
Subject: Re: [HACKERS] UPDATE of partition key