Re: [HACKERS] UPDATE of partition key - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [HACKERS] UPDATE of partition key |
Date | |
Msg-id | CAA4eK1JgPjKxfgw9GdJUowDA617KyCEj_VPhZTrCUxdWnwGBwg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] UPDATE of partition key (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] UPDATE of partition key
|
List | pgsql-hackers |
On Thu, Jun 1, 2017 at 3:25 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, May 29, 2017 at 5:26 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> But I think, we can also take step-by-step approach even for v11. If >>> we agree that it is ok to silently do the updates as long as we >>> document the behaviour, we can go ahead and do this, and then as a >>> second step, implement error handling as a separate patch. If that >>> patch does not materialize, we at least have the current behaviour >>> documented. >> >> I think that is sensible approach if we find the second step involves >> big or complicated changes. > > I think it is definitely a good idea to separate the two patches. > UPDATE tuple routing without any special handling for the EPQ issue is > just a partitioning feature. The proposed handling for the EPQ issue > is an *on-disk format change*. That turns a patch which is subject > only to routine bugs into one which can eat your data permanently -- > so having the "can eat your data permanently" separated out for both > review and commit seems only prudent. For me, it's not a matter of > which patch is big or complicated, but rather a matter of one of them > being a whole lot riskier than the other. Even UPDATE tuple routing > could mess things up pretty seriously if we end up with tuples in the > wrong partition, of course, but the other thing is still worse. > > In terms of a development plan, I think we would need to have both > patches before either could be committed. I believe that everyone > other than me who has expressed an opinion on this issue has said that > it's unacceptable to just ignore the issue, so it doesn't sound like > there will be much appetite for having #1 go into the tree without #2. > I'm still really concerned about that approach because we do not have > very much bit space left and WARM wants to use quite a bit of it. I > think it's quite possible that we'll be sad in the future if we find > that we can't implement feature XYZ because of the bit-space consumed > by this feature. However, I don't have the only vote here and I'm not > going to try to shove this into the tree over multiple objections > (unless there are a lot more votes the other way, but so far there's > no sign of that). > > Greg/Amit's idea of using the CTID field rather than an infomask bit > seems like a possibly promising approach. Not everything that needs > bit-space can use the CTID field, so using it is a little less likely > to conflict with something else we want to do in the future than using > a precious infomask bit. However, I'm worried about this: > > /* Make sure there is no forward chain link in t_ctid */ > tp.t_data->t_ctid = tp.t_self; > > The comment does not say *why* we need to make sure that there is no > forward chain link, but it implies that some code somewhere in the > system does or at one time did depend on no forward link existing. > I think it is to ensure that EvalPlanQual mechanism gets invoked in the right case. The visibility routine will return HeapTupleUpdated both when the tuple is deleted or updated (updated - has a newer version of the tuple), so we use ctid to decide if we need to follow the tuple chain for a newer version of the tuple. > Any such code that still exists will need to be updated. > Yeah. > The other potential issue I see here is that I know the WARM code also > tries to use the bit-space in the CTID field; in particular, it uses > the CTID field of the last tuple in a HOT chain to point back to the > root of the chain. That seems like it could conflict with the usage > proposed here, but I'm not totally sure. > The proposed change in WARM tuple patch uses ip_posid field of CTID and we are planning to use ip_blkid field. Here is the relevant text and code from WARM tuple patch: "Store the root line pointer of the WARM chain in the t_ctid.ip_posid field of the last tuple in the chain and mark the tuple header with HEAP_TUPLE_LATEST flag to record that fact." +#define HeapTupleHeaderSetHeapLatest(tup, offnum) \ +do { \ + AssertMacro(OffsetNumberIsValid(offnum)); \ + (tup)->t_infomask2 |= HEAP_LATEST_TUPLE; \ + ItemPointerSetOffsetNumber(&(tup)->t_ctid, (offnum)); \ +} while (0) For further details, refer patch 0001-Track-root-line-pointer-v23_v26 in the below e-mail: https://www.postgresql.org/message-id/CABOikdOTstHK2y0rDk%2BY3Wx9HRe%2BbZtj3zuYGU%3DVngneiHo5KQ%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: