On 4/24/15 5:30 PM, Alvaro Herrera wrote:
> Jim Nasby wrote:
>
>> It looks like the biggest complaint (aside from allowing a limited number of
>> tuples to be moved) is in [1] and [2], where Tom is saying that you can't
>> simply call heap_update() like this without holding an exclusive lock on the
>> table. Is that because we're not actually changing the tuple?
>
> That's nonsense -- obviously UPDATE can do heap_update without an
> exclusive lock on the table, so the explanation must be something else.
> I think his actual complaint was that you can't remove the old tuple
> until concurrent readers of the table have already finished scanning it,
> or you get into a situation where they might need to read the page in
> which the original version resided, but your mini-vacuum already removed
> it. So before removing it you need to wait until they are all finished.
> This is the reason I mentioned CREATE INDEX CONCURRENTLY: if you wait
> until those transactions are all gone (like CIC does), you are then free
> to remove the old versions of the tuple, because you know that all
> readers have a snapshot new enough to see the new version of the tuple.
Except I don't see anywhere in the patch that's actually removing the
old tuple...
>> Another issue is both HOT and KeyUpdate; I think we need to completely
>> ignore/over-ride that stuff for this.
>
> You don't need anything for HOT, because cross-page updates are never
> HOT. Not sure what you mean about KeyUpdate, but yeah you might need
> something there -- obviously, you don't want to create multixacts
> unnecessarily.
If I'm not mistaken, if there's enough room left on the page then
HeapSatisfiesHOTandKeyUpdate() will say this tuple satisfies HOT. So
we'd have to do something to over-ride that, and I don't think the
current patch does that. (It might force it to a new page anyway, but it
does nothing with satisfies_hot, which I suspect isn't safe.)
>> Instead of adding forcefsm, I think it would be more useful to accept a
>> target block number. That way we can actually control where the new tuple
>> goes.
>
> Whatever makes the most sense, I suppose. (Maybe we shouldn't consider
> this a tweaked heap_update -- which is already complex enough -- but a
> separate heapam entry point.)
Yeah, I thought about creating heap_move, but I suspect that would still
have to worry about a lot of this other stuff anyway. Far more likely
for a change to be missed in heap_move than heap_update too.
I am tempted to add a SQL heap_move function though, assuming it's not
much extra work.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com