Re: POC: Lock updated tuples in tuple_update() and tuple_delete() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
Date
Msg-id 20230323165020.nhdpwue2uezmbmws@awork3.anarazel.de
Whole thread Raw
In response to Re: POC: Lock updated tuples in tuple_update() and tuple_delete()  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: POC: Lock updated tuples in tuple_update() and tuple_delete()  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On 2023-03-23 18:08:36 +0300, Alexander Korotkov wrote:
> On Thu, Mar 23, 2023 at 3:30 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2023-03-21 01:25:11 +0300, Alexander Korotkov wrote:
> > > I'm going to push patchset v15 if no objections.
> >
> > Just saw that this went in - didn't catch up with the thread before,
> > unfortunately. At the very least I'd like to see some more work on cleaning up
> > the lazy tuple slot stuff. It's replete with unnecessary multiple-evaluation
> > hazards - I realize that there's some of those already, but I don't think we
> > should go further down that route. As far as I can tell there's no need for
> > any of this to be macros.
> 
> Thank you for taking a look at this, even post-commit.  Regarding
> marcos, do you think inline functions would be good instead?

Yes.


> > I think it's entirely sensible to avoid the tuple fetching in ExecDelete(),
> > but it needs a bit less localized work. Instead of using the presence of a
> > tuple in the returning slot, ExecDelete() should track whether it already has
> > fetched the deleted tuple.
> >
> > Or alternatively, do the work to avoid refetching the tuple for the much more
> > common case of not needing EPQ at all.
> >
> > I guess this really is part of my issue with this change - it optimizes the
> > rare case, while not addressing the same inefficiency in the common case.
> 
> I'm going to fix this for ExecDelete().  Avoiding refetching the tuple
> in more common case is something I'm definitely very interested in.
> But I would leave it for the future.

It doesn't seem like a good plan to start with the rare and then address the
common case. The solution for the common case might solve the rare case as
well. One way to make to fix the common case would be to return a tuple
suitable for returning computation as part of the input plan - which would
also fix the EPQ case, since we could just use the EPQ output. Of course there
are complications like triggers, but they seem like they could be dealt with.


> > > @@ -299,14 +305,46 @@ heapam_tuple_complete_speculative(Relation relation, TupleTableSlot *slot,
> > >  static TM_Result
> > >  heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
> > >                                       Snapshot snapshot, Snapshot crosscheck, bool wait,
> > > -                                     TM_FailureData *tmfd, bool changingPart)
> > > +                                     TM_FailureData *tmfd, bool changingPart,
> > > +                                     LazyTupleTableSlot *lockedSlot)
> > >  {
> > > +     TM_Result       result;
> > > +
> > >       /*
> > >        * Currently Deleting of index tuples are handled at vacuum, in case if
> > >        * the storage itself is cleaning the dead tuples by itself, it is the
> > >        * time to call the index tuple deletion also.
> > >        */
> > > -     return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
> > > +     result = heap_delete(relation, tid, cid, crosscheck, wait,
> > > +                                              tmfd, changingPart);
> > > +
> > > +     /*
> > > +      * If the tuple has been concurrently updated, then get the lock on it.
> > > +      * (Do this if caller asked for tat by providing a 'lockedSlot'.) With the
> > > +      * lock held retry of delete should succeed even if there are more
> > > +      * concurrent update attempts.
> > > +      */
> > > +     if (result == TM_Updated && lockedSlot)
> > > +     {
> > > +             TupleTableSlot *evalSlot;
> > > +
> > > +             Assert(wait);
> > > +
> > > +             evalSlot = LAZY_TTS_EVAL(lockedSlot);
> > > +             result = heapam_tuple_lock_internal(relation, tid, snapshot,
> > > +                                                                                     evalSlot, cid,
LockTupleExclusive,
> > > +                                                                                     LockWaitBlock,
> > > +
TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
> > > +                                                                                     tmfd, true);
> >
> > Oh, huh? As I mentioned before, the unconditional use of LockWaitBlock means
> > the wait parameter is ignored.
> >
> > I'm frankly getting annoyed here.
> 
> lockedSlot shoudln't be provided when wait == false.  The assertion
> above expresses this intention.  However, the code lacking of comment
> directly expressing this idea.

I don't think a comment here is going to fix things. You can't expect somebody
trying to use tableam to look into the guts of a heapam function to understand
the API constraints to this degree. And there's afaict no comments in tableam
that indicate any of this.

I also just don't see why this is a sensible constraint? Why should this only
work if wait == false?


> > > +/*
> > > + * This routine does the work for heapam_tuple_lock(), but also support
> > > + * `updated` argument to re-use the work done by heapam_tuple_update() or
> > > + * heapam_tuple_delete() on figuring out that tuple was concurrently updated.
> > > + */
> > > +static TM_Result
> > > +heapam_tuple_lock_internal(Relation relation, ItemPointer tid,
> > > +                                                Snapshot snapshot, TupleTableSlot *slot,
> > > +                                                CommandId cid, LockTupleMode mode,
> > > +                                                LockWaitPolicy wait_policy, uint8 flags,
> > > +                                                TM_FailureData *tmfd, bool updated)
> >
> > Why is the new parameter named 'updated'?
> 
> To indicate that we know that we're locking the updated tuple.
> Probably not descriptive enough.

Given it's used for deletions, I'd say so.


> > > -             ReleaseBuffer(buffer);
> > > +                     ReleaseBuffer(buffer);
> > > +             }
> > > +             else
> > > +             {
> > > +                     updated = false;
> > > +             }
> > >
> > >               if (!ItemPointerEquals(&tmfd->ctid, &tuple->t_self))
> > >               {
> >
> > Which means this is completely bogus now?
> >
> >         HeapTuple       tuple = &bslot->base.tupdata;
> >
> > In the first iteration this just points to the newly created slot. Which
> > doesn't have a tuple stored in it. So the above checks some uninitialized
> > memory.
> 
> No, this is not so.  tuple->t_self is unconditionally assigned few
> lines before.  So, it can't be uninitialized memory given that *tid is
> initialized.

Ugh. This means you're basically leaving uninitialized / not initialized state
in the other portions of the tuple/slot, without even documenting that. The
code was ugly starting out, but this certainly makes it worse.

There's also no comment explaining that tmfd suddenly is load-bearing *input*
into heapam_tuple_lock_internal(), whereas previously it was purely an output
parameter - and is documented as such:
 * Output parameters:
 *    *slot: contains the target tuple
 *    *tmfd: filled in failure cases (see below)

This is an *awful* API.


> I seriously doubt this patch could pass the tests if that comparison
> would use uninitialized memory.

IDK about that - it's hard to exercise this code in the regression tests, and
plenty things are zero initialized, which often makes things appear to work in
the first iteration.


> > Giving up at this point.
> >
> >
> > This doesn't seem ready to have been committed.
> 
> Yep, that could be better.  Given, that item pointer comparison
> doesn't really use uninitialized memory, probably not as bad as you
> thought at the first glance.

The details of how it's bad maybe differ slightly, but looking at it a bit
longer I also found new things. So I don't really think it's better than what
I though it was.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: HOT chain validation in verify_heapam()
Next
From: Robert Haas
Date:
Subject: Re: HOT chain validation in verify_heapam()