ON CONFLICT (and manual row locks) cause xmax of updated tuple tounnecessarily be set - Mailing list pgsql-hackers

From Andres Freund
Subject ON CONFLICT (and manual row locks) cause xmax of updated tuple tounnecessarily be set
Date
Msg-id 20190724232439.lpxzjw2jg3ukgcqn@alap3.anarazel.de
Whole thread Raw
Responses Re: ON CONFLICT (and manual row locks) cause xmax of updated tuple tounnecessarily be set  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hi,


Scenario is a very plain upsert:

CREATE TABLE upsert(key int primary key);
INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key;
INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key;
INSERT 0 1
INSERT 0 1

postgres[8755][1]=# SELECT page_items.* FROM generate_series(0, pg_relation_size('upsert'::regclass::text) / 8192 - 1)
blkno,get_raw_page('upsert'::regclass::text, blkno::int4) AS raw_page, heap_page_items(raw_page) as page_items;
 

┌────┬────────┬──────────┬────────┬──────────┬──────────┬──────────┬────────┬─────────────┬────────────┬────────┬────────┬────────┬────────────┐
│ lp │ lp_off │ lp_flags │ lp_len │  t_xmin  │  t_xmax  │ t_field3 │ t_ctid │ t_infomask2 │ t_infomask │ t_hoff │
t_bits│ t_oid  │   t_data   │
 

├────┼────────┼──────────┼────────┼──────────┼──────────┼──────────┼────────┼─────────────┼────────────┼────────┼────────┼────────┼────────────┤
│  1 │   8160 │        1 │     28 │ 19742591 │ 19742592 │        0 │ (0,2)  │       24577 │        256 │     24 │
(null)│ (null) │ \x01000000 │
 
│  2 │   8128 │        1 │     28 │ 19742592 │ 19742592 │        0 │ (0,2)  │       32769 │       8336 │     24 │
(null)│ (null) │ \x01000000 │
 

└────┴────────┴──────────┴────────┴──────────┴──────────┴──────────┴────────┴─────────────┴────────────┴────────┴────────┴────────┴────────────┘
(2 rows)

as you can see the same xmax is set for both row version, with the new
infomask being  HEAP_XMAX_KEYSHR_LOCK | HEAP_XMAX_LOCK_ONLY | HEAP_UPDATED.


The reason that happens is that ExecOnConflictUpdate() needs to lock the
row to be able to reliably compute a new tuple version based on that
row. heap_update() then decides it needs to carry that xmax forward, as
it's a valid lock:


        /*
         * If the tuple we're updating is locked, we need to preserve the locking
         * info in the old tuple's Xmax.  Prepare a new Xmax value for this.
         */
        compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
                                                          oldtup.t_data->t_infomask,
                                                          oldtup.t_data->t_infomask2,
                                                          xid, *lockmode, true,
                                                          &xmax_old_tuple, &infomask_old_tuple,
                                                          &infomask2_old_tuple);

        /*
         * And also prepare an Xmax value for the new copy of the tuple.  If there
         * was no xmax previously, or there was one but all lockers are now gone,
         * then use InvalidXid; otherwise, get the xmax from the old tuple.  (In
         * rare cases that might also be InvalidXid and yet not have the
         * HEAP_XMAX_INVALID bit set; that's fine.)
         */
        if ((oldtup.t_data->t_infomask & HEAP_XMAX_INVALID) ||
                HEAP_LOCKED_UPGRADED(oldtup.t_data->t_infomask) ||
                (checked_lockers && !locker_remains))
                xmax_new_tuple = InvalidTransactionId;
        else
                xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup.t_data);

but we really don't need to do any of that in this case - the only
locker is the current backend, after all.

I think this isn't great, because it'll later will cause unnecessary
hint bit writes (although ones somewhat likely combined with setting
XMIN_COMMITTED), and even extra work for freezing.

Based on a quick look this wasn't the case before the finer grained
tuple locking - which makes sense, there was no cases where locks would
need to be carried forward.


It's worthwhile to note that this *nearly* already works, because of the
following codepath in compute_new_xmax_infomask():

         * If the lock to be acquired is for the same TransactionId as the
         * existing lock, there's an optimization possible: consider only the
         * strongest of both locks as the only one present, and restart.
         */
        if (xmax == add_to_xmax)
        {
            /*
             * Note that it's not possible for the original tuple to be updated:
             * we wouldn't be here because the tuple would have been invisible and
             * we wouldn't try to update it.  As a subtlety, this code can also
             * run when traversing an update chain to lock future versions of a
             * tuple.  But we wouldn't be here either, because the add_to_xmax
             * would be different from the original updater.
             */
            Assert(HEAP_XMAX_IS_LOCKED_ONLY(old_infomask));

            /* acquire the strongest of both */
            if (mode < old_mode)
                mode = old_mode;
            /* mustn't touch is_update */

            old_infomask |= HEAP_XMAX_INVALID;
            goto l5;
        }

which set HEAP_XMAX_INVALID for old_infomask, which would then trigger
the code above not carrying forward xmax to the new tuple - but
compute_new_xmax_infomask() operates on a copy of old_infomask.


Note that this contradict comments in heap_update() itself:

         * If we found a valid Xmax for the new tuple, then the infomask bits
         * to use on the new tuple depend on what was there on the old one.
         * Note that since we're doing an update, the only possibility is that
         * the lockers had FOR KEY SHARE lock.
         */

which seems to indicate that this behaviour wasn't forseen.


I find the separation of concerns (and variable naming) between
computations happening in heap_update() itself, and
compute_new_xmax_infomask() fairly confusing and redundant.

I mean, compute_new_xmax_infomask() expands multis, builds a new one
with the updater added. Then re-expands it via GetMultiXactIdHintBits(),
to compute infomask bits for the old tuple. Then returns. Just for
heap_update() to re-re-expand the multi to compute the infomask bits for
the new tuple, again with GetMultiXactIdHintBits()?  I know that there's
a cache, but still. That's some seriously repetitive work.

Oh, and the things GetMultiXactIdHintBits() returns imo aren't really
well described with hint bits.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Statistical aggregate functions are not working with PARTIAL aggregation
Next
From: Andres Freund
Date:
Subject: Re: Statistical aggregate functions are not working with PARTIALaggregation