Thread: Re: pgsql: Fix a couple of bugs in MultiXactId freezing

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Noah Misch
Date:
On Sat, Nov 30, 2013 at 01:06:09AM +0000, Alvaro Herrera wrote:
> Fix a couple of bugs in MultiXactId freezing
>
> Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look
> into a multixact to check the members against cutoff_xid.

> !             /*
> !              * This is a multixact which is not marked LOCK_ONLY, but which
> !              * is newer than the cutoff_multi.  If the update_xid is below the
> !              * cutoff_xid point, then we can just freeze the Xmax in the
> !              * tuple, removing it altogether.  This seems simple, but there
> !              * are several underlying assumptions:
> !              *
> !              * 1. A tuple marked by an multixact containing a very old
> !              * committed update Xid would have been pruned away by vacuum; we
> !              * wouldn't be freezing this tuple at all.
> !              *
> !              * 2. There cannot possibly be any live locking members remaining
> !              * in the multixact.  This is because if they were alive, the
> !              * update's Xid would had been considered, via the lockers'
> !              * snapshot's Xmin, as part the cutoff_xid.

READ COMMITTED transactions can reset MyPgXact->xmin between commands,
defeating that assumption; see SnapshotResetXmin().  I have attached an
isolationtester spec demonstrating the problem.  The test spec additionally
covers a (probably-related) assertion failure, new in 9.3.2.

> !              *
> !              * 3. We don't create new MultiXacts via MultiXactIdExpand() that
> !              * include a very old aborted update Xid: in that function we only
> !              * include update Xids corresponding to transactions that are
> !              * committed or in-progress.
> !              */
> !             update_xid = HeapTupleGetUpdateXid(tuple);
> !             if (TransactionIdPrecedes(update_xid, cutoff_xid))
> !                 freeze_xmax = true;

That was the only concrete runtime problem I found during a study of the
newest heap_freeze_tuple() and heap_tuple_needs_freeze() code.  One thing that
leaves me unsure is the fact that vacuum_set_xid_limits() does no locking to
ensure a consistent result between GetOldestXmin() and GetOldestMultiXactId().
Transactions may start or end between those calls, making the
GetOldestMultiXactId() result represent a later set of transactions than the
GetOldestXmin() result.  I suspect that's fine.  New transactions have no
immediate effect on either cutoff, and transaction end can only increase a
cutoff.  Using a slightly-lower cutoff than the maximum safe cutoff is always
okay; consider vacuum_defer_cleanup_age.

Thanks,
nm

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Attachment

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
> On Sat, Nov 30, 2013 at 01:06:09AM +0000, Alvaro Herrera wrote:
> > Fix a couple of bugs in MultiXactId freezing
> > 
> > Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look
> > into a multixact to check the members against cutoff_xid.
> 
> > !             /*
> > !              * This is a multixact which is not marked LOCK_ONLY, but which
> > !              * is newer than the cutoff_multi.  If the update_xid is below the
> > !              * cutoff_xid point, then we can just freeze the Xmax in the
> > !              * tuple, removing it altogether.  This seems simple, but there
> > !              * are several underlying assumptions:
> > !              *
> > !              * 1. A tuple marked by an multixact containing a very old
> > !              * committed update Xid would have been pruned away by vacuum; we
> > !              * wouldn't be freezing this tuple at all.
> > !              *
> > !              * 2. There cannot possibly be any live locking members remaining
> > !              * in the multixact.  This is because if they were alive, the
> > !              * update's Xid would had been considered, via the lockers'
> > !              * snapshot's Xmin, as part the cutoff_xid.
> 
> READ COMMITTED transactions can reset MyPgXact->xmin between commands,
> defeating that assumption; see SnapshotResetXmin().  I have attached an
> isolationtester spec demonstrating the problem.

Any idea how to cheat our way out of that one given the current way
heap_freeze_tuple() works (running on both primary and standby)? My only
idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
We can't even realistically create a new multixact with fewer members
with the current format of xl_heap_freeze.

> The test spec additionally
> covers a (probably-related) assertion failure, new in 9.3.2.

Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
last seems actually unrelated, I am not sure why it's 9.3.2
only. Alvaro, are you looking?

> That was the only concrete runtime problem I found during a study of the
> newest heap_freeze_tuple() and heap_tuple_needs_freeze() code.

I'd even be interested in fuzzy problems ;). If 9.3. wouldn't have been
released the interactions between cutoff_xid/multi would have caused me
to say "back to the drawing" board... I'm not suprised if further things
are lurking there.

>  One thing that
> leaves me unsure is the fact that vacuum_set_xid_limits() does no locking to
> ensure a consistent result between GetOldestXmin() and GetOldestMultiXactId().
> Transactions may start or end between those calls, making the
> GetOldestMultiXactId() result represent a later set of transactions than the
> GetOldestXmin() result.  I suspect that's fine.  New transactions have no
> immediate effect on either cutoff, and transaction end can only increase a
> cutoff.  Using a slightly-lower cutoff than the maximum safe cutoff is always
> okay; consider vacuum_defer_cleanup_age.

Yes, that seems fine to me, with the same reasoning.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Noah Misch
Date:
On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote:
> On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
> > On Sat, Nov 30, 2013 at 01:06:09AM +0000, Alvaro Herrera wrote:
> > > Fix a couple of bugs in MultiXactId freezing
> > > 
> > > Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look
> > > into a multixact to check the members against cutoff_xid.
> > 
> > > !             /*
> > > !              * This is a multixact which is not marked LOCK_ONLY, but which
> > > !              * is newer than the cutoff_multi.  If the update_xid is below the
> > > !              * cutoff_xid point, then we can just freeze the Xmax in the
> > > !              * tuple, removing it altogether.  This seems simple, but there
> > > !              * are several underlying assumptions:
> > > !              *
> > > !              * 1. A tuple marked by an multixact containing a very old
> > > !              * committed update Xid would have been pruned away by vacuum; we
> > > !              * wouldn't be freezing this tuple at all.
> > > !              *
> > > !              * 2. There cannot possibly be any live locking members remaining
> > > !              * in the multixact.  This is because if they were alive, the
> > > !              * update's Xid would had been considered, via the lockers'
> > > !              * snapshot's Xmin, as part the cutoff_xid.
> > 
> > READ COMMITTED transactions can reset MyPgXact->xmin between commands,
> > defeating that assumption; see SnapshotResetXmin().  I have attached an
> > isolationtester spec demonstrating the problem.
> 
> Any idea how to cheat our way out of that one given the current way
> heap_freeze_tuple() works (running on both primary and standby)? My only
> idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> We can't even realistically create a new multixact with fewer members
> with the current format of xl_heap_freeze.

Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple?  We'd then ensure all update XID
consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax consumers are
already expected to check HEAP_XMAX_INVALID first.  Seems doable, albeit yet
another injection of complexity.

> > The test spec additionally
> > covers a (probably-related) assertion failure, new in 9.3.2.
> 
> Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
> last seems actually unrelated, I am not sure why it's 9.3.2
> only. Alvaro, are you looking?

(For clarity, the other problem demonstrated by the test spec is also a 9.3.2
regression.)

> > That was the only concrete runtime problem I found during a study of the
> > newest heap_freeze_tuple() and heap_tuple_needs_freeze() code.
> 
> I'd even be interested in fuzzy problems ;). If 9.3. wouldn't have been
> released the interactions between cutoff_xid/multi would have caused me
> to say "back to the drawing" board... I'm not suprised if further things
> are lurking there.

heap_freeze_tuple() of 9.2 had an XXX comment about the possibility of getting
spurious lock contention due to wraparound of the multixact space.  The
comment is gone, and that mechanism no longer poses a threat.  However, a
non-wrapped multixact containing wrapped locker XIDs (we don't freeze locker
XIDs, just updater XIDs) may cause similar spurious contention.

> +                /*
> +                 * The multixact has an update hidden within.  Get rid of it.
> +                 *
> +                 * If the update_xid is below the cutoff_xid, it necessarily
> +                 * must be an aborted transaction.  In a primary server, such
> +                 * an Xmax would have gotten marked invalid by
> +                 * HeapTupleSatisfiesVacuum, but in a replica that is not
> +                 * called before we are, so deal with it in the same way.
> +                 *
> +                 * If not below the cutoff_xid, then the tuple would have been
> +                 * pruned by vacuum, if the update committed long enough ago,
> +                 * and we wouldn't be freezing it; so it's either recently
> +                 * committed, or in-progress.  Deal with this by setting the
> +                 * Xmax to the update Xid directly and remove the IS_MULTI
> +                 * bit.  (We know there cannot be running lockers in this
> +                 * multi, because it's below the cutoff_multi value.)
> +                 */
> +
> +                if (TransactionIdPrecedes(update_xid, cutoff_xid))
> +                {
> +                    Assert(InRecovery || TransactionIdDidAbort(update_xid));
> +                    freeze_xmax = true;
> +                }
> +                else
> +                {
> +                    Assert(InRecovery || !TransactionIdIsInProgress(update_xid));

This assertion is at odds with the comment, but the assertion is okay for now.
If the updater is still in progress, its OldestMemberMXactId[] entry will have
held back cutoff_multi, and we won't be here.  Therefore, if we get here, the
tuple will always be HEAPTUPLE_RECENTLY_DEAD (recently-committed updater) or
HEAPTUPLE_LIVE (aborted updater, recent or not).

Numerous comments in the vicinity (e.g. ones at MultiXactStateData) reflect a
pre-9.3 world.  Most or all of that isn't new with the patch at hand, but it
does complicate study.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
>> The test spec additionally
>> covers a (probably-related) assertion failure, new in 9.3.2.

> Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
> last seems actually unrelated, I am not sure why it's 9.3.2
> only. Alvaro, are you looking?

Is this bad enough that we need to re-wrap the release?
        regards, tom lane



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-03 09:48:23 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
> >> The test spec additionally
> >> covers a (probably-related) assertion failure, new in 9.3.2.
> 
> > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
> > last seems actually unrelated, I am not sure why it's 9.3.2
> > only. Alvaro, are you looking?
> 
> Is this bad enough that we need to re-wrap the release?

Tentatively I'd say no, the only risk is loosing locks afaics. Thats
much bettter than corrupting rows as in 9.3.1. But I'll look into it in
a bit more detail as soon as I am of the phone call I am on.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-03 09:16:18 -0500, Noah Misch wrote:
> On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote:
> > On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
> > > On Sat, Nov 30, 2013 at 01:06:09AM +0000, Alvaro Herrera wrote:
> > Any idea how to cheat our way out of that one given the current way
> > heap_freeze_tuple() works (running on both primary and standby)? My only
> > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> > We can't even realistically create a new multixact with fewer members
> > with the current format of xl_heap_freeze.
> 
> Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple?  We'd then ensure all update XID
> consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax consumers are
> already expected to check HEAP_XMAX_INVALID first.  Seems doable, albeit yet
> another injection of complexity.

I think its pretty much checked that way already, but the problem seems
to be how to avoid checks on xid commit/abort in that case. I've
complained in 20131121200517.GM7240@alap2.anarazel.de that the old
pre-condition that multixacts aren't checked when they can't be relevant
(via OldestVisibleM*) isn't observed anymore.
So, if we re-introduce that condition again, we should be on the safe
side with that, right?

> > > The test spec additionally
> > > covers a (probably-related) assertion failure, new in 9.3.2.
> > 
> > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
> > last seems actually unrelated, I am not sure why it's 9.3.2
> > only. Alvaro, are you looking?
> 
> (For clarity, the other problem demonstrated by the test spec is also a 9.3.2
> regression.)

Yea, I just don't see why yet... Looking now.

> heap_freeze_tuple() of 9.2 had an XXX comment about the possibility of getting
> spurious lock contention due to wraparound of the multixact space.  The
> comment is gone, and that mechanism no longer poses a threat.  However, a
> non-wrapped multixact containing wrapped locker XIDs (we don't freeze locker
> XIDs, just updater XIDs) may cause similar spurious contention.

Yea, I noticed that that comment was missing as well. I think what we
should do now is to rework freezing in HEAD to make all this more
reasonable.

> Numerous comments in the vicinity (e.g. ones at MultiXactStateData) reflect a
> pre-9.3 world.  Most or all of that isn't new with the patch at hand, but it
> does complicate study.

Yea, Alvaro sent a patch for that somewhere, it seems a patch in the
series got lost when foreign key locks were originally applied.

I think we seriously need to put a good amount of work into the
multixact.c stuff in the next months. Otherwise it will be a maintenance
nightmore for a fair bit more time.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-03 09:16:18 -0500, Noah Misch wrote:
> > > The test spec additionally
> > > covers a (probably-related) assertion failure, new in 9.3.2.
> > 
> > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
> > last seems actually unrelated, I am not sure why it's 9.3.2
> > only. Alvaro, are you looking?
> 
> (For clarity, the other problem demonstrated by the test spec is also a 9.3.2
> regression.)

The backtrace for the Assert() you found is:

#4  0x00000000004f1da5 in CreateMultiXactId (nmembers=2, members=0x1ce17d8)   at
/home/andres/src/postgresql/src/backend/access/transam/multixact.c:708
#5  0x00000000004f1aeb in MultiXactIdExpand (multi=6241831, xid=6019366, status=MultiXactStatusUpdate)   at
/home/andres/src/postgresql/src/backend/access/transam/multixact.c:462
#6  0x00000000004a5d8e in compute_new_xmax_infomask (xmax=6241831, old_infomask=4416, old_infomask2=16386,
add_to_xmax=6019366,   mode=LockTupleExclusive, is_update=1 '\001', result_xmax=0x7fffca02a700,
result_infomask=0x7fffca02a6fe,   result_infomask2=0x7fffca02a6fc) at
/home/andres/src/postgresql/src/backend/access/heap/heapam.c:4651
#7  0x00000000004a2d27 in heap_update (relation=0x7f9fc45cc828, otid=0x7fffca02a8d0, newtup=0x1ce1740, cid=0,
crosscheck=0x0,   wait=1 '\001', hufd=0x7fffca02a850, lockmode=0x7fffca02a82c) at
/home/andres/src/postgresql/src/backend/access/heap/heapam.c:3304
#8  0x0000000000646f04 in ExecUpdate (tupleid=0x7fffca02a8d0, oldtuple=0x0, slot=0x1ce12c0, planSlot=0x1ce0740,
epqstate=0x1ce0120,   estate=0x1cdfe98, canSetTag=1 '\001') at
/home/andres/src/postgresql/src/backend/executor/nodeModifyTable.c:690

So imo it isn't really a new problem, it existed all along :/. We only
don't hit it in your terstcase before because we spuriously thought that
a tuple was in-progress if *any* member of the old multi were still
running in some cases instead of just the updater. But I am pretty sure
it can also reproduced in 9.3.1.

Imo the MultiXactIdSetOldestMember() call in heap_update() needs to be
moved outside of the if (satisfies_key). Everything else is vastly more
complex.
Alvaro, correct?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Noah Misch
Date:
On Tue, Dec 03, 2013 at 04:08:23PM +0100, Andres Freund wrote:
> On 2013-12-03 09:16:18 -0500, Noah Misch wrote:
> > On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote:
> > > On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
> > > > On Sat, Nov 30, 2013 at 01:06:09AM +0000, Alvaro Herrera wrote:
> > > Any idea how to cheat our way out of that one given the current way
> > > heap_freeze_tuple() works (running on both primary and standby)? My only
> > > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> > > We can't even realistically create a new multixact with fewer members
> > > with the current format of xl_heap_freeze.
> > 
> > Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple?  We'd then ensure all update XID
> > consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax consumers are
> > already expected to check HEAP_XMAX_INVALID first.  Seems doable, albeit yet
> > another injection of complexity.
> 
> I think its pretty much checked that way already, but the problem seems
> to be how to avoid checks on xid commit/abort in that case. I've
> complained in 20131121200517.GM7240@alap2.anarazel.de that the old
> pre-condition that multixacts aren't checked when they can't be relevant
> (via OldestVisibleM*) isn't observed anymore.
> So, if we re-introduce that condition again, we should be on the safe
> side with that, right?

What specific commit/abort checks do you have in mind?

> > > > The test spec additionally
> > > > covers a (probably-related) assertion failure, new in 9.3.2.
> > > 
> > > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
> > > last seems actually unrelated, I am not sure why it's 9.3.2
> > > only. Alvaro, are you looking?
> > 
> > (For clarity, the other problem demonstrated by the test spec is also a 9.3.2
> > regression.)
> 
> Yea, I just don't see why yet... Looking now.

Sorry, my original report was rather terse.  I speak of the scenario exercised
by the second permutation in that isolationtester spec.  The multixact is
later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all.  9.3.2
does freeze it to InvalidTransactionId per the code I cited in my first
response on this thread, which wrongly removes a key lock.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-03 10:29:54 -0500, Noah Misch wrote:
> Sorry, my original report was rather terse.  I speak of the scenario exercised
> by the second permutation in that isolationtester spec.  The multixact is
> later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all.  9.3.2
> does freeze it to InvalidTransactionId per the code I cited in my first
> response on this thread, which wrongly removes a key lock.

That one is clear, I was only confused about the Assert() you
reported. But I think I've explained that elsewhere.

I currently don't see fixing the errorneous freezing of lockers (not the
updater though) without changing the wal format or synchronously waiting
for all lockers to end. Which both see like a no-go?

While it's still a major bug it seems to still be much better than the
previous case of either inaccessible or reappearing rows.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
Hi,

On 2013-12-03 10:29:54 -0500, Noah Misch wrote:
> On Tue, Dec 03, 2013 at 04:08:23PM +0100, Andres Freund wrote:
> > On 2013-12-03 09:16:18 -0500, Noah Misch wrote:
> > > On Tue, Dec 03, 2013 at 11:56:07AM +0100, Andres Freund wrote:
> > > > On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
> > > > > On Sat, Nov 30, 2013 at 01:06:09AM +0000, Alvaro Herrera wrote:
> > > > Any idea how to cheat our way out of that one given the current way
> > > > heap_freeze_tuple() works (running on both primary and standby)? My only
> > > > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> > > > We can't even realistically create a new multixact with fewer members
> > > > with the current format of xl_heap_freeze.
> > > 
> > > Perhaps set HEAP_XMAX_LOCK_ONLY on the tuple?  We'd then ensure all update XID
> > > consumers check HEAP_XMAX_IS_LOCKED_ONLY() first, much like xmax consumers are
> > > already expected to check HEAP_XMAX_INVALID first.  Seems doable, albeit yet
> > > another injection of complexity.
> > 
> > I think its pretty much checked that way already, but the problem seems
> > to be how to avoid checks on xid commit/abort in that case. I've
> > complained in 20131121200517.GM7240@alap2.anarazel.de that the old
> > pre-condition that multixacts aren't checked when they can't be relevant
> > (via OldestVisibleM*) isn't observed anymore.
> > So, if we re-introduce that condition again, we should be on the safe
> > side with that, right?
> 
> What specific commit/abort checks do you have in mind?

MultiXactIdIsRunning() does a TransactionIdIsInProgress() for each
member which in turn does TransactionIdDidCommit(). Similar when locking
a tuple that's locked/updated without a multixact where we go for a
TransactionIdIsInProgress() in XactLockTableWait().

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
Hi Alvaro, Noah,


On 2013-12-03 15:57:10 +0100, Andres Freund wrote:
> On 2013-12-03 09:48:23 -0500, Tom Lane wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> > > On 2013-12-03 00:47:07 -0500, Noah Misch wrote:
> > >> The test spec additionally
> > >> covers a (probably-related) assertion failure, new in 9.3.2.
> > 
> > > Too bad it's too late to do anthing about it for 9.3.2. :(. At least the
> > > last seems actually unrelated, I am not sure why it's 9.3.2
> > > only. Alvaro, are you looking?
> > 
> > Is this bad enough that we need to re-wrap the release?
> 
> Tentatively I'd say no, the only risk is loosing locks afaics. Thats
> much bettter than corrupting rows as in 9.3.1. But I'll look into it in
> a bit more detail as soon as I am of the phone call I am on.

After looking, I think I am revising my opinion. The broken locking
behaviour (outside of freeze, which I don't see how we can fix in time),
is actually bad.
Would that stop us from making the release date with packages?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
>> On 2013-12-03 09:48:23 -0500, Tom Lane wrote:
>>> Is this bad enough that we need to re-wrap the release?

> After looking, I think I am revising my opinion. The broken locking
> behaviour (outside of freeze, which I don't see how we can fix in time),
> is actually bad.
> Would that stop us from making the release date with packages?

That's hardly answerable when you haven't specified how long you
think it'd take to fix.

In general, though, I'm going to be exceedingly unhappy if this release
introduces new regressions.  If we have to put off the release to fix
something, maybe we'd better do so.  And we'd damn well better get it
right this time.
        regards, tom lane



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-03 12:22:33 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> >> On 2013-12-03 09:48:23 -0500, Tom Lane wrote:
> >>> Is this bad enough that we need to re-wrap the release?
> 
> > After looking, I think I am revising my opinion. The broken locking
> > behaviour (outside of freeze, which I don't see how we can fix in time),
> > is actually bad.
> > Would that stop us from making the release date with packages?
> 
> That's hardly answerable when you haven't specified how long you
> think it'd take to fix.

There's two things that are broken as-is:
1) the freezing of multixacts: The new state is much better than the old one because the old one corrupted data, while
thenew one somes removes locks when you explicitly FREEZE.
 
2) The broken locking behaviour in Noah's test without the  FREEZE.

I don't see a realistic chance of fixing 1) in 9.3. Not even sure if it
can be done without changing the freeze WAL format.  But 2) should be fixed
and basically is a oneliner + comments + test. Alvaro?

> In general, though, I'm going to be exceedingly unhappy if this release
> introduces new regressions.  If we have to put off the release to fix
> something, maybe we'd better do so.  And we'd damn well better get it
> right this time.

I think that's really hard for the multixacts stuff. There's lots of
stuff not really okay in there, but we can't do much about that now :(

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> Any idea how to cheat our way out of that one given the current way
> heap_freeze_tuple() works (running on both primary and standby)? My only
> idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> We can't even realistically create a new multixact with fewer members
> with the current format of xl_heap_freeze.

Maybe we should just bite the bullet and change the WAL format for
heap_freeze (inventing an all-new record type, not repurposing the old
one, and allowing WAL replay to continue to accept the old one).  The
implication for users would be that they'd have to update slave servers
before the master when installing the update; which is unpleasant, but
better than living with a known data corruption case.
        regards, tom lane



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Magnus Hagander
Date:
On Tue, Dec 3, 2013 at 7:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@2ndquadrant.com> writes:
> Any idea how to cheat our way out of that one given the current way
> heap_freeze_tuple() works (running on both primary and standby)? My only
> idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> We can't even realistically create a new multixact with fewer members
> with the current format of xl_heap_freeze.

Maybe we should just bite the bullet and change the WAL format for
heap_freeze (inventing an all-new record type, not repurposing the old
one, and allowing WAL replay to continue to accept the old one).  The
implication for users would be that they'd have to update slave servers
before the master when installing the update; which is unpleasant, but
better than living with a known data corruption case.

Agreed. It may suck, but it sucks less.

How badly will it break if they do the upgrade in the wrong order though. Will the slaves just stop (I assume this?) or is there a risk of a wrong-order upgrade causing extra breakage? And if they do shut down, would just upgrading the slave fix it, or would they then have to rebuild the slave? (actually, don't we recommend they always rebuild the slave *anyway*? In which case the problem is even smaller..)

I think we've always told people to upgrade the slave first, and it's the logical thing that AFAIK most other systems require as well, so that's not an unreasonable requirement at all.

I assume we'd then get rid of the old record type completely in 9.4, right? 


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Noah Misch
Date:
On Tue, Dec 03, 2013 at 04:37:58PM +0100, Andres Freund wrote:
> On 2013-12-03 10:29:54 -0500, Noah Misch wrote:
> > Sorry, my original report was rather terse.  I speak of the scenario exercised
> > by the second permutation in that isolationtester spec.  The multixact is
> > later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all.  9.3.2
> > does freeze it to InvalidTransactionId per the code I cited in my first
> > response on this thread, which wrongly removes a key lock.
>
> That one is clear, I was only confused about the Assert() you
> reported. But I think I've explained that elsewhere.
>
> I currently don't see fixing the errorneous freezing of lockers (not the
> updater though) without changing the wal format or synchronously waiting
> for all lockers to end. Which both see like a no-go?

Not fixing it at all is the real no-go.  We'd take both of those undesirables
before just tolerating the lost locks in 9.3.

The attached patch illustrates the approach I was describing earlier.  It
fixes the test case discussed above.  I haven't verified that everything else
in the system is ready for it, so this is just for illustration purposes.

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Attachment

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > Any idea how to cheat our way out of that one given the current way
> > heap_freeze_tuple() works (running on both primary and standby)? My only
> > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> > We can't even realistically create a new multixact with fewer members
> > with the current format of xl_heap_freeze.
> 
> Maybe we should just bite the bullet and change the WAL format for
> heap_freeze (inventing an all-new record type, not repurposing the old
> one, and allowing WAL replay to continue to accept the old one).  The
> implication for users would be that they'd have to update slave servers
> before the master when installing the update; which is unpleasant, but
> better than living with a known data corruption case.

That was my suggestion too (modulo, I admit, the bit about it being a
new, separate record type.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Dec 3, 2013 at 7:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Maybe we should just bite the bullet and change the WAL format for
>> heap_freeze (inventing an all-new record type, not repurposing the old
>> one, and allowing WAL replay to continue to accept the old one).  The
>> implication for users would be that they'd have to update slave servers
>> before the master when installing the update; which is unpleasant, but
>> better than living with a known data corruption case.

> Agreed. It may suck, but it sucks less.

> How badly will it break if they do the upgrade in the wrong order though.
> Will the slaves just stop (I assume this?) or is there a risk of a
> wrong-order upgrade causing extra breakage?

I assume what would happen is the slave would PANIC upon seeing a WAL
record code it didn't recognize.  Installing the updated version should
allow it to resume functioning.  Would be good to test this, but if it
doesn't work like that, that'd be another bug to fix IMO.  We've always
foreseen the possible need to do something like this, so it ought to
work reasonably cleanly.

> I assume we'd then get rid of the old record type completely in 9.4, right?

Yeah, we should be able to drop it in 9.4, since we'll surely have other
WAL format changes anyway.
        regards, tom lane



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Magnus Hagander
Date:

On Tue, Dec 3, 2013 at 7:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Dec 3, 2013 at 7:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Maybe we should just bite the bullet and change the WAL format for
>> heap_freeze (inventing an all-new record type, not repurposing the old
>> one, and allowing WAL replay to continue to accept the old one).  The
>> implication for users would be that they'd have to update slave servers
>> before the master when installing the update; which is unpleasant, but
>> better than living with a known data corruption case.

> Agreed. It may suck, but it sucks less.

> How badly will it break if they do the upgrade in the wrong order though.
> Will the slaves just stop (I assume this?) or is there a risk of a
> wrong-order upgrade causing extra breakage?

I assume what would happen is the slave would PANIC upon seeing a WAL
record code it didn't recognize.  Installing the updated version should
allow it to resume functioning.  Would be good to test this, but if it
doesn't work like that, that'd be another bug to fix IMO.  We've always
foreseen the possible need to do something like this, so it ought to
work reasonably cleanly.

Yeah, as long as that's tested and actually works,  that sounds like an acceptable thing to deal with.
 
> I assume we'd then get rid of the old record type completely in 9.4, right?

Yeah, we should be able to drop it in 9.4, since we'll surely have other
WAL format changes anyway.

And even if not, there's no point in keeping it unless we actually support replication from 9.3 -> 9.4, I think, and I don't believe anybody has even considered working on that yet :) 

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-03 13:14:38 -0500, Noah Misch wrote:
> On Tue, Dec 03, 2013 at 04:37:58PM +0100, Andres Freund wrote:
> > On 2013-12-03 10:29:54 -0500, Noah Misch wrote:
> > > Sorry, my original report was rather terse.  I speak of the scenario exercised
> > > by the second permutation in that isolationtester spec.  The multixact is
> > > later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all.  9.3.2
> > > does freeze it to InvalidTransactionId per the code I cited in my first
> > > response on this thread, which wrongly removes a key lock.
> > 
> > That one is clear, I was only confused about the Assert() you
> > reported. But I think I've explained that elsewhere.
> > 
> > I currently don't see fixing the errorneous freezing of lockers (not the
> > updater though) without changing the wal format or synchronously waiting
> > for all lockers to end. Which both see like a no-go?
> 
> Not fixing it at all is the real no-go.  We'd take both of those undesirables
> before just tolerating the lost locks in 9.3.

I think it's changing the wal format then.

> The attached patch illustrates the approach I was describing earlier.  It
> fixes the test case discussed above.  I haven't verified that everything else
> in the system is ready for it, so this is just for illustration purposes.

That might be better than the current state because the potential damage
from such not frozen locks would be to get "could not access status of
transaction ..."  errors (*).
But the problem I see with it is that a FOR UPDATE/NO KEY UPDATE lock taken out by
UPDATE is different than the respective lock taken out by SELECT FOR
SHARE:
typedef enum
{MultiXactStatusForKeyShare = 0x00,MultiXactStatusForShare = 0x01,MultiXactStatusForNoKeyUpdate =
0x02,MultiXactStatusForUpdate= 0x03,/* an update that doesn't touch "key" columns */MultiXactStatusNoKeyUpdate =
0x04,/*other updates, and delete */MultiXactStatusUpdate = 0x05
 
} MultiXactStatus;

Ignoring the difference that way isn't going to fly nicely.

*: which already are possible because we do not check multis properly
against OldestVisibleMXactId[] anymore.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Noah Misch wrote:

> > On 2013-12-03 10:29:54 -0500, Noah Misch wrote:
> > > Sorry, my original report was rather terse.  I speak of the scenario exercised
> > > by the second permutation in that isolationtester spec.  The multixact is
> > > later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all.  9.3.2
> > > does freeze it to InvalidTransactionId per the code I cited in my first
> > > response on this thread, which wrongly removes a key lock.
> > 
> > That one is clear, I was only confused about the Assert() you
> > reported. But I think I've explained that elsewhere.
> > 
> > I currently don't see fixing the errorneous freezing of lockers (not the
> > updater though) without changing the wal format or synchronously waiting
> > for all lockers to end. Which both see like a no-go?
> 
> Not fixing it at all is the real no-go.  We'd take both of those undesirables
> before just tolerating the lost locks in 9.3.

Well, unless I misunderstand, this is only a problem in the case that
cutoff_multi is not yet past but cutoff_xid is; and that there are
locker transactions still running.  So it's really a corner case.  Not
saying it's impossible to hit, mind.

> The attached patch illustrates the approach I was describing earlier.  It
> fixes the test case discussed above.  I haven't verified that everything else
> in the system is ready for it, so this is just for illustration purposes.

Wow, this is scary.  I don't oppose it in principle, but we'd better go
over the whole thing once more to ensure every place that cares is
prepared to deal with this.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-03 13:11:13 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > Any idea how to cheat our way out of that one given the current way
> > heap_freeze_tuple() works (running on both primary and standby)? My only
> > idea was to MultiXactIdWait() if !InRecovery but that's extremly grotty.
> > We can't even realistically create a new multixact with fewer members
> > with the current format of xl_heap_freeze.
> 
> Maybe we should just bite the bullet and change the WAL format for
> heap_freeze (inventing an all-new record type, not repurposing the old
> one, and allowing WAL replay to continue to accept the old one).  The
> implication for users would be that they'd have to update slave servers
> before the master when installing the update; which is unpleasant, but
> better than living with a known data corruption case.

I wondered about that myself. How would you suggest the format to look
like?
ISTM per tuple we'd need:

* OffsetNumber off
* uint16 infomask
* TransactionId xmin
* TransactionId xmax

I don't see why we'd need infomask2, but so far being overly skimpy in
that place hasn't shown itself to be the greatest idea?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Andres Freund wrote:

> I wondered about that myself. How would you suggest the format to look
> like?
> ISTM per tuple we'd need:
> 
> * OffsetNumber off
> * uint16 infomask
> * TransactionId xmin
> * TransactionId xmax
> 
> I don't see why we'd need infomask2, but so far being overly skimpy in
> that place hasn't shown itself to be the greatest idea?

I don't see that we need the xmin; a simple bit flag indicating whether
the Xmin was frozen should be enough.

For xmax we need more detail, as you propose.  In infomask, are you
proposing to store the complete infomask, or just the flags being
changed?  Note we have a set of bits used in other wal records,
XLHL_XMAX_IS_MULTI and friends, which perhaps we can use here.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Noah Misch
Date:
On Tue, Dec 03, 2013 at 07:26:38PM +0100, Andres Freund wrote:
> On 2013-12-03 13:14:38 -0500, Noah Misch wrote:
> > On Tue, Dec 03, 2013 at 04:37:58PM +0100, Andres Freund wrote:
> > > I currently don't see fixing the errorneous freezing of lockers (not the
> > > updater though) without changing the wal format or synchronously waiting
> > > for all lockers to end. Which both see like a no-go?
> > 
> > Not fixing it at all is the real no-go.  We'd take both of those undesirables
> > before just tolerating the lost locks in 9.3.
> 
> I think it's changing the wal format then.

I'd rather have an readily-verifiable fix that changes WAL format than a
tricky fix that avoids doing so.  So, modulo not having seen the change, +1.

> > The attached patch illustrates the approach I was describing earlier.  It
> > fixes the test case discussed above.  I haven't verified that everything else
> > in the system is ready for it, so this is just for illustration purposes.
> 
> That might be better than the current state because the potential damage
> from such not frozen locks would be to get "could not access status of
> transaction ..."  errors (*).

> *: which already are possible because we do not check multis properly
> against OldestVisibleMXactId[] anymore.

Separate issue.  That patch adds to the ways we can end up with an extant
multixact referencing an locker XID no longer found it CLOG, but it doesn't
introduce that problem.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-03 13:49:49 -0500, Noah Misch wrote:
> On Tue, Dec 03, 2013 at 07:26:38PM +0100, Andres Freund wrote:
> > On 2013-12-03 13:14:38 -0500, Noah Misch wrote:
> > > Not fixing it at all is the real no-go.  We'd take both of those undesirables
> > > before just tolerating the lost locks in 9.3.
> > 
> > I think it's changing the wal format then.
> 
> I'd rather have an readily-verifiable fix that changes WAL format than a
> tricky fix that avoids doing so.  So, modulo not having seen the change, +1.

Well, who's going to write that then? I can write something up, but I
really would not like not to be solely responsible for it.

That means we cannot release 9.3 on schedule anyway, right?

> > > The attached patch illustrates the approach I was describing earlier.  It
> > > fixes the test case discussed above.  I haven't verified that everything else
> > > in the system is ready for it, so this is just for illustration purposes.
> > >
> > That might be better than the current state because the potential damage
> > from such not frozen locks would be to get "could not access status of
> > transaction ..."  errors (*).
> >
> > *: which already are possible because we do not check multis properly
> > against OldestVisibleMXactId[] anymore.
>
> Separate issue.  That patch adds to the ways we can end up with an extant
> multixact referencing an locker XID no longer found it CLOG, but it doesn't
> introduce that problem.

Sure, that was an argument in favor of your idea, not against it ;).

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-03 15:40:44 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> 
> > I wondered about that myself. How would you suggest the format to look
> > like?
> > ISTM per tuple we'd need:
> > 
> > * OffsetNumber off
> > * uint16 infomask
> > * TransactionId xmin
> > * TransactionId xmax
> > 
> > I don't see why we'd need infomask2, but so far being overly skimpy in
> > that place hasn't shown itself to be the greatest idea?

> I don't see that we need the xmin; a simple bit flag indicating whether
> the Xmin was frozen should be enough.

Yea, that would work as well.

> For xmax we need more detail, as you propose.  In infomask, are you
> proposing to store the complete infomask, or just the flags being
> changed?  Note we have a set of bits used in other wal records,
> XLHL_XMAX_IS_MULTI and friends, which perhaps we can use here.

Tentatively the complete one. I don't think we'd win enough by using
compute_infobits/fix_infomask_from_infobits and we'd need to extend the
bits stored in there unless we are willing to live with not transporting
XMIN/XMAX_COMMITTED which doesn't seem like a good idea.

Btw, why is it currently ok to modify the tuple in heap_freeze_tuple()
without being in a critical section?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2013-12-03 13:49:49 -0500, Noah Misch wrote:
> > On Tue, Dec 03, 2013 at 07:26:38PM +0100, Andres Freund wrote:
> > > On 2013-12-03 13:14:38 -0500, Noah Misch wrote:
> > > > Not fixing it at all is the real no-go.  We'd take both of those undesirables
> > > > before just tolerating the lost locks in 9.3.
> > > 
> > > I think it's changing the wal format then.
> > 
> > I'd rather have an readily-verifiable fix that changes WAL format than a
> > tricky fix that avoids doing so.  So, modulo not having seen the change, +1.
> 
> Well, who's going to write that then? I can write something up, but I
> really would not like not to be solely responsible for it.

I will give this a shot.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Tue, Dec 03, 2013 at 07:26:38PM +0100, Andres Freund wrote:
>> On 2013-12-03 13:14:38 -0500, Noah Misch wrote:
>>> On Tue, Dec 03, 2013 at 04:37:58PM +0100, Andres Freund wrote:
>>>> I currently don't see fixing the errorneous freezing of lockers (not the
>>>> updater though) without changing the wal format or synchronously waiting
>>>> for all lockers to end. Which both see like a no-go?
> 
>>> Not fixing it at all is the real no-go.  We'd take both of those undesirables
>>> before just tolerating the lost locks in 9.3.

>> I think it's changing the wal format then.

> I'd rather have an readily-verifiable fix that changes WAL format than a
> tricky fix that avoids doing so.  So, modulo not having seen the change, +1.

Yeah, same here.

After some discussion, the core committee has concluded that we should go
ahead with the already-wrapped releases.  9.2.6 and below are good anyway,
and despite this issue 9.3.2 is an improvement over 9.3.1.  We'll plan to
do a 9.3.3 as soon as the multixact situation can be straightened out;
but let's learn from experience and not try to fix it in a panic.
        regards, tom lane



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Tom Lane wrote:

> After some discussion, the core committee has concluded that we should go
> ahead with the already-wrapped releases.  9.2.6 and below are good anyway,
> and despite this issue 9.3.2 is an improvement over 9.3.1.  We'll plan to
> do a 9.3.3 as soon as the multixact situation can be straightened out;
> but let's learn from experience and not try to fix it in a panic.

I would suggest we include this one fix in 9.3.2a.  This bug is more
serious than the others because it happens because of checking HTSU on a
tuple containing running locker-only transactions and an aborted update.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> After some discussion, the core committee has concluded that we should go
>> ahead with the already-wrapped releases.  9.2.6 and below are good anyway,
>> and despite this issue 9.3.2 is an improvement over 9.3.1.  We'll plan to
>> do a 9.3.3 as soon as the multixact situation can be straightened out;
>> but let's learn from experience and not try to fix it in a panic.

> I would suggest we include this one fix in 9.3.2a.  This bug is more
> serious than the others because it happens because of checking HTSU on a
> tuple containing running locker-only transactions and an aborted update.

The effect is just that the lockers could lose their locks early, right?
While that's annoying, it's not *directly* a data corruption problem.
And I've lost any enthusiasm I might've had for quick fixes in this area.
I think it'd be better to wait a few days, think this over, and get the
other problem fixed as well.

In any case, I think we're already past the point where we could re-wrap
9.3.2; the tarballs have been in the hands of packagers for > 24 hours.
We'd have to call it 9.3.3.
        regards, tom lane



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Noah Misch wrote:
> On Tue, Dec 03, 2013 at 04:08:23PM +0100, Andres Freund wrote:

> > > (For clarity, the other problem demonstrated by the test spec is also a 9.3.2
> > > regression.)
> > 
> > Yea, I just don't see why yet... Looking now.
> 
> Sorry, my original report was rather terse.  I speak of the scenario exercised
> by the second permutation in that isolationtester spec.  The multixact is
> later than VACUUM's cutoff_multi, so 9.3.1 does not freeze it at all.  9.3.2
> does freeze it to InvalidTransactionId per the code I cited in my first
> response on this thread, which wrongly removes a key lock.

Attached is a patch to fix it.  It's a simple fix, really, but it
reverts the delete-abort-savept test changes we did in 1ce150b7bb.
(This is a more complete version of a patch I posted elsewhere in this
thread as a reply to Tom.)

I added a new isolation spec to test this specific case, and noticed
something that seems curious to me when that test is run in REPEATABLE
READ mode: when the UPDATE is aborted, the concurrent FOR UPDATE gets a
"can't serialize due to concurrent update", but when the UPDATE is
committed, FOR UPDATE works fine.  Shouldn't it happen pretty much
exactly the other way around?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-03 15:46:09 -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > I'd rather have an readily-verifiable fix that changes WAL format than a
> > tricky fix that avoids doing so.  So, modulo not having seen the change, +1.
>
> Yeah, same here.

I am afraid it won't be *that* simple. We still need code to look into
multis, check whether all members are ok wrt. cutoff_xid and replace
them, either by the contained xid, or by a new multi with the still
living members. Ugly.

There's currently also the issue that heap_freeze_tuple() modifies the
tuple inplace without a critical section. We're executing a
HeapTupleSatisfiesVacuum() before we get to WAL logging things, that has
plenty of rope to hang itself on. So that doesn't really seem ok to me?

Attached is a pre-pre alpha patch for this. To fix the issue with the
missing critical section it splits freezing into
heap_prepare_freeze_tuple() and heap_execute_freeze_tuple(). The former
doesn't touch tuples and is executed on the primary, and the second
actually peforms the modifications and is executed both, during normal
processing and recovery.

Needs a fair bit of work:
* Should move parts of the multixact processing into multixact.c,
  specifically it shouldn't require CreateMultiXactId() to be exported.
* it relies on forward-declaring a struct in heapam.h that's actually
  defined heapam_xlog.h - that's pretty ugly.
* any form of testing but make check/isolationcheck across SR.
* lots of the comments around need to be added/reworked
* has a simpler version of Alvaro's patch to HTSV in there

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> Attached is a patch to fix it.


--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-03 19:55:40 -0300, Alvaro Herrera wrote:
> I added a new isolation spec to test this specific case, and noticed
> something that seems curious to me when that test is run in REPEATABLE
> READ mode: when the UPDATE is aborted, the concurrent FOR UPDATE gets a
> "can't serialize due to concurrent update", but when the UPDATE is
> committed, FOR UPDATE works fine.  Shouldn't it happen pretty much
> exactly the other way around?

That's 247c76a989097f1b4ab6fae898f24e75aa27fc1b . Specifically the
DidCommit() branch in test_lockmode_for_conflict(). You forgot something
akin to    /* locker has finished, all good to go */    if (!ISUPDATE_from_mxstatus(status))        return
HeapTupleMayBeUpdated;

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Magnus Hagander
Date:

On Tue, Dec 3, 2013 at 7:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Dec 3, 2013 at 7:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Maybe we should just bite the bullet and change the WAL format for
>> heap_freeze (inventing an all-new record type, not repurposing the old
>> one, and allowing WAL replay to continue to accept the old one).  The
>> implication for users would be that they'd have to update slave servers
>> before the master when installing the update; which is unpleasant, but
>> better than living with a known data corruption case.

> Agreed. It may suck, but it sucks less.

> How badly will it break if they do the upgrade in the wrong order though.
> Will the slaves just stop (I assume this?) or is there a risk of a
> wrong-order upgrade causing extra breakage?

I assume what would happen is the slave would PANIC upon seeing a WAL
record code it didn't recognize.  Installing the updated version should
allow it to resume functioning.  Would be good to test this, but if it
doesn't work like that, that'd be another bug to fix IMO.  We've always
foreseen the possible need to do something like this, so it ought to
work reasonably cleanly.


I wonder if we should for the future have the START_REPLICATION command (or the IDENTIFY_SYSTEM would probably make more sense - or even adding a new command like IDENTIFY_CLIENT. The point is, something in the replication protocol) have walreceiver include it's version sent to the master. That way we could have the walsender identify a walreceiver that's too old and disconnect it right away - with a much  nicer error message than a PANIC. Right now, walreceiver knows the version of the walsender (through pqserverversion), but AFAICT there is no way for the walsender to know which version of the receiver is connected. 

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Dec 3, 2013 at 7:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I assume what would happen is the slave would PANIC upon seeing a WAL
>> record code it didn't recognize.

> I wonder if we should for the future have the START_REPLICATION command (or
> the IDENTIFY_SYSTEM would probably make more sense - or even adding a new
> command like IDENTIFY_CLIENT. The point is, something in the replication
> protocol) have walreceiver include it's version sent to the master. That
> way we could have the walsender identify a walreceiver that's too old and
> disconnect it right away - with a much  nicer error message than a PANIC.

Meh.  That only helps for the case of streaming replication, and not for
the thirty-seven other ways that some WAL might arrive at something that
wants to replay it.

It might be worth doing anyway, but I can't get excited about it for this
scenario.
        regards, tom lane



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Magnus Hagander
Date:
On Wed, Dec 4, 2013 at 8:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Dec 3, 2013 at 7:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I assume what would happen is the slave would PANIC upon seeing a WAL
>> record code it didn't recognize.

> I wonder if we should for the future have the START_REPLICATION command (or
> the IDENTIFY_SYSTEM would probably make more sense - or even adding a new
> command like IDENTIFY_CLIENT. The point is, something in the replication
> protocol) have walreceiver include it's version sent to the master. That
> way we could have the walsender identify a walreceiver that's too old and
> disconnect it right away - with a much  nicer error message than a PANIC.

Meh.  That only helps for the case of streaming replication, and not for
the thirty-seven other ways that some WAL might arrive at something that
wants to replay it.

It might be worth doing anyway, but I can't get excited about it for this
scenario.

It does, but I bet it's one of the by far most common cases. I'd say it's that one and restore-from-backup that would cover a huge majority of all cases. If we can cover those, we don't have to be perfect - so unless it turns out to be ridiculously complicated, I think it would be worthwhile having. 

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2013-12-03 19:55:40 -0300, Alvaro Herrera wrote:
> > I added a new isolation spec to test this specific case, and noticed
> > something that seems curious to me when that test is run in REPEATABLE
> > READ mode: when the UPDATE is aborted, the concurrent FOR UPDATE gets a
> > "can't serialize due to concurrent update", but when the UPDATE is
> > committed, FOR UPDATE works fine.  Shouldn't it happen pretty much
> > exactly the other way around?
>
> That's 247c76a989097f1b4ab6fae898f24e75aa27fc1b . Specifically the
> DidCommit() branch in test_lockmode_for_conflict(). You forgot something
> akin to
>         /* locker has finished, all good to go */
>         if (!ISUPDATE_from_mxstatus(status))
>             return HeapTupleMayBeUpdated;

So I did.  Here are two patches, one to fix this issue, and the other to
fix the issue above.  I intend to apply these two to 9.3 and master, and
then apply your freeze fix on top (which I'm cleaning up a bit -- will
resend later.)

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
Hi,

On 2013-12-05 10:42:35 -0300, Alvaro Herrera wrote:
> I intend to apply these two to 9.3 and master, and
> then apply your freeze fix on top (which I'm cleaning up a bit -- will
> resend later.)

I sure hope it get's cleaned up - it's an evening's hack, with a good
glass of wine ontop. Do you agree with the general direction?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Here's a revamped version of this patch.  One thing I didn't do here is
revert the exporting of CreateMultiXactId, but I don't see any way to
avoid that.

Andres mentioned the idea of sharing some code between
heap_prepare_freeze_tuple and heap_tuple_needs_freeze, but I haven't
explored that.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-09 19:14:58 -0300, Alvaro Herrera wrote:
> Here's a revamped version of this patch.  One thing I didn't do here is
> revert the exporting of CreateMultiXactId, but I don't see any way to
> avoid that.

I don't so much have a problem with exporting CreateMultiXactId(), just
with exporting it under its current name. It's already quirky to have
both MultiXactIdCreate and CreateMultiXactId() in multixact.c but
exporting it imo goes to far.

> Andres mentioned the idea of sharing some code between
> heap_prepare_freeze_tuple and heap_tuple_needs_freeze, but I haven't
> explored that.

My idea would just be to have heap_tuple_needs_freeze() call
heap_prepare_freeze_tuple() and check whether it returns true. Yes,
that's slightly more expensive than the current
heap_tuple_needs_freeze(), but it's only called when we couldn't get a
cleanup lock on a page, so that seems ok.


> ! static TransactionId
> ! FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
> !                   TransactionId cutoff_xid, MultiXactId cutoff_multi,
> !                   uint16 *flags)
> ! {

> !     if (!MultiXactIdIsValid(multi))
> !     {
> !         /* Ensure infomask bits are appropriately set/reset */
> !         *flags |= FRM_INVALIDATE_XMAX;
> !         return InvalidTransactionId;
> !     }

Maybe comment that we're sure to be only called when IS_MULTI is set,
which will be unset by FRM_INVALIDATE_XMAX? I wondered twice whether we
wouldn't just continually mark the buffer dirty this way.

> !     else if (MultiXactIdPrecedes(multi, cutoff_multi))
> !     {
> !         /*
> !          * This old multi cannot possibly have members still running.  If it
> !          * was a locker only, it can be removed without any further
> !          * consideration; but if it contained an update, we might need to
> !          * preserve it.
> !          */
> !         if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
> !         {
> !             *flags |= FRM_INVALIDATE_XMAX;
> !             return InvalidTransactionId;

Cna you place an Assert(!MultiXactIdIsRunning(multi)) here?

> !         if (ISUPDATE_from_mxstatus(members[i].status) &&
> !             !TransactionIdDidAbort(members[i].xid))#

It makes me wary to see a DidAbort() without a previous InProgress()
call.
Also, after we crashed, doesn't DidAbort() possibly return false for
transactions that were in progress before we crashed? At least that's
how I always understood it, and how tqual.c is written.

> !         /* We only keep lockers if they are still running */
> !         if (TransactionIdIsCurrentTransactionId(members[i].xid) ||

I don't think there's a need to check for
TransactionIdIsCurrentTransactionId() - vacuum can explicitly *not* be
run inside a transaction.


> ***************
> *** 5443,5458 **** heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
>                * xvac transaction succeeded.
>                */
>               if (tuple->t_infomask & HEAP_MOVED_OFF)
> !                 HeapTupleHeaderSetXvac(tuple, InvalidTransactionId);
>               else
> !                 HeapTupleHeaderSetXvac(tuple, FrozenTransactionId);
>   
>               /*
>                * Might as well fix the hint bits too; usually XMIN_COMMITTED
>                * will already be set here, but there's a small chance not.
>                */
>               Assert(!(tuple->t_infomask & HEAP_XMIN_INVALID));
> !             tuple->t_infomask |= HEAP_XMIN_COMMITTED;
>               changed = true;
>           }
>       }
> --- 5571,5586 ----
>                * xvac transaction succeeded.
>                */
>               if (tuple->t_infomask & HEAP_MOVED_OFF)
> !                 frz->frzflags |= XLH_FREEZE_XVAC;
>               else
> !                 frz->frzflags |= XLH_INVALID_XVAC;
>   

Hm. Isn't this case inverted? I.e. shouldn't you set XLH_FREEZE_XVAC and
XLH_INVALID_XVAC exactly the other way round? I really don't understand
the moved in/off, since the code has been gone longer than I've followed
the code...

*** a/src/backend/access/rmgrdesc/mxactdesc.c
> --- b/src/backend/access/rmgrdesc/mxactdesc.c
> ***************
> *** 41,47 **** out_member(StringInfo buf, MultiXactMember *member)
>               appendStringInfoString(buf, "(upd) ");
>               break;
>           default:
> !             appendStringInfoString(buf, "(unk) ");
>               break;
>       }
>   }
> --- 41,47 ----
>               appendStringInfoString(buf, "(upd) ");
>               break;
>           default:
> !             appendStringInfo(buf, "(unk) ", member->status);
>               break;
>       }
>   }

That change doesn't look like it will do anything?


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2013-12-09 19:14:58 -0300, Alvaro Herrera wrote:
> > Here's a revamped version of this patch.  One thing I didn't do here is
> > revert the exporting of CreateMultiXactId, but I don't see any way to
> > avoid that.
>
> I don't so much have a problem with exporting CreateMultiXactId(), just
> with exporting it under its current name. It's already quirky to have
> both MultiXactIdCreate and CreateMultiXactId() in multixact.c but
> exporting it imo goes to far.

MultiXactidCreateFromMembers(int, MultiXactMembers *) ?

> > Andres mentioned the idea of sharing some code between
> > heap_prepare_freeze_tuple and heap_tuple_needs_freeze, but I haven't
> > explored that.
>
> My idea would just be to have heap_tuple_needs_freeze() call
> heap_prepare_freeze_tuple() and check whether it returns true. Yes,
> that's slightly more expensive than the current
> heap_tuple_needs_freeze(), but it's only called when we couldn't get a
> cleanup lock on a page, so that seems ok.

Doesn't seem a completely bad idea, but let's leave it for a separate
patch.  This should be changed in master only IMV anyway, while the rest
of this patch is to be backpatched to 9.3.

> > !     if (!MultiXactIdIsValid(multi))
> > !     {
> > !         /* Ensure infomask bits are appropriately set/reset */
> > !         *flags |= FRM_INVALIDATE_XMAX;
> > !         return InvalidTransactionId;
> > !     }
>
> Maybe comment that we're sure to be only called when IS_MULTI is set,
> which will be unset by FRM_INVALIDATE_XMAX? I wondered twice whether we
> wouldn't just continually mark the buffer dirty this way.

Done.

> > !     else if (MultiXactIdPrecedes(multi, cutoff_multi))
> > !     {
> > !         /*
> > !          * This old multi cannot possibly have members still running.  If it

> Cna you place an Assert(!MultiXactIdIsRunning(multi)) here?

Done.


> > !         if (ISUPDATE_from_mxstatus(members[i].status) &&
> > !             !TransactionIdDidAbort(members[i].xid))#
>
> It makes me wary to see a DidAbort() without a previous InProgress()
> call.  Also, after we crashed, doesn't DidAbort() possibly return
> false for transactions that were in progress before we crashed? At
> least that's how I always understood it, and how tqual.c is written.

Yes, that's correct.  But note that here we're not doing a tuple
liveliness test, which is what tqual.c is doing.  What we do with this
info is to keep the Xid as part of the multi if it's still running or
committed.  We also keep it if the xact crashed, which is fine because
the Xid will be removed by some later step.  If we know for certain that
the update Xid is aborted, then we can ignore it, but this is just an
optimization and not needed for correctness.

That loop had a bug, so I restructured it.  (If the update xact had
aborted we wouldn't get to the "continue" and thus would treat it as a
locker-only.  I don't think that behavior would cause any visible
misbehavior but it's wrong nonetheless.)

One interesting bit is that we might end up creating singleton
MultiXactIds when freezing, if there's no updater and there's a running
locker.  We could avoid this (i.e. mark the tuple as locked by a single
Xid) but it would complicate FreezeMultiXactId's API and it's unlikely
to occur with any frequency anyway.

> > !         /* We only keep lockers if they are still running */
> > !         if (TransactionIdIsCurrentTransactionId(members[i].xid) ||
>
> I don't think there's a need to check for
> TransactionIdIsCurrentTransactionId() - vacuum can explicitly *not* be
> run inside a transaction.

Keep in mind that freezing can also happen for tuples handled during a
table-rewrite operation such as CLUSTER.  I wouldn't place a bet that
you can't have a multi created by a transaction and later run cluster in
the same table in the same transaction.  Maybe this is fine because of
the fact that at that point we're holding an exclusive lock in the
table, but it seems fragile.  And the test is cheap anyway.

> > --- 5571,5586 ----
> >                * xvac transaction succeeded.
> >                */
> >               if (tuple->t_infomask & HEAP_MOVED_OFF)
> > !                 frz->frzflags |= XLH_FREEZE_XVAC;
> >               else
> > !                 frz->frzflags |= XLH_INVALID_XVAC;
> >
>
> Hm. Isn't this case inverted? I.e. shouldn't you set XLH_FREEZE_XVAC and
> XLH_INVALID_XVAC exactly the other way round? I really don't understand
> the moved in/off, since the code has been gone longer than I've followed
> the code...

Yep, fixed.

> > --- b/src/backend/access/rmgrdesc/mxactdesc.c
> > ***************
> > *** 41,47 **** out_member(StringInfo buf, MultiXactMember *member)
> >               appendStringInfoString(buf, "(upd) ");
> >               break;
> >           default:
> > !             appendStringInfoString(buf, "(unk) ");
> >               break;
> >       }
> >   }
> > --- 41,47 ----
> >               appendStringInfoString(buf, "(upd) ");
> >               break;
> >           default:
> > !             appendStringInfo(buf, "(unk) ", member->status);
> >               break;
> >       }
> >   }
>
> That change doesn't look like it will do anything?

Meh.  That was a leftover --- removed.  (I was toying with the "desc"
code because it misbehaves when applied on records as they are created,
as opposed to being applied on records as they are replayed.  I'm pretty
sure everyone already knows about this, and it's the reason why
everybody has skimped from examining arrays of things stored in followup
data records.  I was naive enough to write code that tries to decode the
followup record that contains the members of the multiaxct we're
creating, which works fine during replay but gets them completely wrong
during regular operation.  This is the third time I'm surprised by this
misbehavior; blame my bad memory for not remembering that it's not
supposed to work in the first place.)


Right now there is one case in this code that returns
FRM_INVALIDATE_XMAX when it's not strictly necessary, i.e. it would also
work to keep the Multi as is and return FRM_NOOP instead; and it also
returns FRM_NOOP in one case when we could return FRM_INVALIDATE_XMAX
instead.  Neither does any great damage, but there is a consideration
that future examiners of the tuple would have to resolve the MultiXact
by themselves (==> performance hit).  On the other hand, returning
INVALIDATE causes the block to be dirtied, which is undesirable if not
already dirty.  Maybe this can be optimized so that we return a separate
flag from FreezeMultiXactId when Xmax invalidation is optional, so that
we execute all such operations if and only if the block is already dirty
or being dirtied for other reasons.  That would provide the cleanup for
later onlookers, while not causing an unnecessary dirtying.


Attached are patches for this, for both 9.3 and master.  The 9.3 patch
keeps the original FREEZE record; I have tested that an unpatched
replica dies with:

PANIC:  heap2_redo: unknown op code 32
CONTEXTO:  xlog redo UNKNOWN
LOG:  proceso de inicio (PID 316) fue terminado por una señal 6: Aborted

when the master is running the new code.  The message is ugly, but I
don't see any way to fix that.

For the master branch, I have removed the original FREEZE record
definition completely and bumped XLOG_PAGE_MAGIC.  This doesn't pose a
problem given that we have no replication between different major
versions.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-11 14:00:05 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2013-12-09 19:14:58 -0300, Alvaro Herrera wrote:
> > I don't so much have a problem with exporting CreateMultiXactId(), just
> > with exporting it under its current name. It's already quirky to have
> > both MultiXactIdCreate and CreateMultiXactId() in multixact.c but
> > exporting it imo goes to far.
> 
> MultiXactidCreateFromMembers(int, MultiXactMembers *) ?

Works for me.

> > > Andres mentioned the idea of sharing some code between
> > > heap_prepare_freeze_tuple and heap_tuple_needs_freeze, but I haven't
> > > explored that.
> > 
> > My idea would just be to have heap_tuple_needs_freeze() call
> > heap_prepare_freeze_tuple() and check whether it returns true. Yes,
> > that's slightly more expensive than the current
> > heap_tuple_needs_freeze(), but it's only called when we couldn't get a
> > cleanup lock on a page, so that seems ok.
> 
> Doesn't seem a completely bad idea, but let's leave it for a separate
> patch.  This should be changed in master only IMV anyway, while the rest
> of this patch is to be backpatched to 9.3.

I am not so sure it shouldn't be backpatched together with this. We now
have similar complex logic in both functions.

> > > !         if (ISUPDATE_from_mxstatus(members[i].status) &&
> > > !             !TransactionIdDidAbort(members[i].xid))#
> > 
> > It makes me wary to see a DidAbort() without a previous InProgress()
> > call.  Also, after we crashed, doesn't DidAbort() possibly return
> > false for transactions that were in progress before we crashed? At
> > least that's how I always understood it, and how tqual.c is written.
> 
> Yes, that's correct.  But note that here we're not doing a tuple
> liveliness test, which is what tqual.c is doing.  What we do with this
> info is to keep the Xid as part of the multi if it's still running or
> committed.  We also keep it if the xact crashed, which is fine because
> the Xid will be removed by some later step.  If we know for certain that
> the update Xid is aborted, then we can ignore it, but this is just an
> optimization and not needed for correctness.

But why deviate that way? It doesn't seem to save us much?

> One interesting bit is that we might end up creating singleton
> MultiXactIds when freezing, if there's no updater and there's a running
> locker.  We could avoid this (i.e. mark the tuple as locked by a single
> Xid) but it would complicate FreezeMultiXactId's API and it's unlikely
> to occur with any frequency anyway.

Yea, that seems completely fine.

> > I don't think there's a need to check for
> > TransactionIdIsCurrentTransactionId() - vacuum can explicitly *not* be
> > run inside a transaction.
> 
> Keep in mind that freezing can also happen for tuples handled during a
> table-rewrite operation such as CLUSTER.

Good point.

> > >               if (tuple->t_infomask & HEAP_MOVED_OFF)
> > > !                 frz->frzflags |= XLH_FREEZE_XVAC;
> > >               else
> > > !                 frz->frzflags |= XLH_INVALID_XVAC;
> > >   
> > 
> > Hm. Isn't this case inverted? I.e. shouldn't you set XLH_FREEZE_XVAC and
> > XLH_INVALID_XVAC exactly the other way round? I really don't understand
> > the moved in/off, since the code has been gone longer than I've followed
> > the code...
> 
> Yep, fixed.

I wonder how many of the HEAP_MOVED_* cases around are actually
correct... What was the last version those were generated? 8.4?

> (I was toying with the "desc"
> code because it misbehaves when applied on records as they are created,
> as opposed to being applied on records as they are replayed.  I'm pretty
> sure everyone already knows about this, and it's the reason why
> everybody has skimped from examining arrays of things stored in followup
> data records.  I was naive enough to write code that tries to decode the
> followup record that contains the members of the multiaxct we're
> creating, which works fine during replay but gets them completely wrong
> during regular operation.  This is the third time I'm surprised by this
> misbehavior; blame my bad memory for not remembering that it's not
> supposed to work in the first place.)

I am not really sure what you are talking about. That you cannot
properly decode records before they have been processed by XLogInsert()?
If so, yes, that's pretty clear and I am pretty sure it will break in
lots of places if you try?

> Right now there is one case in this code that returns
> FRM_INVALIDATE_XMAX when it's not strictly necessary, i.e. it would also
> work to keep the Multi as is and return FRM_NOOP instead; and it also
> returns FRM_NOOP in one case when we could return FRM_INVALIDATE_XMAX
> instead.  Neither does any great damage, but there is a consideration
> that future examiners of the tuple would have to resolve the MultiXact
> by themselves (==> performance hit).  On the other hand, returning
> INVALIDATE causes the block to be dirtied, which is undesirable if not
> already dirty.

Otherwise it will be marked dirty the next time reads the page, so I
don't think this is problematic.

> !     {
> !         if (ISUPDATE_from_mxstatus(members[i].status))
> !         {
> !             /*
> !              * It's an update; should we keep it?  If the transaction is known
> !              * aborted then it's okay to ignore it, otherwise not.  (Note this
> !              * is just an optimization and not needed for correctness, so it's
> !              * okay to get this test wrong; for example, in case an updater is
> !              * crashed, or a running transaction in the process of aborting.)
> !              */
> !             if (!TransactionIdDidAbort(members[i].xid))
> !             {
> !                 newmembers[nnewmembers++] = members[i];
> !                 Assert(!TransactionIdIsValid(update_xid));
> ! 
> !                 /*
> !                  * Tell caller to set HEAP_XMAX_COMMITTED hint while we have
> !                  * the Xid in cache.  Again, this is just an optimization, so
> !                  * it's not a problem if the transaction is still running and
> !                  * in the process of committing.
> !                  */
> !                 if (TransactionIdDidCommit(update_xid))
> !                     update_committed = true;
> ! 
> !                 update_xid = newmembers[i].xid;
> !             }

I don't think the conclusions here are correct - we might be setting
HEAP_XMAX_COMMITTED a smudge to early that way. If the updating
transaction is in progress, there's the situation that we have updated
the clog, but have not yet removed ourselves from the procarray. I.e. a
situation in which TransactionIdIsInProgress() and
TransactionIdDidCommit() both return true. Afaik it is only correct to
set HEAP_XMAX_COMMITTED once TransactionIdIsInProgress() returns false.

> !             /*
> !              * Checking for very old update Xids is critical: if the update
> !              * member of the multi is older than cutoff_xid, we must remove
> !              * it, because otherwise a later liveliness check could attempt
> !              * pg_clog access for a page that was truncated away by the
> !              * current vacuum.    Note that if the update had committed, we
> !              * wouldn't be freezing this tuple because it would have gotten
> !              * removed (HEAPTUPLE_DEAD) by HeapTupleSatisfiesVacuum; so it
> !              * either aborted or crashed.  Therefore, ignore this update_xid.
> !              */
> !             if (TransactionIdPrecedes(update_xid, cutoff_xid))
> !             {
> !                 update_xid = InvalidTransactionId;
> !                 update_committed = false;

I vote for an Assert(!TransactionIdDidCommit(update_xid)) here.

> !     else
> !     {
> !         /*
> !          * Create a new multixact with the surviving members of the previous
> !          * one, to set as new Xmax in the tuple.
> !          *
> !          * If this is the first possibly-multixact-able operation in the
> !          * current transaction, set my per-backend OldestMemberMXactId
> !          * setting. We can be certain that the transaction will never become a
> !          * member of any older MultiXactIds than that.
> !          */
> !         MultiXactIdSetOldestMember();
> !         xid = MultiXactIdCreateFromMembers(nnewmembers, newmembers);
> !         *flags |= FRM_RETURN_IS_MULTI;
> !     }

I worry that this MultiXactIdSetOldestMember() will be problematic in
longrunning vacuums like a anti-wraparound vacuum of a huge
table. There's no real need to set MultiXactIdSetOldestMember() here,
since we will not become the member of a multi. So I think you should
either move the Assert() in MultiXactIdCreateFromMembers() to it's other
callers, or add a parameter to skip it.

> ! /*
> !  * heap_prepare_freeze_tuple
>    *
>    * Check to see whether any of the XID fields of a tuple (xmin, xmax, xvac)
> !  * are older than the specified cutoff XID and cutoff MultiXactId.    If so,
> !  * setup enough state (in the *frz output argument) to later execute and
> !  * WAL-log what we would need to do, and return TRUE.  Return FALSE if nothing
> !  * is to be changed.
> !  *
> !  * Caller is responsible for setting the offset field, if appropriate.    This
> !  * is only necessary if the freeze is to be WAL-logged.

I'd leave of that second sentence, if you want to freeze a whole page
but not WAL log it, you'd need to set offset as well...


>       if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
>       {
> !         else if (flags & FRM_RETURN_IS_MULTI)
>           {
> !             frz->t_infomask &= ~HEAP_XMAX_BITS;
> !             frz->xmax = newxmax;
> ! 
> !             GetMultiXactIdHintBits(newxmax,
> !                                    &frz->t_infomask,
> !                                    &frz->t_infomask2);
> !             changed = true;
>           }

I worry that all these multixact accesses will create huge performance
problems due to the inefficiency of the multixactid cache. If you scan a
huge table there very well might be millions of different multis we
touch and afaics most of them will end up in the multixactid cache. That
can't end well.
I think we need to either regularly delete that cache when it goes past,
say, 100 entries, or just bypass it entirely.

Greetings,

Andres Freund



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2013-12-11 14:00:05 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > On 2013-12-09 19:14:58 -0300, Alvaro Herrera wrote:

> > > > Andres mentioned the idea of sharing some code between
> > > > heap_prepare_freeze_tuple and heap_tuple_needs_freeze, but I haven't
> > > > explored that.
> > > 
> > > My idea would just be to have heap_tuple_needs_freeze() call
> > > heap_prepare_freeze_tuple() and check whether it returns true. Yes,
> > > that's slightly more expensive than the current
> > > heap_tuple_needs_freeze(), but it's only called when we couldn't get a
> > > cleanup lock on a page, so that seems ok.
> > 
> > Doesn't seem a completely bad idea, but let's leave it for a separate
> > patch.  This should be changed in master only IMV anyway, while the rest
> > of this patch is to be backpatched to 9.3.
> 
> I am not so sure it shouldn't be backpatched together with this. We now
> have similar complex logic in both functions.

Any other opinions on this?


> > > > !         if (ISUPDATE_from_mxstatus(members[i].status) &&
> > > > !             !TransactionIdDidAbort(members[i].xid))#
> > > 
> > > It makes me wary to see a DidAbort() without a previous InProgress()
> > > call.  Also, after we crashed, doesn't DidAbort() possibly return
> > > false for transactions that were in progress before we crashed? At
> > > least that's how I always understood it, and how tqual.c is written.
> > 
> > Yes, that's correct.  But note that here we're not doing a tuple
> > liveliness test, which is what tqual.c is doing.  What we do with this
> > info is to keep the Xid as part of the multi if it's still running or
> > committed.  We also keep it if the xact crashed, which is fine because
> > the Xid will be removed by some later step.  If we know for certain that
> > the update Xid is aborted, then we can ignore it, but this is just an
> > optimization and not needed for correctness.
> 
> But why deviate that way? It doesn't seem to save us much?

Well, it does save something -- there not being a live update means we
are likely to be able to invalidate the Xmax completely if there are no
lockers; and even in the case where there are lockers, we will be able
to set LOCK_ONLY which means faster access in several places.


> > > >               if (tuple->t_infomask & HEAP_MOVED_OFF)
> > > > !                 frz->frzflags |= XLH_FREEZE_XVAC;
> > > >               else
> > > > !                 frz->frzflags |= XLH_INVALID_XVAC;
> > > >   
> > > 
> > > Hm. Isn't this case inverted? I.e. shouldn't you set XLH_FREEZE_XVAC and
> > > XLH_INVALID_XVAC exactly the other way round? I really don't understand
> > > the moved in/off, since the code has been gone longer than I've followed
> > > the code...
> > 
> > Yep, fixed.
> 
> I wonder how many of the HEAP_MOVED_* cases around are actually
> correct... What was the last version those were generated? 8.4?

8.4, yeah, before VACUUM FULL got rewritten.  I don't think anybody
tests these code paths, because it involves databases that were upgraded
straight from 8.4 and which in their 8.4 time saw VACUUM FULL executed.

I think we should be considering removing these things, or at least have
some mechanism to ensure they don't survive from pre-9.0 installs.

> > (I was toying with the "desc"
> > code because it misbehaves when applied on records as they are created,
> > as opposed to being applied on records as they are replayed.  I'm pretty
> > sure everyone already knows about this, and it's the reason why
> > everybody has skimped from examining arrays of things stored in followup
> > data records.  I was naive enough to write code that tries to decode the
> > followup record that contains the members of the multiaxct we're
> > creating, which works fine during replay but gets them completely wrong
> > during regular operation.  This is the third time I'm surprised by this
> > misbehavior; blame my bad memory for not remembering that it's not
> > supposed to work in the first place.)
> 
> I am not really sure what you are talking about. That you cannot
> properly decode records before they have been processed by XLogInsert()?
> If so, yes, that's pretty clear and I am pretty sure it will break in
> lots of places if you try?

Well, not sure about "lots of places".  The only misbahavior I have seen
is in those desc routines.  Of course, the redo routines might also
fail, but then there's no way for them to be running ...

> > Right now there is one case in this code that returns
> > FRM_INVALIDATE_XMAX when it's not strictly necessary, i.e. it would also
> > work to keep the Multi as is and return FRM_NOOP instead; and it also
> > returns FRM_NOOP in one case when we could return FRM_INVALIDATE_XMAX
> > instead.  Neither does any great damage, but there is a consideration
> > that future examiners of the tuple would have to resolve the MultiXact
> > by themselves (==> performance hit).  On the other hand, returning
> > INVALIDATE causes the block to be dirtied, which is undesirable if not
> > already dirty.
> 
> Otherwise it will be marked dirty the next time reads the page, so I
> don't think this is problematic.

Not necessarily.  I mean, if somebody sees a multi, they might just
resolve it to its members and otherwise leave the page alone.  Or in
some cases not even resolve to members (if it's LOCK_ONLY and old enough
to be behind the oldest visible multi).

> > !     {
> > !         if (ISUPDATE_from_mxstatus(members[i].status))
> > !         {
> > !             /*
> > !              * It's an update; should we keep it?  If the transaction is known
> > !              * aborted then it's okay to ignore it, otherwise not.  (Note this
> > !              * is just an optimization and not needed for correctness, so it's
> > !              * okay to get this test wrong; for example, in case an updater is
> > !              * crashed, or a running transaction in the process of aborting.)
> > !              */
> > !             if (!TransactionIdDidAbort(members[i].xid))
> > !             {
> > !                 newmembers[nnewmembers++] = members[i];
> > !                 Assert(!TransactionIdIsValid(update_xid));
> > ! 
> > !                 /*
> > !                  * Tell caller to set HEAP_XMAX_COMMITTED hint while we have
> > !                  * the Xid in cache.  Again, this is just an optimization, so
> > !                  * it's not a problem if the transaction is still running and
> > !                  * in the process of committing.
> > !                  */
> > !                 if (TransactionIdDidCommit(update_xid))
> > !                     update_committed = true;
> > ! 
> > !                 update_xid = newmembers[i].xid;
> > !             }
> 
> I don't think the conclusions here are correct - we might be setting
> HEAP_XMAX_COMMITTED a smudge to early that way. If the updating
> transaction is in progress, there's the situation that we have updated
> the clog, but have not yet removed ourselves from the procarray. I.e. a
> situation in which TransactionIdIsInProgress() and
> TransactionIdDidCommit() both return true. Afaik it is only correct to
> set HEAP_XMAX_COMMITTED once TransactionIdIsInProgress() returns false.

Hmm ... Is there an actual difference?  I mean, a transaction that
marked itself as committed in pg_clog cannot return to any other state,
regardless of what happens elsewhere.

> > !             /*
> > !              * Checking for very old update Xids is critical: if the update
> > !              * member of the multi is older than cutoff_xid, we must remove
> > !              * it, because otherwise a later liveliness check could attempt
> > !              * pg_clog access for a page that was truncated away by the
> > !              * current vacuum.    Note that if the update had committed, we
> > !              * wouldn't be freezing this tuple because it would have gotten
> > !              * removed (HEAPTUPLE_DEAD) by HeapTupleSatisfiesVacuum; so it
> > !              * either aborted or crashed.  Therefore, ignore this update_xid.
> > !              */
> > !             if (TransactionIdPrecedes(update_xid, cutoff_xid))
> > !             {
> > !                 update_xid = InvalidTransactionId;
> > !                 update_committed = false;
> 
> I vote for an Assert(!TransactionIdDidCommit(update_xid)) here.

Will add.

> > !     else
> > !     {
> > !         /*
> > !          * Create a new multixact with the surviving members of the previous
> > !          * one, to set as new Xmax in the tuple.
> > !          *
> > !          * If this is the first possibly-multixact-able operation in the
> > !          * current transaction, set my per-backend OldestMemberMXactId
> > !          * setting. We can be certain that the transaction will never become a
> > !          * member of any older MultiXactIds than that.
> > !          */
> > !         MultiXactIdSetOldestMember();
> > !         xid = MultiXactIdCreateFromMembers(nnewmembers, newmembers);
> > !         *flags |= FRM_RETURN_IS_MULTI;
> > !     }
> 
> I worry that this MultiXactIdSetOldestMember() will be problematic in
> longrunning vacuums like a anti-wraparound vacuum of a huge
> table. There's no real need to set MultiXactIdSetOldestMember() here,
> since we will not become the member of a multi. So I think you should
> either move the Assert() in MultiXactIdCreateFromMembers() to it's other
> callers, or add a parameter to skip it.

I would like to have the Assert() work automatically, that is, check the
PROC_IN_VACUUM flag in MyProc->vacuumflags ... but this probably won't
work with CLUSTER.  That said, I think we *should* call SetOldestMember
in CLUSTER.  So maybe both things should be conditional on
PROC_IN_VACUUM.

(Either way it will be ugly.)

> > ! /*
> > !  * heap_prepare_freeze_tuple
> >    *
> >    * Check to see whether any of the XID fields of a tuple (xmin, xmax, xvac)
> > !  * are older than the specified cutoff XID and cutoff MultiXactId.    If so,
> > !  * setup enough state (in the *frz output argument) to later execute and
> > !  * WAL-log what we would need to do, and return TRUE.  Return FALSE if nothing
> > !  * is to be changed.
> > !  *
> > !  * Caller is responsible for setting the offset field, if appropriate.    This
> > !  * is only necessary if the freeze is to be WAL-logged.
> 
> I'd leave of that second sentence, if you want to freeze a whole page
> but not WAL log it, you'd need to set offset as well...

I can buy that.


> I worry that all these multixact accesses will create huge performance
> problems due to the inefficiency of the multixactid cache. If you scan a
> huge table there very well might be millions of different multis we
> touch and afaics most of them will end up in the multixactid cache. That
> can't end well.
> I think we need to either regularly delete that cache when it goes past,
> say, 100 entries, or just bypass it entirely.

Delete the whole cache, or just prune it of the least recently used
entries?  Maybe the cache should be a dlist instead of the open-coded
stuff that's there now; that would enable pruning of the oldest entries.
I think a blanket deletion might be a cure worse than the disease.  I
see your point anyhow.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-11 22:08:41 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2013-12-11 14:00:05 -0300, Alvaro Herrera wrote:
> > > Andres Freund wrote:
> > > > On 2013-12-09 19:14:58 -0300, Alvaro Herrera wrote:
> > > > > !         if (ISUPDATE_from_mxstatus(members[i].status) &&
> > > > > !             !TransactionIdDidAbort(members[i].xid))#
> > > > 
> > > > It makes me wary to see a DidAbort() without a previous InProgress()
> > > > call.  Also, after we crashed, doesn't DidAbort() possibly return
> > > > false for transactions that were in progress before we crashed? At
> > > > least that's how I always understood it, and how tqual.c is written.
> > > 
> > > Yes, that's correct.  But note that here we're not doing a tuple
> > > liveliness test, which is what tqual.c is doing.  What we do with this
> > > info is to keep the Xid as part of the multi if it's still running or
> > > committed.  We also keep it if the xact crashed, which is fine because
> > > the Xid will be removed by some later step.  If we know for certain that
> > > the update Xid is aborted, then we can ignore it, but this is just an
> > > optimization and not needed for correctness.
> > 
> > But why deviate that way? It doesn't seem to save us much?
> 
> Well, it does save something -- there not being a live update means we
> are likely to be able to invalidate the Xmax completely if there are no
> lockers; and even in the case where there are lockers, we will be able
> to set LOCK_ONLY which means faster access in several places.

What I mean is that we could just query TransactionIdIsInProgress() like
usual. I most of the cases it will be very cheap because of the
RecentXmin() check at its beginning.

> > I am not really sure what you are talking about. That you cannot
> > properly decode records before they have been processed by XLogInsert()?
> > If so, yes, that's pretty clear and I am pretty sure it will break in
> > lots of places if you try?
> 
> Well, not sure about "lots of places".  The only misbahavior I have seen
> is in those desc routines.  Of course, the redo routines might also
> fail, but then there's no way for them to be running ...

Hm. I would guess that e.g. display xl_xact_commit fails majorly.

> > > Right now there is one case in this code that returns
> > > FRM_INVALIDATE_XMAX when it's not strictly necessary, i.e. it would also
> > > work to keep the Multi as is and return FRM_NOOP instead; and it also
> > > returns FRM_NOOP in one case when we could return FRM_INVALIDATE_XMAX
> > > instead.  Neither does any great damage, but there is a consideration
> > > that future examiners of the tuple would have to resolve the MultiXact
> > > by themselves (==> performance hit).  On the other hand, returning
> > > INVALIDATE causes the block to be dirtied, which is undesirable if not
> > > already dirty.
> > 
> > Otherwise it will be marked dirty the next time reads the page, so I
> > don't think this is problematic.
> 
> Not necessarily.  I mean, if somebody sees a multi, they might just
> resolve it to its members and otherwise leave the page alone.  Or in
> some cases not even resolve to members (if it's LOCK_ONLY and old enough
> to be behind the oldest visible multi).

But the work has to be done anyway, even if possibly slightly later?

> > > !                  * Tell caller to set HEAP_XMAX_COMMITTED hint while we have
> > > !                  * the Xid in cache.  Again, this is just an optimization, so
> > > !                  * it's not a problem if the transaction is still running and
> > > !                  * in the process of committing.
> > > !                  */
> > > !                 if (TransactionIdDidCommit(update_xid))
> > > !                     update_committed = true;
> > > ! 
> > > !                 update_xid = newmembers[i].xid;
> > > !             }
> > 
> > I don't think the conclusions here are correct - we might be setting
> > HEAP_XMAX_COMMITTED a smudge to early that way. If the updating
> > transaction is in progress, there's the situation that we have updated
> > the clog, but have not yet removed ourselves from the procarray. I.e. a
> > situation in which TransactionIdIsInProgress() and
> > TransactionIdDidCommit() both return true. Afaik it is only correct to
> > set HEAP_XMAX_COMMITTED once TransactionIdIsInProgress() returns false.
> 
> Hmm ... Is there an actual difference?  I mean, a transaction that
> marked itself as committed in pg_clog cannot return to any other state,
> regardless of what happens elsewhere.

But it could lead to other transactions seing the row as committed, even
though it isn't really yet.
tqual.c sayeth:* NOTE: must check TransactionIdIsInProgress (which looks in PGXACT array)* before
TransactionIdDidCommit/TransactionIdDidAbort(which look in* pg_clog).  Otherwise we have a race condition: we might
decidethat a* just-committed transaction crashed, because none of the tests succeed.* xact.c is careful to record
commit/abortin pg_clog before it unsets* MyPgXact->xid in PGXACT array.  That fixes that problem, but it also* means
thereis a window where TransactionIdIsInProgress and* TransactionIdDidCommit will both return true.  If we check only*
TransactionIdDidCommit,we could consider a tuple committed when a* later GetSnapshotData call will still think the
originatingtransaction* is in progress, which leads to application-level inconsistency.    The* upshot is that we gotta
checkTransactionIdIsInProgress first in all* code paths, except for a few cases where we are looking at*
subtransactionsof our own main transaction and so there can't be any* race condition.
 

I don't think there's any reason to deviate from this pattern here. For
old xids TransactionIdIsInProgress() should be really cheap.

> > I worry that this MultiXactIdSetOldestMember() will be problematic in
> > longrunning vacuums like a anti-wraparound vacuum of a huge
> > table. There's no real need to set MultiXactIdSetOldestMember() here,
> > since we will not become the member of a multi. So I think you should
> > either move the Assert() in MultiXactIdCreateFromMembers() to it's other
> > callers, or add a parameter to skip it.
> 
> I would like to have the Assert() work automatically, that is, check the
> PROC_IN_VACUUM flag in MyProc->vacuumflags ... but this probably won't
> work with CLUSTER.  That said, I think we *should* call SetOldestMember
> in CLUSTER.  So maybe both things should be conditional on
> PROC_IN_VACUUM.

Why should it be dependent on cluster? SetOldestMember() defines the
oldest multi we can be a member of. Even in cluster, the freezing will
not make us a member of a multi. If the transaction does something else
requiring SetOldestMember(), that will do it?

> > I worry that all these multixact accesses will create huge performance
> > problems due to the inefficiency of the multixactid cache. If you scan a
> > huge table there very well might be millions of different multis we
> > touch and afaics most of them will end up in the multixactid cache. That
> > can't end well.
> > I think we need to either regularly delete that cache when it goes past,
> > say, 100 entries, or just bypass it entirely.
> 
> Delete the whole cache, or just prune it of the least recently used
> entries?  Maybe the cache should be a dlist instead of the open-coded
> stuff that's there now; that would enable pruning of the oldest entries.
> I think a blanket deletion might be a cure worse than the disease.  I
> see your point anyhow.

I was thinking of just deleting the whole thing. Revamping the cache
mechanism to be more efficient, is an important goal, but it imo
shouldn't be lumped together with this. Now you could argue that purging
the cache shouldn't be either - but 9.3.2+ the worst case essentially is
O(n^2) in the number of rows in a table. Don't think that can be
acceptable.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2013-12-11 22:08:41 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:

> > > I worry that all these multixact accesses will create huge performance
> > > problems due to the inefficiency of the multixactid cache. If you scan a
> > > huge table there very well might be millions of different multis we
> > > touch and afaics most of them will end up in the multixactid cache. That
> > > can't end well.
> > > I think we need to either regularly delete that cache when it goes past,
> > > say, 100 entries, or just bypass it entirely.
> > 
> > Delete the whole cache, or just prune it of the least recently used
> > entries?  Maybe the cache should be a dlist instead of the open-coded
> > stuff that's there now; that would enable pruning of the oldest entries.
> > I think a blanket deletion might be a cure worse than the disease.  I
> > see your point anyhow.
> 
> I was thinking of just deleting the whole thing. Revamping the cache
> mechanism to be more efficient, is an important goal, but it imo
> shouldn't be lumped together with this. Now you could argue that purging
> the cache shouldn't be either - but 9.3.2+ the worst case essentially is
> O(n^2) in the number of rows in a table. Don't think that can be
> acceptable.

So I think this is the only remaining issue to make this patch
committable (I will address the other points in Andres' email.)  Since
there has been no other feedback on this thread, Andres and I discussed
the cache issue a bit over IM and we seem to agree that a patch to
revamp the cache should be a fairly localized change that could be
applied on both 9.3 and master, separately from this fix.  Doing cache
deletion seems more invasive, and not provide better performance anyway.

Since having a potentially O(n^2) cache behavior but with working freeze
seems better than no O(n^2) but broken freeze, I'm going to apply this
patch shortly and then work on reworking the cache.

Are there other opinions?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2013-12-11 22:08:41 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:

> > > I worry that this MultiXactIdSetOldestMember() will be problematic in
> > > longrunning vacuums like a anti-wraparound vacuum of a huge
> > > table. There's no real need to set MultiXactIdSetOldestMember() here,
> > > since we will not become the member of a multi. So I think you should
> > > either move the Assert() in MultiXactIdCreateFromMembers() to it's other
> > > callers, or add a parameter to skip it.
> >
> > I would like to have the Assert() work automatically, that is, check the
> > PROC_IN_VACUUM flag in MyProc->vacuumflags ... but this probably won't
> > work with CLUSTER.  That said, I think we *should* call SetOldestMember
> > in CLUSTER.  So maybe both things should be conditional on
> > PROC_IN_VACUUM.
>
> Why should it be dependent on cluster? SetOldestMember() defines the
> oldest multi we can be a member of. Even in cluster, the freezing will
> not make us a member of a multi. If the transaction does something else
> requiring SetOldestMember(), that will do it?

One last thing (I hope).  It's not real easy to disable this check,
because it actually lives in GetNewMultiXactId.  It would uglify the API
a lot if we were to pass a flag down two layers of routines; and moving
it to higher-level routines doesn't seem all that appropriate either.
I'm thinking we can have a new flag in MyPgXact->vacuumFlags, so
heap_prepare_freeze_tuple does this:

        PG_TRY();
        {
            /* set flag to let multixact.c know what we're doing */
            MyPgXact->vacuumFlags |= PROC_FREEZING_MULTI;
            newxmax = FreezeMultiXactId(xid, tuple->t_infomask,
                                        cutoff_xid, cutoff_multi, &flags);
        }
        PG_CATCH();
        {
            MyPgXact->vacuumFlags &= ~PROC_FREEZING_MULTI;
            PG_RE_THROW();
        }
        PG_END_TRY();
        MyPgXact->vacuumFlags &= ~PROC_FREEZING_MULTI;

and GetNewMultiXactId tests it to avoid the assert:

    /*
     * MultiXactIdSetOldestMember() must have been called already, but don't
     * check while freezing MultiXactIds.
     */
    Assert((MyPggXact->vacuumFlags & PROC_FREEZING_MULTI) ||
           MultiXactIdIsValid(OldestMemberMXactId[MyBackendId]));

This avoids the API uglification issues, but introduces a setjmp call
for every tuple to be frozen.  I don't think this is an excessive cost
to pay; after all, this is going to happen only for tuples for which
heap_tuple_needs_freeze already returned true; and for those we're
already going to do a lot of other work.

Attached is the whole series of patches for 9.3.  (master is the same,
only with an additional patch that removes the legacy WAL record.)

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> One last thing (I hope).  It's not real easy to disable this check,
> because it actually lives in GetNewMultiXactId.  It would uglify the API
> a lot if we were to pass a flag down two layers of routines; and moving
> it to higher-level routines doesn't seem all that appropriate either.
> I'm thinking we can have a new flag in MyPgXact->vacuumFlags, so
> heap_prepare_freeze_tuple does this:
> 
>         PG_TRY();
>         {
>             /* set flag to let multixact.c know what we're doing */
>             MyPgXact->vacuumFlags |= PROC_FREEZING_MULTI;
>             newxmax = FreezeMultiXactId(xid, tuple->t_infomask,
>                                         cutoff_xid, cutoff_multi, &flags);
>         }

Uhm, actually we don't need a PG_TRY block at all for this to work: we
can rely on the flag being reset at transaction abort, if anything wrong
happens and we lose control.  So just set the flag, call
FreezeMultiXactId, reset flag.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
Hi,

On 2013-12-12 18:39:43 -0300, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> > One last thing (I hope).  It's not real easy to disable this check,
> > because it actually lives in GetNewMultiXactId.  It would uglify the API
> > a lot if we were to pass a flag down two layers of routines; and moving
> > it to higher-level routines doesn't seem all that appropriate
> > either.

Unfortunately I find that too ugly. Adding a flag in the procarray
because of an Assert() in a lowlevel routine? That's overkill.
What's the problem with moving the check to MultiXactIdCreate() and
MultiXactIdExpand() instead? Since those are the ones where it's
required to have called SetOldest() before, I don't see why that would
be inappropriate?

> > I'm thinking we can have a new flag in MyPgXact->vacuumFlags, so
> > heap_prepare_freeze_tuple does this:
> > 
> >         PG_TRY();
> >         {
> >             /* set flag to let multixact.c know what we're doing */
> >             MyPgXact->vacuumFlags |= PROC_FREEZING_MULTI;
> >             newxmax = FreezeMultiXactId(xid, tuple->t_infomask,
> >                                         cutoff_xid, cutoff_multi, &flags);
> >         }
> 
> Uhm, actually we don't need a PG_TRY block at all for this to work: we
> can rely on the flag being reset at transaction abort, if anything wrong
> happens and we lose control.  So just set the flag, call
> FreezeMultiXactId, reset flag.

I don't think that'd be correct for a CLUSTER in a subtransaction?  A
subtransaction's abort afaics doesn't call ProcArrayEndTransaction() and
thus don't clear vacuumFlags...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> Unfortunately I find that too ugly. Adding a flag in the procarray
> because of an Assert() in a lowlevel routine? That's overkill.

If this flag doesn't need to be visible to other backends, it absolutely
does not belong in the procarray.
        regards, tom lane



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-12 18:24:34 -0300, Alvaro Herrera wrote:
> +            /*
> +             * It's an update; should we keep it?  If the transaction is known
> +             * aborted then it's okay to ignore it, otherwise not.  (Note this
> +             * is just an optimization and not needed for correctness, so it's
> +             * okay to get this test wrong; for example, in case an updater is
> +             * crashed, or a running transaction in the process of aborting.)
> +             */
> +            if (!TransactionIdDidAbort(members[i].xid))
> +            {
> +                newmembers[nnewmembers++] = members[i];
> +                Assert(!TransactionIdIsValid(update_xid));
> +
> +                /*
> +                 * Tell caller to set HEAP_XMAX_COMMITTED hint while we have
> +                 * the Xid in cache.  Again, this is just an optimization, so
> +                 * it's not a problem if the transaction is still running and
> +                 * in the process of committing.
> +                 */
> +                if (TransactionIdDidCommit(update_xid))
> +                    update_committed = true;
> +
> +                update_xid = newmembers[i].xid;
> +            }

I still don't think this is ok. Freezing shouldn't set hint bits earlier
than tqual.c does. What's the problem with adding a
!TransactionIdIsInProgress()?

You also wrote:
On 2013-12-11 22:08:41 -0300, Alvaro Herrera wrote:
> Hmm ... Is there an actual difference?  I mean, a transaction that
> marked itself as committed in pg_clog cannot return to any other state,
> regardless of what happens elsewhere.

Hm, that's not actually true, I missed that so far: Think of async
commits and what we do in tqual.c:SetHintBits(). But I think we're safe
in this scenario, at least for the current callers. vacuumlazy.c will
WAL log the freezing and set the LSN while holding an exclusive lock,
therefor providing an LSN interlock. VACUUM FULL/CLUSTER will be safe,
even with wal_level=minimal, because the relation won't be visible until
it commits and it will contain a smgr pending delete forcing a
synchronous commit. But that should be documented.

> +            if (TransactionIdPrecedes(update_xid, cutoff_xid))
> +            {
> +                update_xid = InvalidTransactionId;
> +                update_committed = false;
> +
> +            }

Deserves an Assert().

> +    else if (TransactionIdIsValid(update_xid) && !has_lockers)
> +    {
> +        /*
> +         * If there's a single member and it's an update, pass it back alone
> +         * without creating a new Multi.  (XXX we could do this when there's a
> +         * single remaining locker, too, but that would complicate the API too
> +         * much; moreover, the case with the single updater is more
> +         * interesting, because those are longer-lived.)
> +         */
> +        Assert(nnewmembers == 1);
> +        *flags |= FRM_RETURN_IS_XID;
> +        if (update_committed)
> +            *flags |= FRM_MARK_COMMITTED;
> +        xid = update_xid;
> +    }

Afaics this will cause HEAP_KEYS_UPDATED to be reset, is that
problematic? I don't really remember what it's needed for TBH...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2013-12-12 18:24:34 -0300, Alvaro Herrera wrote:
> > +            /*
> > +             * It's an update; should we keep it?  If the transaction is known
> > +             * aborted then it's okay to ignore it, otherwise not.  (Note this
> > +             * is just an optimization and not needed for correctness, so it's
> > +             * okay to get this test wrong; for example, in case an updater is
> > +             * crashed, or a running transaction in the process of aborting.)
> > +             */
> > +            if (!TransactionIdDidAbort(members[i].xid))
> > +            {
> > +                newmembers[nnewmembers++] = members[i];
> > +                Assert(!TransactionIdIsValid(update_xid));
> > +
> > +                /*
> > +                 * Tell caller to set HEAP_XMAX_COMMITTED hint while we have
> > +                 * the Xid in cache.  Again, this is just an optimization, so
> > +                 * it's not a problem if the transaction is still running and
> > +                 * in the process of committing.
> > +                 */
> > +                if (TransactionIdDidCommit(update_xid))
> > +                    update_committed = true;
> > +
> > +                update_xid = newmembers[i].xid;
> > +            }
> 
> I still don't think this is ok. Freezing shouldn't set hint bits earlier
> than tqual.c does. What's the problem with adding a
> !TransactionIdIsInProgress()?

I was supposed to tell you, and evidently forgot, that patch 0001 was
the same as previously submitted, and was modified by the subsequent
patches modify per review comments.  These comments should already be
handled in the later patches in the series I just posted.  The idea was
to spare you reading the whole thing all over again, but evidently that
backfired.  I think the new code doesn't suffer from the problem you
mention; and neither the other one that I trimmed out.

> > +    else if (TransactionIdIsValid(update_xid) && !has_lockers)
> > +    {
> > +        /*
> > +         * If there's a single member and it's an update, pass it back alone
> > +         * without creating a new Multi.  (XXX we could do this when there's a
> > +         * single remaining locker, too, but that would complicate the API too
> > +         * much; moreover, the case with the single updater is more
> > +         * interesting, because those are longer-lived.)
> > +         */
> > +        Assert(nnewmembers == 1);
> > +        *flags |= FRM_RETURN_IS_XID;
> > +        if (update_committed)
> > +            *flags |= FRM_MARK_COMMITTED;
> > +        xid = update_xid;
> > +    }
> 
> Afaics this will cause HEAP_KEYS_UPDATED to be reset, is that
> problematic? I don't really remember what it's needed for TBH...

Hmm, will check that out.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> So I think this is the only remaining issue to make this patch
> committable (I will address the other points in Andres' email.)  Since
> there has been no other feedback on this thread, Andres and I discussed
> the cache issue a bit over IM and we seem to agree that a patch to
> revamp the cache should be a fairly localized change that could be
> applied on both 9.3 and master, separately from this fix.  Doing cache
> deletion seems more invasive, and not provide better performance anyway.

Here's cache code with LRU superpowers (ahem.)

I settled on 256 as number of entries because it's in the same ballpark
as MaxHeapTuplesPerPage which seems a reasonable guideline to follow.

I considered the idea of avoiding palloc/pfree for cache entries
entirely, instead storing them in a static array which is referenced
from the dlist; unfortunately that doesn't work because each cache entry
is variable size, depending on number of members.  We could try to work
around that and allocate a large shared array for members, but that
starts to smell of over-engineering, so I punted.

I was going to 'perf' this, but then found out that I need to compile my
own linux-tools package for a home-compiled kernel ATM.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-13 13:39:20 -0300, Alvaro Herrera wrote:
> Here's cache code with LRU superpowers (ahem.)

Heh.

> I settled on 256 as number of entries because it's in the same ballpark
> as MaxHeapTuplesPerPage which seems a reasonable guideline to follow.

Sounds ok.

> I considered the idea of avoiding palloc/pfree for cache entries
> entirely, instead storing them in a static array which is referenced
> from the dlist; unfortunately that doesn't work because each cache entry
> is variable size, depending on number of members.  We could try to work
> around that and allocate a large shared array for members, but that
> starts to smell of over-engineering, so I punted.

Good plan imo.

> *** 1326,1331 **** mXactCacheGetBySet(int nmembers, MultiXactMember *members)
> --- 1331,1337 ----
>           if (memcmp(members, entry->members, nmembers * sizeof(MultiXactMember)) == 0)
>           {
>               debug_elog3(DEBUG2, "CacheGet: found %u", entry->multi);
> +             dlist_move_head(&MXactCache, iter.cur);
>               return entry->multi;
>           }
>       }

That's only possible because we immediately abort the loop, otherwise
we'd corrupt the iterator. Maybe that deserves a comment.

> + 
> +             dlist_move_head(&MXactCache, iter.cur);
> + 

Heh. I forgot that we already had that bit; I was wondering whether you
had to forgot to include it in the patch ;)

>   static char *
> --- 1420,1435 ----
>       /* mXactCacheGetBySet assumes the entries are sorted, so sort them */
>       qsort(entry->members, nmembers, sizeof(MultiXactMember), mxactMemberComparator);
>   
> !     dlist_push_head(&MXactCache, &entry->node);
> !     if (MXactCacheMembers++ >= MAX_CACHE_ENTRIES)
> !     {
> !         dlist_node *node;
> ! 
> !         node = dlist_tail_node(&MXactCache);
> !         dlist_delete(dlist_tail_node(&MXactCache));
> !         MXactCacheMembers--;
> !         pfree(dlist_container(mXactCacheEnt, node, node));
> !     }
>   }

Duplicate dlist_tail_node(). Maybe add a debug_elog3(.. "CacheGet:
pruning %u from cache")?

I wondered before if we shouldn't introduce a layer above dlists, that
support keeping track of the number of elements, and maybe also have
support for LRU behaviour. Not as a part this patch, just generally.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2013-12-12 18:24:34 -0300, Alvaro Herrera wrote:

> > +    else if (TransactionIdIsValid(update_xid) && !has_lockers)
> > +    {
> > +        /*
> > +         * If there's a single member and it's an update, pass it back alone
> > +         * without creating a new Multi.  (XXX we could do this when there's a
> > +         * single remaining locker, too, but that would complicate the API too
> > +         * much; moreover, the case with the single updater is more
> > +         * interesting, because those are longer-lived.)
> > +         */
> > +        Assert(nnewmembers == 1);
> > +        *flags |= FRM_RETURN_IS_XID;
> > +        if (update_committed)
> > +            *flags |= FRM_MARK_COMMITTED;
> > +        xid = update_xid;
> > +    }
> 
> Afaics this will cause HEAP_KEYS_UPDATED to be reset, is that
> problematic? I don't really remember what it's needed for TBH...

So we do reset HEAP_KEYS_UPDATED, and in general that bit seems critical
for several things.  So it should be kept when a Xmax is carried over
from the pre-frozen version of the tuple.  But while reading through
that, I realize that we should also be keeping HEAP_HOT_UPDATED in that
case.  And particularly we should never clear HEAP_ONLY_TUPLE.

So I think heap_execute_freeze_tuple() is wrong, because it's resetting
the whole infomask to zero, and then setting it to only the bits that
heap_prepare_freeze_tuple decided that it needed set.  That seems bogus
to me.  heap_execute_freeze_tuple() should only clear a certain number
of bits, and then possibly set some of the same bits; but the remaining
flags should remain untouched.  So HEAP_KEYS_UPDATED, HEAP_UPDATED and
HEAP_HOT_UPDATED should be untouched by heap_execute_freeze_tuple;
heap_prepare_freeze_tuple needn't worry about querying those bits at
all.

Only when FreezeMultiXactId returns FRM_INVALIDATE_XMAX, and when the
Xmax is not a multi and it gets removed, should those three flags be
removed completely.

HEAP_ONLY_TUPLE should be untouched by the freezing protocol.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Andres Freund
Date:
On 2013-12-13 17:08:46 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > Afaics this will cause HEAP_KEYS_UPDATED to be reset, is that
> > problematic? I don't really remember what it's needed for TBH...
> 
> So we do reset HEAP_KEYS_UPDATED, and in general that bit seems critical
> for several things.  So it should be kept when a Xmax is carried over
> from the pre-frozen version of the tuple.  But while reading through
> that, I realize that we should also be keeping HEAP_HOT_UPDATED in that
> case.  And particularly we should never clear HEAP_ONLY_TUPLE.

That's only for the multi->plain xid case tho, right?

> So I think heap_execute_freeze_tuple() is wrong, because it's resetting
> the whole infomask to zero, and then setting it to only the bits that
> heap_prepare_freeze_tuple decided that it needed set.  That seems bogus
> to me.  heap_execute_freeze_tuple() should only clear a certain number
> of bits, and then possibly set some of the same bits; but the remaining
> flags should remain untouched.

Uh, my version and the latest you've sent intiially copy the original
infomask to the freeze plan and then manipulate those. That seems fine
to me. Am I missing something?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
Noah Misch wrote:
> On Tue, Dec 03, 2013 at 07:26:38PM +0100, Andres Freund wrote:
> > On 2013-12-03 13:14:38 -0500, Noah Misch wrote:
> > > On Tue, Dec 03, 2013 at 04:37:58PM +0100, Andres Freund wrote:
> > > > I currently don't see fixing the errorneous freezing of lockers (not the
> > > > updater though) without changing the wal format or synchronously waiting
> > > > for all lockers to end. Which both see like a no-go?
> > > 
> > > Not fixing it at all is the real no-go.  We'd take both of those undesirables
> > > before just tolerating the lost locks in 9.3.
> > 
> > I think it's changing the wal format then.
> 
> I'd rather have an readily-verifiable fix that changes WAL format than a
> tricky fix that avoids doing so.  So, modulo not having seen the change, +1.

I've committed a patch which hopefully fixes the problem using this
approach.  Thanks, Noah, for noticing the issue, and thanks, Andres, for
collaboration in getting the code in the right state.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Fix a couple of bugs in MultiXactId freezing

From
Alvaro Herrera
Date:
BTW, there are a a couple of spec files floating around which perhaps we
should consider getting into the source repo (in some cleaned up form).
Noah published one, Andres shared a couple more with me, and I think I
have two more.  They can't be made to work in normal circumstances,
because they depend on concurrent server activity.  But perhaps we
should add them anyway and perhaps list them in a separate schedule
file, so that any developer interested in messing with this stuff has
them readily available for testing.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment