Re: Small SSI issues - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Small SSI issues
Date
Msg-id 4DF1EC83020000250003E4B0@gw.wicourts.gov
Whole thread Raw
In response to Small SSI issues  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Small SSI issues
Re: Small SSI issues
List pgsql-hackers
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> Here's a bunch of small issues that I spotted:
Thanks for going over it again.  It is encouraging that you didn't
spot any *big* issues.
> * The oldserxid code is broken for non-default BLCKSZ.
>    o The warning will come either too early or too late
Good point.  That part is easily fixed -- if we want to keep the
warning, in light of the next few points.
>    o There is no safeguard against actually wrapping around the
>      SLRU, just the warning
Any thoughts on what we should do instead?  If someone holds open a
transaction long enough to burn through a billion transaction IDs
(or possibly less if someone uses a smaller BLCKSZ), should we
generate a FATAL error?  Of course, one other option would be to
allow more SLRU segment files, as you raised on another thread. 
Then this issue goes away because they would hit other, existing,
protections against transaction wraparound first.
>    o I'm suspicious of the OldSerXidPagePrecedesLogically() logic
>      with 32k BLCKSZ. In that case, OLDSERXID_MAX_PAGE is a larger
>      than necessary to cover the whole range of 2^32 transactions,
>      so at high XIDs, say 2^32-1, doesn't it incorrectly think
>      that low XIDs, say 10, are in the past, not the future?
I will look that over to see; but if this is broken, then one of the
other SLRU users is probably broken, because I think I stole this
code pretty directly from another spot.
>> /*
>>  * Keep a pointer to the currently-running serializable
>>  * transaction (if any) for quick reference.
>>  * TODO SSI: Remove volatile qualifier and the then-unnecessary
>>  * casts?
>>  */
>> static volatile SERIALIZABLEXACT *MySerializableXact = 
>>                     InvalidSerializableXact;
> 
> * So, should we remove it? In most places where contents of
>    MySerializableXact are modified, we're holding
>    SerializableXactHashLock in exclusive mode. However, there's
>    two exceptions:
> 
>    o GetSafeSnapshot() modifies MySerializableXact->flags without
>      any lock. It also inspects
>      MySerializableXact->possibleUnsafeConflicts without a lock.
>      What if somone sets some other flag in the flags bitmap just
>      when GetSafeSnapshot() is about to set
>      SXACT_FLAG_DEFERRABLE_WAITING? One of the flags might be lost
>      :-(.
> 
>    o CheckForSerializableConflictIn() sets SXACT_FLAG_DID_WRITE
>      without holding a lock. The same danger is here if someone
>      else tries to set some other flag concurrently.
> 
>    I think we should simply acquire SerializableXactHashLock in
>    GetSafeSnapshot(). It shouldn't be so much of a hotspot that it
>    would make any difference in performance.
>    CheckForSerializableConflictIn() might be called a lot,
>    however, so maybe we need to check if the flag is set first,
>    and only acquire the lock and set it if it's not set already.
OK.
Do checks such as that argue for keeping the volatile flag, or do
you think we can drop it if we make those changes?  (That would also
allow dropping a number of casts which exist just to avoid
warnings.)
> * Is the SXACT_FLAG_ROLLED_BACK flag necessary? It's only set in
>    ReleasePredicateLocks() for a fleeting moment while the
>    function releases all conflicts and locks held by the
>    transaction, and finally the sxact struct itself containing the
>    flag.
I think that one can go away.  It  had more of a point many months
ago before we properly sorted out what belongs in
PreCommit_CheckForSerializationFailure() and what belongs in
ReleasePredicateLocks().  The point at which we reached clarity on
that and moved things around, this flag probably became obsolete.
>    Also, isn't a transaction that's already been marked for death
>    the same as one that has already rolled back, for the purposes
>    of detecting conflicts?
Yes.
We should probably ignore any marked-for-death transaction during
conflict detection and serialization failure detection.  As a start,
anywhere there is now a check for rollback and not for this, we
should change it to this.  There may be some places this can be
checked which haven't yet been identified and touched.
> * The HavePartialClearThrough/CanPartialClearThrough mechanism
>   needs a better explanation. The only explanation currently is
>   this:
> 
>>  if (--(PredXact->WritableSxactCount) == 0)
>>  {
>>      /*
>>       * Release predicate locks and rw-conflicts in for
>>       * all committed transactions.  There are no longer any
>>       * transactions which might conflict with the locks and no
>>       * chance for new transactions to overlap.  Similarly,
>>       * existing conflicts in can't cause pivots, and any
>>       * conflicts in which could have completed a dangerous
>>       * structure would already have caused a rollback, so any
>>       * remaining ones must be benign.
>>       */
>>      PredXact->CanPartialClearThrough =
>>          PredXact->LastSxactCommitSeqNo;
>>  }
> 
>    If I understand that correctly, any predicate lock belonging to
>    any already committed transaction can be safely zapped away at
>    that instant.
Correct.
>    We don't do it immediately, because it might be expensive.
I think it has more to do with getting the LW locks right.  We make
the call to ClearOldPredicateLocks() farther down in the function,
so this is effectively setting things up for that call.
>    But what is the purpose of HavePartialClearThrough?
To avoid doing unnecessary work for completed transactions on which
we still need to keep some information but for which we were
previously able to clear predicate locks.  This relates to the
"mitigation" work discussed in these and other posts from around
that time:
http://archives.postgresql.org/pgsql-hackers/2010-10/msg01754.php
http://archives.postgresql.org/pgsql-hackers/2010-12/msg02119.php
I'm happy to work on modifications for any of this or to stay out of
your way if you want to.  Should I put together a patch for those
items where we seem to agree and have a clear way forward?
-Kevin


pgsql-hackers by date:

Previous
From: Alexey Klyukin
Date:
Subject: Re: Operator families vs. casts
Next
From: Alex Hunsaker
Date:
Subject: Re: Creating new remote branch in git?