Re: Small SSI issues - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Small SSI issues
Date
Msg-id 4DF265EE.4040209@enterprisedb.com
Whole thread Raw
In response to Re: Small SSI issues  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: Small SSI issues
List pgsql-hackers
On 10.06.2011 18:05, Kevin Grittner wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  wrote:
>>     o There is no safeguard against actually wrapping around the
>>       SLRU, just the warning
>
> Any thoughts on what we should do instead?  If someone holds open a
> transaction long enough to burn through a billion transaction IDs
> (or possibly less if someone uses a smaller BLCKSZ), should we
> generate a FATAL error?

FATAL is a bit harsh, ERROR seems more appropriate.

>>     o I'm suspicious of the OldSerXidPagePrecedesLogically() logic
>>       with 32k BLCKSZ. In that case, OLDSERXID_MAX_PAGE is a larger
>>       than necessary to cover the whole range of 2^32 transactions,
>>       so at high XIDs, say 2^32-1, doesn't it incorrectly think
>>       that low XIDs, say 10, are in the past, not the future?
>
> I will look that over to see; but if this is broken, then one of the
> other SLRU users is probably broken, because I think I stole this
> code pretty directly from another spot.

It was copied from async.c, which doesn't have this problem because it's 
not mapping XIDs to the slru. There, the max queue size is determined by 
the *_MAX_PAGE, and you point to a particular location in the SLRU with 
simply page+offset.

>>> /*
>>>   * Keep a pointer to the currently-running serializable
>>>   * transaction (if any) for quick reference.
>>>   * TODO SSI: Remove volatile qualifier and the then-unnecessary
>>>   * casts?
>>>   */
>>> static volatile SERIALIZABLEXACT *MySerializableXact =
>>>                      InvalidSerializableXact;
>>
>> * So, should we remove it? In most places where contents of
>>     MySerializableXact are modified, we're holding
>>     SerializableXactHashLock in exclusive mode. However, there's
>>     two exceptions:
>>
>>     o GetSafeSnapshot() modifies MySerializableXact->flags without
>>       any lock. It also inspects
>>       MySerializableXact->possibleUnsafeConflicts without a lock.
>>       What if somone sets some other flag in the flags bitmap just
>>       when GetSafeSnapshot() is about to set
>>       SXACT_FLAG_DEFERRABLE_WAITING? One of the flags might be lost
>>       :-(.
>>
>>     o CheckForSerializableConflictIn() sets SXACT_FLAG_DID_WRITE
>>       without holding a lock. The same danger is here if someone
>>       else tries to set some other flag concurrently.
>>
>>     I think we should simply acquire SerializableXactHashLock in
>>     GetSafeSnapshot(). It shouldn't be so much of a hotspot that it
>>     would make any difference in performance.
>>     CheckForSerializableConflictIn() might be called a lot,
>>     however, so maybe we need to check if the flag is set first,
>>     and only acquire the lock and set it if it's not set already.
>
> OK.
>
> Do checks such as that argue for keeping the volatile flag, or do
> you think we can drop it if we make those changes?  (That would also
> allow dropping a number of casts which exist just to avoid
> warnings.)

I believe we can drop it, I'll double-check.

> I'm happy to work on modifications for any of this or to stay out of
> your way if you want to.  Should I put together a patch for those
> items where we seem to agree and have a clear way forward?

I'll fix the MySerializableXact locking issue, and come back to the 
other issues on Monday. If you have the time and energy to write a patch 
by then, feel free, but I can look into them otherwise.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: gcc 4.6 and hot standby
Next
From: Josh Berkus
Date:
Subject: Re: [v9.2] Start new timeline for PITR