Thread: [HACKERS] [GSOC] [Weekly report 2] Eliminate O(N^2) scaling from rw-conflicttracking in serializable transactions

 Hi, all. In the last week, I replaced linked list with hash table in SerializableXact. 
 I only replace inConflicts and outConflicts. The other linked lists, such as possibleUnsafeConflicts, I will modify them after other things work well.

There are still some bugs: the abort rate is much higher than before (from 11.3% to 72%). I'll figure out what's wrong in the next few days.

My design is as follow:

For hash table, key is the pointer of SerializableXact; Value is the RWConflictData object. 
Hashcode is generated based on the SerializableXact pointer. 
So, given a SerializableXact, we can quickly find if it is conflict with another SerializableXact.

Every SerializableXact has two tables: inConflictsTab and outConflictsTab. 
They are allocated and initialized when creating PredXactList (in the function InitPredicateLocks). 
When a SerializableXact object is released, it will release all its RWConflict objects, the hash tables are empty again. 
Thus They can be reused directly after releasing. 

NOTE: I stored RWConflictData in hash tables, instead of RWConflict object allocated by RWConflictPool. 
 After I remove other linked lists, the RWConflictPool can be omitted.

My code is on the :
https://github.com/liumx10/postgresql/commit/3fd9a7488de5ae19ce2ce19eae5f303153a079ff

There are 3 main modifications in summary:
1) Initializing hash table. Related functions: InitPredicateLocks

2) Setting, releasing and checking RWConflicts. Related functions: RWConflictExists, SetRWConflict, ReleaseRWConflict

3) There are some functions scanning all RWConflict in a SerializableXact.
Related functions: ReleasePredicateLocks, ReleaseOneSerializableXact, CheckForSerializableConflictOut, 
 CheckForSerializableConflictIn, OnConflict_CheckForSerializationFailure, PreCommit_CheckForSerializationFailure



Sincerely
--
Mengxing Liu





On 06/15/2017 08:51 AM, Mengxing Liu wrote:
> My design is as follow:
>
> For hash table, key is the pointer of SerializableXact; Value is the RWConflictData object.
> Hashcode is generated based on the SerializableXact pointer.
> So, given a SerializableXact, we can quickly find if it is conflict with another SerializableXact.
>
> Every SerializableXact has two tables: inConflictsTab and outConflictsTab.
> They are allocated and initialized when creating PredXactList (in the function InitPredicateLocks).
> When a SerializableXact object is released, it will release all its RWConflict objects, the hash tables are empty
again.
> Thus They can be reused directly after releasing.
>
> NOTE: I stored RWConflictData in hash tables, instead of RWConflict object allocated by RWConflictPool.
>  After I remove other linked lists, the RWConflictPool can be omitted.

Sounds good!

> My code is on the :
> https://github.com/liumx10/postgresql/commit/3fd9a7488de5ae19ce2ce19eae5f303153a079ff

Once you've ironed out the obvious bugs, make sure to also post it as a 
patch to this mailing list. For the sake of the archives, and to make it 
clear that you're submitting this for inclusion in PostgreSQL, under the 
PostgreSQL license.

Couple of little things caught my eye at a quick glance:

>  * Test whether a hash table is empty.
> + * I didn't find any function in dynamic hash supports the requirement.

You can do: hash_get_num_entries(hashp) == 0

> sprintf(name, "PredXact inConflictsTab %d", i);

Better to use snprintf() instead. sprintf() is safe here, but many 
static analysis tools will complain on any sight of plain sprintf(), so 
better to just never use it.

- Heikki