OK, I've worked out why I am seeing deadlocks etc. from parallel restore
on FK items.
In my original patch, I looked at all the dependencies of a candidate
item ansd compared them with the dependencies of the running items to
see if there was a potential locking clash. However, Tom in his
admirable reworking of my patch, restricted the list of potential
clashing items (lockDeps) to "TABLE" items, if any. This would probably
have been ok if we hadn't just beforehand transferred all TABLE
dependencies in POST_DATA items to the corresponding TABLE DATA item.
The result is that we get empty lockDeps lists on all items - I'm
surprised we haven't had more complaints about deadlock or failing locks.
A simple fix that would probably work would be to adjust the filter to
include TABLE DATA items, so the relevant statement would read:
if (tocsByDumpId[depid - 1] && (strcmp(tocsByDumpId[depid - 1]->desc, "TABLE") == 0 ||
strcmp(tocsByDumpId[depid- 1]->desc, "TABLE DATA") == 0)) lockids[nlockids++] = depid;
Perhaps a better fix would move the code that sets up the lockDeps so
that it runs before we adjust the dependencies.
I'm moderately confident that either of these fixes will work, but I
think this demonstrates the need for lots of testing, especially with
complex data sets that have lots of dependencies and potentially
deadlocking items.
thoughts?
cheers
andrew