Thread: parallel restore item dependencies

parallel restore item dependencies

From
Andrew Dunstan
Date:
OK, I've worked out why I am seeing deadlocks etc. from parallel restore 
on FK items.

In my original patch, I looked at all the dependencies of a candidate 
item ansd compared them with the dependencies of the running items to 
see if there was a potential locking clash. However, Tom in his 
admirable reworking of my patch, restricted the list of potential 
clashing items (lockDeps) to "TABLE" items, if any. This would probably 
have been ok if we hadn't just beforehand transferred all TABLE 
dependencies in POST_DATA items to the corresponding TABLE DATA item. 
The result is that we get empty lockDeps lists on all items - I'm 
surprised we haven't had more complaints about deadlock or failing locks.

A simple fix that would probably work would be to adjust the filter to 
include TABLE DATA items, so the relevant statement would read:
       if (tocsByDumpId[depid - 1] &&           (strcmp(tocsByDumpId[depid - 1]->desc, "TABLE") == 0 ||
strcmp(tocsByDumpId[depid- 1]->desc, "TABLE DATA") == 0))           lockids[nlockids++] = depid;
 

Perhaps a better fix would move the code that sets up the lockDeps so 
that it runs before we adjust the dependencies.

I'm moderately confident that either of these fixes will work, but I 
think this demonstrates the need for lots of testing, especially with 
complex data sets that have lots of dependencies and potentially 
deadlocking items.

thoughts?

cheers

andrew




Re: parallel restore item dependencies

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> OK, I've worked out why I am seeing deadlocks etc. from parallel restore 
> on FK items.

> In my original patch, I looked at all the dependencies of a candidate 
> item ansd compared them with the dependencies of the running items to 
> see if there was a potential locking clash. However, Tom in his 
> admirable reworking of my patch, restricted the list of potential 
> clashing items (lockDeps) to "TABLE" items, if any. This would probably 
> have been ok if we hadn't just beforehand transferred all TABLE 
> dependencies in POST_DATA items to the corresponding TABLE DATA item. 
> The result is that we get empty lockDeps lists on all items - I'm 
> surprised we haven't had more complaints about deadlock or failing locks.

[ scratches head... ]  I coulda sworn I tested that when I was hacking
it.  I'm running low on steam tonight but will think more about this
tomorrow.
        regards, tom lane


Re: parallel restore item dependencies

From
Tom Lane
Date:
I wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> In my original patch, I looked at all the dependencies of a candidate 
>> item ansd compared them with the dependencies of the running items to 
>> see if there was a potential locking clash. However, Tom in his 
>> admirable reworking of my patch, restricted the list of potential 
>> clashing items (lockDeps) to "TABLE" items, if any. This would probably 
>> have been ok if we hadn't just beforehand transferred all TABLE 
>> dependencies in POST_DATA items to the corresponding TABLE DATA item. 
>> The result is that we get empty lockDeps lists on all items - I'm 
>> surprised we haven't had more complaints about deadlock or failing locks.

> [ scratches head... ]  I coulda sworn I tested that when I was hacking
> it.  I'm running low on steam tonight but will think more about this
> tomorrow.

I think I have reconstructed what happened: I tested this code before
I decided that repointing the dependencies was a good idea, or else
reordered the sequence of operations in fix_dependencies after that.
It looks to me like the correct fix is just to look for TABLE DATA
not TABLE while setting up lockDeps[], since all the entry types we
care about are POST_DATA items.  Anyway, I've committed that, please
try it.
        regards, tom lane


Re: parallel restore item dependencies

From
Andrew Dunstan
Date:

Tom Lane wrote:
> I wrote:
>   
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>     
>>> In my original patch, I looked at all the dependencies of a candidate 
>>> item ansd compared them with the dependencies of the running items to 
>>> see if there was a potential locking clash. However, Tom in his 
>>> admirable reworking of my patch, restricted the list of potential 
>>> clashing items (lockDeps) to "TABLE" items, if any. This would probably 
>>> have been ok if we hadn't just beforehand transferred all TABLE 
>>> dependencies in POST_DATA items to the corresponding TABLE DATA item. 
>>> The result is that we get empty lockDeps lists on all items - I'm 
>>> surprised we haven't had more complaints about deadlock or failing locks.
>>>       
>
>   
>> [ scratches head... ]  I coulda sworn I tested that when I was hacking
>> it.  I'm running low on steam tonight but will think more about this
>> tomorrow.
>>     
>
> I think I have reconstructed what happened: I tested this code before
> I decided that repointing the dependencies was a good idea, or else
> reordered the sequence of operations in fix_dependencies after that.
> It looks to me like the correct fix is just to look for TABLE DATA
> not TABLE while setting up lockDeps[], since all the entry types we
> care about are POST_DATA items.  Anyway, I've committed that, please
> try it.
>
>             
>   

Passes test.

Thanks.

andrew