Re: WIP parallel restore patch - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: WIP parallel restore patch
Date
Msg-id 49259697.20905@dunslane.net
Whole thread Raw
In response to Re: WIP parallel restore patch  (Magnus Hagander <magnus@hagander.net>)
List pgsql-hackers

Magnus Hagander wrote:
> Andrew Dunstan wrote:
>   
>> Attached is my latest parallel restore patch. I think it's functionally
>> complete for Unix.
>>
>> Many bugs have been fixed since the last patch, and the hardcoded
>> limitation to two table dependencies is removed. It seems fairly robust
>> in my recent testing.
>>
>> Remaining to be done:
>>
>> . code cleanup
>> . better error checking in a few places
>> . final decision re command line option names/defaults
>> . documentation
>> . Windows support.
>>     
>
> I've looked around this a bit, and it's fairly clear where the issue
> comes in with Windows - we get heap corruption. Most likely because we
> have multiple threads working on the same data somewhere.
>
> I notice for example that we're doing a shallow copy of the
> ArchiveHandle with a simple memcpy() for each thread. But that struct
> contains a number of things like file descriptors and pointers. Have you
> verified for each and every one of those that it actually doesn't get
> modified anywhere? If not, a deep copy may be needed to make sure of that.
>
> Other than that, are there any global variables that may be addressed
> from more than one worker? If so they need to be marked as TLS.
>
> And yes, I got hit by the lack of error checking a couple of times
> during my testing - it would probably be a good idea to add that as soon
> as possible, it helps a lot with the debugging.
>
> If I run it with just a single thread, it also crashes in PQfinish()
> called from die_horribly(), when trying to free conn->pgpass, which has
> a value (non-NULL) but is not a valid pointer. This crash happens in the
> worker thread, after it has logged that "fseek is required" - that's an
> indicator something being passed down to the thread is either wrong or
> being scribbled upon after the fact.
>
> I didn't dig into these questions specifically - since you have already
> been reading up on this code to do the patch you can probably reach the
> answer to them much quicker :-) So I'll stick to the questions.
>
>
>   

OK, Thanks, this will help. I thought I had caught the ArchiveHandle 
things, but there might be one or two I missed.

I'll try to have a new version in a few days.

cheers

andrew


pgsql-hackers by date:

Previous
From: "Pavan Deolasee"
Date:
Subject: Re: Hot Standby (commit fest version - v5)
Next
From: Tom Lane
Date:
Subject: Re: Problem with Bitmap Heap Scan