Re: Parallel copy - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Parallel copy
Date
Msg-id CALj2ACXttbVQa0L46nnHVJN7n70gazCTEKNm0dehh_D70Zc01Q@mail.gmail.com
Whole thread Raw
In response to Re: Parallel copy  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, Oct 29, 2020 at 2:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> 4) Worker has to hop through all the processed chunks before getting
> the chunk which it can process.
>
> One more point, I have noticed that some time back [1], I have given
> one suggestion related to the way workers process the set of lines
> (aka chunk). I think you can try by increasing the chunk size to say
> 100, 500, 1000 and use some shared counter to remember the number of
> chunks processed.
>

Hi, I did some analysis on using spinlock protected worker write position i.e. each worker acquires spinlock on a shared write position to choose the next available chunk vs each worker hops to get the next available chunk position:

Use Case: 10mn rows, 5.6GB data, 2 indexes on integer columns, 1 index on text column, results are of the form (no of workers, total exec time in sec, index insertion time in sec, worker write pos get time in sec, buffer contention event count):

With spinlock:
(1,1126.443,1060.067,0.478,0), (2,669.343,630.769,0.306,26), (4,346.297,326.950,0.161,89), (8,209.600,196.417,0.088,291), (16,166.113,157.086,0.065,1468), (20,173.884,166.013,0.067,2700), (30,173.087,1166.565,0.0065,5346)
Without spinlock:
(1,1119.695,1054.586,0.496,0), (2,645.733,608.313,1.5,8), (4,340.620,320.344,1.6,58), (8,203.985,189.644,1.3,222), (16,142.997,133.045,1,813), (20,132.621,122.527,1.1,1215), (30,135.737,126.716,1.5,2901)

With spinlock each worker is getting the required write position quickly and proceeding further till the index insertion(which is becoming a single point of contention) where we observed more buffer lock contention. Reason is that all the workers are reaching the index insertion point at the similar time.

Without spinlock, each worker is spending some time in hopping to get the write position, by the time the other workers are inserting into the indexes. So basically, all the workers are not reaching the index insertion point at the same time and hence less buffer lock contention.

The same behaviour(explained above) is observed with different worker chunk count(default 64, 128, 512 and 1024) i.e. the number of tuples each worker caches into its local memory before inserting into table.

In summary: with spinlock, it looks like we are able to avoid workers waiting to get the next chunk, which also means that we are not creating any contention point inside the parallel copy code. However this is causing another choking point i.e. index insertion if indexes are available on the table, which is out of scope of parallel copy code. We think that it would be good to use spinlock-protected worker write position or an atomic variable for worker write position(as it performs equal to spinlock or little better in some platforms). Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: "Shinoda, Noriyoshi (PN Japan FSI)"
Date:
Subject: RE: Tab complete for CREATE OR REPLACE TRIGGER statement
Next
From: Soumyadeep Chakraborty
Date:
Subject: Re: Split copy.