Re: Parallel copy - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Parallel copy |
Date | |
Msg-id | CALDaNm13zK=JXfZWqZJsm3+2yagYDJc=eJBgE4i77-4PPNj7vw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel copy (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Parallel copy
|
List | pgsql-hackers |
Hi.
We have made a patch on the lines that were discussed in the previous mails. We could achieve up to 9.87X performance improvement. The improvement varies from case to case.
Workers/ Exec time (seconds) | copy from file, 2 indexes on integer columns 1 index on text column | copy from stdin, 2 indexes on integer columns 1 index on text column | copy from file, 1 gist index on text column | copy from file, 3 indexes on integer columns | copy from stdin, 3 indexes on integer columns |
0 | 1162.772(1X) | 1176.035(1X) | 827.669(1X) | 216.171(1X) | 217.376(1X) |
1 | 1110.288(1.05X) | 1120.556(1.05X) | 747.384(1.11X) | 174.242(1.24X) | 163.492(1.33X) |
2 | 635.249(1.83X) | 668.18(1.76X) | 435.673(1.9X) | 133.829(1.61X) | 126.516(1.72X) |
4 | 336.835(3.45X) | 346.768(3.39X) | 236.406(3.5X) | 105.767(2.04X) | 107.382(2.02X) |
8 | 188.577(6.17X) | 194.491(6.04X) | 148.962(5.56X) | 100.708(2.15X) | 107.72(2.01X) |
16 | 126.819(9.17X) | 146.402(8.03X) | 119.923(6.9X) | 97.996(2.2X) | 106.531(2.04X) |
20 | 117.845(9.87X) | 149.203(7.88X) | 138.741(5.96X) | 97.94(2.21X) | 107.5(2.02) |
30 | 127.554(9.11X) | 161.218(7.29X) | 172.443(4.8X) | 98.232(2.2X) | 108.778(1.99X) |
Posting the initial patch to get the feedback.
Design of the Parallel Copy: The backend, to which the "COPY FROM" query is submitted acts as leader with the responsibility of reading data from the file/stdin, launching at most n number of workers as specified with PARALLEL 'n' option in the "COPY FROM" query. The leader populates the common data required for the workers execution in the DSM and shares it with the workers. The leader then executes before statement triggers if there exists any. Leader populates DSM chunks which includes the start offset and chunk size, while populating the chunks it reads as many blocks as required into the DSM data blocks from the file. Each block is of 64K size. The leader parses the data to identify a chunk, the existing logic from CopyReadLineText which identifies the chunks with some changes was used for this. Leader checks if a free chunk is available to copy the information, if there is no free chunk it waits till the required chunk is freed up by the worker and then copies the identified chunks information (offset & chunk size) into the DSM chunks. This process is repeated till the complete file is processed. Simultaneously, the workers cache the chunks(50) locally into the local memory and release the chunks to the leader for further populating. Each worker processes the chunk that it cached and inserts it into the table. The leader waits till all the chunks populated are processed by the workers and exits.
We would like to include support of parallel copy for referential integrity constraints and parallelizing copy from binary format files in the future.
The above mentioned tests were run with CSV format, file size of 5.1GB & 10 million records in the table. The postgres configuration and system configuration used is attached in config.txt.
Myself and one of my colleagues Bharath have developed this patch. We would like to thank Amit, Dilip, Robert, Andres, Ants, Kuntal, Alastair, Tomas, David, Thomas, Andrew & Kyotaro for their thoughts/discussions/suggestions.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Mon, May 18, 2020 at 10:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, May 15, 2020 at 6:49 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, May 15, 2020 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > My sense is that it would be a lot more sensible to do it at the
> > > *beginning* of the parallel operation. Once we do it once, we
> > > shouldn't ever do it again; that's how it works now. Deferring it
> > > until later seems much more likely to break things.
> >
> > AFAIU, we always increment the command counter after executing the
> > command. Why do we want to do it differently here?
>
> Hmm, now I'm starting to think that I'm confused about what is under
> discussion here. Which CommandCounterIncrement() are we talking about
> here?
>
The one we do after executing a non-readonly command. Let me try to
explain by example:
CREATE TABLE tab_fk_referenced_chk(refindex INTEGER PRIMARY KEY,
height real, weight real);
insert into tab_fk_referenced_chk values( 1, 1.1, 100);
CREATE TABLE tab_fk_referencing_chk(index INTEGER REFERENCES
tab_fk_referenced_chk(refindex), height real, weight real);
COPY tab_fk_referencing_chk(index,height,weight) FROM stdin WITH(
DELIMITER ',');
1,1.1,100
1,2.1,200
1,3.1,300
\.
In the above case, even though we are executing a single command from
the user perspective, but the currentCommandId will be four after the
command. One increment will be for the copy command and the other
three increments are for locking tuple in PK table
(tab_fk_referenced_chk) for three tuples in FK table
(tab_fk_referencing_chk). Now, for parallel workers, it is
(theoretically) possible that the three tuples are processed by three
different workers which don't get synced as of now. The question was
do we see any kind of problem with this and if so can we just sync it
up at the end of parallelism.
> > First, let me clarify the CTID I have used in my email are for the
> > table in which insertion is happening which means FK table. So, in
> > such a case, we can't have the same CTIDs queued for different
> > workers. Basically, we use CTID to fetch the row from FK table later
> > and form a query to lock (in KEY SHARE mode) the corresponding tuple
> > in PK table. Now, it is possible that two different workers try to
> > lock the same row of PK table. I am not clear what problem group
> > locking can have in this case because these are non-conflicting locks.
> > Can you please elaborate a bit more?
>
> I'm concerned about two workers trying to take the same lock at the
> same time. If that's prevented by the buffer locking then I think it's
> OK, but if it's prevented by a heavyweight lock then it's not going to
> work in this case.
>
We do take buffer lock in exclusive mode before trying to acquire KEY
SHARE lock on the tuple, so the two workers shouldn't try to acquire
at the same time. I think you are trying to see if in any case, two
workers try to acquire heavyweight lock like tuple lock or something
like that to perform the operation then it will create a problem
because due to group locking it will allow such an operation where it
should not have been. But I don't think anything of that sort is
feasible in COPY operation and if it is then we probably need to
carefully block it or find some solution for it.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: