Re: Parallel copy - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Parallel copy |
Date | |
Msg-id | CAA4eK1JRcigscJ_Yfgu8Q2571nh0GdG+xuAiwKVPtV1mq6ugWA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel copy (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Parallel copy
Re: Parallel copy |
List | pgsql-hackers |
On Fri, May 15, 2020 at 6:49 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, May 15, 2020 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > My sense is that it would be a lot more sensible to do it at the > > > *beginning* of the parallel operation. Once we do it once, we > > > shouldn't ever do it again; that's how it works now. Deferring it > > > until later seems much more likely to break things. > > > > AFAIU, we always increment the command counter after executing the > > command. Why do we want to do it differently here? > > Hmm, now I'm starting to think that I'm confused about what is under > discussion here. Which CommandCounterIncrement() are we talking about > here? > The one we do after executing a non-readonly command. Let me try to explain by example: CREATE TABLE tab_fk_referenced_chk(refindex INTEGER PRIMARY KEY, height real, weight real); insert into tab_fk_referenced_chk values( 1, 1.1, 100); CREATE TABLE tab_fk_referencing_chk(index INTEGER REFERENCES tab_fk_referenced_chk(refindex), height real, weight real); COPY tab_fk_referencing_chk(index,height,weight) FROM stdin WITH( DELIMITER ','); 1,1.1,100 1,2.1,200 1,3.1,300 \. In the above case, even though we are executing a single command from the user perspective, but the currentCommandId will be four after the command. One increment will be for the copy command and the other three increments are for locking tuple in PK table (tab_fk_referenced_chk) for three tuples in FK table (tab_fk_referencing_chk). Now, for parallel workers, it is (theoretically) possible that the three tuples are processed by three different workers which don't get synced as of now. The question was do we see any kind of problem with this and if so can we just sync it up at the end of parallelism. > > First, let me clarify the CTID I have used in my email are for the > > table in which insertion is happening which means FK table. So, in > > such a case, we can't have the same CTIDs queued for different > > workers. Basically, we use CTID to fetch the row from FK table later > > and form a query to lock (in KEY SHARE mode) the corresponding tuple > > in PK table. Now, it is possible that two different workers try to > > lock the same row of PK table. I am not clear what problem group > > locking can have in this case because these are non-conflicting locks. > > Can you please elaborate a bit more? > > I'm concerned about two workers trying to take the same lock at the > same time. If that's prevented by the buffer locking then I think it's > OK, but if it's prevented by a heavyweight lock then it's not going to > work in this case. > We do take buffer lock in exclusive mode before trying to acquire KEY SHARE lock on the tuple, so the two workers shouldn't try to acquire at the same time. I think you are trying to see if in any case, two workers try to acquire heavyweight lock like tuple lock or something like that to perform the operation then it will create a problem because due to group locking it will allow such an operation where it should not have been. But I don't think anything of that sort is feasible in COPY operation and if it is then we probably need to carefully block it or find some solution for it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: