Home > mailing lists

Re: Parallel copy - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Parallel copy
Date	May 18, 2020 07:48:01
Msg-id	CAA4eK1JRcigscJ_Yfgu8Q2571nh0GdG+xuAiwKVPtV1mq6ugWA@mail.gmail.com Whole thread Raw
In response to	Re: Parallel copy (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Parallel copy Re: Parallel copy
List	pgsql-hackers

Tree view

On Fri, May 15, 2020 at 6:49 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, May 15, 2020 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > My sense is that it would be a lot more sensible to do it at the
> > > *beginning* of the parallel operation. Once we do it once, we
> > > shouldn't ever do it again; that's how it works now. Deferring it
> > > until later seems much more likely to break things.
> >
> > AFAIU, we always increment the command counter after executing the
> > command.  Why do we want to do it differently here?
>
> Hmm, now I'm starting to think that I'm confused about what is under
> discussion here. Which CommandCounterIncrement() are we talking about
> here?
>

The one we do after executing a non-readonly command.  Let me try to
explain by example:

CREATE TABLE tab_fk_referenced_chk(refindex INTEGER PRIMARY KEY,
height real, weight real);
insert into tab_fk_referenced_chk values( 1, 1.1, 100);
CREATE TABLE tab_fk_referencing_chk(index INTEGER REFERENCES
tab_fk_referenced_chk(refindex), height real, weight real);

COPY tab_fk_referencing_chk(index,height,weight) FROM stdin WITH(
DELIMITER ',');
1,1.1,100
1,2.1,200
1,3.1,300
\.

In the above case, even though we are executing a single command from
the user perspective, but the currentCommandId will be four after the
command.  One increment will be for the copy command and the other
three increments are for locking tuple in PK table
(tab_fk_referenced_chk) for three tuples in FK table
(tab_fk_referencing_chk).  Now, for parallel workers, it is
(theoretically) possible that the three tuples are processed by three
different workers which don't get synced as of now.  The question was
do we see any kind of problem with this and if so can we just sync it
up at the end of parallelism.

> > First, let me clarify the CTID I have used in my email are for the
> > table in which insertion is happening which means FK table.  So, in
> > such a case, we can't have the same CTIDs queued for different
> > workers.  Basically, we use CTID to fetch the row from FK table later
> > and form a query to lock (in KEY SHARE mode) the corresponding tuple
> > in PK table.  Now, it is possible that two different workers try to
> > lock the same row of PK table.  I am not clear what problem group
> > locking can have in this case because these are non-conflicting locks.
> > Can you please elaborate a bit more?
>
> I'm concerned about two workers trying to take the same lock at the
> same time. If that's prevented by the buffer locking then I think it's
> OK, but if it's prevented by a heavyweight lock then it's not going to
> work in this case.
>

We do take buffer lock in exclusive mode before trying to acquire KEY
SHARE lock on the tuple, so the two workers shouldn't try to acquire
at the same time.  I think you are trying to see if in any case, two
workers try to acquire heavyweight lock like tuple lock or something
like that to perform the operation then it will create a problem
because due to group locking it will allow such an operation where it
should not have been.  But I don't think anything of that sort is
feasible in COPY operation and if it is then we probably need to
carefully block it or find some solution for it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: "Joe Wildish"
Date: 18 May 2020, 02:29:56
Subject: [PATCH] Add support to psql for edit-and-execute-command

From: Amit Kapila
Date: 18 May 2020, 07:59:34
Subject: Re: Fix a typo in slot.c

Re: Parallel copy - Mailing list pgsql-hackers

Previous

Next