Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Seq Scan
Date
Msg-id CAA4eK1+ArakRB4pAOftQrADh4KkBfVzAobaYWz9GKtiV80numw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: Parallel Seq Scan
List pgsql-hackers
On Mon, Mar 16, 2015 at 9:40 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>
> On 13-03-2015 PM 11:03, Amit Kapila wrote:
> > On Fri, Mar 13, 2015 at 7:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >>
> >> I don't think this is the right fix; the point of that code is to
> >> remove a tuple queue from the funnel when it gets detached, which is a
> >> correct thing to want to do.  funnel->nextqueue should always be less
> >> than funnel->nqueues; how is that failing to be the case here?
> >>
> >
> > I could not reproduce the issue, neither the exact scenario is
> > mentioned in mail.  However what I think can lead to funnel->nextqueue
> > greater than funnel->nqueues is something like below:
> >
> > Assume 5 queues, so value of funnel->nqueues will be 5 and
> > assume value of funnel->nextqueue is 2, so now let us say 4 workers
> > got detached one-by-one, so for such a case it will always go in else loop
> > and will never change funnel->nextqueue whereas value of funnel->nqueues
> > will become 1.
> >
>
> Or if the just-detached queue happens to be the last one, we'll make
> shm_mq_receive() to read from a potentially already-detached queue in the
> immediately next iteration.

Won't the last queue case already handled by below code:
else
{
--funnel->nqueues;
if (funnel->nqueues == 0)
{
if (done != NULL)
*done = true;
return NULL;
}

> That seems to be caused by not having updated the
> funnel->nextqueue. With the returned value being SHM_MQ_DETACHED, we'll again
> try to remove it from the queue. In this case, it causes the third argument to
> memcpy be negative and hence the segfault.
>

In anycase, I think we need some handling for such cases.

> I can't seem to really figure out the other problem of waiting forever in
> WaitLatch() 
>

The reason seems that for certain scenarios, the way we set the latch before
exiting needs some more thought.  Currently we are setting the latch in
HandleParallelMessageInterrupt(), that doesn't seem to be sufficient.  

> By the way, you can try reproducing this with the example I posted on Friday.
>

Sure.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: PATCH: pgbench - merging transaction logs
Next
From: Amit Kapila
Date:
Subject: Re: Parallel Seq Scan