Home > mailing lists

Re: crashes due to setting max_parallel_workers=0 - Mailing list pgsql-hackers

From	Rushabh Lathia
Subject	Re: crashes due to setting max_parallel_workers=0
Date	March 28, 2017 12:07:39
Msg-id	CAGPqQf0CcH+cyU=WK+bUvwgFEVFdGPFaBC2RMRz_hPCEKEFhwg@mail.gmail.com Whole thread
In response to	crashes due to setting max_parallel_workers=0 (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: crashes due to setting max_parallel_workers=0
List	pgsql-hackers

Tree view

On Tue, Mar 28, 2017 at 10:00 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On 03/27/2017 01:40 PM, Rushabh Lathia wrote:

...
I was doing more testing with the patch and I found one more server
crash with the patch around same area, when we forced the gather
merge for the scan having zero rows.

create table dept ( deptno numeric, dname varchar(20);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
set force_parallel_mode=regress;
explain analyze select * from dept order by deptno;

This is because for leader we don't initialize the slot into gm_slots. So
in case where launched worker is zero and table having zero rows, we
end up having NULL slot into gm_slots array.

Currently gather_merge_clear_slots() clear out the tuple table slots for each
gather merge input and returns clear slot. In the patch I modified function
gather_merge_clear_slots() to just clear out the tuple table slots and
always return NULL when All the queues and heap us exhausted.

Isn't that just another sign the code might be a bit too confusing? I see two main issues in the code:

1) allocating 'slots' as 'nreaders+1' elements, which seems like a good way to cause off-by-one errors

2) mixing objects with different life spans (slots for readers vs. slot for the leader) seems like a bad idea too

I wonder how much we gain by reusing the slot from the leader (I'd be surprised if it was at all measurable). David posted a patch reworking this, and significantly simplifying the GatherMerge node. Why not to accept that?

I think we all agree that we should get rid of nreaders from the GatherMergeState

and need to do some code re-factor. But if I understood correctly that Robert's

concern was to do that re-factor as separate commit.

I picked David's patch and started reviewing the changes. I applied that patch

on top of my v2 patch (which does the re-factor of gather_merge_clear_slots).

In David's patch, into gather_merge_init(), a loop where tuple array is getting

allocate, that loop need to only up to nworkers_launched. Because we don't

hold the tuple array for leader. I changed that and did some other simple

changes based on mine v2 patch. I also performed manual testing with
the changes.

Please find attached re-factor patch, which is v2 patch submitted for the

server crash fix. (Attaching both the patch here again, for easy of access).

Thanks,

Rushabh Lathia

www.EnterpriseDB.com

Attachment

pgsql-hackers by date:

From: Mithun Cy
Date: 28 March 2017, 12:00:55
Subject: Re: [POC] A better way to expand hash indexes.

From: Pavel Stehule
Date: 28 March 2017, 12:14:01
Subject: xmltable doc fix and example for XMLNAMESPACES

Re: crashes due to setting max_parallel_workers=0 - Mailing list pgsql-hackers

Attachment

Previous

Next