Re: [WIP] speeding up GIN build with parallel workers - Mailing list pgsql-hackers

From Constantin S. Pan
Subject Re: [WIP] speeding up GIN build with parallel workers
Date
Msg-id 20160316031115.5856920c@monster
Whole thread Raw
In response to Re: [WIP] speeding up GIN build with parallel workers  (David Steele <david@pgmasters.net>)
Responses Re: [WIP] speeding up GIN build with parallel workers
Re: [WIP] speeding up GIN build with parallel workers
List pgsql-hackers
On Mon, 14 Mar 2016 08:42:26 -0400
David Steele <david@pgmasters.net> wrote:

> On 2/18/16 10:10 AM, Constantin S. Pan wrote:
> > On Wed, 17 Feb 2016 23:01:47 +0300
> > Oleg Bartunov <obartunov@gmail.com> wrote:
> >
> >> My feedback is (Mac OS X 10.11.3)
> >>
> >> set gin_parallel_workers=2;
> >> create index message_body_idx on messages using gin(body_tsvector);
> >> LOG:  worker process: parallel worker for PID 5689 (PID 6906) was
> >> terminated by signal 11: Segmentation fault
> >
> > Fixed this, try the new patch. The bug was in incorrect handling
> > of some GIN categories.
>
> Oleg, it looks like Constantin has updated to patch to address the
> issue you were seeing.  Do you have time to retest and review?
>
> Thanks,

Actually, there was some progress since. The patch is
attached.

1. Added another GUC parameter for changing the amount of
shared memory for parallel GIN workers.

2. Changed the way results are merged. It uses shared memory
message queue now.

3. Tested on some real data (GIN index on email message body
tsvectors). Here are the timings for different values of
'gin_shared_mem' and 'gin_parallel_workers' on a 4-CPU
machine. Seems 'gin_shared_mem' has no visible effect.

wnum mem(MB) time(s)
   0      16     247
   1      16     256
   2      16     126
   4      16      89
   0      32     247
   1      32     270
   2      32     123
   4      32      92
   0      64     254
   1      64     272
   2      64     123
   4      64      88
   0     128     250
   1     128     263
   2     128     126
   4     128      85
   0     256     247
   1     256     269
   2     256     130
   4     256      88
   0     512     257
   1     512     275
   2     512     129
   4     512      92
   0    1024     255
   1    1024     273
   2    1024     130
   4    1024      90

On Wed, 17 Feb 2016 12:26:05 -0800
Peter Geoghegan <pg@heroku.com> wrote:

> On Wed, Feb 17, 2016 at 7:55 AM, Constantin S. Pan <kvapen@gmail.com>
> wrote:
> > 4. Hit the 8x speedup limit. Made some analysis of the reasons (see
> > the attached plot or the data file).
>
> Did you actually compare this to the master branch? I wouldn't like to
> assume that the one worker case was equivalent. Obviously that's the
> really interesting baseline.

Compared with the master branch. The case of 0 workers is
indeed equivalent to the master branch.

Regards,
Constantin
Attachment

pgsql-hackers by date:

Previous
From: Vik Fearing
Date:
Subject: Re: Idle In Transaction Session Timeout, revived
Next
From: Tom Lane
Date:
Subject: Re: plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types