Home > mailing lists

Proposal: speeding up GIN build with parallel workers - Mailing list pgsql-hackers

From	Constantin S. Pan
Subject	Proposal: speeding up GIN build with parallel workers
Date	January 15, 2016 22:39:02
Msg-id	20160116013839.57cfcb37@thought Whole thread Raw
Responses	Re: Proposal: speeding up GIN build with parallel workers Re: Proposal: speeding up GIN build with parallel workers Re: [WIP] speeding up GIN build with parallel workers
List	pgsql-hackers

Tree view

Hi, Hackers.

The task of building GIN can require lots of time and eats 100 % CPU,
but we could easily make it use more than a 100 %, especially since we
now have parallel workers in postgres.

The process of building GIN looks like this:

1. Accumulate a batch of index records into an rbtree in maintenance
work memory.

2. Dump the batch to disk.

3. Repeat.

I have a draft implementation which divides the whole process between
N parallel workers, see the patch attached. Instead of a full scan of
the relation, I give each worker a range of blocks to read.

This speeds up the first step N times, but slows down the second one,
because when multiple workers dump item pointers for the same key, each
of them has to read and decode the results of the previous one. That is
a huge waste, but there is an idea on how to eliminate it.

When it comes to dumping the next batch, a worker does not do it
independently. Instead, it (and every other worker) sends the
accumulated index records to the parent (backend) in ascending key
order. The backend, which receives the records from the workers through
shared memory, can merge them and dump each of them once, without the
need to reread the records N-1 times.

In current state the implementation is just a proof of concept
and it has all the configuration hardcoded, but it already works as is,
though it does not speed up the build process more than 4 times on my
configuration (12 CPUs). There is also a problem with temporary tables,
for which the parallel mode does not work.

Please leave your feedback.

Regards,

Constantin S. Pan
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

pgin.patch

pgsql-hackers by date:

From: Jeff Janes
Date: 15 January 2016, 21:59:08
Subject: Re: GIN pending list clean up exposure to SQL

From: Julien Rouhaud
Date: 15 January 2016, 22:42:28
Subject: Re: GIN pending list clean up exposure to SQL

Proposal: speeding up GIN build with parallel workers - Mailing list pgsql-hackers

Attachment

Previous

Next