Home > mailing lists

large regression for parallel COPY - Mailing list pgsql-hackers

From	Robert Haas
Subject	large regression for parallel COPY
Date	March 30, 2016 19:50:28
Msg-id	CA+TgmoYoUQf9cGcpgyGNgZQHcY-gCcKRyAqQtDU8KFE4N6HVkA@mail.gmail.com Whole thread
Responses	Re: large regression for parallel COPY
List	pgsql-hackers

Tree view

On Thu, Mar 10, 2016 at 8:29 PM, Andres Freund <andres@anarazel.de> wrote:
> Allow to trigger kernel writeback after a configurable number of writes.

While testing out Dilip Kumar's relation extension patch today, I
discovered (with some help from Andres) that this causes nasty
regressions when doing parallel COPY on hydra (3.2.6-3.fc16.ppc64,
lousy disk subsystem).  What I did was (1) run pgbench -i -s 100, (2)
copy the results to a file, (3) truncate and drop the indexes on the
original table, and (4) try copying in one or more copies of the data
from the file.  Typical command line:

time pgbench -n -f f -t 1 -c 4 -j 4 && psql -c "select
pg_size_pretty(pg_relation_size('pgbench_accounts'));" && time psql -c
checkpoint && psql -c "truncate pgbench_accounts; checkpoint;"

With default settings against
96f8373cad5d6066baeb7a1c5a88f6f5c9661974, pgbench takes 9 to 9.5
minutes and the subsequent checkpoint takes 9 seconds.  After setting
, it takes 1 minute and 11 seconds and the subsequent checkpoint takes
11 seconds.   With a single copy of the data (that is, -c 1 -j 1 but
otherwise as above), it takes 28-29 seconds with default settings and
26-27 seconds with backend_flush_after=0, bgwriter_flush_after=0.  So
the difference is rather small with a straight-up COPY, but with 4
copies running at the same time, it's near enough to an order of
magnitude.

Andres reports that on his machine, non-zero *_flush_after settings
make things faster, not slower, so apparently this is
hardware-dependent or kernel-dependent.  Nevertheless, it seems to me
that we should try to get some broader testing here to see which
experience is typical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Kevin Grittner
Date: 30 March 2016, 19:46:40
Subject: Re: snapshot too old, configured by time

From: Kevin Grittner
Date: 30 March 2016, 19:55:04
Subject: Re: snapshot too old, configured by time

large regression for parallel COPY - Mailing list pgsql-hackers

Previous

Next