postgres 8.4, COPY, and high concurrency - Mailing list pgsql-performance

From Jon Nelson
Subject postgres 8.4, COPY, and high concurrency
Date
Msg-id CAKuK5J28HKP7EqKaGGUQMT-FcpPCQQUHJ18OvOGwG9a7nLVS4w@mail.gmail.com
Whole thread Raw
Responses Re: postgres 8.4, COPY, and high concurrency  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Re: postgres 8.4, COPY, and high concurrency  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance
I was working on a data warehousing project where a fair number of files could be COPY'd more or less directly into tables. I have a somewhat nice machine to work with, and I ran on 75% of the cores I have (75% of 32 is 24).

Performance was pretty bad. With 24 processes going, each backend (in COPY) spent 98% of it's time in semop (as identified by strace).  I tried larger and smaller shared buffers, all sorts of other tweaks, until I tried reducing the number of concurrent processes from 24 to 4.

Disk I/O went up (on average) at least 10X and strace reports that the top system calls are write (61%), recvfrom (25%), and lseek (14%) - pretty reasonable IMO.

Given that each COPY is into it's own, newly-made table with no indices or foreign keys, etc, I would have expected the interaction among the backends to be minimal, but that doesn't appear to be the case.  What is the likely cause of the semops?

I can't really try a newer version of postgres at this time (perhaps soon).

I'm using PG 8.4.13 on ScientificLinux 6.2 (x86_64), and the CPU is a 32 core Xeon E5-2680 @ 2.7 GHz.

--
Jon

pgsql-performance by date:

Previous
From: Gavin Flower
Date:
Subject: Re: Planner sometimes doesn't use a relevant index with IN (subquery) condition
Next
From: Heikki Linnakangas
Date:
Subject: Re: postgres 8.4, COPY, and high concurrency