Re: INSERTing lots of data - Mailing list pgsql-general

From Greg Smith
Subject Re: INSERTing lots of data
Date
Msg-id 4C04824B.2080205@2ndquadrant.com
Whole thread Raw
In response to INSERTing lots of data  (Joachim Worringen <joachim.worringen@iathh.de>)
Responses Re: INSERTing lots of data  (Joachim Worringen <joachim.worringen@iathh.de>)
Re: INSERTing lots of data  (Dimitri Fontaine <dimitri@2ndQuadrant.fr>)
List pgsql-general
Joachim Worringen wrote:
> my Python application (http://perfbase.tigris.org) repeatedly needs to
> insert lots of data into an exsting, non-empty, potentially large
> table. Currently, the bottleneck is with the Python application, so I
> intend to multi-thread it. Each thread should work on a part of the
> input file.

You are wandering down a path followed by pgloader at one point:
http://pgloader.projects.postgresql.org/#toc6 and one that I fought with
briefly as well.  Simple multi-threading can be of minimal help in
scaling up insert performance here, due to the Python issues involved
with the GIL.  Maybe we get Dimitri to chime in here, he did more of
this than I did.

Two thoughts.  First, build a test performance case assuming it will
fail to scale upwards, looking for problems.  If you get lucky, great,
but don't assume this will work--it's proven more difficult than is
obvious in the past for others.

Second, if you do end up being throttled by the GIL, you can probably
build a solution for Python 2.6/3.0 using the multiprocessing module for
your use case:  http://docs.python.org/library/multiprocessing.html

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: server-side extension in c++
Next
From: Greg Smith
Date:
Subject: Re: What Linux edition we should chose?