Home > mailing lists

Re: INSERTing lots of data - Mailing list pgsql-general

From	Greg Smith
Subject	Re: INSERTing lots of data
Date	June 1, 2010 03:45:47
Msg-id	4C04824B.2080205@2ndquadrant.com Whole thread Raw
In response to	INSERTing lots of data (Joachim Worringen <joachim.worringen@iathh.de>)
Responses	Re: INSERTing lots of data Re: INSERTing lots of data
List	pgsql-general

Tree view

Joachim Worringen wrote:
> my Python application (http://perfbase.tigris.org) repeatedly needs to
> insert lots of data into an exsting, non-empty, potentially large
> table. Currently, the bottleneck is with the Python application, so I
> intend to multi-thread it. Each thread should work on a part of the
> input file.

You are wandering down a path followed by pgloader at one point:
http://pgloader.projects.postgresql.org/#toc6 and one that I fought with
briefly as well.  Simple multi-threading can be of minimal help in
scaling up insert performance here, due to the Python issues involved
with the GIL.  Maybe we get Dimitri to chime in here, he did more of
this than I did.

Two thoughts.  First, build a test performance case assuming it will
fail to scale upwards, looking for problems.  If you get lucky, great,
but don't assume this will work--it's proven more difficult than is
obvious in the past for others.

Second, if you do end up being throttled by the GIL, you can probably
build a solution for Python 2.6/3.0 using the multiprocessing module for
your use case:  http://docs.python.org/library/multiprocessing.html

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

pgsql-general by date:

From: Bruce Momjian
Date: 01 June 2010, 03:20:10
Subject: Re: server-side extension in c++

From: Greg Smith
Date: 01 June 2010, 04:01:16
Subject: Re: What Linux edition we should chose?

Re: INSERTing lots of data - Mailing list pgsql-general

Previous

Next