Joachim Worringen wrote:
> my Python application (http://perfbase.tigris.org) repeatedly needs to
> insert lots of data into an exsting, non-empty, potentially large
> table. Currently, the bottleneck is with the Python application, so I
> intend to multi-thread it. Each thread should work on a part of the
> input file.
You are wandering down a path followed by pgloader at one point:
http://pgloader.projects.postgresql.org/#toc6 and one that I fought with
briefly as well. Simple multi-threading can be of minimal help in
scaling up insert performance here, due to the Python issues involved
with the GIL. Maybe we get Dimitri to chime in here, he did more of
this than I did.
Two thoughts. First, build a test performance case assuming it will
fail to scale upwards, looking for problems. If you get lucky, great,
but don't assume this will work--it's proven more difficult than is
obvious in the past for others.
Second, if you do end up being throttled by the GIL, you can probably
build a solution for Python 2.6/3.0 using the multiprocessing module for
your use case: http://docs.python.org/library/multiprocessing.html
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us