Re: INSERTing lots of data - Mailing list pgsql-general

From Szymon Guz
Subject Re: INSERTing lots of data
Date
Msg-id AANLkTimvkkS7CYZtuIbIuTTiaHDJMLWkpPfji41Jh9-b@mail.gmail.com
Whole thread Raw
In response to INSERTing lots of data  (Joachim Worringen <joachim.worringen@iathh.de>)
Responses Re: INSERTing lots of data  (Joachim Worringen <joachim.worringen@iathh.de>)
Re: INSERTing lots of data  (Martin Gainty <mgainty@hotmail.com>)
List pgsql-general
2010/5/28 Joachim Worringen <joachim.worringen@iathh.de>
Greetings,

my Python application (http://perfbase.tigris.org) repeatedly needs to insert lots of data into an exsting, non-empty, potentially large table. Currently, the bottleneck is with the Python application, so I intend to multi-thread it. Each thread should work on a part of the input file.

I already multi-threaded the query part of the application, which requires to use one connection per thread - cursors a serialized via a single connection.

Provided that
- the threads use their own connection
- the threads perform all INSERTs within a single transaction
- the machine has enough resources

 will I get a speedup? Or will table-locking serialize things on the server side?

Suggestions for alternatives are welcome, but the data must go through the Python application via INSERTs (no bulk insert, COPY etc. possible)


Remember about Python's GIL in some Python implementations so those threads could be serialized at the Python level.

This is possible that those inserts will be faster. The speed depends on the table structure, some constraints and triggers and even database configuration. The best answer is: just check it on some test code, make a simple multithreaded aplication and try to do the inserts and check that out.


regards
Szymon Guz

pgsql-general by date:

Previous
From: Joachim Worringen
Date:
Subject: INSERTing lots of data
Next
From: Joachim Worringen
Date:
Subject: Re: INSERTing lots of data