Thread: speeding up inserts by pipelining

speeding up inserts by pipelining

From
Todd Owen
Date:
I'm trying to populate a large temporary table (a few million rows), and running into performance woes. In my initial tests, inserting 100,000 rows took 2.5 minutes, which I managed to get down to 1 minute by (a) using an insert buffer, i.e. using SQL_ATTR_PARAMSET_SIZE with SQLExecute, and (b) turning on the "Server Side Prepare" option. However, this is still 10 times slower than writing the data to a file and then running "\copy" in psql.

I know this complaint is not new :) Earlier posts on this mailing list have suggested concatenating values into a single string which can be transformed back into multiple rows using a custom server-side function. I've also seen some talk about SQLBulkOperations. But I'm unable to implement either of these because the ODBC client is not our code, rather it's an off-the-shelf data integration package running on Windows. The size of the insert buffer is one of the few configurable options.

At the moment, increasing the insert buffer has only limited effect because (as far as I can tell) each row still requires a round trip to the server. Network latency then becomes the major bottleneck, and indeed I've run some tests that show insert performance being much better when the client and server are both running on the same machine.

There was a recent post on pgsql-general where somebody asked about pipelining inserts, i.e. sending multiple commands to the server without waiting for it to acknowledge the completion of each one. Tom Lane said (http://www.postgresql.org/message-id/26269.1356282651@sss.pgh.pa.us) that the wire protocol does support this, although libpq is "too simple-minded to deal with more than one query in flight". Of course libpq support should not matter to psqlODBC, which I understand implements most of the wire protocol natively.

Is there any chance of psqlODBC one day supporting query pipelining inside the scope of calls to SQLExecute? This doesn't seem like too much effort to implement, although the tricky part is probably the semantics in the case where some rows succeed and some fail (constaint violations, etc). I'm tempted to give it a try myself, if I manage to get the build working under VS2010...

Regards,

Todd

Re: speeding up inserts by pipelining

From
Heikki Linnakangas
Date:
On 04.03.2013 12:01, Todd Owen wrote:
> Is there any chance of psqlODBC one day supporting query pipelining inside
> the scope of calls to SQLExecute? This doesn't seem like too much effort to
> implement, although the tricky part is probably the semantics in the case
> where some rows succeed and some fail (constaint violations, etc). I'm
> tempted to give it a try myself, if I manage to get the build working under
> VS2010...

Yeah, seems reasonable. I think you'll want to send all the Bind and
Execute messages, followed by a single Sync message. That means that
all the queries will run as a single transaction, and if any of them
fail, all will be rolled back. That seems reasonable, although I don't
know what the ODBC spec has to say about that.

One tricky aspect is that if you just naively send all the queries
first, and then start to read the results, you can get a deadlock caused
by full network buffers. If the client doesn't read the responses before
it's sent all the queries, the client's receive buffer can fill up with
the responses from the backend, so that the backend blocks, trying to
send more responses. And at the same time, the client is blocked while
trying to send more queries, because the backend's receive buffer is
already full of incoming queries. I think you'll need to put the socket
into non-blocking mode, and use select()/poll().

- Heikki