Thread: Slow network retrieves
I'm seeing what seems like slow retrieval times over the network. I am retrieving a single field of about 100-120 characters per record. I am getting about 3 seconds per 1000 records - it takes 30 seconds to retrieve 10,000 records. That's only about 36 KBytes/sec. This is a 100BT switched network (not sure if it is vlan'd or through a router). Echo time averages 3ms. The back end is pretty much idle. It shows 'idle in transaction'. 05-08-2004.23:54:43 Records read: 10000 05-08-2004.23:55:17 Records read: 20000 05-08-2004.23:55:50 Records read: 30000 05-08-2004.23:56:22 Records read: 40000 05-08-2004.23:56:55 Records read: 50000 05-08-2004.23:57:32 Records read: 60000 05-08-2004.23:58:07 Records read: 70000 ... The code is an ecpg program like: EXEC SQL WHENEVER SQLERROR GOTO sql_error; EXEC SQL WHENEVER NOT FOUND DO break; EXEC SQL DECLARE message_cursor CURSOR FOR SELECT file_name FROM messages WHERE system_key=(select system_key from systems where system_name=:systemName); EXEC SQL OPEN message_cursor; count = 0; while (1) { EXEC SQL FETCH message_cursor INTO :fileNameDB; memcpy (tempstr, fileNameDB.arr, fileNameDB.len); tempstr[fileNameDB.len] = '\0'; [Action with tempstr removed for testing] count++; if ( (count % 10000) == 0) logmsg ("Records read: %d", count); } How can I speed this thing up? Wes
<wespvp@syntegra.com> writes: > I'm seeing what seems like slow retrieval times over the network. Are you sure it is a network problem? What performance do you get if you run the same test program locally on the database machine? How about issuing the same sort of FETCH commands via a psql script? regards, tom lane
On 5/9/04 9:32 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Are you sure it is a network problem? What performance do you get > if you run the same test program locally on the database machine? > How about issuing the same sort of FETCH commands via a psql script? Yes, it is definitely due to the network latency even though that latency is very small. Here it is running locally: 05-09-2004.17:49:41 Records read: 10000 05-09-2004.17:49:41 Records read: 20000 05-09-2004.17:49:42 Records read: 30000 05-09-2004.17:49:42 Records read: 40000 05-09-2004.17:49:43 Records read: 50000 05-09-2004.17:49:43 Records read: 60000 05-09-2004.17:49:44 Records read: 70000 05-09-2004.17:49:45 Records read: 80000 05-09-2004.17:49:45 Records read: 90000 05-09-2004.17:49:46 Records read: 100000 05-09-2004.17:49:46 Records read: 110000 05-09-2004.17:49:47 Records read: 120000 05-09-2004.17:49:47 Records read: 130000 05-09-2004.17:49:48 Records read: 140000 My "outside looking in" observations seem to point to the fact that every row has to be retrieved (or stored) with a separate request. Network latency, however small, becomes an issue when the volume is very high. A Pro*C program I recently ported from Oracle to PostgreSQL showed this difference. In Pro*C you can load an array with rows to insert, then issue a single INSERT request passing it the array. As far as I can tell, in PostgreSQL ecpg (or other) you have to execute one request per record. Is there some way to batch insert/fetch requests? How else can I improve upon the performance? It appears that COPY works like this, but you can't control what is returned and you have to know the column order. Wes
<wespvp@syntegra.com> writes: > On 5/9/04 9:32 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> Are you sure it is a network problem? > Yes, it is definitely due to the network latency even though that latency is > very small. Here it is running locally: > [ about 20000 records/sec ] Okay, I just wanted to verify that we weren't overlooking any other sorts of bottleneck. But the numbers you quote make sense as a network issue: 33 seconds for 10000 records is 3.03 msec per record, and since you say the measured ping time is 3 msec, it appears that FETCH has just about the same response time as a ping ;-). So you can't really complain about it. The only way to do better will be to batch multiple fetches into one network round trip. > A Pro*C program I recently ported from Oracle to PostgreSQL showed this > difference. In Pro*C you can load an array with rows to insert, then issue > a single INSERT request passing it the array. As far as I can tell, in > PostgreSQL ecpg (or other) you have to execute one request per record. The usual way to batch multiple insertions is with COPY IN. The usual way to batch a fetch is just to SELECT the whole thing; or if that is too much data to snarf at once, use a cursor with "FETCH n" requests. I am not sure how either of these techniques map into ecpg though. If you want to use ecpg then I'd suggest bringing up the question on pgsql-interfaces --- the ecpg gurus are more likely to be paying attention over there. > ... It appears that COPY works like this, but you can't > control what is returned and you have to know the column order. True, COPY OUT is only designed to return all the rows of a table. However, in recent versions you can specify what columns you want in a COPY. It's still no substitute for SELECT... regards, tom lane
wespvp@syntegra.com wrote: > The back end is pretty much idle. It shows 'idle in transaction'. Well, is not soo much idle, it's holding a transaction id! That "idle in transaction" is not your problem but however I suggest you take a look at why you have idle in transaction backend; do you have back end sitting there days and days in that state ? Regards Gaetano Mendola
On 5/9/04 1:58 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > I am not sure how either of these techniques map into ecpg though. > If you want to use ecpg then I'd suggest bringing up the question on > pgsql-interfaces --- the ecpg gurus are more likely to be paying > attention over there. I got some sample code from someone on the pgsql-interfaces list on how to do bulk FETCH's. It is pretty much the same as with Pro*C. You just can't use that for INSERT/UPDATE (hint, hint...) I was able to improve the network retrieval rate from 1 million records per hour to 1 million records per minute. Wes