I found a similar issue like this in the archives from 2000, but no
answer...
I'm seeing what seems like slow retrieval times over the network. I am
retrieving a single field of about 100-120 characters per record. I am
getting about 3 seconds per 1000 records - it takes 30 seconds to retrieve
10,000 records. That's only about 36 KBytes/sec.
This is a 100BT switched network (not sure if it is vlan'd or through a
router). Echo time averages 3ms. That works out to about the same rate as
the records are being retrieved - 3ms per record.
The back end is pretty much idle. It shows 'idle in transaction'. If I run
the program on the system containing the database, it runs close to 2 orders
of magnitude faster.
On the remote system:
05-08-2004.23:54:43 Records read: 10000
05-08-2004.23:55:17 Records read: 20000
05-08-2004.23:55:50 Records read: 30000
05-08-2004.23:56:22 Records read: 40000
05-08-2004.23:56:55 Records read: 50000
05-08-2004.23:57:32 Records read: 60000
05-08-2004.23:58:07 Records read: 70000
...
The code is an ecpg program like:
EXEC SQL WHENEVER SQLERROR GOTO sql_error; EXEC SQL WHENEVER NOT FOUND DO break; EXEC SQL DECLARE message_cursor
CURSORFOR
SELECT file_name FROM messages WHERE system_key=(select system_key
fromsystems where
system_name=:systemName);
EXEC SQL OPEN message_cursor;
count = 0; while (1) { EXEC SQL FETCH message_cursor INTO :fileNameDB;
memcpy (tempstr, fileNameDB.arr, fileNameDB.len); tempstr[fileNameDB.len] = '\0';
[Action with tempstr removed for testing]
count++; if ( (count % 10000) == 0) logmsg ("Records read: %d", count); }
My "outside looking in" observations seem to point to the fact that every
row has to be retrieved (or stored) with a separate request. Network
latency, however small, becomes an issue when the volume is very high.
A (different) Pro*C program I recently ported from Oracle to PostgreSQL
showed this difference. In Pro*C you can load an array with rows to insert,
then issue a single INSERT request passing it the array. I believe the same
thing applies to FETCH. As far as I can tell, in PostgreSQL ecpg (or other)
you have to execute one request per record. The program SEGV'd all over the
place when I tried to use the Pro*C array code. I had to do a major rework
of the code to remove the array logic.
Is there some way to batch insert/fetch requests? How else can I improve
upon the performance? I can find nothing in the ecpg documentation that
would indicate this is possible. It appears that COPY works like this, but
you can't control what is returned and you have to know the column order. I
need to retrieve about 10 million records - 10% of the database, so COPY is
not feasible.
Wes