Thread: performance libpq vs JDBC
I wrote a test program in C++ using libpq. It works as follows (pseudo code):
for ( int loop = 0; loop < 1000; ++loop ) {
PQexec("BEGIN");
const char* sql = "INSERT INTO pg_perf_test (id, text) VALUES($1,$2)";
PQprepare(m_conn, "stmtid",sql,0,NULL);
for ( int i = 0; i < 1000; ++i )
// Set values etc.
PQexecPrepared(m_conn,…);
}
PQexec("DEALLOCATE stmtid");
PQexec("COMMIT");
}
I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert)
After that, I wrote a test program in Java using JDBC. It works as follows:
for ( int loops = 0; loops < 1000; ++i) {
String sql = "INSERT INTO pq_perf_test (id,text) VALUES (?,?)";
PreparedStatement stmt = con.prepareStatement(sql);
for (int i = 0; i < 1000; ++i ) {
// Set values etc.
stmt.addBatch();
}
stmt.executeBatch();
con.commit();
stmt.close();
}
I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert)
This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq.
Comparable results have been measured with analog update and delete statements.
I need to enhance the performance of my C++ code. Is there any possibility in libpq to reach the performance of JDBC for INSERT, UPDATE and DELETE statements (I have no chance to use COPY statements)? I didn't find anything comparable to PreparedStatement.executeBatch() in libpq.
Best regards,
Werner Scholtes
I hope it will be better than prepared statements.
Divakar
From: Werner Scholtes <Werner.Scholtes@heuboe.de>
To: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Wed, December 15, 2010 8:21:55 PM
Subject: [PERFORM] performance libpq vs JDBC
I wrote a test program in C++ using libpq. It works as follows (pseudo code):
for ( int loop = 0; loop < 1000; ++loop ) {
PQexec("BEGIN");
const char* sql = "INSERT INTO pg_perf_test (id, text) VALUES($1,$2)";
PQprepare(m_conn, "stmtid",sql,0,NULL);
for ( int i = 0; i < 1000; ++i )
// Set values etc.
PQexecPrepared(m_conn,…);
}
PQexec("DEALLOCATE stmtid");
PQexec("COMMIT");
}
I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert)
After that, I wrote a test program in Java using JDBC. It works as follows:
for ( int loops = 0; loops < 1000; ++i) {
String sql = "INSERT INTO pq_perf_test (id,text) VALUES (?,?)";
PreparedStatement stmt = con.prepareStatement(sql);
for (int i = 0; i < 1000; ++i ) {
// Set values etc.
stmt.addBatch();
}
stmt.executeBatch();
con.commit();
stmt.close();
}
I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert)
This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq.
Comparable results have been measured with analog update and delete statements.
I need to enhance the performance of my C++ code. Is there any possibility in libpq to reach the performance of JDBC for INSERT, UPDATE and DELETE statements (I have no chance to use COPY statements)? I didn't find anything comparable to PreparedStatement.executeBatch() in libpq.
Best regards,
Werner Scholtes
Unfortunately I cannot use COPY funtion, since I need the performance of JDBC for update and delete statements in C++ libpq-program as well.
I wonder how JDBC PreparedStatement.addBatch() and PreparedStatement.executeBatch() work. They need to have a more efficient protocol to send bulks of parameter sets for one prepared statement as batch in one network transmission to the server. As far as I could see PQexecPrepared does not allow to send more than one parameter set (parameters for one row) in one call. So libpq sends 1000 times one single row to the server where JDBC sends 1 time 1000 rows, which is much more efficient.
I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right?
Von: Divakar Singh [mailto:dpsmails@yahoo.com]
Gesendet: Donnerstag, 16. Dezember 2010 09:11
An: Werner Scholtes; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] performance libpq vs JDBC
Can you trying writing libpq program using COPY functions?
I hope it will be better than prepared statements.
Best Regards,
Divakar
From: Werner Scholtes <Werner.Scholtes@heuboe.de>
To: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Wed, December 15, 2010 8:21:55 PM
Subject: [PERFORM] performance libpq vs JDBC
I wrote a test program in C++ using libpq. It works as follows (pseudo code):
for ( int loop = 0; loop < 1000; ++loop ) {
PQexec("BEGIN");
const char* sql = "INSERT INTO pg_perf_test (id, text) VALUES($1,$2)";
PQprepare(m_conn, "stmtid",sql,0,NULL);
for ( int i = 0; i < 1000; ++i )
// Set values etc.
PQexecPrepared(m_conn,…);
}
PQexec("DEALLOCATE stmtid");
PQexec("COMMIT");
}
I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert)
After that, I wrote a test program in Java using JDBC. It works as follows:
for ( int loops = 0; loops < 1000; ++i) {
String sql = "INSERT INTO pq_perf_test (id,text) VALUES (?,?)";
PreparedStatement stmt = con.prepareStatement(sql);
for (int i = 0; i < 1000; ++i ) {
// Set values etc.
stmt.addBatch();
}
stmt.executeBatch();
con.commit();
stmt.close();
}
I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert)
This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq.
Comparable results have been measured with analog update and delete statements.
I need to enhance the performance of my C++ code. Is there any possibility in libpq to reach the performance of JDBC for INSERT, UPDATE and DELETE statements (I have no chance to use COPY statements)? I didn't find anything comparable to PreparedStatement.executeBatch() in libpq.
Best regards,
Werner Scholtes
an example: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows
Divakar
From: Werner Scholtes <Werner.Scholtes@heuboe.de>
To: Divakar Singh <dpsmails@yahoo.com>; "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Thu, December 16, 2010 2:51:53 PM
Subject: RE: [PERFORM] performance libpq vs JDBC
Unfortunately I cannot use COPY funtion, since I need the performance of JDBC for update and delete statements in C++ libpq-program as well.
I wonder how JDBC PreparedStatement.addBatch() and PreparedStatement.executeBatch() work. They need to have a more efficient protocol to send bulks of parameter sets for one prepared statement as batch in one network transmission to the server. As far as I could see PQexecPrepared does not allow to send more than one parameter set (parameters for one row) in one call. So libpq sends 1000 times one single row to the server where JDBC sends 1 time 1000 rows, which is much more efficient.
I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right?
Von: Divakar Singh [mailto:dpsmails@yahoo.com]
Gesendet: Donnerstag, 16. Dezember 2010 09:11
An: Werner Scholtes; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] performance libpq vs JDBC
Can you trying writing libpq program using COPY functions?
I hope it will be better than prepared statements.
Best Regards,
Divakar
From: Werner Scholtes <Werner.Scholtes@heuboe.de>
To: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Wed, December 15, 2010 8:21:55 PM
Subject: [PERFORM] performance libpq vs JDBC
I wrote a test program in C++ using libpq. It works as follows (pseudo code):
for ( int loop = 0; loop < 1000; ++loop ) {
PQexec("BEGIN");
const char* sql = "INSERT INTO pg_perf_test (id, text) VALUES($1,$2)";
PQprepare(m_conn, "stmtid",sql,0,NULL);
for ( int i = 0; i < 1000; ++i )
// Set values etc.
PQexecPrepared(m_conn,…);
}
PQexec("DEALLOCATE stmtid");
PQexec("COMMIT");
}
I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert)
After that, I wrote a test program in Java using JDBC. It works as follows:
for ( int loops = 0; loops < 1000; ++i) {
String sql = "INSERT INTO pq_perf_test (id,text) VALUES (?,?)";
PreparedStatement stmt = con.prepareStatement(sql);
for (int i = 0; i < 1000; ++i ) {
// Set values etc.
stmt.addBatch();
}
stmt.executeBatch();
con.commit();
stmt.close();
}
I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert)
This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq.
Comparable results have been measured with analog update and delete statements.
I need to enhance the performance of my C++ code. Is there any possibility in libpq to reach the performance of JDBC for INSERT, UPDATE and DELETE statements (I have no chance to use COPY statements)? I didn't find anything comparable to PreparedStatement.executeBatch() in libpq.
Best regards,
Werner Scholtes
What about update and delete? In case of an update I have all records to be updated and in case of an delete I have all primary key values of records to be deleted.
Von: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] Im Auftrag von Divakar Singh
Gesendet: Donnerstag, 16. Dezember 2010 10:38
An: Werner Scholtes; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] performance libpq vs JDBC
If you have all records before issuing Insert, you can do it like: insert into xxx values (a,b,c), (d,e,f), ......;
an example: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows
Best Regards,
Divakar
From: Werner Scholtes <Werner.Scholtes@heuboe.de>
To: Divakar Singh <dpsmails@yahoo.com>; "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Thu, December 16, 2010 2:51:53 PM
Subject: RE: [PERFORM] performance libpq vs JDBC
Unfortunately I cannot use COPY funtion, since I need the performance of JDBC for update and delete statements in C++ libpq-program as well.
I wonder how JDBC PreparedStatement.addBatch() and PreparedStatement.executeBatch() work. They need to have a more efficient protocol to send bulks of parameter sets for one prepared statement as batch in one network transmission to the server. As far as I could see PQexecPrepared does not allow to send more than one parameter set (parameters for one row) in one call. So libpq sends 1000 times one single row to the server where JDBC sends 1 time 1000 rows, which is much more efficient.
I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right?
Von: Divakar Singh [mailto:dpsmails@yahoo.com]
Gesendet: Donnerstag, 16. Dezember 2010 09:11
An: Werner Scholtes; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] performance libpq vs JDBC
Can you trying writing libpq program using COPY functions?
I hope it will be better than prepared statements.
Best Regards,
Divakar
From: Werner Scholtes <Werner.Scholtes@heuboe.de>
To: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Wed, December 15, 2010 8:21:55 PM
Subject: [PERFORM] performance libpq vs JDBC
I wrote a test program in C++ using libpq. It works as follows (pseudo code):
for ( int loop = 0; loop < 1000; ++loop ) {
PQexec("BEGIN");
const char* sql = "INSERT INTO pg_perf_test (id, text) VALUES($1,$2)";
PQprepare(m_conn, "stmtid",sql,0,NULL);
for ( int i = 0; i < 1000; ++i )
// Set values etc.
PQexecPrepared(m_conn,…);
}
PQexec("DEALLOCATE stmtid");
PQexec("COMMIT");
}
I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert)
After that, I wrote a test program in Java using JDBC. It works as follows:
for ( int loops = 0; loops < 1000; ++i) {
String sql = "INSERT INTO pq_perf_test (id,text) VALUES (?,?)";
PreparedStatement stmt = con.prepareStatement(sql);
for (int i = 0; i < 1000; ++i ) {
// Set values etc.
stmt.addBatch();
}
stmt.executeBatch();
con.commit();
stmt.close();
}
I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert)
This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq.
Comparable results have been measured with analog update and delete statements.
I need to enhance the performance of my C++ code. Is there any possibility in libpq to reach the performance of JDBC for INSERT, UPDATE and DELETE statements (I have no chance to use COPY statements)? I didn't find anything comparable to PreparedStatement.executeBatch() in libpq.
Best regards,
Werner Scholtes
The only thing is that the criteria has to be the same for all rows.
If you have different criteria for different rows in case of update or delete, you will have to fire 2 queries.
I mean, if you want to do
1. delete from xyz where a = 1
and
2. delete from xyz where a = 2
Then you will have to run query 2 times.
Best Regards,
Divakar
From: Werner Scholtes <Werner.Scholtes@heuboe.de>
To: Divakar Singh <dpsmails@yahoo.com>; "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Thu, December 16, 2010 3:11:36 PM
Subject: Re: [PERFORM] performance libpq vs JDBC
What about update and delete? In case of an update I have all records to be updated and in case of an delete I have all primary key values of records to be deleted.
Von: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] Im Auftrag von Divakar Singh
Gesendet: Donnerstag, 16. Dezember 2010 10:38
An: Werner Scholtes; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] performance libpq vs JDBC
If you have all records before issuing Insert, you can do it like: insert into xxx values (a,b,c), (d,e,f), ......;
an example: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows
Best Regards,
Divakar
From: Werner Scholtes <Werner.Scholtes@heuboe.de>
To: Divakar Singh <dpsmails@yahoo.com>; "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Thu, December 16, 2010 2:51:53 PM
Subject: RE: [PERFORM] performance libpq vs JDBC
Unfortunately I cannot use COPY funtion, since I need the performance of JDBC for update and delete statements in C++ libpq-program as well.
I wonder how JDBC PreparedStatement.addBatch() and PreparedStatement.executeBatch() work. They need to have a more efficient protocol to send bulks of parameter sets for one prepared statement as batch in one network transmission to the server. As far as I could see PQexecPrepared does not allow to send more than one parameter set (parameters for one row) in one call. So libpq sends 1000 times one single row to the server where JDBC sends 1 time 1000 rows, which is much more efficient.
I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right?
Von: Divakar Singh [mailto:dpsmails@yahoo.com]
Gesendet: Donnerstag, 16. Dezember 2010 09:11
An: Werner Scholtes; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] performance libpq vs JDBC
Can you trying writing libpq program using COPY functions?
I hope it will be better than prepared statements.
Best Regards,
Divakar
From: Werner Scholtes <Werner.Scholtes@heuboe.de>
To: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Wed, December 15, 2010 8:21:55 PM
Subject: [PERFORM] performance libpq vs JDBC
I wrote a test program in C++ using libpq. It works as follows (pseudo code):
for ( int loop = 0; loop < 1000; ++loop ) {
PQexec("BEGIN");
const char* sql = "INSERT INTO pg_perf_test (id, text) VALUES($1,$2)";
PQprepare(m_conn, "stmtid",sql,0,NULL);
for ( int i = 0; i < 1000; ++i )
// Set values etc.
PQexecPrepared(m_conn,…);
}
PQexec("DEALLOCATE stmtid");
PQexec("COMMIT");
}
I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert)
After that, I wrote a test program in Java using JDBC. It works as follows:
for ( int loops = 0; loops < 1000; ++i) {
String sql = "INSERT INTO pq_perf_test (id,text) VALUES (?,?)";
PreparedStatement stmt = con.prepareStatement(sql);
for (int i = 0; i < 1000; ++i ) {
// Set values etc.
stmt.addBatch();
}
stmt.executeBatch();
con.commit();
stmt.close();
}
I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert)
This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq.
Comparable results have been measured with analog update and delete statements.
I need to enhance the performance of my C++ code. Is there any possibility in libpq to reach the performance of JDBC for INSERT, UPDATE and DELETE statements (I have no chance to use COPY statements)? I didn't find anything comparable to PreparedStatement.executeBatch() in libpq.
Best regards,
Werner Scholtes
On 16/12/10 09:21, Werner Scholtes wrote: > I assume that the wire protocol of PostgreSQL allows to transmit > multiple rows at once, but libpq doesn't have an interface to access it. > Is that right? Sounds wrong to me. The libpq client is the default reference implementation of the protocol. If there were large efficiencies that could be copied, they would be. Anyway - you don't need to assume what's in the protocol. It's documented here: http://www.postgresql.org/docs/9.0/static/protocol.html I'd stick wireshark or some other network analyser on the two sessions - see exactly what is different. -- Richard Huxton Archonet Ltd
Thanks a lot for your advice. I found the difference: My Java program sends one huge SQL string containing 1000 INSERT statementsseparated by ';' (without using prepared statements at all!), whereas my C++ program sends one INSERT statementwith parameters to be prepared and after that 1000 times parameters. Now I refactured my C++ program to send also1000 INSERT statements in one call to PQexec and reached the same performance as my Java program. I just wonder why anyone should use prepared statements at all? > -----Ursprüngliche Nachricht----- > Von: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance- > owner@postgresql.org] Im Auftrag von Richard Huxton > Gesendet: Donnerstag, 16. Dezember 2010 13:15 > An: Werner Scholtes > Cc: Divakar Singh; pgsql-performance@postgresql.org > Betreff: Re: [PERFORM] performance libpq vs JDBC > > On 16/12/10 09:21, Werner Scholtes wrote: > > I assume that the wire protocol of PostgreSQL allows to transmit > > multiple rows at once, but libpq doesn't have an interface to access > it. > > Is that right? > > Sounds wrong to me. The libpq client is the default reference > implementation of the protocol. If there were large efficiencies that > could be copied, they would be. > > Anyway - you don't need to assume what's in the protocol. It's > documented here: > http://www.postgresql.org/docs/9.0/static/protocol.html > > I'd stick wireshark or some other network analyser on the two sessions > - > see exactly what is different. > > -- > Richard Huxton > Archonet Ltd > > -- > Sent via pgsql-performance mailing list (pgsql- > performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On 16/12/10 12:28, Werner Scholtes wrote: > Thanks a lot for your advice. I found the difference: My Java program > sends one huge SQL string containing 1000 INSERT statements separated > by ';' (without using prepared statements at all!), whereas my C++ > program sends one INSERT statement with parameters to be prepared and > after that 1000 times parameters. Now I refactured my C++ program to > send also 1000 INSERT statements in one call to PQexec and reached > the same performance as my Java program. So - it was the network round-trip overhead. Like Divakar suggested, COPY or VALUES (),(),() would work too. You mention multiple updates/deletes too. Perhaps the cleanest and fastest method would be to build a TEMP table containing IDs/values required and join against that for your updates/deletes. > I just wonder why anyone should use prepared statements at all? Not everything is a simple INSERT. Preparing saves planning-time on repeated SELECTs. It also provides some SQL injection safety since you provide parameters rather than building a SQL string. -- Richard Huxton Archonet Ltd
On Thu, Dec 16, 2010 at 7:14 AM, Richard Huxton <dev@archonet.com> wrote: > On 16/12/10 09:21, Werner Scholtes wrote: >> >> I assume that the wire protocol of PostgreSQL allows to transmit >> multiple rows at once, but libpq doesn't have an interface to access it. >> Is that right? > > Sounds wrong to me. The libpq client is the default reference implementation > of the protocol. If there were large efficiencies that could be copied, they > would be. > > Anyway - you don't need to assume what's in the protocol. It's documented > here: > http://www.postgresql.org/docs/9.0/static/protocol.html > > I'd stick wireshark or some other network analyser on the two sessions - see > exactly what is different. There is only one explanation for the difference: they are slamming data across the wire without waiting for the result. libpq queries are synchronous: you send a query, wait for the result. This means for very simple queries like the above you can become network bound. In C/C++ you can work around this using a couple of different methods. COPY of course is the fastest, but extremely limiting in what it can do. We developed libpqtypes (I love talking about libpqtypes) to deal with this problem. In the attached example, it stacks data into an array in the client, sends it to the server which unnests and inserts it. The attached example inserts a million rows in about 11 seconds on my workstation (client side prepare could knock that down to 8 or so). If you need to do something fancy, the we typically create a receiving function on the server in plpgsql which unnests() the result and makes decisions, etc. This is extremely powerful and you can compose and send very rich data to/from postgres in a single query. merlin #include "libpq-fe.h" #include "libpqtypes.h" #define INS_COUNT 1000000 int main() { int i; PGconn *conn = PQconnectdb("dbname=pg9"); PGresult *res; if(PQstatus(conn) != CONNECTION_OK) { printf("bad connection"); return -1; } PQtypesRegister(conn); PGregisterType type = {"ins_test", NULL, NULL}; PQregisterComposites(conn, &type, 1); PGparam *p = PQparamCreate(conn); PGarray arr; arr.param = PQparamCreate(conn); arr.ndims = 0; PGparam *t = PQparamCreate(conn); for(i=0; i<INS_COUNT; i++) { PGint4 a=i; PGtext b = "some_text"; PGtimestamp c; PGbytea d; d.len = 8; d.data = b; c.date.isbc = 0; c.date.year = 2000; c.date.mon = 0; c.date.mday = 19; c.time.hour = 10; c.time.min = 41; c.time.sec = 6; c.time.usec = 0; c.time.gmtoff = -18000; PQputf(t, "%int4 %text %timestamptz %bytea", a, b, &c, &d); PQputf(arr.param, "%ins_test", t); PQparamReset(t); } if(!PQputf(p, "%ins_test[]", &arr)) { printf("putf failed: %s\n", PQgeterror()); return -1; } res = PQparamExec(conn, p, "insert into ins_test select * from unnest($1) r(a, b, c, d)", 1); if(!res) { printf("got %s\n", PQgeterror()); return -1; } PQclear(res); PQparamClear(p); PQfinish(conn); }