Home > mailing lists

Thread: performance libpq vs JDBC

performance libpq vs JDBC

From

Werner Scholtes

Date:

16 December 2010, 03:31:20

I wrote a test program in C++ using libpq. It works as follows (pseudo code):

for ( int loop = 0; loop < 1000; ++loop ) {

PQexec("BEGIN");

const char* sql = "INSERT INTO pg_perf_test (id, text) VALUES($1,$2)";

PQprepare(m_conn, "stmtid",sql,0,NULL);

for ( int i = 0; i < 1000; ++i )

// Set values etc.

PQexecPrepared(m_conn,…);

}

PQexec("DEALLOCATE stmtid");

PQexec("COMMIT");

}

I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert)

After that, I wrote a test program in Java using JDBC. It works as follows:

for ( int loops = 0; loops < 1000; ++i) {

String sql = "INSERT INTO pq_perf_test (id,text) VALUES (?,?)";

PreparedStatement stmt = con.prepareStatement(sql);

for (int i = 0; i < 1000; ++i ) {

// Set values etc.

stmt.addBatch();

}

stmt.executeBatch();

con.commit();

stmt.close();

}

I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert)

This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq.

Comparable results have been measured with analog update and delete statements.

I need to enhance the performance of my C++ code. Is there any possibility in libpq to reach the performance of JDBC for INSERT, UPDATE and DELETE statements (I have no chance to use COPY statements)? I didn't find anything comparable to PreparedStatement.executeBatch() in libpq.

Best regards,

Werner Scholtes

Re: performance libpq vs JDBC

From

Divakar Singh

Date:

16 December 2010, 04:10:50

Can you trying writing libpq program using COPY functions?
I hope it will be better than prepared statements.

Best Regards,
Divakar

From: Werner Scholtes <Werner.Scholtes@heuboe.de>
To: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Wed, December 15, 2010 8:21:55 PM
Subject: [PERFORM] performance libpq vs JDBC

I wrote a test program in C++ using libpq. It works as follows (pseudo code):

for ( int loop = 0; loop < 1000; ++loop ) {

PQexec("BEGIN");

const char* sql = "INSERT INTO pg_perf_test (id, text) VALUES($1,$2)";

PQprepare(m_conn, "stmtid",sql,0,NULL);

for ( int i = 0; i < 1000; ++i )

// Set values etc.

PQexecPrepared(m_conn,…);

}

PQexec("DEALLOCATE stmtid");

PQexec("COMMIT");

}

I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert)

After that, I wrote a test program in Java using JDBC. It works as follows:

for ( int loops = 0; loops < 1000; ++i) {

String sql = "INSERT INTO pq_perf_test (id,text) VALUES (?,?)";

PreparedStatement stmt = con.prepareStatement(sql);

for (int i = 0; i < 1000; ++i ) {

// Set values etc.

stmt.addBatch();

}

stmt.executeBatch();

con.commit();

stmt.close();

}

I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert)

This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq.

Comparable results have been measured with analog update and delete statements.

Best regards,

Werner Scholtes

Re: performance libpq vs JDBC

From

Werner Scholtes

Date:

16 December 2010, 05:22:28

Unfortunately I cannot use COPY funtion, since I need the performance of JDBC for update and delete statements in C++ libpq-program as well.

I wonder how JDBC PreparedStatement.addBatch() and PreparedStatement.executeBatch() work. They need to have a more efficient protocol to send bulks of parameter sets for one prepared statement as batch in one network transmission to the server. As far as I could see PQexecPrepared does not allow to send more than one parameter set (parameters for one row) in one call. So libpq sends 1000 times one single row to the server where JDBC sends 1 time 1000 rows, which is much more efficient.

I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right?

Von: Divakar Singh [mailto:dpsmails@yahoo.com]
Gesendet: Donnerstag, 16. Dezember 2010 09:11
An: Werner Scholtes; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] performance libpq vs JDBC

Can you trying writing libpq program using COPY functions?
I hope it will be better than prepared statements.

Best Regards,
Divakar

I wrote a test program in C++ using libpq. It works as follows (pseudo code):

for ( int loop = 0; loop < 1000; ++loop ) {

PQexec("BEGIN");

const char* sql = "INSERT INTO pg_perf_test (id, text) VALUES($1,$2)";

PQprepare(m_conn, "stmtid",sql,0,NULL);

for ( int i = 0; i < 1000; ++i )

// Set values etc.

PQexecPrepared(m_conn,…);

}

PQexec("DEALLOCATE stmtid");

PQexec("COMMIT");

}

I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert)

After that, I wrote a test program in Java using JDBC. It works as follows:

for ( int loops = 0; loops < 1000; ++i) {

String sql = "INSERT INTO pq_perf_test (id,text) VALUES (?,?)";

PreparedStatement stmt = con.prepareStatement(sql);

for (int i = 0; i < 1000; ++i ) {

// Set values etc.

stmt.addBatch();

}

stmt.executeBatch();

con.commit();

stmt.close();

}

I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert)

This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq.

Comparable results have been measured with analog update and delete statements.

Best regards,

Werner Scholtes

Re: performance libpq vs JDBC

From

Divakar Singh

Date:

16 December 2010, 05:37:43

If you have all records before issuing Insert, you can do it like: insert into xxx values (a,b,c), (d,e,f), ......;
an example: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows

Best Regards,
Divakar

From: Werner Scholtes <Werner.Scholtes@heuboe.de>
To: Divakar Singh <dpsmails@yahoo.com>; "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Thu, December 16, 2010 2:51:53 PM
Subject: RE: [PERFORM] performance libpq vs JDBC

Unfortunately I cannot use COPY funtion, since I need the performance of JDBC for update and delete statements in C++ libpq-program as well.

I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right?

Can you trying writing libpq program using COPY functions?
I hope it will be better than prepared statements.

Best Regards,
Divakar

I wrote a test program in C++ using libpq. It works as follows (pseudo code):

for ( int loop = 0; loop < 1000; ++loop ) {

PQexec("BEGIN");

const char* sql = "INSERT INTO pg_perf_test (id, text) VALUES($1,$2)";

PQprepare(m_conn, "stmtid",sql,0,NULL);

for ( int i = 0; i < 1000; ++i )

// Set values etc.

PQexecPrepared(m_conn,…);

}

PQexec("DEALLOCATE stmtid");

PQexec("COMMIT");

}

I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert)

After that, I wrote a test program in Java using JDBC. It works as follows:

for ( int loops = 0; loops < 1000; ++i) {

String sql = "INSERT INTO pq_perf_test (id,text) VALUES (?,?)";

PreparedStatement stmt = con.prepareStatement(sql);

for (int i = 0; i < 1000; ++i ) {

// Set values etc.

stmt.addBatch();

}

stmt.executeBatch();

con.commit();

stmt.close();

}

I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert)

This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq.

Comparable results have been measured with analog update and delete statements.

Best regards,

Werner Scholtes

Re: performance libpq vs JDBC

From

Werner Scholtes

Date:

16 December 2010, 05:42:07

What about update and delete? In case of an update I have all records to be updated and in case of an delete I have all primary key values of records to be deleted.

Von: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] Im Auftrag von Divakar Singh
Gesendet: Donnerstag, 16. Dezember 2010 10:38
An: Werner Scholtes; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] performance libpq vs JDBC

If you have all records before issuing Insert, you can do it like: insert into xxx values (a,b,c), (d,e,f), ......;
an example: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows

Best Regards,
Divakar

Unfortunately I cannot use COPY funtion, since I need the performance of JDBC for update and delete statements in C++ libpq-program as well.

I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right?

Can you trying writing libpq program using COPY functions?
I hope it will be better than prepared statements.

Best Regards,
Divakar

I wrote a test program in C++ using libpq. It works as follows (pseudo code):

for ( int loop = 0; loop < 1000; ++loop ) {

PQexec("BEGIN");

const char* sql = "INSERT INTO pg_perf_test (id, text) VALUES($1,$2)";

PQprepare(m_conn, "stmtid",sql,0,NULL);

for ( int i = 0; i < 1000; ++i )

// Set values etc.

PQexecPrepared(m_conn,…);

}

PQexec("DEALLOCATE stmtid");

PQexec("COMMIT");

}

I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert)

After that, I wrote a test program in Java using JDBC. It works as follows:

for ( int loops = 0; loops < 1000; ++i) {

String sql = "INSERT INTO pq_perf_test (id,text) VALUES (?,?)";

PreparedStatement stmt = con.prepareStatement(sql);

for (int i = 0; i < 1000; ++i ) {

// Set values etc.

stmt.addBatch();

}

stmt.executeBatch();

con.commit();

stmt.close();

}

I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert)

This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq.

Comparable results have been measured with analog update and delete statements.

Best regards,

Werner Scholtes

Re: performance libpq vs JDBC

From

Divakar Singh

Date:

16 December 2010, 05:48:46

Update and delete are the operations which affect more than 1 row in general.
The only thing is that the criteria has to be the same for all rows.
If you have different criteria for different rows in case of update or delete, you will have to fire 2 queries.

I mean, if you want to do
1. delete from xyz where a = 1
and
2. delete from xyz where a = 2
Then you will have to run query 2 times.

Best Regards,
Divakar

From: Werner Scholtes <Werner.Scholtes@heuboe.de>
To: Divakar Singh <dpsmails@yahoo.com>; "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Thu, December 16, 2010 3:11:36 PM
Subject: Re: [PERFORM] performance libpq vs JDBC

What about update and delete? In case of an update I have all records to be updated and in case of an delete I have all primary key values of records to be deleted.

If you have all records before issuing Insert, you can do it like: insert into xxx values (a,b,c), (d,e,f), ......;
an example: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows

Best Regards,
Divakar

Unfortunately I cannot use COPY funtion, since I need the performance of JDBC for update and delete statements in C++ libpq-program as well.

I assume that the wire protocol of PostgreSQL allows to transmit multiple rows at once, but libpq doesn't have an interface to access it. Is that right?

Can you trying writing libpq program using COPY functions?
I hope it will be better than prepared statements.

Best Regards,
Divakar

I wrote a test program in C++ using libpq. It works as follows (pseudo code):

for ( int loop = 0; loop < 1000; ++loop ) {

PQexec("BEGIN");

const char* sql = "INSERT INTO pg_perf_test (id, text) VALUES($1,$2)";

PQprepare(m_conn, "stmtid",sql,0,NULL);

for ( int i = 0; i < 1000; ++i )

// Set values etc.

PQexecPrepared(m_conn,…);

}

PQexec("DEALLOCATE stmtid");

PQexec("COMMIT");

}

I measured the duration of every loop of the outer for-loop resulting in an average of 450 ms (per 1000 data sets insert)

After that, I wrote a test program in Java using JDBC. It works as follows:

for ( int loops = 0; loops < 1000; ++i) {

String sql = "INSERT INTO pq_perf_test (id,text) VALUES (?,?)";

PreparedStatement stmt = con.prepareStatement(sql);

for (int i = 0; i < 1000; ++i ) {

// Set values etc.

stmt.addBatch();

}

stmt.executeBatch();

con.commit();

stmt.close();

}

I measured the duration of every loop of the outer for-loop resulting in an average of 100 ms (per 1000 data sets insert)

This means that accessing PostgreSQL by JDBC is about 4-5 times faster than using libpq.

Comparable results have been measured with analog update and delete statements.

Best regards,

Werner Scholtes

Re: performance libpq vs JDBC

From

Richard Huxton

Date:

16 December 2010, 08:14:47

On 16/12/10 09:21, Werner Scholtes wrote:
> I assume that the wire protocol of PostgreSQL allows to transmit
> multiple rows at once, but libpq doesn't have an interface to access it.
> Is that right?

Sounds wrong to me. The libpq client is the default reference
implementation of the protocol. If there were large efficiencies that
could be copied, they would be.

Anyway - you don't need to assume what's in the protocol. It's
documented here:
   http://www.postgresql.org/docs/9.0/static/protocol.html

I'd stick wireshark or some other network analyser on the two sessions -
see exactly what is different.

--
   Richard Huxton
   Archonet Ltd

Re: performance libpq vs JDBC

From

Werner Scholtes

Date:

16 December 2010, 08:29:26

Thanks a lot for your advice. I found the difference: My Java program sends one huge SQL string containing 1000 INSERT
statementsseparated by ';' (without using prepared statements at all!), whereas my C++ program sends one INSERT
statementwith parameters to be prepared and after that 1000 times parameters. Now I refactured my C++ program to send
also1000 INSERT statements in one call to PQexec and reached the same performance as my Java program.
 

I just wonder why anyone should use prepared statements at all?

> -----Ursprüngliche Nachricht-----
> Von: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-
> owner@postgresql.org] Im Auftrag von Richard Huxton
> Gesendet: Donnerstag, 16. Dezember 2010 13:15
> An: Werner Scholtes
> Cc: Divakar Singh; pgsql-performance@postgresql.org
> Betreff: Re: [PERFORM] performance libpq vs JDBC
> 
> On 16/12/10 09:21, Werner Scholtes wrote:
> > I assume that the wire protocol of PostgreSQL allows to transmit
> > multiple rows at once, but libpq doesn't have an interface to access
> it.
> > Is that right?
> 
> Sounds wrong to me. The libpq client is the default reference
> implementation of the protocol. If there were large efficiencies that
> could be copied, they would be.
> 
> Anyway - you don't need to assume what's in the protocol. It's
> documented here:
>    http://www.postgresql.org/docs/9.0/static/protocol.html
> 
> I'd stick wireshark or some other network analyser on the two sessions
> -
> see exactly what is different.
> 
> --
>    Richard Huxton
>    Archonet Ltd
> 
> --
> Sent via pgsql-performance mailing list (pgsql-
> performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: performance libpq vs JDBC

From

Richard Huxton

Date:

16 December 2010, 08:38:03

On 16/12/10 12:28, Werner Scholtes wrote:
> Thanks a lot for your advice. I found the difference: My Java program
> sends one huge SQL string containing 1000 INSERT statements separated
> by ';' (without using prepared statements at all!), whereas my C++
> program sends one INSERT statement with parameters to be prepared and
> after that 1000 times parameters. Now I refactured my C++ program to
> send also 1000 INSERT statements in one call to PQexec and reached
> the same performance as my Java program.

So - it was the network round-trip overhead. Like Divakar suggested,
COPY or VALUES (),(),() would work too.

You mention multiple updates/deletes too. Perhaps the cleanest and
fastest method would be to build a TEMP table containing IDs/values
required and join against that for your updates/deletes.

> I just wonder why anyone should use prepared statements at all?

Not everything is a simple INSERT. Preparing saves planning-time on
repeated SELECTs. It also provides some SQL injection safety since you
provide parameters rather than building a SQL string.

--
   Richard Huxton
   Archonet Ltd

Re: performance libpq vs JDBC

From

Merlin Moncure

Date:

16 December 2010, 11:09:31

On Thu, Dec 16, 2010 at 7:14 AM, Richard Huxton <dev@archonet.com> wrote:
> On 16/12/10 09:21, Werner Scholtes wrote:
>>
>> I assume that the wire protocol of PostgreSQL allows to transmit
>> multiple rows at once, but libpq doesn't have an interface to access it.
>> Is that right?
>
> Sounds wrong to me. The libpq client is the default reference implementation
> of the protocol. If there were large efficiencies that could be copied, they
> would be.
>
> Anyway - you don't need to assume what's in the protocol. It's documented
> here:
>  http://www.postgresql.org/docs/9.0/static/protocol.html
>
> I'd stick wireshark or some other network analyser on the two sessions - see
> exactly what is different.

There is only one explanation for the difference: they are slamming
data across the wire without waiting for the result.  libpq queries
are synchronous: you send a query, wait for the result.  This means
for very simple queries like the above you can become network bound.

In C/C++ you can work around this using a couple of different methods.
 COPY of course is the fastest, but extremely limiting in what it can
do.  We developed libpqtypes (I love talking about libpqtypes) to deal
with this problem.  In the attached example, it stacks data into an
array in the client, sends it to the server which unnests and inserts
it.  The attached example inserts a million rows in about 11 seconds
on my workstation (client side prepare could knock that down to 8 or
so).

If you need to do something fancy, the we typically create a receiving
function on the server in plpgsql which unnests() the result and makes
decisions, etc.  This is extremely powerful and you can compose and
send very rich data to/from postgres in a single query.

merlin

#include "libpq-fe.h"
#include "libpqtypes.h"

#define INS_COUNT 1000000

int main()
{
 int i;

 PGconn *conn = PQconnectdb("dbname=pg9");
 PGresult *res;
 if(PQstatus(conn) != CONNECTION_OK)
 {
   printf("bad connection");
   return -1;
 }

 PQtypesRegister(conn);

 PGregisterType type = {"ins_test", NULL, NULL};
 PQregisterComposites(conn, &type, 1);

 PGparam *p =  PQparamCreate(conn);
 PGarray arr;
 arr.param = PQparamCreate(conn);
 arr.ndims = 0;

 PGparam *t = PQparamCreate(conn);

 for(i=0; i<INS_COUNT; i++)
 {
   PGint4 a=i;
   PGtext b = "some_text";
   PGtimestamp c;
   PGbytea d;

   d.len = 8;
   d.data = b;

   c.date.isbc   = 0;
   c.date.year   = 2000;
   c.date.mon    = 0;
   c.date.mday   = 19;
   c.time.hour   = 10;
   c.time.min    = 41;
   c.time.sec    = 6;
   c.time.usec   = 0;
   c.time.gmtoff = -18000;

   PQputf(t, "%int4 %text %timestamptz %bytea", a, b, &c, &d);
   PQputf(arr.param, "%ins_test", t);
   PQparamReset(t);
 }

 if(!PQputf(p, "%ins_test[]", &arr))
 {
   printf("putf failed: %s\n", PQgeterror());
   return -1;
 }
 res = PQparamExec(conn, p, "insert into ins_test select * from
unnest($1) r(a, b, c, d)", 1);

 if(!res)
 {
   printf("got %s\n", PQgeterror());
   return -1;
 }
 PQclear(res);
 PQparamClear(p);
 PQfinish(conn);
}