Speed improvement - Group batch Insert - Rewrite the INSERT at the driver level (using a parameter) - Mailing list pgsql-jdbc

From J. W. Ulbts
Subject Speed improvement - Group batch Insert - Rewrite the INSERT at the driver level (using a parameter)
Date
Msg-id 828427796@web.de
Whole thread Raw
Responses Re: Speed improvement - Group batch Insert - Rewrite the INSERT at the driver level (using a parameter)  (Oliver Jowett <oliver@opencloud.com>)
List pgsql-jdbc
Hello!

It would be great if someone would add a parameter and code to the driver which would allow the JDBC driver to group a
listof INSERT INTO added to a batch (Prepared Statement and the addBatch(..) method). 
This way the INSERT INTO from the batch could be rewritten in a single INSERT INTO which is at least 3 times faster
comparedto the current behaviour. At least that was the result of my tests. 

To make clear what I mean a simple example.

The normal batch would execute this:
--------------------
INSERT INTO sometable (col1, col2) VALUES (1, 'word1');
INSERT INTO sometable (col1, col2) VALUES (2, 'word2');
INSERT INTO sometable (col1, col2) VALUES (3, 'word3');
INSERT INTO sometable (col1, col2) VALUES (4, 'word4');
--------------------

If the batch insert is rewritten by the driver it would generate and execute this:
--------------------
INSERT INTO sometable (col1, col2) VALUES (1, 'word1'), (2, 'word2'), (3, 'word3'),  (4, 'word4');
--------------------

If this is already possible with the PostgreSQL JDBC driver I must have missed it.

I ran into this by one of my users pointing me to the multi value/grouped insert statement, and then I finally saw that
MarkMatthews has just improved the MySQL JDBC driver in this area. He's talking about 10x performance increase but
that'sbasically because the MySQL server isn't optimizing as good as PostgreSQL. ;) 
I could "only" see an improvement by factor 5 with my test data and MySQL InnoDB (IMDb - Keywords.list; reading data
fromfile creating INSERT INTO all in one transaction). It's possible that MyISAM are up to 10x faster. 

In my tests where I added the values for the INSERT INTO statement into two Lists (ArrayList; one for the key column
andone for the data column) and generated the "INSERT INTO sometable (col1, col2) VALUES (x,y)[, (x,y)];" in my Java
codeit was 3 times faster compared to the normal PreparedStatement addBatch with it's batchExecute for PostgreSQL. 
I could get the same speed improvement with my manually generated INSERT INTO as with the new JDBC driver for MySQL
whereI turned on the "rewriteBatchedStatements" parameter. 
I tried column groups (the "(x,y)") with sizes between 250 and 5000 (in steps of 250: so 250, 500, 750,...,5000). The
speedwas pretty stable at 3x for PostgreSQL but the larger the group the more memory is used to store the data you are
tryingto insert. 


Here is the blog entry of Mark Matthews (MySQL Connector/J developer) about speeding up Batch Inserts:
http://www.jroller.com/mmatthews/entry/speeding_up_batch_inserts_for

The upcoming MySQL driver supporting this feature (Connector/J 5.1 GA and 6.0 Alpha) can be downloaded here:
http://downloads.mysql.com/snapshots.php

Currently the MySQL driver has an error if your prepared INSERT INTO statement ends with a ";" (which is not needed
exceptin a SQL script but several developers like myself normally also add this to the SQL Command in the Java code). 
The driver would create an INSERT INTO statment that looks like this:
INSERT INTO sometable1 (col1, col2) VALUES (1, 'word');, (2, 'word2'), ...


It would be great if someone could implement this into the JDBC driver and the grouping could be turned on by a URL
driverparameter like e.g MySQL Connector/J (?rewriteBatchedStatements=true). 
Maybe there are also comment about this. So what do you think?

--
Juergen
______________________________________________________
GRATIS für alle WEB.DE-Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de


pgsql-jdbc by date:

Previous
From: John R Pierce
Date:
Subject: Re: java 1.4 for how long ?
Next
From: Oliver Jowett
Date:
Subject: Re: Speed improvement - Group batch Insert - Rewrite the INSERT at the driver level (using a parameter)