[PERFORM] Improving PostgreSQL insert performance - Mailing list pgsql-performance

From Frits Jalvingh
Subject [PERFORM] Improving PostgreSQL insert performance
Date
Msg-id CAKhTGFX-ChBSjqENrAv8uqUR_H5PTvf14jG0cOAq6c-ami_7sQ@mail.gmail.com
Whole thread Raw
Responses Re: [PERFORM] Improving PostgreSQL insert performance
Re: [PERFORM] Improving PostgreSQL insert performance
List pgsql-performance
Hi all,

I am trying to improve the runtime of a big data warehouse application. One significant bottleneck found was insert performance, so I am investigating ways of getting Postgresql to insert data faster. I ran several tests on a fast machine to find out what performs best, and compared the results with the same actions in Oracle on that same machine.

So far I am finding that PostgreSQL insert performance is several times slower than Oracle performance, and I would be grateful for some help trying to decrease this gap...

To test I wrote a simple Java program which inserts data into a simple table, using statement batching and delayed commits. The table looks as follows:

create table h_test(
id_h integer
, source_system_id integer
, organisation_id integer
, load_dts timestamp without time zone
, boekingdetailid text
);
No constraints, no indexes.

The java program and PostgreSQL run on the same machine. The best results I've got are:

PostgreSQL inserts:

Commit size 50.000 and batch size 10.000
Inserted 1000000 rows in 7500 milliseconds, 142857.14285714287 rows per second
Inserted 1000000 rows in 7410 milliseconds, 142857.14285714287 rows per second

The exact same test done on Oracle (on the same machine) reports:

Inserted 1000000 rows in 1072 milliseconds, 1000000.0 rows per second

Increasing the row count in Oracle decreases this number a bit, but it's still fast:
Inserted 24000000 rows in 47155 milliseconds, 510638.2978723404 rows per second (oracle)

compared with:
Inserted 24000000 rows in 159929 milliseconds, 150943.3962264151 rows per second (postgresql)

I also created a small pg/sql stored procedure to insert the same 1 million rows, which runs in about 4 seconds, resulting in 250.000 rows a second. This is in the DB itself, but it still is twice as slow as Oracle with JDBC:
CREATE or replace function test() returns void AS $$
DECLARE
    count integer;
BEGIN
    for count in 1..1000000 loop
        insert into h_test(id_h,source_system_id,organisation_id,load_dts,boekingdetailid)
            values(count, 1, 12, now(), 'boe' || count || 'king' || count);
    end loop;
END;
$$ LANGUAGE plpgsql;


I already changed the following config parameters:
work_mem 512MB
synchronous_commit off
shared_buffers 512mb
commit_delay 100000
autovacuum_naptime 10min

Postgres version is 9.6.3 on Ubuntu 17.04 64 bit, on a i7-4790K with 16GB memory and an Intel 750 SSD. JDBC driver is postgresql-42.1.1.

(btw: the actual load I'm trying to improve will load more than 132 million rows, and will grow).

Any help is greatly appreciated!

Regards,

Frits

pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: [PERFORM] index of only not null, use function index?
Next
From: "Kevin.Hughes@uk.fujitsu.com"
Date:
Subject: Re: [PERFORM] Client Server performance & UDS